Quick Start
Get a monitoring stack running in 30 minutes or less.
Prerequisites¶
Required¶
- Ansible: 2.9 or higher
- Python: 3.6 or higher on control node
- Linux host: Debian 11/12, Rocky 9, or Ubuntu 24
- SSH access: To target hosts with sudo privileges
- Network: Outbound internet for package downloads
Recommended¶
- Two hosts: One for monitoring server, one for client
- 4GB RAM: Minimum per host
- 20GB disk: For short-term storage
Installation¶
Step 1: Install Collection¶
# Install from Ansible Galaxy
ansible-galaxy collection install jackaltx.solti_monitoring
# Verify installation
ansible-galaxy collection list | grep solti_monitoring
Step 2: Create Inventory¶
# inventory.yml
all:
children:
monitoring_servers:
hosts:
monitor1:
ansible_host: 192.168.1.10
monitored_clients:
hosts:
monitor1: # Monitor the monitoring server itself
ansible_host: 192.168.1.10
client1:
ansible_host: 192.168.1.20
Why monitor1 appears twice:
- As monitoring_servers → Runs InfluxDB and Loki (storage)
- As monitored_clients → Runs Telegraf and Alloy (collectors)
- Reason: Monitor the monitor! Track CPU, memory, disk usage of monitoring infrastructure itself
Step 3: Deploy Monitoring Server¶
# deploy-server.yml
---
- name: Deploy Monitoring Server
hosts: monitoring_servers
become: true
roles:
- role: jackaltx.solti_monitoring.influxdb
vars:
influxdb_org: "myorg"
influxdb_bucket: "telegraf"
- role: jackaltx.solti_monitoring.loki
vars:
loki_local_storage: true
Run the playbook:
Expected time: 5-10 minutes
What happens: - InfluxDB deployed with auto-generated tokens - Loki deployed with local storage - Monitor1 ready to receive metrics and logs
Step 4: Get InfluxDB Token¶
After deployment, get the system-operator token:
# SSH to monitoring server
ssh monitor1
# List all tokens
sudo podman exec influxdb influx auth list --json
# Find system-operator token
sudo podman exec influxdb influx auth list --json | jq -r '.[] | select(.description == "system-operator") | .token'
Copy the token value - you'll need it for the next step.
Step 5: Configure Connection Database¶
Create connection configurations for Telegraf outputs:
# group_vars/all/telegraf2influxdb_configs.yml
---
telgraf2influxdb_configs:
localhost:
url: "http://127.0.0.1:8086"
token: "" # Leave empty - auto-discovered when collector runs on same host as InfluxDB
bucket: "telegraf"
org: "myorg"
namedrop: '["influxdb_oss"]'
central:
url: "http://192.168.1.10:8086"
token: "Z5VHB6JzIEioWv9MH1_lFgTcfFy_yZR8V7ThhiA0lAdXmeS50_y-rjL8PrXNPrlG1zLziOHZwsxVkqojWaPJ4A==" # Paste token from Step 4
bucket: "telegraf"
org: "myorg"
Important - Auto-Discovery Feature:
localhostconfig: Token is empty (token: "")- Why: When Telegraf runs on the same host as InfluxDB, the role automatically discovers the token from the local InfluxDB instance
-
Use case: monitor1 monitoring itself (all-in-one deployment)
-
centralconfig: Token is required (paste from Step 4) - Why: Remote clients cannot auto-discover tokens
- Use case: client1 sending metrics to monitor1
Directory structure:
./
├── inventory.yml
├── group_vars/
│ └── all/
│ └── telegraf2influxdb_configs.yml
├── deploy-server.yml
└── deploy-client.yml
Step 6: Deploy Monitoring Clients¶
# deploy-client.yml
---
- name: Deploy Telegraf on Monitor Server
hosts: monitoring_servers
become: true
roles:
- role: jackaltx.solti_monitoring.telegraf
vars:
telegraf_outputs: ['localhost'] # Uses localhost config (auto-discovery)
- name: Deploy Clients
hosts: monitored_clients
become: true
roles:
- role: jackaltx.solti_monitoring.telegraf
vars:
telegraf_outputs: ['central'] # Uses central config (requires token)
- role: jackaltx.solti_monitoring.alloy
vars:
alloy_loki_endpoints:
- label: central
endpoint: "192.168.1.10"
Run the playbook:
Expected time: 3-5 minutes
What happens:
- monitor1 gets Telegraf → uses localhost config (token auto-discovered)
- client1 gets Telegraf → uses central config (token from Step 5)
- Both get Alloy → send logs to monitor1
Step 7: Verify Data Flow¶
Check metrics from monitor1 (monitoring itself):
curl -G "http://192.168.1.10:8086/api/v2/query?org=myorg" \
-H "Authorization: Token YOUR_TOKEN_HERE" \
--data-urlencode 'query=from(bucket:"telegraf") |> range(start:-5m) |> filter(fn:(r) => r["host"] == "monitor1") |> limit(n:5)'
Check metrics from client1:
curl -G "http://192.168.1.10:8086/api/v2/query?org=myorg" \
-H "Authorization: Token YOUR_TOKEN_HERE" \
--data-urlencode 'query=from(bucket:"telegraf") |> range(start:-5m) |> filter(fn:(r) => r["host"] == "client1") |> limit(n:5)'
Check logs:
curl -G "http://192.168.1.10:3100/loki/api/v1/query" \
--data-urlencode 'query={hostname=~"monitor1|client1"}' \
--data-urlencode 'limit=10'
If you see data from both hosts, congratulations! Your monitoring stack is working.
Alternative: All-in-One Testing¶
For testing or single-host deployments, you can deploy everything on one host:
Simplified Inventory¶
# inventory-allinone.yml
all:
hosts:
monitor1:
ansible_host: 192.168.1.10
children:
monitoring_servers:
hosts:
monitor1:
monitored_clients:
hosts:
monitor1:
Single Playbook¶
# deploy-allinone.yml
---
- name: Deploy Monitoring Server
hosts: monitoring_servers
become: true
roles:
- role: jackaltx.solti_monitoring.influxdb
vars:
influxdb_org: "myorg"
influxdb_bucket: "telegraf"
- role: jackaltx.solti_monitoring.loki
vars:
loki_local_storage: true
- name: Deploy Collectors
hosts: monitored_clients
become: true
roles:
- role: jackaltx.solti_monitoring.telegraf
vars:
telegraf_outputs: ['localhost'] # Token auto-discovered
- role: jackaltx.solti_monitoring.alloy
vars:
alloy_loki_endpoints:
- label: localhost
endpoint: "127.0.0.1"
Run All-in-One¶
# Create minimal connection config
mkdir -p group_vars/all
cat > group_vars/all/telegraf2influxdb_configs.yml <<EOF
---
telgraf2influxdb_configs:
localhost:
url: "http://127.0.0.1:8086"
token: "" # Auto-discovered
bucket: "telegraf"
org: "myorg"
namedrop: '["influxdb_oss"]'
EOF
# Deploy everything
ansible-playbook -i inventory-allinone.yml deploy-allinone.yml
What's special: - No manual token retrieval needed - The Telegraf role automatically discovers the token from the local InfluxDB - Perfect for testing - Quick setup to verify the collection works - Production use - Also works for small deployments where everything runs on one server
Summary¶
You now have: - ✅ InfluxDB receiving metrics - ✅ Loki receiving logs - ✅ Telegraf collecting system metrics - ✅ Alloy collecting system logs - ✅ Verified data flow
Total time: ~30 minutes
Next: Explore the rest of this documentation to customize and expand your monitoring setup.