Verification Procedures¶
Overview¶
Verification ensures monitoring components are collecting, storing, and serving data correctly. This page covers systematic verification procedures.
Quick Verification¶
One-Command Health Check¶
# Check all services
systemctl is-active influxdb loki telegraf alloy && \
curl -s http://localhost:8086/health && \
curl -s http://localhost:3100/ready && \
echo "✅ All services healthy"
InfluxDB Verification¶
1. Service Status¶
Expected: active (running)
2. API Health¶
Expected: HTTP/1.1 200 OK
3. Authentication¶
Expected: JSON list of buckets
4. Data Ingestion¶
Check recent data:
curl -X POST "http://localhost:8086/api/v2/query?org=myorg" \
-H "Authorization: Token YOUR_TOKEN" \
-H "Content-Type: application/vnd.flux" \
--data-binary 'from(bucket: "telegraf")
|> range(start: -5m)
|> limit(n: 1)'
Expected: Recent data points
5. Storage Backend¶
Local storage:
S3 storage:
Loki Verification¶
1. Service Status¶
Expected: active (running)
2. Ready Endpoint¶
Expected: ready
3. Label Discovery¶
Expected: JSON array of labels (service_type, hostname, etc.)
4. Query Test¶
curl -G "http://localhost:3100/loki/api/v1/query" \
--data-urlencode 'query={service_type="fail2ban"}' \
--data-urlencode 'limit=5'
Expected: JSON with log entries (if fail2ban logs exist)
5. Log Ingestion Rate¶
# Check Loki metrics
curl -s http://localhost:3100/metrics | grep loki_ingester_chunks_created_total
Expected: Non-zero, increasing counter
6. Storage Backend¶
Local storage:
S3 storage:
Telegraf Verification¶
1. Service Status¶
Expected: active (running)
2. Configuration Test¶
Expected: Metrics output in InfluxDB line protocol
3. Check for Errors¶
Expected: No errors (or only transient connection errors during startup)
4. Verify Metrics Collection¶
Expected: CPU, memory, disk metrics
5. Verify Output to InfluxDB¶
Check logs for successful writes:
Or query InfluxDB for recent Telegraf data:
curl -X POST "http://localhost:8086/api/v2/query?org=myorg" \
-H "Authorization: Token YOUR_TOKEN" \
-H "Content-Type: application/vnd.flux" \
--data-binary 'from(bucket: "telegraf")
|> range(start: -5m)
|> filter(fn: (r) => r["_measurement"] == "cpu")
|> count()'
Alloy Verification¶
1. Service Status¶
Expected: active (running)
2. Configuration Validation¶
Expected: No errors
3. Check for Errors¶
Expected: No errors
4. Verify Log Sources¶
Check that Alloy is reading from configured sources:
5. Verify Output to Loki¶
Check logs for successful pushes:
Or query Loki for recent logs:
curl -G "http://localhost:3100/loki/api/v1/query" \
--data-urlencode 'query={hostname="YOUR_HOSTNAME"}' \
--data-urlencode 'limit=5'
6. Check Alloy Metrics¶
Expected: Metrics showing write operations
End-to-End Verification¶
Metrics Flow (Telegraf → InfluxDB)¶
- Generate load: Run
stressor similar tool - Wait 10 seconds: Allow Telegraf to collect
- Query InfluxDB: Check for high CPU metrics
# Generate CPU load
stress --cpu 2 --timeout 10s &
# Wait for collection
sleep 15
# Query for high CPU
curl -X POST "http://localhost:8086/api/v2/query?org=myorg" \
-H "Authorization: Token YOUR_TOKEN" \
-H "Content-Type: application/vnd.flux" \
--data-binary 'from(bucket: "telegraf")
|> range(start: -1m)
|> filter(fn: (r) => r["_measurement"] == "cpu")
|> filter(fn: (r) => r["_field"] == "usage_system")
|> filter(fn: (r) => r["cpu"] == "cpu-total")
|> max()'
Logs Flow (Alloy → Loki)¶
- Generate log entry: Trigger application event
- Wait 5 seconds: Allow Alloy to ship
- Query Loki: Check for log entry
# Generate fail2ban event (example)
logger -t fail2ban "[sshd] Ban 192.0.2.1"
# Wait for shipping
sleep 5
# Query Loki
curl -G "http://localhost:3100/loki/api/v1/query" \
--data-urlencode 'query={service_type="fail2ban"}' \
--data-urlencode 'limit=5'
Verification Checklist¶
Server Components¶
- [ ] InfluxDB service running
- [ ] InfluxDB health endpoint returns 200
- [ ] InfluxDB API accessible with token
- [ ] Recent data in InfluxDB buckets
- [ ] Loki service running
- [ ] Loki ready endpoint returns "ready"
- [ ] Loki labels endpoint returns labels
- [ ] Recent logs in Loki
Client Components¶
- [ ] Telegraf service running
- [ ] Telegraf configuration test passes
- [ ] No errors in Telegraf logs
- [ ] Metrics appearing in InfluxDB
- [ ] Alloy service running
- [ ] Alloy configuration validation passes
- [ ] No errors in Alloy logs
- [ ] Logs appearing in Loki
Network Connectivity¶
- [ ] Clients can reach InfluxDB (port 8086)
- [ ] Clients can reach Loki (port 3100)
- [ ] WireGuard tunnel active (if used)
- [ ] Firewall rules allow monitoring traffic
Automated Verification Script¶
#!/bin/bash
# verify-monitoring.sh
echo "╔════════════════════════════════════════╗"
echo "║ Monitoring Stack Verification ║"
echo "╚════════════════════════════════════════╝"
echo
# Check services
echo "Checking services..."
for svc in influxdb loki telegraf alloy; do
if systemctl is-active --quiet $svc; then
echo " ✅ $svc running"
else
echo " ❌ $svc not running"
fi
done
echo
# Check InfluxDB
echo "Checking InfluxDB..."
if curl -s -f http://localhost:8086/health > /dev/null; then
echo " ✅ InfluxDB healthy"
else
echo " ❌ InfluxDB not healthy"
fi
# Check Loki
echo "Checking Loki..."
if curl -s http://localhost:3100/ready | grep -q "ready"; then
echo " ✅ Loki ready"
else
echo " ❌ Loki not ready"
fi
echo
# Check data flow
echo "Checking data flow..."
echo " ℹ️ Query InfluxDB for recent data..."
# Add query here
echo " ℹ️ Query Loki for recent logs..."
# Add query here
echo
echo "Verification complete!"
Troubleshooting Failed Verification¶
See "Troubleshooting Guide" page for specific failure scenarios and solutions.