Skip to content

Verification Procedures

Overview

Verification ensures monitoring components are collecting, storing, and serving data correctly. This page covers systematic verification procedures.

Quick Verification

One-Command Health Check

# Check all services
systemctl is-active influxdb loki telegraf alloy && \
  curl -s http://localhost:8086/health && \
  curl -s http://localhost:3100/ready && \
  echo "✅ All services healthy"

InfluxDB Verification

1. Service Status

systemctl status influxdb

Expected: active (running)

2. API Health

curl -I http://localhost:8086/health

Expected: HTTP/1.1 200 OK

3. Authentication

curl -H "Authorization: Token YOUR_TOKEN" \
  http://localhost:8086/api/v2/buckets

Expected: JSON list of buckets

4. Data Ingestion

Check recent data:

curl -X POST "http://localhost:8086/api/v2/query?org=myorg" \
  -H "Authorization: Token YOUR_TOKEN" \
  -H "Content-Type: application/vnd.flux" \
  --data-binary 'from(bucket: "telegraf")
    |> range(start: -5m)
    |> limit(n: 1)'

Expected: Recent data points

5. Storage Backend

Local storage:

du -sh /var/lib/influxdb2
ls -lh /var/lib/influxdb2/engine/data

S3 storage:

# Check InfluxDB logs for S3 operations
journalctl -u influxdb | grep -i s3

Loki Verification

1. Service Status

systemctl status loki

Expected: active (running)

2. Ready Endpoint

curl http://localhost:3100/ready

Expected: ready

3. Label Discovery

curl http://localhost:3100/loki/api/v1/labels

Expected: JSON array of labels (service_type, hostname, etc.)

4. Query Test

curl -G "http://localhost:3100/loki/api/v1/query" \
  --data-urlencode 'query={service_type="fail2ban"}' \
  --data-urlencode 'limit=5'

Expected: JSON with log entries (if fail2ban logs exist)

5. Log Ingestion Rate

# Check Loki metrics
curl -s http://localhost:3100/metrics | grep loki_ingester_chunks_created_total

Expected: Non-zero, increasing counter

6. Storage Backend

Local storage:

du -sh /var/lib/loki
ls -lh /var/lib/loki/chunks

S3 storage:

# Check Loki logs for S3 operations
journalctl -u loki | grep -i s3

Telegraf Verification

1. Service Status

systemctl status telegraf

Expected: active (running)

2. Configuration Test

telegraf --config /etc/telegraf/telegraf.conf --test

Expected: Metrics output in InfluxDB line protocol

3. Check for Errors

journalctl -u telegraf -n 50 | grep -i error

Expected: No errors (or only transient connection errors during startup)

4. Verify Metrics Collection

telegraf --config /etc/telegraf/telegraf.conf \
  --input-filter cpu,mem,disk --test

Expected: CPU, memory, disk metrics

5. Verify Output to InfluxDB

Check logs for successful writes:

journalctl -u telegraf -n 100 | grep -i "wrote batch"

Or query InfluxDB for recent Telegraf data:

curl -X POST "http://localhost:8086/api/v2/query?org=myorg" \
  -H "Authorization: Token YOUR_TOKEN" \
  -H "Content-Type: application/vnd.flux" \
  --data-binary 'from(bucket: "telegraf")
    |> range(start: -5m)
    |> filter(fn: (r) => r["_measurement"] == "cpu")
    |> count()'

Alloy Verification

1. Service Status

systemctl status alloy

Expected: active (running)

2. Configuration Validation

alloy validate /etc/alloy/config.alloy

Expected: No errors

3. Check for Errors

journalctl -u alloy -n 50 | grep -i error

Expected: No errors

4. Verify Log Sources

Check that Alloy is reading from configured sources:

journalctl -u alloy | grep -i "reading from"

5. Verify Output to Loki

Check logs for successful pushes:

journalctl -u alloy | grep -i "batch"

Or query Loki for recent logs:

curl -G "http://localhost:3100/loki/api/v1/query" \
  --data-urlencode 'query={hostname="YOUR_HOSTNAME"}' \
  --data-urlencode 'limit=5'

6. Check Alloy Metrics

curl http://127.0.0.1:12345/metrics | grep loki_write

Expected: Metrics showing write operations

End-to-End Verification

Metrics Flow (Telegraf → InfluxDB)

  1. Generate load: Run stress or similar tool
  2. Wait 10 seconds: Allow Telegraf to collect
  3. Query InfluxDB: Check for high CPU metrics
# Generate CPU load
stress --cpu 2 --timeout 10s &

# Wait for collection
sleep 15

# Query for high CPU
curl -X POST "http://localhost:8086/api/v2/query?org=myorg" \
  -H "Authorization: Token YOUR_TOKEN" \
  -H "Content-Type: application/vnd.flux" \
  --data-binary 'from(bucket: "telegraf")
    |> range(start: -1m)
    |> filter(fn: (r) => r["_measurement"] == "cpu")
    |> filter(fn: (r) => r["_field"] == "usage_system")
    |> filter(fn: (r) => r["cpu"] == "cpu-total")
    |> max()'

Logs Flow (Alloy → Loki)

  1. Generate log entry: Trigger application event
  2. Wait 5 seconds: Allow Alloy to ship
  3. Query Loki: Check for log entry
# Generate fail2ban event (example)
logger -t fail2ban "[sshd] Ban 192.0.2.1"

# Wait for shipping
sleep 5

# Query Loki
curl -G "http://localhost:3100/loki/api/v1/query" \
  --data-urlencode 'query={service_type="fail2ban"}' \
  --data-urlencode 'limit=5'

Verification Checklist

Server Components

  • [ ] InfluxDB service running
  • [ ] InfluxDB health endpoint returns 200
  • [ ] InfluxDB API accessible with token
  • [ ] Recent data in InfluxDB buckets
  • [ ] Loki service running
  • [ ] Loki ready endpoint returns "ready"
  • [ ] Loki labels endpoint returns labels
  • [ ] Recent logs in Loki

Client Components

  • [ ] Telegraf service running
  • [ ] Telegraf configuration test passes
  • [ ] No errors in Telegraf logs
  • [ ] Metrics appearing in InfluxDB
  • [ ] Alloy service running
  • [ ] Alloy configuration validation passes
  • [ ] No errors in Alloy logs
  • [ ] Logs appearing in Loki

Network Connectivity

  • [ ] Clients can reach InfluxDB (port 8086)
  • [ ] Clients can reach Loki (port 3100)
  • [ ] WireGuard tunnel active (if used)
  • [ ] Firewall rules allow monitoring traffic

Automated Verification Script

#!/bin/bash
# verify-monitoring.sh

echo "╔════════════════════════════════════════╗"
echo "║   Monitoring Stack Verification        ║"
echo "╚════════════════════════════════════════╝"
echo

# Check services
echo "Checking services..."
for svc in influxdb loki telegraf alloy; do
    if systemctl is-active --quiet $svc; then
        echo "  ✅ $svc running"
    else
        echo "  ❌ $svc not running"
    fi
done
echo

# Check InfluxDB
echo "Checking InfluxDB..."
if curl -s -f http://localhost:8086/health > /dev/null; then
    echo "  ✅ InfluxDB healthy"
else
    echo "  ❌ InfluxDB not healthy"
fi

# Check Loki
echo "Checking Loki..."
if curl -s http://localhost:3100/ready | grep -q "ready"; then
    echo "  ✅ Loki ready"
else
    echo "  ❌ Loki not ready"
fi
echo

# Check data flow
echo "Checking data flow..."
echo "  ℹ️  Query InfluxDB for recent data..."
# Add query here

echo "  ℹ️  Query Loki for recent logs..."
# Add query here

echo
echo "Verification complete!"

Troubleshooting Failed Verification

See "Troubleshooting Guide" page for specific failure scenarios and solutions.