Skip to content

Collection Monitor

This server will be "monitor11.example.com".

Overview

Production metrics and log collector running on Proxmox VM (local infrastructure).

Type: Proxmox VM (local infrastructure) Purpose: Production metrics/log collector (solti-monitoring reference, partial deployment) Stack: InfluxDB, Loki, Telegraf, Alloy

Deployment Details

Stack Components

  • InfluxDB - Metrics storage with S3 backend
  • Loki - Log aggregation with S3 backend
  • Telegraf - Local metrics collection
  • Alloy - Log collection from remote hosts

Playbooks

  • svc-monitor11-metrics.yml - InfluxDB + Telegraf deployment
  • svc-monitor11-logs.yml - Loki + Alloy deployment

Located in: mylab/playbooks/

Configuration

Telegraf

  • Output: localhost InfluxDB
  • Metrics: System metrics, service health

InfluxDB

  • Storage Backend: S3-compatible (storage.example.com:8010)
  • Bucket: influx11
  • Retention: 30 days
  • Organization: example-org

Loki

  • Storage Backend: S3-compatible (storage.example.com:8010)
  • Bucket: loki11
  • Port: 3100

WireGuard

  • Endpoint: 10.10.0.11
  • Purpose: Secure tunnel for remote collectors
  • Clients: prod.example.com

Network Configuration

Internet
   └── monitor11.example.com (Proxmox VM)
        ├── InfluxDB :8086
        ├── Loki :3100
        └── WireGuard :51820
             └── 10.10.0.11 (WireGuard internal IP)
                  └── prod.example.com (10.10.0.1) sends logs/metrics

Access

  • SSH: ssh root@monitor11.example.com
  • InfluxDB API: http://monitor11.example.com:8086
  • Loki API: http://monitor11.example.com:3100
  • WireGuard: 10.10.0.11 (internal network)

S3 Storage

Both InfluxDB and Loki use S3-compatible storage on s3-server:

storage.example.com:8010
├── loki11/     (Loki chunks and indexes)
└── influx11/   (InfluxDB WAL and data)

Deployment Commands

The "manage-svc.sh" wscript will dynamically install a role, based on the inventory. These scripts are use for development deploys.

Deploy Metrics Stack

cd mylab
./manage-svc.sh -h monitor11 influxdb deploy
./manage-svc.sh -h monitor11 telegraf deploy

Deploy Logging Stack

cd mylab
./manage-svc.sh -h monitor11 loki deploy
./manage-svc.sh -h monitor11 alloy deploy

Verify Deployment

cd mylab
./svc-exec.sh -h monitor11 influxdb verify
./svc-exec.sh -h monitor11 loki verify

Service Management

Check Services

ssh root@monitor11.example.com
systemctl status influxdb
systemctl status loki
systemctl status telegraf
systemctl status alloy

View Logs

journalctl -u influxdb -f
journalctl -u loki -f

Troubleshooting

InfluxDB Not Starting

Check S3 connectivity:

curl http://storage.example.com:8010

Check InfluxDB logs:

journalctl -u influxdb -n 100

Loki Not Receiving Logs

Check Loki status:

curl http://localhost:3100/ready

Test query:

curl -G http://localhost:3100/loki/api/v1/query   --data-urlencode 'query={hostname="prod.example.com"}'   --data-urlencode 'limit=10'

Monitoring the Monitor

monitor11 monitors itself: - Telegraf collects its own metrics - Metrics stored locally in InfluxDB - No external dependencies for self-monitoring

References

TODO: these playbooks are not public yet.

  • Playbooks: mylab/playbooks/svc-monitor11-*.yml
  • Inventory: mylab/inventory.yml (monitor11 host definition)
  • CLAUDE.md: Reference Machines section