Collection Monitor
This server will be "monitor11.example.com".
Overview¶
Production metrics and log collector running on Proxmox VM (local infrastructure).
Type: Proxmox VM (local infrastructure) Purpose: Production metrics/log collector (solti-monitoring reference, partial deployment) Stack: InfluxDB, Loki, Telegraf, Alloy
Deployment Details¶
Stack Components¶
- InfluxDB - Metrics storage with S3 backend
- Loki - Log aggregation with S3 backend
- Telegraf - Local metrics collection
- Alloy - Log collection from remote hosts
Playbooks¶
svc-monitor11-metrics.yml- InfluxDB + Telegraf deploymentsvc-monitor11-logs.yml- Loki + Alloy deployment
Located in: mylab/playbooks/
Configuration¶
Telegraf¶
- Output: localhost InfluxDB
- Metrics: System metrics, service health
InfluxDB¶
- Storage Backend: S3-compatible (storage.example.com:8010)
- Bucket:
influx11 - Retention: 30 days
- Organization: example-org
Loki¶
- Storage Backend: S3-compatible (storage.example.com:8010)
- Bucket:
loki11 - Port: 3100
WireGuard¶
- Endpoint: 10.10.0.11
- Purpose: Secure tunnel for remote collectors
- Clients: prod.example.com
Network Configuration¶
Internet
│
└── monitor11.example.com (Proxmox VM)
├── InfluxDB :8086
├── Loki :3100
└── WireGuard :51820
│
└── 10.10.0.11 (WireGuard internal IP)
│
└── prod.example.com (10.10.0.1) sends logs/metrics
Access¶
- SSH:
ssh root@monitor11.example.com - InfluxDB API:
http://monitor11.example.com:8086 - Loki API:
http://monitor11.example.com:3100 - WireGuard:
10.10.0.11(internal network)
S3 Storage¶
Both InfluxDB and Loki use S3-compatible storage on s3-server:
storage.example.com:8010
├── loki11/ (Loki chunks and indexes)
└── influx11/ (InfluxDB WAL and data)
Deployment Commands¶
The "manage-svc.sh" wscript will dynamically install a role, based on the inventory. These scripts are use for development deploys.
Deploy Metrics Stack¶
Deploy Logging Stack¶
Verify Deployment¶
Service Management¶
Check Services¶
ssh root@monitor11.example.com
systemctl status influxdb
systemctl status loki
systemctl status telegraf
systemctl status alloy
View Logs¶
Troubleshooting¶
InfluxDB Not Starting¶
Check S3 connectivity:
Check InfluxDB logs:
Loki Not Receiving Logs¶
Check Loki status:
Test query:
curl -G http://localhost:3100/loki/api/v1/query --data-urlencode 'query={hostname="prod.example.com"}' --data-urlencode 'limit=10'
Monitoring the Monitor¶
monitor11 monitors itself: - Telegraf collects its own metrics - Metrics stored locally in InfluxDB - No external dependencies for self-monitoring
References¶
TODO: these playbooks are not public yet.
- Playbooks:
mylab/playbooks/svc-monitor11-*.yml - Inventory:
mylab/inventory.yml(monitor11 host definition) - CLAUDE.md: Reference Machines section