Integration Points
Overview¶
Solti-Monitoring provides monitoring stack roles (InfluxDB, Loki, Telegraf, Alloy, Grafana) and consumes container orchestration and shared services from other collections.
What Solti-Monitoring Provides¶
Monitoring Stack Components¶
InfluxDB v2 OSS - Time-series metrics storage - Flux query language - HTTP API on port 8086 - Token-based authentication
Loki - Log aggregation storage - LogQL query language - HTTP API on port 3100 - S3-compatible object storage support
Telegraf - Metrics collection agent - 300+ input plugins (system, Docker, databases, web servers) - Multiple output formats (InfluxDB, Prometheus, file)
Alloy - Observability data pipeline - Log parsing, filtering, and label engineering - Reduces Loki cardinality and storage costs - Enables predictable label schemas for programmatic queries
Grafana (Orchestrator Only) - Visualization dashboards - Multi-datasource support (InfluxDB, Loki, Prometheus) - Programmatic dashboard creation via HTTP API
Integration Points¶
All components integrate via HTTP APIs and standardized protocols:
Telegraf → InfluxDB (HTTP, port 8086)
Alloy → Loki (HTTP, port 3100)
Grafana → InfluxDB/Loki (HTTP queries)
What Solti-Monitoring Consumes¶
Container Orchestration (solti-containers)¶
Solti-Monitoring does NOT provide container orchestration.
We consume container infrastructure from solti-containers:
- Podman: Rootless containers, quadlet systemd integration
- Testing containers: Pre-configured test targets (MySQL, Redis, etc.)
- Molecule scenarios: Isolated test environments
Example:
# solti-monitoring roles deploy INTO containers
# managed by solti-containers orchestration
- hosts: monitoring_servers
roles:
- role: jackaltx.solti_containers.podman # Provides container runtime
- role: jackaltx.solti_monitoring.influxdb # Deploys as podman container
Shared Services (solti-ensemble)¶
Purpose: Infrastructure services shared across all SOLTI collections
Integration: - MariaDB: Database for application metadata (Grafana dashboards, user data) - HashiVault: Secrets management for InfluxDB tokens, Loki credentials - ACME/Traefik: TLS certificates and reverse proxy for monitoring APIs
Example:
# Use HashiVault for InfluxDB admin token
- hosts: monitoring_servers
roles:
- role: jackaltx.solti_ensemble.hashivault
- role: jackaltx.solti_monitoring.influxdb
vars:
influxdb_admin_token: "{{ lookup('vault', 'influxdb/admin_token') }}"
Documentation (solti-docs)¶
Purpose: SOLTI framework architecture and testing methodology
Integration:
- Testing patterns (molecule scenarios)
- Reporting standards (sprint reports in Reports/)
- Architectural philosophy (SOLTI = Systems Oriented Laboratory Testing Integration)
External Integrations¶
Grafana Dashboard Development¶
Programmatic dashboard creation via HTTP API:
import requests, json
# Get admin password
with open('~/.secrets/grafana.admin.pass', 'r') as f:
password = f.read().strip()
# Create dashboard via API
dashboard = {
"dashboard": {...}, # Dashboard JSON
"message": "Created via API",
"overwrite": True
}
response = requests.post(
"http://localhost:3000/api/dashboards/db",
auth=('admin', password),
json=dashboard
)
Reference: See CLAUDE.md "Grafana Dashboard Development Workflow" and "Creating Grafana Dashboards Programmatically"
Prometheus Federation (Optional)¶
Telegraf can expose metrics in Prometheus format:
Then scrape from Prometheus:
# prometheus.yml
scrape_configs:
- job_name: 'telegraf'
static_configs:
- targets: ['monitor11:9273']
Note: We do NOT currently deploy Prometheus in production. This is documented for potential future use.
Elasticsearch (Not Currently Used)¶
Alloy can theoretically forward to Elasticsearch, but we do not currently use Elasticsearch in solti-monitoring.
If needed in the future, Alloy supports multiple outputs:
# NOT currently deployed
alloy_additional_outputs:
- type: elasticsearch
endpoint: "https://es.example.com:9200"
API Integration¶
InfluxDB API¶
Write metrics programmatically:
from influxdb_client import InfluxDBClient
client = InfluxDBClient(
url="http://monitor11.example.com:8086",
token="YOUR_TOKEN",
org="example-org"
)
write_api = client.write_api()
write_api.write(
bucket="telegraf",
record="custom_metric,host=app1 value=42"
)
Query metrics:
query_api = client.query_api()
query = '''
from(bucket: "telegraf")
|> range(start: -1h)
|> filter(fn: (r) => r["service"] == "alloy")
'''
result = query_api.query(query=query)
Loki API¶
Query logs programmatically (used by Claude for dashboard development):
curl -s -G "http://monitor11.example.com:3100/loki/api/v1/query" \
--data-urlencode 'query={service_type="fail2ban"}' \
--data-urlencode 'limit=10'
Python example:
import requests, time
now_ns = int(time.time() * 1e9)
params = {
'query': '{service_type="fail2ban"}',
'time': now_ns
}
response = requests.get(
"http://monitor11.example.com:3100/loki/api/v1/query",
params=params
)
data = response.json()
Reference: See CLAUDE.md "Grafana Dashboard Development Workflow" for query testing examples
Data Export¶
InfluxDB Export¶
Via CLI:
# Export as CSV
influx query 'from(bucket:"telegraf") |> range(start:-1h)' --format csv > export.csv
# Backup entire database
influx backup /backup/influxdb/$(date +%Y%m%d)
Loki Export¶
Via API:
curl -G "http://monitor11.example.com:3100/loki/api/v1/query_range" \
--data-urlencode 'query={hostname="ispconfig3-server.example.com"}' \
--data-urlencode 'start=1704067200000000000' \
--data-urlencode 'end=1704153600000000000' \
> export.json
AI-Assisted Integration¶
Claude Code programmatic access:
User request
↓
Claude constructs Loki/InfluxDB query
↓
Claude fetches data via HTTP API
↓
Claude generates Grafana dashboard JSON
↓
Claude deploys via Grafana API
How Alloy enables this: - Predictable label schemas (service_type, hostname, jail, etc.) - Pre-parsed fields (IP addresses, actions, timestamps) - Filtered noise (Claude queries return relevant results)
Example workflow:
# 1. Claude tests query against Loki
curl -s -G "http://monitor11.example.com:3100/loki/api/v1/query" \
--data-urlencode 'query=sum by(jail) (count_over_time({service_type="fail2ban"} [24h]))'
# 2. Claude builds dashboard panel JSON
{
"type": "stat",
"targets": [{
"expr": "sum by(jail) (count_over_time({service_type=\"fail2ban\"} [24h]))"
}]
}
# 3. Claude deploys to Grafana
curl -X POST -u admin:$PASSWORD \
-H "Content-Type: application/json" \
-d @dashboard.json \
http://localhost:3000/api/dashboards/db
Reference: See CLAUDE.md "AI-Assisted Observability Workflow"
Self-Monitoring¶
Monitoring servers monitor themselves:
- hosts: monitor11.example.com
roles:
- jackaltx.solti_monitoring.influxdb
- jackaltx.solti_monitoring.loki
- jackaltx.solti_monitoring.telegraf # Monitors the server itself
- jackaltx.solti_monitoring.alloy # Collects server logs
vars:
telegraf_outputs_influxdb_endpoint: http://localhost:8086
alloy_loki_endpoints:
- endpoint: http://localhost:3100
Health checks:
# InfluxDB health
curl http://monitor11.example.com:8086/health
# Loki health
curl http://monitor11.example.com:3100/ready
Backup Integration¶
Automated Backups¶
# InfluxDB backup (NFS mount)
- name: Backup InfluxDB data directory
cron:
name: "influxdb_backup"
minute: "0"
hour: "2"
job: "rsync -a /mnt/nfs/influxdb/ /backup/influxdb/$(date +%Y%m%d)/"
# Loki S3 backend (already in object storage)
# No additional backup needed - MinIO handles replication
Disaster Recovery¶
InfluxDB: 1. Restore NFS mount with data directory 2. Start InfluxDB container 3. Verify buckets and retention policies
Loki: 1. Restore S3 bucket from MinIO backup 2. Start Loki container 3. Verify log queries return data
CI/CD Integration¶
Testing Pipeline¶
# .github/workflows/test.yml
name: Test Monitoring Roles
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install collection
run: ansible-galaxy collection install .
- name: Run molecule tests
run: |
cd solti-monitoring
molecule test --all
Testing matrix: - Platforms: Podman (local), Proxmox (remote), GitHub Actions (CI) - Distros: Rocky Linux 9, Debian 12, Ubuntu 24 - Scenarios: Default (single role), combined (full stack)
Deployment Pipeline¶
See orchestrator (mylab/) for automated deployment workflows:
# Test Alloy config before deploy
cd mylab
ansible-playbook playbooks/ispconfig3/91-ispconfig3-alloy-test.yml
# Deploy to production
ansible-playbook playbooks/ispconfig3/22-ispconfig3-alloy.yml
Reference: See CLAUDE.md "Alloy Test/Deploy Workflow"
Current Production Integrations¶
monitor11.example.com (Monitoring Server)¶
Components: - InfluxDB (metrics storage, NFS backend) - Loki (log storage, S3 backend via MinIO) - Grafana (visualization, local Podman) - WireGuard (VPN endpoint for remote collectors)
Consumers: - ispconfig3-server.example.com (Telegraf + Alloy ship data via WireGuard)
ispconfig3-server.example.com (Monitored Host)¶
Components: - Telegraf (collects metrics) - Alloy (collects and processes logs)
Monitored services: - Apache, ISPConfig, Fail2ban, Gitea, Postfix/Dovecot, Bind9
Shipping: - Metrics → monitor11:8086 (via WireGuard) - Logs → monitor11:3100 (via WireGuard)
Next Steps¶
- Chapter 2: Quick Start - Deployment patterns
- Chapter 4: Component Guides - Individual role documentation
- Chapter 7: Dashboard Development - Programmatic dashboard creation
- Chapter 8: Testing - Molecule scenarios and validation