InfluxDB
Overview¶
InfluxDB is a time-series database optimized for storing and querying metrics data. The solti-monitoring collection uses InfluxDB v2 OSS for metrics storage.
Important: InfluxDB v2 OSS does NOT support S3 object storage. Use NFS mounts for shared/scalable storage.
Purpose¶
- Store time-series metrics from Telegraf collectors
- Provide query API for Grafana dashboards
- Support retention policies for data lifecycle management
- Enable shared storage via NFS mounts
Installation¶
The influxdb role deploys InfluxDB as a Podman container (consumed from solti-containers) using systemd quadlets:
- role: jackaltx.solti_monitoring.influxdb
vars:
influxdb_admin_token: "{{ vault_influxdb_token }}"
influxdb_org: "example-org"
influxdb_bucket: "telegraf"
influxdb_retention: "30d"
Note: This role configures InfluxDB within a container. The Podman container runtime is provided by solti-containers collection.
Key Configuration Options¶
Basic Configuration¶
influxdb_version: "2.7" # InfluxDB version
influxdb_port: 8086 # HTTP API port
influxdb_admin_user: "admin" # Admin username
influxdb_admin_token: "{{ vault_token }}" # Admin API token (from vault)
influxdb_org: "example-org" # Organization name
influxdb_bucket: "telegraf" # Default bucket
influxdb_retention: "30d" # Data retention period
Storage Options¶
Local storage (default):
NFS mount (production):
Why NFS: - InfluxDB v2 OSS does NOT support S3 object storage - NFS enables shared storage across multiple hosts (if needed) - NFS supports simple backup via snapshots - Cost-effective for medium-scale deployments
Container Configuration¶
Deployed via Podman with systemd quadlets (quadlet files managed by role):
influxdb_container_name: "influxdb"
influxdb_image: "docker.io/influxdb:2.7"
influxdb_restart_policy: "always"
Container Runtime: Provided by solti-containers Podman role
Deployment¶
Basic Deployment (Local Storage)¶
Deploy InfluxDB on a single host:
---
- name: Deploy InfluxDB server
hosts: monitoring_servers
become: true
roles:
- role: jackaltx.solti_containers.podman # Provides container runtime
- role: jackaltx.solti_monitoring.influxdb
vars:
influxdb_admin_token: "{{ vault_influxdb_token }}"
influxdb_org: "example-org"
influxdb_bucket: "telegraf"
influxdb_data_path: "/var/lib/influxdb2"
Run deployment:
NFS-Backed Deployment (Production)¶
Deploy with NFS storage backend:
---
- name: Deploy InfluxDB with NFS storage
hosts: monitoring_servers
become: true
tasks:
# Mount NFS first
- name: Mount NFS volume
ansible.posix.mount:
src: "nas.example.com:/export/influxdb"
path: "/mnt/nfs/influxdb"
fstype: nfs
opts: "rw,sync"
state: mounted
roles:
- role: jackaltx.solti_containers.podman
- role: jackaltx.solti_monitoring.influxdb
vars:
influxdb_admin_token: "{{ vault_influxdb_token }}"
influxdb_org: "example-org"
influxdb_bucket: "telegraf"
influxdb_data_path: "/mnt/nfs/influxdb"
influxdb_retention: "30d"
Production Example (monitor11):
influxdb_data_path: "/mnt/nfs-s3-server/influxdb"
influxdb_retention: "30d"
influxdb_org: "example-org"
influxdb_bucket: "telegraf"
Service Management¶
Systemd Quadlet¶
InfluxDB runs as a Podman container managed by systemd:
# Check status
systemctl status influxdb
# Start/stop/restart
systemctl start influxdb
systemctl stop influxdb
systemctl restart influxdb
# View logs
journalctl -u influxdb -f
# Check container
podman ps | grep influxdb
Health Check¶
Verify InfluxDB is running:
Expected response:
API Access¶
Authentication¶
All API requests require token authentication:
# Using token authentication
curl -H "Authorization: Token YOUR_TOKEN" \
http://localhost:8086/api/v2/buckets
Creating Tokens¶
Create additional tokens for Telegraf clients:
# Via influx CLI (inside container)
podman exec influxdb influx auth create \
--org example-org \
--description "Telegraf write token for ispconfig3" \
--write-bucket telegraf
Or via API:
curl -X POST http://localhost:8086/api/v2/authorizations \
-H "Authorization: Token ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"orgID": "ORG_ID",
"description": "Telegraf write token",
"permissions": [
{"action": "write", "resource": {"type": "buckets", "name": "telegraf"}}
]
}'
Query API¶
Query metrics using Flux:
curl -X POST http://localhost:8086/api/v2/query \
-H "Authorization: Token YOUR_TOKEN" \
-H "Content-Type: application/vnd.flux" \
-d 'from(bucket: "telegraf")
|> range(start: -1h)
|> filter(fn: (r) => r["_measurement"] == "cpu")
|> filter(fn: (r) => r["host"] == "ispconfig3-server.example.com")'
Python Example (used by Claude for dashboard development):
from influxdb_client import InfluxDBClient
client = InfluxDBClient(
url="http://monitor11.example.com:8086",
token="YOUR_TOKEN",
org="example-org"
)
query_api = client.query_api()
query = '''
from(bucket: "telegraf")
|> range(start: -1h)
|> filter(fn: (r) => r["service"] == "alloy")
|> filter(fn: (r) => r["_field"] == "alloy_build_info")
'''
result = query_api.query(query=query)
Retention Policies¶
Bucket Retention¶
Set retention when creating buckets:
Update Retention¶
Modify existing bucket retention:
List Buckets¶
Resource Requirements¶
Minimum Requirements¶
- CPU: 2 cores
- Memory: 2GB RAM
- Disk: 50GB+ (local) or NFS mount
- Network: 100 Mbps
Sizing Guidance¶
Small deployment (1-10 hosts, current production):
- 2 CPU cores
- 4GB RAM
- 100GB NFS mount
- 30-day retention
Medium deployment (10-50 hosts):
- 4 CPU cores
- 8GB RAM
- 250GB NFS mount
- 60-day retention
Large deployment (50+ hosts):
- Consider deploying multiple regional hubs (see Deployment Patterns)
- Each hub: 4 CPU, 8GB RAM, 250GB NFS
- Distributed deployment preferred over single large instance
Backup and Recovery¶
NFS Snapshot Backup¶
If using NFS storage, leverage NFS server snapshots:
# On NFS server (e.g., FreeNAS/TrueNAS)
# Create snapshot
zfs snapshot tank/influxdb@backup-2026-01-11
# List snapshots
zfs list -t snapshot tank/influxdb
Manual Export/Import¶
# Export specific bucket
podman exec influxdb influx backup /tmp/backup-telegraf -t YOUR_TOKEN
# Copy out of container
podman cp influxdb:/tmp/backup-telegraf /backup/influxdb/
# Restore to new instance
podman cp /backup/influxdb/backup-telegraf influxdb:/tmp/
podman exec influxdb influx restore /tmp/backup-telegraf
Disaster Recovery with NFS¶
- Deploy new InfluxDB instance
- Mount same NFS volume to
/mnt/nfs/influxdb - Start InfluxDB container - data automatically available
Performance Tuning¶
Cache Settings¶
Adjust cache sizes based on available memory:
Compaction¶
Configure compaction for I/O optimization:
Note: When using NFS storage, monitor NFS server I/O during compaction.
Monitoring InfluxDB¶
Internal Metrics¶
InfluxDB exposes Prometheus-format metrics:
Monitor with Telegraf¶
InfluxDB can monitor itself:
# In telegraf config on monitoring server
[[inputs.http]]
urls = ["http://localhost:8086/metrics"]
data_format = "prometheus"
tagexclude = ["url"]
Key Metrics to Monitor¶
http_api_requests_total- API request countstorage_compactions_active- Active compactionsstorage_cache_disk_bytes- Cache disk usagehttp_query_request_duration_seconds- Query latencystorage_wal_size_current_bytes- WAL size (write-ahead log)
Troubleshooting¶
Check Container Status¶
Check Service Status¶
Verify API Access¶
# Health check
curl -I http://localhost:8086/health
# Ping endpoint
curl http://localhost:8086/api/v2/ping
# Verify authentication
curl -H "Authorization: Token YOUR_TOKEN" \
http://localhost:8086/api/v2/buckets
Common Issues¶
- Container won't start
- Check logs:
podman logs influxdb - Verify data path exists and is writable
-
Check NFS mount if using NFS
-
API not accessible
- Verify port 8086 is open:
ss -tulpn | grep 8086 - Check firewall rules
-
Verify container is running:
podman ps -
Authentication errors
- Verify token is correct
-
Check token permissions:
podman exec influxdb influx auth list -
Out of disk space
- Check NFS mount usage:
df -h /mnt/nfs/influxdb - Reduce retention period
-
Delete old buckets
-
High memory usage
- Reduce cache sizes
- Add more RAM
-
Check for expensive queries
-
NFS mount issues
- Verify NFS server is reachable:
ping nas.example.com - Check mount status:
mount | grep influxdb - Test NFS write:
touch /mnt/nfs/influxdb/test
Security Considerations¶
- Token Security
- Store admin tokens in Ansible Vault
- Use minimal permissions for client tokens (write-only for Telegraf)
-
Rotate tokens regularly
-
Network Access
- Restrict port 8086 to monitoring network
- Use WireGuard VPN for remote collectors
-
Consider firewall rules limiting source IPs
-
TLS/SSL (Optional)
- InfluxDB supports TLS natively (not currently configured)
-
Alternative: Use reverse proxy (Traefik from
solti-ensemble) -
Container Security
- Run as rootless Podman container
- Limit container capabilities
-
Keep InfluxDB image updated
-
Backup Encryption
- Encrypt NFS volume at rest (FreeNAS/TrueNAS encryption)
- Encrypt exported backups before off-site storage
Container Integration¶
This role CONSUMES containers from solti-containers:
- Podman runtime: Provided by
jackaltx.solti_containers.podmanrole - Systemd quadlets: This role generates quadlet files, Podman manages lifecycle
- Testing: Uses
solti-containerstest containers in molecule scenarios
This role CONFIGURES InfluxDB container:
- Generates InfluxDB config files
- Sets up data directories and mounts
- Creates systemd quadlet definitions
- Manages tokens and buckets
Example integration:
- hosts: monitoring_servers
roles:
# 1. Provide container runtime
- role: jackaltx.solti_containers.podman
# 2. Configure InfluxDB within container
- role: jackaltx.solti_monitoring.influxdb
vars:
influxdb_data_path: "/mnt/nfs/influxdb"
Reference Deployment¶
Production (monitor11.example.com):
influxdb_org: "example-org"
influxdb_bucket: "telegraf"
influxdb_retention: "30d"
influxdb_data_path: "/mnt/nfs-s3-server/influxdb"
influxdb_port: 8086
Characteristics: - Proxmox VM (4 CPU, 8GB RAM) - NFS mount to FreeNAS (s3-server.example.com) - Serves 10 monitored hosts (ispconfig3 + local services) - WireGuard VPN for remote collection
See also: - Chapter 2: Deployment Patterns - Hub-and-spoke architecture - Chapter 6: Reference Deployments - monitor11 detailed configuration