Skip to content

InfluxDB

Overview

InfluxDB is a time-series database optimized for storing and querying metrics data. The solti-monitoring collection uses InfluxDB v2 OSS for metrics storage.

Important: InfluxDB v2 OSS does NOT support S3 object storage. Use NFS mounts for shared/scalable storage.

Purpose

  • Store time-series metrics from Telegraf collectors
  • Provide query API for Grafana dashboards
  • Support retention policies for data lifecycle management
  • Enable shared storage via NFS mounts

Installation

The influxdb role deploys InfluxDB as a Podman container (consumed from solti-containers) using systemd quadlets:

- role: jackaltx.solti_monitoring.influxdb
  vars:
    influxdb_admin_token: "{{ vault_influxdb_token }}"
    influxdb_org: "example-org"
    influxdb_bucket: "telegraf"
    influxdb_retention: "30d"

Note: This role configures InfluxDB within a container. The Podman container runtime is provided by solti-containers collection.

Key Configuration Options

Basic Configuration

influxdb_version: "2.7"                    # InfluxDB version
influxdb_port: 8086                        # HTTP API port
influxdb_admin_user: "admin"               # Admin username
influxdb_admin_token: "{{ vault_token }}"  # Admin API token (from vault)
influxdb_org: "example-org"                     # Organization name
influxdb_bucket: "telegraf"                # Default bucket
influxdb_retention: "30d"                  # Data retention period

Storage Options

Local storage (default):

influxdb_data_path: "/var/lib/influxdb2"

NFS mount (production):

influxdb_data_path: "/mnt/nfs/influxdb"

Why NFS: - InfluxDB v2 OSS does NOT support S3 object storage - NFS enables shared storage across multiple hosts (if needed) - NFS supports simple backup via snapshots - Cost-effective for medium-scale deployments

Container Configuration

Deployed via Podman with systemd quadlets (quadlet files managed by role):

influxdb_container_name: "influxdb"
influxdb_image: "docker.io/influxdb:2.7"
influxdb_restart_policy: "always"

Container Runtime: Provided by solti-containers Podman role

Deployment

Basic Deployment (Local Storage)

Deploy InfluxDB on a single host:

---
- name: Deploy InfluxDB server
  hosts: monitoring_servers
  become: true

  roles:
    - role: jackaltx.solti_containers.podman  # Provides container runtime
    - role: jackaltx.solti_monitoring.influxdb
      vars:
        influxdb_admin_token: "{{ vault_influxdb_token }}"
        influxdb_org: "example-org"
        influxdb_bucket: "telegraf"
        influxdb_data_path: "/var/lib/influxdb2"

Run deployment:

ansible-playbook -i inventory.yml deploy-influxdb.yml

NFS-Backed Deployment (Production)

Deploy with NFS storage backend:

---
- name: Deploy InfluxDB with NFS storage
  hosts: monitoring_servers
  become: true

  tasks:
    # Mount NFS first
    - name: Mount NFS volume
      ansible.posix.mount:
        src: "nas.example.com:/export/influxdb"
        path: "/mnt/nfs/influxdb"
        fstype: nfs
        opts: "rw,sync"
        state: mounted

  roles:
    - role: jackaltx.solti_containers.podman
    - role: jackaltx.solti_monitoring.influxdb
      vars:
        influxdb_admin_token: "{{ vault_influxdb_token }}"
        influxdb_org: "example-org"
        influxdb_bucket: "telegraf"
        influxdb_data_path: "/mnt/nfs/influxdb"
        influxdb_retention: "30d"

Production Example (monitor11):

influxdb_data_path: "/mnt/nfs-s3-server/influxdb"
influxdb_retention: "30d"
influxdb_org: "example-org"
influxdb_bucket: "telegraf"

Service Management

Systemd Quadlet

InfluxDB runs as a Podman container managed by systemd:

# Check status
systemctl status influxdb

# Start/stop/restart
systemctl start influxdb
systemctl stop influxdb
systemctl restart influxdb

# View logs
journalctl -u influxdb -f

# Check container
podman ps | grep influxdb

Health Check

Verify InfluxDB is running:

curl -I http://localhost:8086/health

Expected response:

HTTP/1.1 200 OK

API Access

Authentication

All API requests require token authentication:

# Using token authentication
curl -H "Authorization: Token YOUR_TOKEN" \
  http://localhost:8086/api/v2/buckets

Creating Tokens

Create additional tokens for Telegraf clients:

# Via influx CLI (inside container)
podman exec influxdb influx auth create \
  --org example-org \
  --description "Telegraf write token for ispconfig3" \
  --write-bucket telegraf

Or via API:

curl -X POST http://localhost:8086/api/v2/authorizations \
  -H "Authorization: Token ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "orgID": "ORG_ID",
    "description": "Telegraf write token",
    "permissions": [
      {"action": "write", "resource": {"type": "buckets", "name": "telegraf"}}
    ]
  }'

Query API

Query metrics using Flux:

curl -X POST http://localhost:8086/api/v2/query \
  -H "Authorization: Token YOUR_TOKEN" \
  -H "Content-Type: application/vnd.flux" \
  -d 'from(bucket: "telegraf")
    |> range(start: -1h)
    |> filter(fn: (r) => r["_measurement"] == "cpu")
    |> filter(fn: (r) => r["host"] == "ispconfig3-server.example.com")'

Python Example (used by Claude for dashboard development):

from influxdb_client import InfluxDBClient

client = InfluxDBClient(
    url="http://monitor11.example.com:8086",
    token="YOUR_TOKEN",
    org="example-org"
)

query_api = client.query_api()
query = '''
from(bucket: "telegraf")
  |> range(start: -1h)
  |> filter(fn: (r) => r["service"] == "alloy")
  |> filter(fn: (r) => r["_field"] == "alloy_build_info")
'''
result = query_api.query(query=query)

Retention Policies

Bucket Retention

Set retention when creating buckets:

podman exec influxdb influx bucket create \
  --name telegraf \
  --org example-org \
  --retention 30d

Update Retention

Modify existing bucket retention:

podman exec influxdb influx bucket update \
  --name telegraf \
  --retention 90d

List Buckets

podman exec influxdb influx bucket list --org example-org

Resource Requirements

Minimum Requirements

  • CPU: 2 cores
  • Memory: 2GB RAM
  • Disk: 50GB+ (local) or NFS mount
  • Network: 100 Mbps

Sizing Guidance

Small deployment (1-10 hosts, current production):

  • 2 CPU cores
  • 4GB RAM
  • 100GB NFS mount
  • 30-day retention

Medium deployment (10-50 hosts):

  • 4 CPU cores
  • 8GB RAM
  • 250GB NFS mount
  • 60-day retention

Large deployment (50+ hosts):

  • Consider deploying multiple regional hubs (see Deployment Patterns)
  • Each hub: 4 CPU, 8GB RAM, 250GB NFS
  • Distributed deployment preferred over single large instance

Backup and Recovery

NFS Snapshot Backup

If using NFS storage, leverage NFS server snapshots:

# On NFS server (e.g., FreeNAS/TrueNAS)
# Create snapshot
zfs snapshot tank/influxdb@backup-2026-01-11

# List snapshots
zfs list -t snapshot tank/influxdb

Manual Export/Import

# Export specific bucket
podman exec influxdb influx backup /tmp/backup-telegraf -t YOUR_TOKEN

# Copy out of container
podman cp influxdb:/tmp/backup-telegraf /backup/influxdb/

# Restore to new instance
podman cp /backup/influxdb/backup-telegraf influxdb:/tmp/
podman exec influxdb influx restore /tmp/backup-telegraf

Disaster Recovery with NFS

  1. Deploy new InfluxDB instance
  2. Mount same NFS volume to /mnt/nfs/influxdb
  3. Start InfluxDB container - data automatically available

Performance Tuning

Cache Settings

Adjust cache sizes based on available memory:

influxdb_cache_max_memory: "1g"
influxdb_cache_snapshot_memory: "25m"

Compaction

Configure compaction for I/O optimization:

influxdb_compaction_throughput: "48m"
influxdb_max_concurrent_compactions: 4

Note: When using NFS storage, monitor NFS server I/O during compaction.

Monitoring InfluxDB

Internal Metrics

InfluxDB exposes Prometheus-format metrics:

curl http://localhost:8086/metrics

Monitor with Telegraf

InfluxDB can monitor itself:

# In telegraf config on monitoring server
[[inputs.http]]
  urls = ["http://localhost:8086/metrics"]
  data_format = "prometheus"
  tagexclude = ["url"]

Key Metrics to Monitor

  • http_api_requests_total - API request count
  • storage_compactions_active - Active compactions
  • storage_cache_disk_bytes - Cache disk usage
  • http_query_request_duration_seconds - Query latency
  • storage_wal_size_current_bytes - WAL size (write-ahead log)

Troubleshooting

Check Container Status

podman ps -a | grep influxdb
podman logs influxdb --tail 100

Check Service Status

systemctl status influxdb
journalctl -u influxdb -n 100 --no-pager

Verify API Access

# Health check
curl -I http://localhost:8086/health

# Ping endpoint
curl http://localhost:8086/api/v2/ping

# Verify authentication
curl -H "Authorization: Token YOUR_TOKEN" \
  http://localhost:8086/api/v2/buckets

Common Issues

  1. Container won't start
  2. Check logs: podman logs influxdb
  3. Verify data path exists and is writable
  4. Check NFS mount if using NFS

  5. API not accessible

  6. Verify port 8086 is open: ss -tulpn | grep 8086
  7. Check firewall rules
  8. Verify container is running: podman ps

  9. Authentication errors

  10. Verify token is correct
  11. Check token permissions: podman exec influxdb influx auth list

  12. Out of disk space

  13. Check NFS mount usage: df -h /mnt/nfs/influxdb
  14. Reduce retention period
  15. Delete old buckets

  16. High memory usage

  17. Reduce cache sizes
  18. Add more RAM
  19. Check for expensive queries

  20. NFS mount issues

  21. Verify NFS server is reachable: ping nas.example.com
  22. Check mount status: mount | grep influxdb
  23. Test NFS write: touch /mnt/nfs/influxdb/test

Security Considerations

  1. Token Security
  2. Store admin tokens in Ansible Vault
  3. Use minimal permissions for client tokens (write-only for Telegraf)
  4. Rotate tokens regularly

  5. Network Access

  6. Restrict port 8086 to monitoring network
  7. Use WireGuard VPN for remote collectors
  8. Consider firewall rules limiting source IPs

  9. TLS/SSL (Optional)

  10. InfluxDB supports TLS natively (not currently configured)
  11. Alternative: Use reverse proxy (Traefik from solti-ensemble)

  12. Container Security

  13. Run as rootless Podman container
  14. Limit container capabilities
  15. Keep InfluxDB image updated

  16. Backup Encryption

  17. Encrypt NFS volume at rest (FreeNAS/TrueNAS encryption)
  18. Encrypt exported backups before off-site storage

Container Integration

This role CONSUMES containers from solti-containers:

  • Podman runtime: Provided by jackaltx.solti_containers.podman role
  • Systemd quadlets: This role generates quadlet files, Podman manages lifecycle
  • Testing: Uses solti-containers test containers in molecule scenarios

This role CONFIGURES InfluxDB container:

  • Generates InfluxDB config files
  • Sets up data directories and mounts
  • Creates systemd quadlet definitions
  • Manages tokens and buckets

Example integration:

- hosts: monitoring_servers
  roles:
    # 1. Provide container runtime
    - role: jackaltx.solti_containers.podman

    # 2. Configure InfluxDB within container
    - role: jackaltx.solti_monitoring.influxdb
      vars:
        influxdb_data_path: "/mnt/nfs/influxdb"

Reference Deployment

Production (monitor11.example.com):

influxdb_org: "example-org"
influxdb_bucket: "telegraf"
influxdb_retention: "30d"
influxdb_data_path: "/mnt/nfs-s3-server/influxdb"
influxdb_port: 8086

Characteristics: - Proxmox VM (4 CPU, 8GB RAM) - NFS mount to FreeNAS (s3-server.example.com) - Serves 10 monitored hosts (ispconfig3 + local services) - WireGuard VPN for remote collection

See also: - Chapter 2: Deployment Patterns - Hub-and-spoke architecture - Chapter 6: Reference Deployments - monitor11 detailed configuration