Telegraf Role¶
Overview¶
Telegraf is a plugin-driven server agent for collecting and sending metrics. It supports a wide variety of input plugins for system metrics, application metrics, and custom data sources.
Purpose¶
- Collect system metrics (CPU, memory, disk, network)
- Collect application metrics (databases, web servers, containers)
- Send metrics to InfluxDB for storage and analysis
- Lightweight agent with minimal resource overhead
Installation¶
The telegraf role installs and configures Telegraf on target hosts:
- role: jackaltx.solti_monitoring.telegraf
vars:
telegraf_output_influxdb: true
telegraf_output_url: "http://monitor.example.com:8086"
telegraf_output_token: "{{ vault_telegraf_token }}"
telegraf_output_org: "myorg"
telegraf_output_bucket: "telegraf"
Key Configuration Options¶
Output Configuration¶
InfluxDB output (primary):
telegraf_output_influxdb: true
telegraf_output_url: "http://10.10.0.11:8086" # InfluxDB API endpoint
telegraf_output_token: "{{ vault_influxdb_token }}"
telegraf_output_org: "myorg"
telegraf_output_bucket: "telegraf"
Input Plugins¶
System metrics (enabled by default):
- cpu - CPU usage statistics
- disk - Disk usage and I/O
- diskio - Disk I/O statistics
- mem - Memory usage
- net - Network interface statistics
- system - System load and uptime
- processes - Process count
Additional plugins (opt-in):
telegraf_enable_docker: true # Docker container metrics
telegraf_enable_nginx: true # Nginx web server metrics
telegraf_enable_postgresql: true # PostgreSQL database metrics
telegraf_enable_redis: true # Redis metrics
Global Tags¶
Add custom labels to all metrics:
Service Management¶
Systemd Service¶
Telegraf runs as a systemd service:
# Check status
systemctl status telegraf
# Start/stop/restart
systemctl start telegraf
systemctl stop telegraf
systemctl restart telegraf
# Enable at boot
systemctl enable telegraf
Configuration File¶
Primary configuration: /etc/telegraf/telegraf.conf
Additional configs: /etc/telegraf/telegraf.d/*.conf
Testing Configuration¶
Validate configuration without starting service:
Resource Requirements¶
Minimal footprint: - CPU: < 1% average - Memory: 128MB - Disk: 50MB for binary and configs - Network: Depends on metric collection frequency
Example Configurations¶
Basic System Monitoring¶
Monitor local system only:
telegraf_output_influxdb: true
telegraf_output_url: "http://localhost:8086"
telegraf_output_token: "{{ vault_token }}"
telegraf_output_org: "myorg"
telegraf_output_bucket: "telegraf"
# Only system metrics
telegraf_enable_docker: false
telegraf_enable_nginx: false
Remote Collection via WireGuard¶
Ship metrics to central server over WireGuard:
telegraf_output_influxdb: true
telegraf_output_url: "http://10.10.0.11:8086" # WireGuard IP
telegraf_output_token: "{{ vault_token }}"
telegraf_output_org: "myorg"
telegraf_output_bucket: "telegraf"
telegraf_global_tags:
hostname: "{{ ansible_hostname }}"
site: "remote"
Multi-Output Configuration¶
Send metrics to multiple destinations:
telegraf_outputs:
- type: "influxdb_v2"
url: "http://primary.example.com:8086"
token: "{{ vault_primary_token }}"
- type: "influxdb_v2"
url: "http://backup.example.com:8086"
token: "{{ vault_backup_token }}"
Troubleshooting¶
Check Logs¶
Test Connection to InfluxDB¶
Verify Metrics Collection¶
Check that Telegraf is collecting metrics:
Common Issues¶
- Connection refused: InfluxDB not reachable
- Check network connectivity
- Verify InfluxDB is running
-
Check firewall rules
-
Authentication failed: Invalid token
- Verify token has write permissions
-
Check token in InfluxDB UI
-
No data in InfluxDB: Metrics not being sent
- Check telegraf logs for errors
- Verify output configuration
- Test with
--testflag
Performance Tuning¶
Collection Interval¶
Adjust collection frequency:
telegraf_interval: "60s" # Collect every 60 seconds
telegraf_flush_interval: "60s" # Flush to output every 60 seconds
Metric Filtering¶
Reduce metric volume:
telegraf_metric_filters:
- measurement: "cpu"
tags:
cpu: ["cpu-total"] # Only collect aggregate CPU stats
Security Considerations¶
- Token Security: Store tokens in Ansible Vault
- TLS/SSL: Use HTTPS endpoints when possible
- Network Security: Use WireGuard for remote collectors
- Least Privilege: Grant minimum required permissions to tokens
Reference Deployment¶
See Reference Deployments chapter for real-world examples: - monitor11.example.com - Server with local Telegraf - ispconfig3.example.com - Client shipping metrics via WireGuard