Multi-Destination Setup¶
Overview¶
Multi-destination setup allows collectors to send data to multiple endpoints for redundancy, load distribution, or different retention policies.
Use Cases¶
- High Availability: Send to primary and backup servers
- Regional Distribution: Send to local and central servers
- Tiered Retention: Send to short-term and long-term storage
- Development/Production: Send to both test and production servers
Telegraf Multi-Output¶
Multiple InfluxDB Outputs¶
Configure Telegraf to send metrics to multiple InfluxDB instances:
telegraf_outputs:
- name: "primary"
type: "influxdb_v2"
url: "http://primary.example.com:8086"
token: "{{ vault_primary_token }}"
org: "myorg"
bucket: "telegraf"
- name: "backup"
type: "influxdb_v2"
url: "http://backup.example.com:8086"
token: "{{ vault_backup_token }}"
org: "myorg"
bucket: "telegraf"
Regional Setup¶
Send to local server and central aggregation:
telegraf_outputs:
- name: "local"
type: "influxdb_v2"
url: "http://local.example.com:8086"
token: "{{ vault_local_token }}"
org: "myorg"
bucket: "telegraf"
- name: "central"
type: "influxdb_v2"
url: "http://central.example.com:8086"
token: "{{ vault_central_token }}"
org: "global"
bucket: "all_metrics"
Selective Routing¶
Send different metrics to different destinations:
# High-frequency metrics to local
[[outputs.influxdb_v2]]
urls = ["http://local.example.com:8086"]
namepass = ["cpu", "mem", "disk"]
# Application metrics to central
[[outputs.influxdb_v2]]
urls = ["http://central.example.com:8086"]
namepass = ["nginx", "postgresql"]
Alloy Multi-Destination¶
Multiple Loki Endpoints¶
Configure Alloy to forward logs to multiple Loki instances:
// Primary Loki
loki.write "primary" {
endpoint {
url = "http://primary.example.com:3100/loki/api/v1/push"
}
}
// Backup Loki
loki.write "backup" {
endpoint {
url = "http://backup.example.com:3100/loki/api/v1/push"
}
}
// Send to both
loki.source.journal "system_logs" {
matches {
_SYSTEMD_UNIT = "sshd.service"
}
labels = {
service_type = "system",
}
forward_to = [
loki.write.primary.receiver,
loki.write.backup.receiver,
]
}
Selective Log Routing¶
Route different logs to different endpoints:
// Security logs to secure Loki
loki.write "security" {
endpoint {
url = "http://secure.example.com:3100/loki/api/v1/push"
}
}
// Application logs to general Loki
loki.write "general" {
endpoint {
url = "http://general.example.com:3100/loki/api/v1/push"
}
}
// Fail2ban logs to security
loki.source.journal "fail2ban" {
matches {
_SYSTEMD_UNIT = "fail2ban.service"
}
forward_to = [loki.write.security.receiver]
}
// Apache logs to general
loki.source.file "apache" {
targets = [{
__path__ = "/var/log/apache2/access.log",
}]
forward_to = [loki.write.general.receiver]
}
Regional + Central Architecture¶
Send to local and central servers:
loki.write "local" {
endpoint {
url = "http://10.10.0.11:3100/loki/api/v1/push" // Local WireGuard
}
}
loki.write "central" {
endpoint {
url = "https://central.example.com/loki/api/v1/push" // Central HTTPS
}
}
// All logs to both destinations
loki.source.journal "all_logs" {
forward_to = [
loki.write.local.receiver,
loki.write.central.receiver,
]
}
Failure Handling¶
Retry Configuration¶
Configure retries for failed writes:
Telegraf:
[[outputs.influxdb_v2]]
urls = ["http://primary.example.com:8086"]
retry_attempts = 3
retry_max_time = "30s"
Alloy:
loki.write "primary" {
endpoint {
url = "http://primary.example.com:3100/loki/api/v1/push"
retry {
max_retries = 3
min_backoff = "1s"
max_backoff = "30s"
}
}
}
Buffering¶
Buffer data locally when remote endpoints are unavailable:
Telegraf:
Alloy:
loki.write "primary" {
endpoint {
url = "http://primary.example.com:3100/loki/api/v1/push"
queue {
capacity = 10000
max_backoff = "1m"
}
}
}
Load Balancing¶
Distribute load across multiple servers:
// Round-robin across multiple Loki servers
loki.write "loki_cluster" {
endpoint {
url = "http://loki1.example.com:3100/loki/api/v1/push"
}
endpoint {
url = "http://loki2.example.com:3100/loki/api/v1/push"
}
endpoint {
url = "http://loki3.example.com:3100/loki/api/v1/push"
}
}
Cost Optimization¶
Tiered Storage¶
Send full data to short-term storage, sampled data to long-term:
# High-resolution to local (7 days)
telegraf_outputs:
- name: "local_shortterm"
url: "http://local.example.com:8086"
bucket: "telegraf_7d"
# Sampled data to S3-backed (365 days)
- name: "s3_longterm"
url: "http://archive.example.com:8086"
bucket: "telegraf_365d"
aggregation_interval: "5m" # Sample every 5 minutes
Monitoring Multi-Destination Health¶
Telegraf Metrics¶
Monitor output health via Telegraf's internal metrics:
from(bucket: "telegraf")
|> range(start: -1h)
|> filter(fn: (r) => r["_measurement"] == "internal_write")
|> filter(fn: (r) => r["_field"] == "errors")
Alloy Metrics¶
Check Alloy's write metrics:
Best Practices¶
- Test individually first: Verify each destination works before combining
- Monitor write failures: Alert on output errors
- Set appropriate timeouts: Don't block on slow destinations
- Use buffering: Prevent data loss during outages
- Document routing rules: Keep clear notes on what goes where
- Consider costs: Evaluate bandwidth and storage costs
- Security per destination: Different credentials for each endpoint
- Avoid circular routing: Don't create feedback loops
Troubleshooting¶
One Destination Failing¶
- Check network connectivity to failing endpoint
- Verify authentication credentials
- Review endpoint logs for errors
- Check if destination has capacity
All Destinations Failing¶
- Check client network connectivity
- Verify client services are running
- Review client logs for errors
- Check client resource usage (CPU, memory, disk)
Data Inconsistency¶
- Verify all destinations receive same data
- Check for selective routing rules
- Review buffer and retry settings
- Ensure time synchronization across systems