Multi-Destination Setup¶

Overview¶

Multi-destination setup allows collectors to send data to multiple endpoints for redundancy, load distribution, or different retention policies.

Use Cases¶

High Availability: Send to primary and backup servers
Regional Distribution: Send to local and central servers
Tiered Retention: Send to short-term and long-term storage
Development/Production: Send to both test and production servers

Telegraf Multi-Output¶

Multiple InfluxDB Outputs¶

Configure Telegraf to send metrics to multiple InfluxDB instances:

telegraf_outputs:
  - name: "primary"
    type: "influxdb_v2"
    url: "http://primary.example.com:8086"
    token: "{{ vault_primary_token }}"
    org: "myorg"
    bucket: "telegraf"

  - name: "backup"
    type: "influxdb_v2"
    url: "http://backup.example.com:8086"
    token: "{{ vault_backup_token }}"
    org: "myorg"
    bucket: "telegraf"

Regional Setup¶

Send to local server and central aggregation:

telegraf_outputs:
  - name: "local"
    type: "influxdb_v2"
    url: "http://local.example.com:8086"
    token: "{{ vault_local_token }}"
    org: "myorg"
    bucket: "telegraf"

  - name: "central"
    type: "influxdb_v2"
    url: "http://central.example.com:8086"
    token: "{{ vault_central_token }}"
    org: "global"
    bucket: "all_metrics"

Selective Routing¶

Send different metrics to different destinations:

# High-frequency metrics to local
[[outputs.influxdb_v2]]
  urls = ["http://local.example.com:8086"]
  namepass = ["cpu", "mem", "disk"]

# Application metrics to central
[[outputs.influxdb_v2]]
  urls = ["http://central.example.com:8086"]
  namepass = ["nginx", "postgresql"]

Alloy Multi-Destination¶

Multiple Loki Endpoints¶

Configure Alloy to forward logs to multiple Loki instances:

// Primary Loki
loki.write "primary" {
  endpoint {
    url = "http://primary.example.com:3100/loki/api/v1/push"
  }
}

// Backup Loki
loki.write "backup" {
  endpoint {
    url = "http://backup.example.com:3100/loki/api/v1/push"
  }
}

// Send to both
loki.source.journal "system_logs" {
  matches {
    _SYSTEMD_UNIT = "sshd.service"
  }

  labels = {
    service_type = "system",
  }

  forward_to = [
    loki.write.primary.receiver,
    loki.write.backup.receiver,
  ]
}

Selective Log Routing¶

Route different logs to different endpoints:

// Security logs to secure Loki
loki.write "security" {
  endpoint {
    url = "http://secure.example.com:3100/loki/api/v1/push"
  }
}

// Application logs to general Loki
loki.write "general" {
  endpoint {
    url = "http://general.example.com:3100/loki/api/v1/push"
  }
}

// Fail2ban logs to security
loki.source.journal "fail2ban" {
  matches {
    _SYSTEMD_UNIT = "fail2ban.service"
  }
  forward_to = [loki.write.security.receiver]
}

// Apache logs to general
loki.source.file "apache" {
  targets = [{
    __path__ = "/var/log/apache2/access.log",
  }]
  forward_to = [loki.write.general.receiver]
}

Regional + Central Architecture¶

Send to local and central servers:

loki.write "local" {
  endpoint {
    url = "http://10.10.0.11:3100/loki/api/v1/push"  // Local WireGuard
  }
}

loki.write "central" {
  endpoint {
    url = "https://central.example.com/loki/api/v1/push"  // Central HTTPS
  }
}

// All logs to both destinations
loki.source.journal "all_logs" {
  forward_to = [
    loki.write.local.receiver,
    loki.write.central.receiver,
  ]
}

Failure Handling¶

Retry Configuration¶

Configure retries for failed writes:

Telegraf:

[[outputs.influxdb_v2]]
  urls = ["http://primary.example.com:8086"]
  retry_attempts = 3
  retry_max_time = "30s"

Alloy:

loki.write "primary" {
  endpoint {
    url = "http://primary.example.com:3100/loki/api/v1/push"

    retry {
      max_retries = 3
      min_backoff = "1s"
      max_backoff = "30s"
    }
  }
}

Buffering¶

Buffer data locally when remote endpoints are unavailable:

Telegraf:

[[outputs.influxdb_v2]]
  buffer_size = 10000
  buffer_limit = 100000
  flush_interval = "10s"

Alloy:

loki.write "primary" {
  endpoint {
    url = "http://primary.example.com:3100/loki/api/v1/push"

    queue {
      capacity = 10000
      max_backoff = "1m"
    }
  }
}

Load Balancing¶

Distribute load across multiple servers:

// Round-robin across multiple Loki servers
loki.write "loki_cluster" {
  endpoint {
    url = "http://loki1.example.com:3100/loki/api/v1/push"
  }
  endpoint {
    url = "http://loki2.example.com:3100/loki/api/v1/push"
  }
  endpoint {
    url = "http://loki3.example.com:3100/loki/api/v1/push"
  }
}

Cost Optimization¶

Tiered Storage¶

Send full data to short-term storage, sampled data to long-term:

# High-resolution to local (7 days)
telegraf_outputs:
  - name: "local_shortterm"
    url: "http://local.example.com:8086"
    bucket: "telegraf_7d"

# Sampled data to S3-backed (365 days)
  - name: "s3_longterm"
    url: "http://archive.example.com:8086"
    bucket: "telegraf_365d"
    aggregation_interval: "5m"  # Sample every 5 minutes

Monitoring Multi-Destination Health¶

Telegraf Metrics¶

Monitor output health via Telegraf's internal metrics:

from(bucket: "telegraf")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "internal_write")
  |> filter(fn: (r) => r["_field"] == "errors")

Alloy Metrics¶

Check Alloy's write metrics:

curl http://127.0.0.1:12345/metrics | grep loki_write

Best Practices¶

Test individually first: Verify each destination works before combining
Monitor write failures: Alert on output errors
Set appropriate timeouts: Don't block on slow destinations
Use buffering: Prevent data loss during outages
Document routing rules: Keep clear notes on what goes where
Consider costs: Evaluate bandwidth and storage costs
Security per destination: Different credentials for each endpoint
Avoid circular routing: Don't create feedback loops

Troubleshooting¶

One Destination Failing¶

Check network connectivity to failing endpoint
Verify authentication credentials
Review endpoint logs for errors
Check if destination has capacity

All Destinations Failing¶

Check client network connectivity
Verify client services are running
Review client logs for errors
Check client resource usage (CPU, memory, disk)

Data Inconsistency¶

Verify all destinations receive same data
Check for selective routing rules
Review buffer and retry settings
Ensure time synchronization across systems