Skip to content

Multi-Destination Setup

Overview

Multi-destination setup allows collectors to send data to multiple endpoints for redundancy, load distribution, or different retention policies.

Use Cases

  1. High Availability: Send to primary and backup servers
  2. Regional Distribution: Send to local and central servers
  3. Tiered Retention: Send to short-term and long-term storage
  4. Development/Production: Send to both test and production servers

Telegraf Multi-Output

Multiple InfluxDB Outputs

Configure Telegraf to send metrics to multiple InfluxDB instances:

telegraf_outputs:
  - name: "primary"
    type: "influxdb_v2"
    url: "http://primary.example.com:8086"
    token: "{{ vault_primary_token }}"
    org: "myorg"
    bucket: "telegraf"

  - name: "backup"
    type: "influxdb_v2"
    url: "http://backup.example.com:8086"
    token: "{{ vault_backup_token }}"
    org: "myorg"
    bucket: "telegraf"

Regional Setup

Send to local server and central aggregation:

telegraf_outputs:
  - name: "local"
    type: "influxdb_v2"
    url: "http://local.example.com:8086"
    token: "{{ vault_local_token }}"
    org: "myorg"
    bucket: "telegraf"

  - name: "central"
    type: "influxdb_v2"
    url: "http://central.example.com:8086"
    token: "{{ vault_central_token }}"
    org: "global"
    bucket: "all_metrics"

Selective Routing

Send different metrics to different destinations:

# High-frequency metrics to local
[[outputs.influxdb_v2]]
  urls = ["http://local.example.com:8086"]
  namepass = ["cpu", "mem", "disk"]

# Application metrics to central
[[outputs.influxdb_v2]]
  urls = ["http://central.example.com:8086"]
  namepass = ["nginx", "postgresql"]

Alloy Multi-Destination

Multiple Loki Endpoints

Configure Alloy to forward logs to multiple Loki instances:

// Primary Loki
loki.write "primary" {
  endpoint {
    url = "http://primary.example.com:3100/loki/api/v1/push"
  }
}

// Backup Loki
loki.write "backup" {
  endpoint {
    url = "http://backup.example.com:3100/loki/api/v1/push"
  }
}

// Send to both
loki.source.journal "system_logs" {
  matches {
    _SYSTEMD_UNIT = "sshd.service"
  }

  labels = {
    service_type = "system",
  }

  forward_to = [
    loki.write.primary.receiver,
    loki.write.backup.receiver,
  ]
}

Selective Log Routing

Route different logs to different endpoints:

// Security logs to secure Loki
loki.write "security" {
  endpoint {
    url = "http://secure.example.com:3100/loki/api/v1/push"
  }
}

// Application logs to general Loki
loki.write "general" {
  endpoint {
    url = "http://general.example.com:3100/loki/api/v1/push"
  }
}

// Fail2ban logs to security
loki.source.journal "fail2ban" {
  matches {
    _SYSTEMD_UNIT = "fail2ban.service"
  }
  forward_to = [loki.write.security.receiver]
}

// Apache logs to general
loki.source.file "apache" {
  targets = [{
    __path__ = "/var/log/apache2/access.log",
  }]
  forward_to = [loki.write.general.receiver]
}

Regional + Central Architecture

Send to local and central servers:

loki.write "local" {
  endpoint {
    url = "http://10.10.0.11:3100/loki/api/v1/push"  // Local WireGuard
  }
}

loki.write "central" {
  endpoint {
    url = "https://central.example.com/loki/api/v1/push"  // Central HTTPS
  }
}

// All logs to both destinations
loki.source.journal "all_logs" {
  forward_to = [
    loki.write.local.receiver,
    loki.write.central.receiver,
  ]
}

Failure Handling

Retry Configuration

Configure retries for failed writes:

Telegraf:

[[outputs.influxdb_v2]]
  urls = ["http://primary.example.com:8086"]
  retry_attempts = 3
  retry_max_time = "30s"

Alloy:

loki.write "primary" {
  endpoint {
    url = "http://primary.example.com:3100/loki/api/v1/push"

    retry {
      max_retries = 3
      min_backoff = "1s"
      max_backoff = "30s"
    }
  }
}

Buffering

Buffer data locally when remote endpoints are unavailable:

Telegraf:

[[outputs.influxdb_v2]]
  buffer_size = 10000
  buffer_limit = 100000
  flush_interval = "10s"

Alloy:

loki.write "primary" {
  endpoint {
    url = "http://primary.example.com:3100/loki/api/v1/push"

    queue {
      capacity = 10000
      max_backoff = "1m"
    }
  }
}

Load Balancing

Distribute load across multiple servers:

// Round-robin across multiple Loki servers
loki.write "loki_cluster" {
  endpoint {
    url = "http://loki1.example.com:3100/loki/api/v1/push"
  }
  endpoint {
    url = "http://loki2.example.com:3100/loki/api/v1/push"
  }
  endpoint {
    url = "http://loki3.example.com:3100/loki/api/v1/push"
  }
}

Cost Optimization

Tiered Storage

Send full data to short-term storage, sampled data to long-term:

# High-resolution to local (7 days)
telegraf_outputs:
  - name: "local_shortterm"
    url: "http://local.example.com:8086"
    bucket: "telegraf_7d"

# Sampled data to S3-backed (365 days)
  - name: "s3_longterm"
    url: "http://archive.example.com:8086"
    bucket: "telegraf_365d"
    aggregation_interval: "5m"  # Sample every 5 minutes

Monitoring Multi-Destination Health

Telegraf Metrics

Monitor output health via Telegraf's internal metrics:

from(bucket: "telegraf")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "internal_write")
  |> filter(fn: (r) => r["_field"] == "errors")

Alloy Metrics

Check Alloy's write metrics:

curl http://127.0.0.1:12345/metrics | grep loki_write

Best Practices

  1. Test individually first: Verify each destination works before combining
  2. Monitor write failures: Alert on output errors
  3. Set appropriate timeouts: Don't block on slow destinations
  4. Use buffering: Prevent data loss during outages
  5. Document routing rules: Keep clear notes on what goes where
  6. Consider costs: Evaluate bandwidth and storage costs
  7. Security per destination: Different credentials for each endpoint
  8. Avoid circular routing: Don't create feedback loops

Troubleshooting

One Destination Failing

  1. Check network connectivity to failing endpoint
  2. Verify authentication credentials
  3. Review endpoint logs for errors
  4. Check if destination has capacity

All Destinations Failing

  1. Check client network connectivity
  2. Verify client services are running
  3. Review client logs for errors
  4. Check client resource usage (CPU, memory, disk)

Data Inconsistency

  1. Verify all destinations receive same data
  2. Check for selective routing rules
  3. Review buffer and retry settings
  4. Ensure time synchronization across systems