Skip to content

Label Strategies

Overview

Labels (tags in InfluxDB, labels in Loki) are key-value pairs that provide metadata about metrics and logs. Effective labeling is critical for filtering, aggregating, and organizing observability data.

Why Labels Matter

  1. Filtering: Query specific hosts, services, or environments
  2. Aggregation: Group data for analysis
  3. Organization: Logical separation of data streams
  4. Cardinality: Too many unique label combinations = performance issues

Label Design Principles

1. Low Cardinality

Problem: Each unique label combination creates a new series/stream

Bad (high cardinality):

labels:
  user_id: "12345"        # Thousands of unique users
  request_id: "abc-123"   # Every request is unique
  timestamp: "2024-01-01" # Every second is unique

Good (low cardinality):

labels:
  environment: "production"  # Only 3-4 values
  service: "api"             # Limited number of services
  region: "us-east"          # Fixed set of regions

2. Consistent Naming

Use consistent label names across all data sources:

Good:

# Telegraf
telegraf_global_tags:
  hostname: "server1"
  environment: "production"

# Alloy
alloy_global_labels:
  hostname: "server1"
  environment: "production"

Bad (inconsistent):

# Telegraf uses "host"
telegraf_global_tags:
  host: "server1"

# Alloy uses "hostname"
alloy_global_labels:
  hostname: "server1"

3. Descriptive Values

Use clear, descriptive values:

Good: environment="production" Bad: env="prod"

Good: service_type="fail2ban" Bad: type="f2b"

4. Hierarchical Structure

Organize labels hierarchically:

labels:
  datacenter: "us-east"
  cluster: "prod-cluster-1"
  host: "server01"
  service: "api"
  component: "auth"

Enables queries like: - All of datacenter="us-east" - All of cluster="prod-cluster-1" - Specific host="server01"

Standard Label Sets

Telegraf (Metrics)

Global tags (applied to all metrics):

telegraf_global_tags:
  hostname: "{{ ansible_hostname }}"
  environment: "production"
  datacenter: "dc1"
  role: "webserver"

Automatic tags (added by plugins): - cpu: CPU identifier (cpu0, cpu1, cpu-total) - device: Disk device name - interface: Network interface name - path: Filesystem mount point

Alloy (Logs)

Global labels (applied to all logs):

alloy_global_labels:
  hostname: "{{ ansible_hostname }}"
  environment: "production"

Source-specific labels:

loki.source.journal "fail2ban" {
  labels = {
    service_type = "fail2ban",
    hostname = env("HOSTNAME"),
  }
}

loki.source.file "apache" {
  targets = [{
    __path__ = "/var/log/apache2/access.log",
    service_type = "web",
    log_type = "access",
  }]
}

Infrastructure Labels

Applied to all metrics/logs from a host:

hostname: "server01"           # Unique host identifier
environment: "production"      # prod, staging, dev
datacenter: "us-east-1"       # Physical/cloud location
cluster: "prod-api"           # Logical grouping

Service Labels

Applied to application-specific data:

service_type: "web"           # web, db, cache, queue
service_name: "apache"        # Specific service
application: "myapp"          # Application name
component: "api"              # Application component

Operational Labels

Context about data collection:

collector: "telegraf"         # Data collection agent
version: "1.28"              # Agent/app version

Label Examples by Use Case

Web Server Monitoring

Metrics (Telegraf):

telegraf_global_tags:
  hostname: "web01"
  environment: "production"
  role: "webserver"
  application: "frontend"

Logs (Alloy):

loki.source.file "apache_access" {
  targets = [{
    __path__ = "/var/log/apache2/access.log",
    service_type = "web",
    log_type = "access",
    hostname = env("HOSTNAME"),
  }]
}

Database Server Monitoring

telegraf_global_tags:
  hostname: "db01"
  environment: "production"
  role: "database"
  db_type: "postgresql"
  db_cluster: "main"

Multi-Tenant Application

# Labels
hostname: "app01"
environment: "production"
service: "api"

# Use fields (not labels) for high-cardinality data
# Store tenant_id, user_id as fields in the data, not labels

Querying by Labels

InfluxDB (Flux)

from(bucket: "telegraf")
  |> range(start: -1h)
  |> filter(fn: (r) => r["environment"] == "production")
  |> filter(fn: (r) => r["role"] == "webserver")
  |> filter(fn: (r) => r["_measurement"] == "cpu")

Loki (LogQL)

{environment="production", service_type="web"}

AND operator (implicit):

{hostname="server01", service_type="fail2ban"}

OR operator:

{service_type="web"} or {service_type="api"}

Regex matching:

{hostname=~"web.*"}  // Matches web01, web02, etc.

Cardinality Management

Checking Cardinality

InfluxDB:

// Count unique series
from(bucket: "telegraf")
  |> range(start: -7d)
  |> group()
  |> count()

Loki:

# Check label cardinality
curl http://localhost:3100/loki/api/v1/labels

# Check values for a label
curl http://localhost:3100/loki/api/v1/label/hostname/values

Reducing Cardinality

Problem: Too many unique label combinations

Solutions: 1. Use fields instead of labels for high-cardinality data 2. Aggregate before labeling: Group similar values 3. Drop unnecessary labels: Remove labels you don't query 4. Use relabeling: Transform labels to reduce uniqueness

Example - Bad (high cardinality):

labels:
  client_ip: "192.168.1.100"  # Every IP is unique

Example - Good (low cardinality):

labels:
  client_network: "internal"  # Only internal/external
# Store actual IP in log message, not label

Label Migration

If you need to change labels:

  1. Add new labels alongside old ones (transition period)
  2. Update queries to use new labels
  3. Remove old labels after transition

Example:

# Phase 1: Add new label
labels:
  host: "server01"        # Old
  hostname: "server01"    # New

# Phase 2: Update all queries to use "hostname"

# Phase 3: Remove "host" label
labels:
  hostname: "server01"    # Only new label

Best Practices Summary

  1. Keep cardinality low: < 10 values per label is ideal
  2. Use consistent names: Same label names across tools
  3. Be descriptive: Clear, meaningful values
  4. Plan hierarchy: Organize labels logically
  5. Avoid timestamps: Never use time as a label
  6. Avoid UUIDs: Use fields for unique identifiers
  7. Document labels: Maintain label documentation
  8. Review regularly: Audit labels for unused/redundant ones

Reference Deployment

See Reference Deployments for labeling strategy in practice: - monitor11.example.com and ispconfig3.example.com label examples