Label Strategies¶
Overview¶
Labels (tags in InfluxDB, labels in Loki) are key-value pairs that provide metadata about metrics and logs. Effective labeling is critical for filtering, aggregating, and organizing observability data.
Why Labels Matter¶
- Filtering: Query specific hosts, services, or environments
- Aggregation: Group data for analysis
- Organization: Logical separation of data streams
- Cardinality: Too many unique label combinations = performance issues
Label Design Principles¶
1. Low Cardinality¶
Problem: Each unique label combination creates a new series/stream
Bad (high cardinality):
labels:
user_id: "12345" # Thousands of unique users
request_id: "abc-123" # Every request is unique
timestamp: "2024-01-01" # Every second is unique
Good (low cardinality):
labels:
environment: "production" # Only 3-4 values
service: "api" # Limited number of services
region: "us-east" # Fixed set of regions
2. Consistent Naming¶
Use consistent label names across all data sources:
Good:
# Telegraf
telegraf_global_tags:
hostname: "server1"
environment: "production"
# Alloy
alloy_global_labels:
hostname: "server1"
environment: "production"
Bad (inconsistent):
# Telegraf uses "host"
telegraf_global_tags:
host: "server1"
# Alloy uses "hostname"
alloy_global_labels:
hostname: "server1"
3. Descriptive Values¶
Use clear, descriptive values:
Good: environment="production"
Bad: env="prod"
Good: service_type="fail2ban"
Bad: type="f2b"
4. Hierarchical Structure¶
Organize labels hierarchically:
labels:
datacenter: "us-east"
cluster: "prod-cluster-1"
host: "server01"
service: "api"
component: "auth"
Enables queries like:
- All of datacenter="us-east"
- All of cluster="prod-cluster-1"
- Specific host="server01"
Standard Label Sets¶
Telegraf (Metrics)¶
Global tags (applied to all metrics):
telegraf_global_tags:
hostname: "{{ ansible_hostname }}"
environment: "production"
datacenter: "dc1"
role: "webserver"
Automatic tags (added by plugins):
- cpu: CPU identifier (cpu0, cpu1, cpu-total)
- device: Disk device name
- interface: Network interface name
- path: Filesystem mount point
Alloy (Logs)¶
Global labels (applied to all logs):
Source-specific labels:
loki.source.journal "fail2ban" {
labels = {
service_type = "fail2ban",
hostname = env("HOSTNAME"),
}
}
loki.source.file "apache" {
targets = [{
__path__ = "/var/log/apache2/access.log",
service_type = "web",
log_type = "access",
}]
}
Recommended Label Strategy¶
Infrastructure Labels¶
Applied to all metrics/logs from a host:
hostname: "server01" # Unique host identifier
environment: "production" # prod, staging, dev
datacenter: "us-east-1" # Physical/cloud location
cluster: "prod-api" # Logical grouping
Service Labels¶
Applied to application-specific data:
service_type: "web" # web, db, cache, queue
service_name: "apache" # Specific service
application: "myapp" # Application name
component: "api" # Application component
Operational Labels¶
Context about data collection:
Label Examples by Use Case¶
Web Server Monitoring¶
Metrics (Telegraf):
telegraf_global_tags:
hostname: "web01"
environment: "production"
role: "webserver"
application: "frontend"
Logs (Alloy):
loki.source.file "apache_access" {
targets = [{
__path__ = "/var/log/apache2/access.log",
service_type = "web",
log_type = "access",
hostname = env("HOSTNAME"),
}]
}
Database Server Monitoring¶
telegraf_global_tags:
hostname: "db01"
environment: "production"
role: "database"
db_type: "postgresql"
db_cluster: "main"
Multi-Tenant Application¶
# Labels
hostname: "app01"
environment: "production"
service: "api"
# Use fields (not labels) for high-cardinality data
# Store tenant_id, user_id as fields in the data, not labels
Querying by Labels¶
InfluxDB (Flux)¶
from(bucket: "telegraf")
|> range(start: -1h)
|> filter(fn: (r) => r["environment"] == "production")
|> filter(fn: (r) => r["role"] == "webserver")
|> filter(fn: (r) => r["_measurement"] == "cpu")
Loki (LogQL)¶
AND operator (implicit):
OR operator:
Regex matching:
Cardinality Management¶
Checking Cardinality¶
InfluxDB:
Loki:
# Check label cardinality
curl http://localhost:3100/loki/api/v1/labels
# Check values for a label
curl http://localhost:3100/loki/api/v1/label/hostname/values
Reducing Cardinality¶
Problem: Too many unique label combinations
Solutions: 1. Use fields instead of labels for high-cardinality data 2. Aggregate before labeling: Group similar values 3. Drop unnecessary labels: Remove labels you don't query 4. Use relabeling: Transform labels to reduce uniqueness
Example - Bad (high cardinality):
Example - Good (low cardinality):
labels:
client_network: "internal" # Only internal/external
# Store actual IP in log message, not label
Label Migration¶
If you need to change labels:
- Add new labels alongside old ones (transition period)
- Update queries to use new labels
- Remove old labels after transition
Example:
# Phase 1: Add new label
labels:
host: "server01" # Old
hostname: "server01" # New
# Phase 2: Update all queries to use "hostname"
# Phase 3: Remove "host" label
labels:
hostname: "server01" # Only new label
Best Practices Summary¶
- Keep cardinality low: < 10 values per label is ideal
- Use consistent names: Same label names across tools
- Be descriptive: Clear, meaningful values
- Plan hierarchy: Organize labels logically
- Avoid timestamps: Never use time as a label
- Avoid UUIDs: Use fields for unique identifiers
- Document labels: Maintain label documentation
- Review regularly: Audit labels for unused/redundant ones
Reference Deployment¶
See Reference Deployments for labeling strategy in practice: - monitor11.example.com and ispconfig3.example.com label examples