Skip to content

Retention Policies

Overview

Retention policies define how long data is kept before being deleted. Proper retention configuration balances storage costs, compliance requirements, and data availability.

Why Retention Matters

  1. Cost Control: Limit storage growth
  2. Compliance: Meet regulatory requirements
  3. Performance: Smaller datasets query faster
  4. Capacity Planning: Predictable storage needs

InfluxDB Retention

Bucket-Level Retention

Each InfluxDB bucket has its own retention policy:

# Short-term bucket for detailed metrics
influxdb_buckets:
  - name: "telegraf_hourly"
    retention: "7d"
    description: "High-resolution metrics"

  - name: "telegraf_daily"
    retention: "90d"
    description: "Daily aggregates"

  - name: "telegraf_monthly"
    retention: "365d"
    description: "Monthly summaries"

Setting Retention via Role

- role: jackaltx.solti_monitoring.influxdb
  vars:
    influxdb_bucket: "telegraf"
    influxdb_retention: "30d"

Updating Retention

Change retention for existing bucket:

influx bucket update \
  --name telegraf \
  --retention 90d

Via API:

curl -X PATCH "http://localhost:8086/api/v2/buckets/BUCKET_ID" \
  -H "Authorization: Token YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"retentionRules": [{"type": "expire", "everySeconds": 7776000}]}'

Infinite Retention

influxdb_retention: "0"  # Keep forever (not recommended)

Warning: Infinite retention leads to unbounded storage growth.

Loki Retention

Global Retention

Configure retention for all log streams:

- role: jackaltx.solti_monitoring.loki
  vars:
    loki_retention: "30d"

Stream-Level Retention

Configure different retention per log type (via labels):

loki_retention_config:
  - selector: '{service_type="audit"}'
    retention: "365d"      # Keep audit logs for 1 year

  - selector: '{service_type="debug"}'
    retention: "7d"        # Keep debug logs for 1 week

  - selector: '{service_type="application"}'
    retention: "30d"       # Keep application logs for 1 month

Compaction and Deletion

Loki automatically: 1. Marks chunks older than retention for deletion 2. Waits for retention_delete_delay (default: 2h) 3. Deletes chunks during compaction

loki_retention_delete_delay: "2h"
loki_compaction_interval: "10m"

Retention Strategy by Data Type

Metrics (InfluxDB)

System metrics (CPU, memory, disk): - Retention: 30-90 days - Rationale: Sufficient for troubleshooting and capacity planning

Application metrics (app-specific): - Retention: 30-90 days - Rationale: Correlate with logs and events

Business metrics (KPIs, analytics): - Retention: 365+ days - Rationale: Year-over-year comparison, trend analysis

Logs (Loki)

Application logs: - Retention: 30 days - Rationale: Recent troubleshooting

Security logs (fail2ban, auth): - Retention: 90-365 days - Rationale: Compliance, security audits

Audit logs (admin actions): - Retention: 365+ days - Rationale: Compliance requirements

Debug logs: - Retention: 7 days - Rationale: Temporary troubleshooting

Access logs (web, API): - Retention: 30 days - Rationale: Traffic analysis, debugging

Multi-Tier Retention

Downsampling Strategy

Keep detailed metrics short-term, aggregated metrics long-term:

Tier 1 (Raw data): 7 days, 10-second intervals Tier 2 (5-minute aggregates): 30 days Tier 3 (1-hour aggregates): 365 days Tier 4 (Daily summaries): Forever

influxdb_buckets:
  - name: "telegraf_raw"
    retention: "7d"

  - name: "telegraf_5m"
    retention: "30d"

  - name: "telegraf_1h"
    retention: "365d"

Create downsampling tasks:

option task = {name: "downsample_5m", every: 5m}

from(bucket: "telegraf_raw")
  |> range(start: -10m)
  |> aggregateWindow(every: 5m, fn: mean)
  |> to(bucket: "telegraf_5m")

Compliance Considerations

Regulatory Requirements

GDPR: Personal data retention limits HIPAA: Healthcare data retention (6 years) SOX: Financial data retention (7 years) PCI-DSS: Payment card data retention limits

Implementing Compliance

  1. Identify regulated data: Which logs/metrics contain sensitive info
  2. Set appropriate retention: Match regulatory requirements
  3. Document policy: Maintain retention policy documentation
  4. Audit regularly: Verify retention settings
  5. Secure deletion: Ensure deleted data is unrecoverable

Example Compliance Configuration

# HIPAA-compliant retention
loki_retention_config:
  - selector: '{data_class="phi"}'
    retention: "2190d"  # 6 years

# GDPR-compliant retention
  - selector: '{data_class="pii"}'
    retention: "90d"    # Delete after purpose fulfilled

Monitoring Retention

Check Current Retention

InfluxDB:

influx bucket list

Loki:

# Check config file
cat /etc/loki/loki.yaml | grep -A 5 retention

Storage Growth Tracking

Monitor storage growth to verify retention is working:

# InfluxDB storage
du -sh /var/lib/influxdb2

# Loki storage
du -sh /var/lib/loki

# S3 bucket size (if using S3)
aws s3 ls --summarize --recursive s3://influx11/

Alerts

Set up alerts for: - Storage growth exceeding expected rate - Retention deletion failures - Storage approaching capacity

Adjusting Retention

When to Increase Retention

  • Compliance requirements change
  • Need longer historical data
  • Storage costs decrease
  • Business needs change

When to Decrease Retention

  • Storage costs too high
  • Running out of disk space
  • Data rarely accessed
  • Compliance allows shorter retention

Impact Assessment

Before changing retention:

  1. Query patterns: Check how old data is typically queried
  2. Storage impact: Calculate storage savings
  3. User impact: Notify users of retention changes
  4. Compliance: Verify changes meet requirements

Backup vs Retention

Retention: Automatic deletion of old data Backup: Separate copy for disaster recovery

Best practice: Retention ≠ Backup - Set retention based on operational needs - Create backups for disaster recovery - Archive old data separately if needed for compliance

Cost Optimization

Storage Cost by Retention

Example calculation:

Current: 30d retention = 100 GB = $50/month
Option 1: 7d retention = 23 GB = $12/month (76% savings)
Option 2: 90d retention = 300 GB = $150/month (200% increase)

Retention Recommendations

Cost-optimized: - System metrics: 7-14 days - Application logs: 7-14 days - Security logs: 30 days (minimum)

Balanced: - System metrics: 30 days - Application logs: 30 days - Security logs: 90 days

Compliance-focused: - System metrics: 90 days - Application logs: 90 days - Security logs: 365 days

Reference Deployment

See Reference Deployments chapter for retention configuration in production: - monitor11.example.com - 30-day retention with S3 backend