Skip to content

Fail2ban Journald Migration

Overview

This page documents the migration from file-based fail2ban logging to journald-based logging, which occurred on 2026-01-01. This is a real migration that happened in the reference deployment.

Background

Before (file-based): - Fail2ban wrote logs to /var/log/fail2ban.log - Alloy read from log file - Logs were pre-parsed with action labels

After (journald-based): - Fail2ban writes to systemd journal - Alloy reads from journald - Logs require parsing in queries

Why Migrate?

Benefits of journald: 1. Structured logging (automatic metadata) 2. No log rotation issues 3. Centralized log management 4. Better systemd integration 5. Automatic enrichment (PID, UID, etc.)

Configuration Changes

Fail2ban Configuration

Enable journald backend:

# /etc/fail2ban/jail.local
[DEFAULT]
backend = systemd

Restart fail2ban:

systemctl restart fail2ban

Alloy Configuration Changes

OLD configuration (file-based):

loki.source.file "fail2ban" {
  targets = [
    {
      __path__ = "/var/log/fail2ban.log",
      job = "fail2ban",
    },
  ]

  # Pre-parsing via pipeline
  forward_to = [loki.process.fail2ban.receiver]
}

loki.process "fail2ban" {
  forward_to = [loki.write.default.receiver]

  stage.regex {
    expression = "fail2ban.actions.*\[(?P<jail>[^\]]+)\].*(?P<action_type>Ban|Unban)"
  }

  stage.labels {
    values = {
      jail = "",
      action_type = "",
    }
  }
}

NEW configuration (journald):

loki.source.journal "fail2ban" {
  matches {
    _SYSTEMD_UNIT = "fail2ban.service"
  }

  labels = {
    service_type = "fail2ban",
    hostname     = env("HOSTNAME"),
  }

  forward_to = [loki.write.default.receiver]
}

Key differences: - Source changed from loki.source.file to loki.source.journal - No pre-parsing (parsing moved to queries) - Labels changed: job="fail2ban"service_type="fail2ban" - Simpler configuration

Query Migration

OLD Query Pattern

Pre-migration (labels available):

{job="fail2ban", action_type="Ban", jail="sshd"}

Labels were added by Alloy during collection.

NEW Query Pattern

Post-migration (parse in query):

{service_type="fail2ban"}
| regexp `\[(?P<jail>[^\]]+)\]\s+(?P<action>Ban|Unban)\s+(?P<banned_ip>\d+\.\d+\.\d+\.\d+)`
| action="Ban"
| jail="sshd"

Parsing happens in LogQL query.

Dashboard Updates

Update Queries in Grafana

Example: Ban count by jail

OLD query:

sum by(jail) (count_over_time({job="fail2ban", action_type="Ban"} [24h]))

NEW query:

sum by(jail) (
  count_over_time(
    {service_type="fail2ban"}
    | regexp `\[(?P<jail>[^\]]+)\]\s+(?P<action>Ban)`
    [24h]
  )
)

Dashboard Panel Updates

Update all dashboard panels with fail2ban queries:

  1. Identify panels: List all panels using fail2ban data
  2. Update queries: Replace with new LogQL patterns
  3. Test queries: Verify data appears correctly
  4. Update transformations: May need to adjust for new label structure

Automation Script

#!/usr/bin/env python3
"""
Update Grafana dashboard queries for fail2ban journald migration
"""
import json
import requests

GRAFANA_URL = "http://localhost:3000"
GRAFANA_TOKEN = "YOUR_TOKEN"

def update_query(old_query):
    """Convert old query to new format"""
    # Simple replacement example
    new_query = old_query.replace(
        '{job="fail2ban"',
        '{service_type="fail2ban"} | regexp `\\[(?P<jail>[^\\]]+)\\]\\s+(?P<action>Ban|Unban)`'
    )
    return new_query

def update_dashboard(dashboard_uid):
    """Update dashboard queries"""
    # Fetch dashboard
    resp = requests.get(
        f"{GRAFANA_URL}/api/dashboards/uid/{dashboard_uid}",
        headers={"Authorization": f"Bearer {GRAFANA_TOKEN}"}
    )
    dashboard = resp.json()['dashboard']

    # Update queries
    for panel in dashboard.get('panels', []):
        for target in panel.get('targets', []):
            if 'expr' in target and 'fail2ban' in target['expr']:
                target['expr'] = update_query(target['expr'])

    # Save dashboard
    requests.post(
        f"{GRAFANA_URL}/api/dashboards/db",
        headers={"Authorization": f"Bearer {GRAFANA_TOKEN}"},
        json={
            "dashboard": dashboard,
            "message": "Update fail2ban queries for journald",
            "overwrite": True
        }
    )

# Run for fail2ban dashboard
update_dashboard("fail2ban")

Migration Timeline

Actual migration in reference deployment:

  • 2026-01-01 04:18 UTC: Last file-based log entry
  • 2026-01-01 04:41 UTC: First journald log entry
  • Gap: 23 minutes during migration

Data Continuity

Handling the Gap

Options:

  1. Accept gap: Short outage during migration (chosen approach)
  2. Run both temporarily: Dual logging during transition
  3. Backfill: Import old logs to Loki with new format

Querying Across Migration

Query both old and new data:

# Old data (before 2026-01-01 04:18 UTC)
{job="fail2ban", action_type="Ban"}

# New data (after 2026-01-01 04:41 UTC)
{service_type="fail2ban"} | regexp `Ban`

# Combined (union)
{job="fail2ban", action_type="Ban"}
or
{service_type="fail2ban"} | regexp `Ban`

Testing the Migration

Pre-Migration Testing

  1. Test journald collection:

    journalctl -u fail2ban -n 20
    

  2. Test Alloy configuration:

    alloy validate /etc/alloy/config.alloy
    

  3. Deploy to test environment: Verify before production

Post-Migration Verification

  1. Check logs in Loki:

    curl -G "http://localhost:3100/loki/api/v1/query" \
      --data-urlencode 'query={service_type="fail2ban"}' \
      --data-urlencode 'limit=5'
    

  2. Verify parsing works:

    {service_type="fail2ban"}
    | regexp `\[(?P<jail>[^\]]+)\]`
    | jail="sshd"
    

  3. Check dashboard: Verify panels show data

Lessons Learned

What Went Well

  1. Simpler configuration: Journald is easier to configure
  2. Better metadata: Automatic enrichment
  3. No log rotation: One less thing to manage

Challenges

  1. Query complexity: Parsing moved to queries (more complex)
  2. Dashboard updates: All queries needed updating
  3. Migration gap: Brief data gap during cutover

Recommendations

  1. Plan downtime: Accept brief gap for cleaner migration
  2. Update dashboards first: Test queries before migrating
  3. Document changes: Keep notes for future reference
  4. Communicate: Notify users of dashboard changes

Rollback Procedure

If migration needs to be reversed:

  1. Revert Alloy configuration:

    cp /backup/config.alloy.old /etc/alloy/config.alloy
    systemctl restart alloy
    

  2. Disable journald backend (if desired):

    # /etc/fail2ban/jail.local
    [DEFAULT]
    backend = auto
    

  3. Revert dashboard queries: Restore old LogQL patterns

Similar Migrations

This pattern applies to other services:

  • Mail logs: Postfix journald migration
  • DNS logs: Bind9 journald migration
  • System logs: Any file → journald migration

Use the same approach: 1. Change Alloy from loki.source.file to loki.source.journal 2. Update labels to match new structure 3. Move parsing to queries 4. Update dashboards