Fail2ban Journald Migration¶

Overview¶

This page documents the migration from file-based fail2ban logging to journald-based logging, which occurred on 2026-01-01. This is a real migration that happened in the reference deployment.

Background¶

Before (file-based): - Fail2ban wrote logs to /var/log/fail2ban.log - Alloy read from log file - Logs were pre-parsed with action labels

After (journald-based): - Fail2ban writes to systemd journal - Alloy reads from journald - Logs require parsing in queries

Why Migrate?¶

Benefits of journald: 1. Structured logging (automatic metadata) 2. No log rotation issues 3. Centralized log management 4. Better systemd integration 5. Automatic enrichment (PID, UID, etc.)

Configuration Changes¶

Fail2ban Configuration¶

Enable journald backend:

# /etc/fail2ban/jail.local
[DEFAULT]
backend = systemd

Restart fail2ban:

systemctl restart fail2ban

Alloy Configuration Changes¶

OLD configuration (file-based):

loki.source.file "fail2ban" {
  targets = [
    {
      __path__ = "/var/log/fail2ban.log",
      job = "fail2ban",
    },
  ]

  # Pre-parsing via pipeline
  forward_to = [loki.process.fail2ban.receiver]
}

loki.process "fail2ban" {
  forward_to = [loki.write.default.receiver]

  stage.regex {
    expression = "fail2ban.actions.*\[(?P<jail>[^\]]+)\].*(?P<action_type>Ban|Unban)"
  }

  stage.labels {
    values = {
      jail = "",
      action_type = "",
    }
  }
}

NEW configuration (journald):

loki.source.journal "fail2ban" {
  matches {
    _SYSTEMD_UNIT = "fail2ban.service"
  }

  labels = {
    service_type = "fail2ban",
    hostname     = env("HOSTNAME"),
  }

  forward_to = [loki.write.default.receiver]
}

Key differences: - Source changed from loki.source.file to loki.source.journal - No pre-parsing (parsing moved to queries) - Labels changed: job="fail2ban" → service_type="fail2ban" - Simpler configuration

Query Migration¶

OLD Query Pattern¶

Pre-migration (labels available):

{job="fail2ban", action_type="Ban", jail="sshd"}

Labels were added by Alloy during collection.

NEW Query Pattern¶

Post-migration (parse in query):

{service_type="fail2ban"}
| regexp `\[(?P<jail>[^\]]+)\]\s+(?P<action>Ban|Unban)\s+(?P<banned_ip>\d+\.\d+\.\d+\.\d+)`
| action="Ban"
| jail="sshd"

Parsing happens in LogQL query.

Dashboard Updates¶

Update Queries in Grafana¶

Example: Ban count by jail

OLD query:

sum by(jail) (count_over_time({job="fail2ban", action_type="Ban"} [24h]))

NEW query:

sum by(jail) (
  count_over_time(
    {service_type="fail2ban"}
    | regexp `\[(?P<jail>[^\]]+)\]\s+(?P<action>Ban)`
    [24h]
  )
)

Dashboard Panel Updates¶

Update all dashboard panels with fail2ban queries:

Identify panels: List all panels using fail2ban data
Update queries: Replace with new LogQL patterns
Test queries: Verify data appears correctly
Update transformations: May need to adjust for new label structure

Automation Script¶

#!/usr/bin/env python3
"""
Update Grafana dashboard queries for fail2ban journald migration
"""
import json
import requests

GRAFANA_URL = "http://localhost:3000"
GRAFANA_TOKEN = "YOUR_TOKEN"

def update_query(old_query):
    """Convert old query to new format"""
    # Simple replacement example
    new_query = old_query.replace(
        '{job="fail2ban"',
        '{service_type="fail2ban"} | regexp `\\[(?P<jail>[^\\]]+)\\]\\s+(?P<action>Ban|Unban)`'
    )
    return new_query

def update_dashboard(dashboard_uid):
    """Update dashboard queries"""
    # Fetch dashboard
    resp = requests.get(
        f"{GRAFANA_URL}/api/dashboards/uid/{dashboard_uid}",
        headers={"Authorization": f"Bearer {GRAFANA_TOKEN}"}
    )
    dashboard = resp.json()['dashboard']

    # Update queries
    for panel in dashboard.get('panels', []):
        for target in panel.get('targets', []):
            if 'expr' in target and 'fail2ban' in target['expr']:
                target['expr'] = update_query(target['expr'])

    # Save dashboard
    requests.post(
        f"{GRAFANA_URL}/api/dashboards/db",
        headers={"Authorization": f"Bearer {GRAFANA_TOKEN}"},
        json={
            "dashboard": dashboard,
            "message": "Update fail2ban queries for journald",
            "overwrite": True
        }
    )

# Run for fail2ban dashboard
update_dashboard("fail2ban")

Migration Timeline¶

Actual migration in reference deployment:

2026-01-01 04:18 UTC: Last file-based log entry
2026-01-01 04:41 UTC: First journald log entry
Gap: 23 minutes during migration

Data Continuity¶

Handling the Gap¶

Options:

Accept gap: Short outage during migration (chosen approach)
Run both temporarily: Dual logging during transition
Backfill: Import old logs to Loki with new format

Querying Across Migration¶

Query both old and new data:

# Old data (before 2026-01-01 04:18 UTC)
{job="fail2ban", action_type="Ban"}

# New data (after 2026-01-01 04:41 UTC)
{service_type="fail2ban"} | regexp `Ban`

# Combined (union)
{job="fail2ban", action_type="Ban"}
or
{service_type="fail2ban"} | regexp `Ban`

Testing the Migration¶

Pre-Migration Testing¶

Test journald collection:
```
journalctl -u fail2ban -n 20
```
Test Alloy configuration:
```
alloy validate /etc/alloy/config.alloy
```
Deploy to test environment: Verify before production

Post-Migration Verification¶

Check logs in Loki:

curl -G "http://localhost:3100/loki/api/v1/query" \
  --data-urlencode 'query={service_type="fail2ban"}' \
  --data-urlencode 'limit=5'

Verify parsing works:

{service_type="fail2ban"}
| regexp `\[(?P<jail>[^\]]+)\]`
| jail="sshd"

Check dashboard: Verify panels show data

Lessons Learned¶

What Went Well¶

Simpler configuration: Journald is easier to configure
Better metadata: Automatic enrichment
No log rotation: One less thing to manage

Challenges¶

Query complexity: Parsing moved to queries (more complex)
Dashboard updates: All queries needed updating
Migration gap: Brief data gap during cutover

Recommendations¶

Plan downtime: Accept brief gap for cleaner migration
Update dashboards first: Test queries before migrating
Document changes: Keep notes for future reference
Communicate: Notify users of dashboard changes

Rollback Procedure¶

If migration needs to be reversed:

Revert Alloy configuration:

cp /backup/config.alloy.old /etc/alloy/config.alloy
systemctl restart alloy

Disable journald backend (if desired):

# /etc/fail2ban/jail.local
[DEFAULT]
backend = auto

Revert dashboard queries: Restore old LogQL patterns

Similar Migrations¶

This pattern applies to other services:

Mail logs: Postfix journald migration
DNS logs: Bind9 journald migration
System logs: Any file → journald migration

Use the same approach: 1. Change Alloy from loki.source.file to loki.source.journal 2. Update labels to match new structure 3. Move parsing to queries 4. Update dashboards