Fail2ban Journald Migration¶
Overview¶
This page documents the migration from file-based fail2ban logging to journald-based logging, which occurred on 2026-01-01. This is a real migration that happened in the reference deployment.
Background¶
Before (file-based):
- Fail2ban wrote logs to /var/log/fail2ban.log
- Alloy read from log file
- Logs were pre-parsed with action labels
After (journald-based): - Fail2ban writes to systemd journal - Alloy reads from journald - Logs require parsing in queries
Why Migrate?¶
Benefits of journald: 1. Structured logging (automatic metadata) 2. No log rotation issues 3. Centralized log management 4. Better systemd integration 5. Automatic enrichment (PID, UID, etc.)
Configuration Changes¶
Fail2ban Configuration¶
Enable journald backend:
Restart fail2ban:
Alloy Configuration Changes¶
OLD configuration (file-based):
loki.source.file "fail2ban" {
targets = [
{
__path__ = "/var/log/fail2ban.log",
job = "fail2ban",
},
]
# Pre-parsing via pipeline
forward_to = [loki.process.fail2ban.receiver]
}
loki.process "fail2ban" {
forward_to = [loki.write.default.receiver]
stage.regex {
expression = "fail2ban.actions.*\[(?P<jail>[^\]]+)\].*(?P<action_type>Ban|Unban)"
}
stage.labels {
values = {
jail = "",
action_type = "",
}
}
}
NEW configuration (journald):
loki.source.journal "fail2ban" {
matches {
_SYSTEMD_UNIT = "fail2ban.service"
}
labels = {
service_type = "fail2ban",
hostname = env("HOSTNAME"),
}
forward_to = [loki.write.default.receiver]
}
Key differences:
- Source changed from loki.source.file to loki.source.journal
- No pre-parsing (parsing moved to queries)
- Labels changed: job="fail2ban" → service_type="fail2ban"
- Simpler configuration
Query Migration¶
OLD Query Pattern¶
Pre-migration (labels available):
Labels were added by Alloy during collection.
NEW Query Pattern¶
Post-migration (parse in query):
{service_type="fail2ban"}
| regexp `\[(?P<jail>[^\]]+)\]\s+(?P<action>Ban|Unban)\s+(?P<banned_ip>\d+\.\d+\.\d+\.\d+)`
| action="Ban"
| jail="sshd"
Parsing happens in LogQL query.
Dashboard Updates¶
Update Queries in Grafana¶
Example: Ban count by jail
OLD query:
NEW query:
sum by(jail) (
count_over_time(
{service_type="fail2ban"}
| regexp `\[(?P<jail>[^\]]+)\]\s+(?P<action>Ban)`
[24h]
)
)
Dashboard Panel Updates¶
Update all dashboard panels with fail2ban queries:
- Identify panels: List all panels using fail2ban data
- Update queries: Replace with new LogQL patterns
- Test queries: Verify data appears correctly
- Update transformations: May need to adjust for new label structure
Automation Script¶
#!/usr/bin/env python3
"""
Update Grafana dashboard queries for fail2ban journald migration
"""
import json
import requests
GRAFANA_URL = "http://localhost:3000"
GRAFANA_TOKEN = "YOUR_TOKEN"
def update_query(old_query):
"""Convert old query to new format"""
# Simple replacement example
new_query = old_query.replace(
'{job="fail2ban"',
'{service_type="fail2ban"} | regexp `\\[(?P<jail>[^\\]]+)\\]\\s+(?P<action>Ban|Unban)`'
)
return new_query
def update_dashboard(dashboard_uid):
"""Update dashboard queries"""
# Fetch dashboard
resp = requests.get(
f"{GRAFANA_URL}/api/dashboards/uid/{dashboard_uid}",
headers={"Authorization": f"Bearer {GRAFANA_TOKEN}"}
)
dashboard = resp.json()['dashboard']
# Update queries
for panel in dashboard.get('panels', []):
for target in panel.get('targets', []):
if 'expr' in target and 'fail2ban' in target['expr']:
target['expr'] = update_query(target['expr'])
# Save dashboard
requests.post(
f"{GRAFANA_URL}/api/dashboards/db",
headers={"Authorization": f"Bearer {GRAFANA_TOKEN}"},
json={
"dashboard": dashboard,
"message": "Update fail2ban queries for journald",
"overwrite": True
}
)
# Run for fail2ban dashboard
update_dashboard("fail2ban")
Migration Timeline¶
Actual migration in reference deployment:
- 2026-01-01 04:18 UTC: Last file-based log entry
- 2026-01-01 04:41 UTC: First journald log entry
- Gap: 23 minutes during migration
Data Continuity¶
Handling the Gap¶
Options:
- Accept gap: Short outage during migration (chosen approach)
- Run both temporarily: Dual logging during transition
- Backfill: Import old logs to Loki with new format
Querying Across Migration¶
Query both old and new data:
# Old data (before 2026-01-01 04:18 UTC)
{job="fail2ban", action_type="Ban"}
# New data (after 2026-01-01 04:41 UTC)
{service_type="fail2ban"} | regexp `Ban`
# Combined (union)
{job="fail2ban", action_type="Ban"}
or
{service_type="fail2ban"} | regexp `Ban`
Testing the Migration¶
Pre-Migration Testing¶
-
Test journald collection:
-
Test Alloy configuration:
-
Deploy to test environment: Verify before production
Post-Migration Verification¶
-
Check logs in Loki:
-
Verify parsing works:
-
Check dashboard: Verify panels show data
Lessons Learned¶
What Went Well¶
- Simpler configuration: Journald is easier to configure
- Better metadata: Automatic enrichment
- No log rotation: One less thing to manage
Challenges¶
- Query complexity: Parsing moved to queries (more complex)
- Dashboard updates: All queries needed updating
- Migration gap: Brief data gap during cutover
Recommendations¶
- Plan downtime: Accept brief gap for cleaner migration
- Update dashboards first: Test queries before migrating
- Document changes: Keep notes for future reference
- Communicate: Notify users of dashboard changes
Rollback Procedure¶
If migration needs to be reversed:
-
Revert Alloy configuration:
-
Disable journald backend (if desired):
-
Revert dashboard queries: Restore old LogQL patterns
Similar Migrations¶
This pattern applies to other services:
- Mail logs: Postfix journald migration
- DNS logs: Bind9 journald migration
- System logs: Any file → journald migration
Use the same approach:
1. Change Alloy from loki.source.file to loki.source.journal
2. Update labels to match new structure
3. Move parsing to queries
4. Update dashboards