Loki
Overview¶
Grafana Loki is a log aggregation system inspired by Prometheus. It stores logs efficiently and provides powerful querying via LogQL.
Purpose¶
- Store logs from Alloy collectors
- Provide query API for Grafana dashboards
- Enable log aggregation across multiple hosts
- Support S3-compatible object storage backends
Installation¶
The loki role deploys Loki as a Podman container using systemd quadlets:
- role: jackaltx.solti_monitoring.loki
vars:
loki_version: "2.9"
loki_port: 3100
loki_retention: "30d"
Key Configuration Options¶
Basic Configuration¶
loki_version: "2.9" # Loki version
loki_port: 3100 # HTTP API port
loki_retention: "30d" # Log retention period
loki_max_chunk_age: "2h" # Maximum chunk age before flush
Storage Backends¶
Local filesystem (default):
S3-compatible storage:
loki_storage_type: "s3"
loki_s3_endpoint: "storage.example.com:8010"
loki_s3_bucket: "loki11"
loki_s3_access_key: "{{ vault_s3_access }}"
loki_s3_secret_key: "{{ vault_s3_secret }}"
loki_s3_region: "us-east-1" # Optional
Container Configuration¶
Deployed via Podman with systemd quadlets:
Deployment¶
Basic Deployment¶
Deploy Loki on a single host:
---
- name: Deploy Loki server
hosts: monitoring_servers
become: true
roles:
- role: jackaltx.solti_monitoring.loki
vars:
loki_retention: "30d"
Run deployment:
S3-Backed Deployment¶
Deploy with S3 storage backend:
- role: jackaltx.solti_monitoring.loki
vars:
loki_storage_type: "s3"
loki_s3_endpoint: "storage.example.com:8010"
loki_s3_bucket: "loki11"
loki_s3_access_key: "{{ vault_s3_access }}"
loki_s3_secret_key: "{{ vault_s3_secret }}"
Service Management¶
Systemd Quadlet¶
Loki runs as a Podman container managed by systemd:
# Check status
systemctl status loki
# Start/stop/restart
systemctl start loki
systemctl stop loki
systemctl restart loki
# View logs
journalctl -u loki -f
# Check container
podman ps | grep loki
Health Check¶
Verify Loki is running:
Expected response:
API Access¶
Query API¶
Instant query (single value):
curl -G "http://localhost:3100/loki/api/v1/query" \
--data-urlencode 'query={service_type="fail2ban"}' \
--data-urlencode 'limit=10'
Range query (time series):
curl -G "http://localhost:3100/loki/api/v1/query_range" \
--data-urlencode 'query={service_type="fail2ban"}' \
--data-urlencode 'start=1735000000000000000' \
--data-urlencode 'end=1735100000000000000' \
--data-urlencode 'limit=100'
Push API¶
Alloy uses the push API to send logs:
POST http://localhost:3100/loki/api/v1/push
Content-Type: application/json
{
"streams": [
{
"stream": {"service_type": "fail2ban", "hostname": "server1"},
"values": [
["1735000000000000000", "Log message here"]
]
}
]
}
Label Discovery¶
List all labels:
Get values for a label:
LogQL Query Language¶
Basic Queries¶
Stream selector:
Filter logs:
Regex filter:
Log Parsing¶
Extract fields with regex:
{service_type="fail2ban"}
| regexp `\[(?P<jail>[^\]]+)\]\s+(?P<action>Ban|Unban)\s+(?P<ip>\d+\.\d+\.\d+\.\d+)`
| jail="sshd"
JSON parsing:
Aggregations¶
Count logs over time:
Rate of logs:
Sum by label:
Retention Configuration¶
Retention Period¶
Set retention in role variables:
Compaction¶
Loki automatically compacts chunks. Configure compaction settings:
Resource Requirements¶
Minimum Requirements¶
- CPU: 2 cores
- Memory: 1GB RAM
- Disk: 50GB+ (depends on log volume and retention)
- Network: 100 Mbps
Sizing Guidance¶
Small deployment (1-10 hosts, low log volume): - 2 CPU cores - 1GB RAM - 50GB storage
Medium deployment (10-50 hosts, moderate logs): - 4 CPU cores - 2GB RAM - 200GB storage
Large deployment (50+ hosts, high volume): - 8+ CPU cores - 4GB+ RAM - 1TB+ storage or S3 backend
Performance Tuning¶
Chunk Configuration¶
Optimize chunk sizes:
Cache Settings¶
Tune cache for better performance:
Ingestion Rate Limits¶
Prevent resource exhaustion:
Backup and Recovery¶
Filesystem Backend¶
Manual backup:
# Stop Loki
systemctl stop loki
# Backup data directory
tar -czf loki-backup.tar.gz /var/lib/loki/
# Start Loki
systemctl start loki
S3 Backend¶
When using S3, data is automatically stored in object storage. For disaster recovery:
- Deploy new Loki instance
- Configure same S3 endpoint and bucket
- Data automatically available
Monitoring Loki¶
Metrics Endpoint¶
Loki exposes Prometheus metrics:
Key Metrics to Monitor¶
loki_ingester_chunks_created_total- Chunks createdloki_request_duration_seconds- Query latencyloki_ingester_memory_chunks- Chunks in memoryloki_panic_total- Application panics
Troubleshooting¶
Check Container Status¶
Check Service Status¶
Verify API Access¶
Test Query¶
curl -G "http://localhost:3100/loki/api/v1/query" \
--data-urlencode 'query={service_type="fail2ban"}' \
--data-urlencode 'limit=5'
Common Issues¶
- Container won't start: Check logs with
podman logs loki - API not accessible: Verify port 3100 is open, check firewall
- No logs appearing: Check Alloy collectors are configured correctly
- Out of disk space: Reduce retention period or use S3 backend
- High memory usage: Reduce chunk cache sizes or add more RAM
- Query timeouts: Optimize queries, add time range filters
Security Considerations¶
- Network Access: Restrict port 3100 to monitoring network
- Authentication: Consider adding authentication via reverse proxy
- TLS/SSL: Use HTTPS in production (requires reverse proxy)
- S3 Credentials: Store S3 keys in Ansible Vault
- Log Sanitization: Avoid logging sensitive data
Reference Deployment¶
See Reference Deployments chapter for real-world example: - monitor11.example.com - Loki with S3 backend