Skip to content

Log Analysis

Overview

Effective log analysis is crucial for troubleshooting, security monitoring, and understanding system behavior. This page covers techniques for analyzing logs in Loki using LogQL.

Basic Log Queries

Stream Selection

Select by label:

{service_type="fail2ban"}
{hostname="server01"}
{service_type="web", log_type="access"}

Multiple label values:

{hostname="server01"} or {hostname="server02"}
{service_type=~"fail2ban|apache"}  // Regex OR

Exclude labels:

{service_type!="debug"}

Filtering Log Lines

Text Matching

Contains:

{service_type="fail2ban"} |= "Ban"

Does not contain:

{service_type="web"} != "200"

Case-insensitive:

{service_type="web"} |~ "(?i)error"

Multiple conditions:

{service_type="web"}
  |= "POST"
  |= "/api/"
  != "200"

Log Parsing

Regex Parsing

Extract fields:

{service_type="fail2ban"}
| regexp `\[(?P<jail>[^\]]+)\]\s+(?P<action>Ban|Unban)\s+(?P<ip>\d+\.\d+\.\d+\.\d+)`
| jail="sshd"

Apache access log parsing:

{service_type="web", log_type="access"}
| regexp `^(?P<ip>\S+) \S+ \S+ \[(?P<timestamp>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) \S+" (?P<status>\d+) (?P<size>\d+)`
| status="500"

JSON Parsing

For JSON-formatted logs:

{service_type="application"}
| json
| level="error"
| message=~"database.*timeout"

Pattern Matching

Use pattern for simpler parsing:

{service_type="mail"}
| pattern `<timestamp> <level> <message>`
| level="error"

Aggregations

Count Logs

Count over time:

count_over_time({service_type="fail2ban"} [24h])

Group by label:

sum by(jail) (count_over_time({service_type="fail2ban"} [24h]))

Top N results:

topk(20, sum by(ip) (
  count_over_time(
    {service_type="fail2ban"}
    | regexp `Ban\s+(?P<ip>\d+\.\d+\.\d+\.\d+)` [7d]
  )
))

Rate Calculations

Log rate:

rate({service_type="web"} [5m])

Error rate:

sum(rate({service_type="web"} |= "error" [5m]))

Percentage:

sum(rate({service_type="web"} |= "error" [5m]))
/
sum(rate({service_type="web"} [5m]))
* 100

Time-Based Analysis

Time Ranges

Last hour:

{service_type="fail2ban"} [1h]

Specific time range: Use Grafana time picker or API parameters

Bucketed Counts

Count logs per hour:

sum by (hour) (
  count_over_time({service_type="web"} [1h])
)

Time Series

Logs over time:

sum(count_over_time({service_type="fail2ban"} |= "Ban" [5m]))

Advanced Queries

Multi-Stage Pipeline

{service_type="fail2ban"}
| regexp `\[(?P<jail>[^\]]+)\]\s+(?P<action>Ban|Unban)\s+(?P<ip>\d+\.\d+\.\d+\.\d+)`
| action="Ban"
| jail=~"sshd|dovecot"
| ip!~"192\.168\..*"

Log Context

Logs before and after match: Use Grafana's "Show context" feature, or query with time range:

{service_type="web"}
  |= "500"
  @ timestamp +/- 5m

Metric Queries from Logs

Bytes transferred:

sum(
  sum_over_time(
    {service_type="web", log_type="access"}
    | regexp `\s(?P<bytes>\d+)$`
    | unwrap bytes [1h]
  )
)

Common Analysis Patterns

Security Analysis

Failed login attempts:

{service_type="system"}
  |= "Failed password"
  | regexp `Failed password for (?P<user>\S+) from (?P<ip>\S+)`

Top banned IPs:

topk(20, sum by(ip) (
  count_over_time(
    {service_type="fail2ban"}
    | regexp `Ban\s+(?P<ip>\d+\.\d+\.\d+\.\d+)` [7d]
  )
))

Unusual activity hours:

sum by (hour) (
  count_over_time({service_type="web"} [1h])
)
// Visualize in Grafana heatmap

Application Monitoring

Error frequency:

sum(rate({service_type="application"} |~ "(?i)error|exception" [5m]))

Slow requests (if duration in logs):

{service_type="web"}
  | regexp `duration=(?P<duration>\d+)ms`
  | duration > 1000

User activity:

{service_type="application"}
  | json
  | user_id!=""
  | count by(user_id) over time

System Monitoring

Service restarts:

{service_type="system"}
  |= "Started" or |= "Stopped"

Disk space warnings:

{service_type="system"}
  |~ "disk.*full|out of space"

OOM events:

{service_type="system"}
  |= "Out of memory"
  | regexp `Killed process (?P<pid>\d+) \((?P<process>[^)]+)\)`

Query Optimization

Performance Tips

  1. Always use time ranges: Avoid unbounded queries
  2. Use specific labels: Filter by label before parsing
  3. Avoid regex when possible: Use literal matching when you can
  4. Limit results: Use limit clause for exploratory queries
  5. Use instant queries for tables: Faster than range queries

Slow:

{service_type="web"} |~ "error"  // Scans all web logs

Fast:

{service_type="web", log_type="error"} |= "timeout"  // Label filter first

Query Examples

Debug query performance:

{service_type="web"} [1m] | stats  // See query stats

Grafana Integration

Log Panel

  • Use "Logs" visualization
  • Enable "Live" mode for real-time
  • Use "Show context" for surrounding logs
  • Apply filters interactively

Table Panel

Use instant queries for tables:

topk(10, sum by(jail) (count_over_time({service_type="fail2ban"} [24h])))

Add transformations: - "Labels to fields" for Loki queries - "Organize fields" to reorder columns - "Sort by" to order results

Time Series Panel

Use range queries for graphs:

sum(rate({service_type="web"} [5m])) by (log_type)

Export and Reporting

Export Logs

Via API:

curl -G "http://localhost:3100/loki/api/v1/query_range" \
  --data-urlencode 'query={service_type="fail2ban"}' \
  --data-urlencode 'start=1735000000000000000' \
  --data-urlencode 'end=1735100000000000000' \
  -o logs-export.json

Parse JSON output:

jq -r '.data.result[].values[][1]' logs-export.json

Scheduled Reports

Use Grafana alerting/reporting to: 1. Create dashboard with key queries 2. Set up scheduled snapshot or PDF export 3. Send via email or webhook

Best Practices

  1. Start broad, narrow down: Begin with label filters, then add log line filters
  2. Test queries incrementally: Add one filter at a time
  3. Use descriptive labels: Make queries readable
  4. Document complex queries: Add comments in dashboards
  5. Save useful queries: Create dashboard or alert from working query
  6. Monitor query performance: Avoid expensive queries that timeout
  7. Use templates: Create reusable query patterns

Reference

  • LogQL documentation: https://grafana.com/docs/loki/latest/logql/
  • Query examples in Dashboard Development chapter