Service Management¶

Overview¶

All monitoring components run as systemd services, either native or Podman containers managed by systemd quadlets. This page covers common service management operations.

Service Status¶

Check Service Status¶

# InfluxDB
systemctl status influxdb

# Loki
systemctl status loki

# Telegraf
systemctl status telegraf

# Alloy
systemctl status alloy

Check All Monitoring Services¶

systemctl status influxdb loki telegraf alloy

Starting and Stopping Services¶

Individual Services¶

# Start service
systemctl start influxdb
systemctl start loki
systemctl start telegraf
systemctl start alloy

# Stop service
systemctl stop influxdb
systemctl stop loki
systemctl stop telegraf
systemctl stop alloy

# Restart service
systemctl restart influxdb
systemctl restart loki
systemctl restart telegraf
systemctl restart alloy

Multiple Services¶

# Start all server components
systemctl start influxdb loki

# Start all client components
systemctl start telegraf alloy

# Restart all monitoring services
systemctl restart influxdb loki telegraf alloy

Enable/Disable Auto-Start¶

Enable Services at Boot¶

systemctl enable influxdb
systemctl enable loki
systemctl enable telegraf
systemctl enable alloy

Disable Services at Boot¶

systemctl disable influxdb
systemctl disable loki
systemctl disable telegraf
systemctl disable alloy

Check if Service is Enabled¶

systemctl is-enabled influxdb
systemctl is-enabled loki
systemctl is-enabled telegraf
systemctl is-enabled alloy

Viewing Logs¶

Recent Logs¶

# View last 50 lines
journalctl -u influxdb -n 50
journalctl -u loki -n 50
journalctl -u telegraf -n 50
journalctl -u alloy -n 50

Follow Logs (Real-time)¶

# Tail logs in real-time
journalctl -u influxdb -f
journalctl -u loki -f
journalctl -u telegraf -f
journalctl -u alloy -f

Logs Since Timestamp¶

# Logs from last hour
journalctl -u telegraf --since "1 hour ago"

# Logs from specific time
journalctl -u alloy --since "2024-01-01 10:00:00"

# Logs between times
journalctl -u influxdb --since "09:00" --until "10:00"

Filter Logs by Priority¶

# Only errors and critical
journalctl -u telegraf -p err

# Warning and above
journalctl -u alloy -p warning

Container Management (Podman)¶

View Containers¶

# List running containers
podman ps

# List all containers (including stopped)
podman ps -a

# Filter by name
podman ps | grep influxdb
podman ps | grep loki

Container Logs¶

# View container logs
podman logs influxdb
podman logs loki

# Follow container logs
podman logs -f influxdb
podman logs -f loki

# Last 100 lines
podman logs --tail 100 influxdb

Container Information¶

# Inspect container
podman inspect influxdb
podman inspect loki

# Container stats (CPU, memory)
podman stats influxdb loki

# Container processes
podman top influxdb

Restarting Containers¶

# Restart via systemd (preferred)
systemctl restart influxdb
systemctl restart loki

# Direct container restart (not recommended)
podman restart influxdb
podman restart loki

Configuration Reload¶

Telegraf¶

Reload configuration without restart:

# Send SIGHUP to reload
systemctl reload telegraf

# Or restart (if reload not supported)
systemctl restart telegraf

Alloy¶

Test configuration before reload:

# Validate configuration
alloy validate /etc/alloy/config.alloy

# Reload configuration (experimental)
systemctl reload alloy

# Restart if reload fails
systemctl restart alloy

InfluxDB and Loki¶

Configuration changes require restart:

systemctl restart influxdb
systemctl restart loki

Service Dependencies¶

Start Order¶

When starting monitoring stack:

Server components first: InfluxDB, Loki
Wait for readiness: Check health endpoints
Client components: Telegraf, Alloy

# Start servers
systemctl start influxdb loki

# Verify servers are ready
curl http://localhost:8086/health
curl http://localhost:3100/ready

# Start clients
systemctl start telegraf alloy

Stop Order¶

When stopping monitoring stack:

Client components first: Telegraf, Alloy
Server components: InfluxDB, Loki

# Stop clients
systemctl stop telegraf alloy

# Stop servers
systemctl stop influxdb loki

Orchestrator Tools¶

manage-svc.sh¶

Service lifecycle management:

# Deploy service
./manage-svc.sh deploy monitor11-metrics
./manage-svc.sh deploy ispconfig3-alloy

# Remove service
./manage-svc.sh remove monitor11-metrics

# Prepare environment
./manage-svc.sh prepare monitor11

svc-exec.sh¶

Execute specific tasks:

# Verify service
./svc-exec.sh verify monitor11-metrics
./svc-exec.sh verify ispconfig3-alloy

# Restart service
./svc-exec.sh restart monitor11-metrics

# Configure service
./svc-exec.sh configure ispconfig3-alloy

Health Checks¶

InfluxDB¶

# Health endpoint
curl -I http://localhost:8086/health

# Ping endpoint
curl -I http://localhost:8086/ping

# Check with token
curl -H "Authorization: Token YOUR_TOKEN" \
  http://localhost:8086/api/v2/buckets

Loki¶

# Ready endpoint
curl http://localhost:3100/ready

# Metrics endpoint
curl http://localhost:3100/metrics

# Query test
curl -G "http://localhost:3100/loki/api/v1/query" \
  --data-urlencode 'query={service_type="fail2ban"}' \
  --data-urlencode 'limit=1'

Telegraf¶

# Check service status
systemctl status telegraf

# Test configuration
telegraf --config /etc/telegraf/telegraf.conf --test

# Verify output to InfluxDB
journalctl -u telegraf -n 20 | grep -i error

Alloy¶

# Check service status
systemctl status alloy

# Validate configuration
alloy validate /etc/alloy/config.alloy

# Check metrics endpoint
curl http://127.0.0.1:12345/metrics

Automation Scripts¶

Health Check Script¶

#!/bin/bash
# check-monitoring.sh

check_service() {
    if systemctl is-active --quiet "$1"; then
        echo "✅ $1 is running"
        return 0
    else
        echo "❌ $1 is not running"
        return 1
    fi
}

echo "Checking monitoring services..."
check_service influxdb
check_service loki
check_service telegraf
check_service alloy

Restart All Script¶

#!/bin/bash
# restart-monitoring.sh

echo "Stopping clients..."
systemctl stop telegraf alloy

echo "Restarting servers..."
systemctl restart influxdb loki

echo "Waiting for servers..."
sleep 5

echo "Starting clients..."
systemctl start telegraf alloy

echo "Checking status..."
systemctl status influxdb loki telegraf alloy

Best Practices¶

Always check status before operations: Verify service state
Use systemd for container management: Don't use podman commands directly
Follow proper start/stop order: Servers before clients
Check logs after restart: Verify no errors occurred
Test configuration before applying: Avoid service failures
Use orchestrator tools: Leverage manage-svc.sh and svc-exec.sh
Document custom procedures: Keep runbooks for complex operations