Comprehensive System Monitoring and Logging Guide for Ubuntu Server 22.04

Effective monitoring and logging are crucial for maintaining healthy Ubuntu servers. This guide covers everything from built-in tools to advanced monitoring stacks like Prometheus/Grafana and the ELK stack.

Prerequisites

Ubuntu Server 22.04 LTS
Root or sudo access
Basic understanding of Linux systems
Sufficient storage for logs and metrics

Built-in Monitoring Tools

System Resource Monitoring

top and htop

# Install htop
sudo apt install htop -y

# Basic top usage
top

# htop with tree view
htop -t

System Load and Memory

# Load average
uptime
cat /proc/loadavg

# Memory usage
free -h
cat /proc/meminfo

# Detailed memory stats
vmstat 1 5

CPU Information

# CPU info
lscpu
cat /proc/cpuinfo

# CPU usage per core
mpstat -P ALL 1

# Install sysstat for mpstat
sudo apt install sysstat -y

Disk Usage and I/O

# Disk usage
df -h
df -i  # inode usage

# Directory sizes
du -sh /var/*
ncdu /  # Interactive disk usage

# Disk I/O statistics
iostat -x 1
iotop  # Real-time I/O

Network Statistics

# Network interfaces
ip -s link show
ifstat

# Network connections
ss -tuln
netstat -tulpn

# Bandwidth monitoring
iftop
nload
bmon

Process Monitoring

Process Management

# List processes
ps aux
ps -ef --forest

# Find specific process
pgrep -la nginx
pidof nginx

# Process tree
pstree -p

# System calls
strace -p <PID>

Resource Usage by Process

# Top CPU consumers
ps aux --sort=-%cpu | head

# Top memory consumers
ps aux --sort=-%mem | head

# Process accounting
sudo apt install acct -y
sudo accton on
sa  # Summary of process accounting

System Logging with journald

Basic journald Usage

# View all logs
journalctl

# Follow logs in real-time
journalctl -f

# Show logs since last boot
journalctl -b

# Show previous boot logs
journalctl -b -1

# Filter by time
journalctl --since "2024-01-19 10:00:00"
journalctl --since "1 hour ago"
journalctl --until "2024-01-19 12:00:00"

Filtering Logs

# By unit/service
journalctl -u nginx
journalctl -u ssh.service

# By priority
journalctl -p err
journalctl -p warning

# By process
journalctl _PID=1234
journalctl _UID=1000

# Kernel messages
journalctl -k

journald Configuration

sudo nano /etc/systemd/journald.conf

[Journal]
Storage=persistent
Compress=yes
SplitMode=uid
MaxRetentionSec=1month
MaxFileSec=1week
SystemMaxUse=1G
SystemKeepFree=15%
ForwardToSyslog=yes

Export and Backup Logs

# Export to JSON
journalctl -o json > logs.json

# Export specific time range
journalctl --since "2024-01-19" --until "2024-01-20" > daily_logs.txt

# Verify journal integrity
journalctl --verify

Advanced System Monitoring with Prometheus

Install Prometheus

# Create user
sudo useradd --no-create-home --shell /bin/false prometheus

# Download Prometheus
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvf prometheus-2.45.0.linux-amd64.tar.gz

# Install binaries
sudo cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool

# Create directories
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

Configure Prometheus

sudo nano /etc/prometheus/prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: []

rule_files:
  - "alerts/*.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']

  - job_name: 'apache'
    static_configs:
      - targets: ['localhost:9117']

Create Prometheus Service

sudo nano /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus

Install Node Exporter

# Download Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz
tar xvf node_exporter-1.6.0.linux-amd64.tar.gz

# Install
sudo cp node_exporter-1.6.0.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Node Exporter Service

sudo nano /etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Visualization with Grafana

Install Grafana

# Add repository
sudo apt install -y software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list

# Install
sudo apt update
sudo apt install grafana -y

# Start service
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

Configure Grafana

# Access Grafana at http://server-ip:3000
# Default login: admin/admin

# Configure Prometheus data source
# URL: http://localhost:9090

Import Dashboards

# Popular dashboard IDs:
# 1860 - Node Exporter Full
# 3662 - Prometheus 2.0 Overview
# 7362 - MySQL Overview

ELK Stack Setup

Install Elasticsearch

# Add repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Install
sudo apt update
sudo apt install elasticsearch -y

# Configure
sudo nano /etc/elasticsearch/elasticsearch.yml

cluster.name: my-cluster
node.name: node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: localhost
http.port: 9200
discovery.type: single-node
xpack.security.enabled: false

sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

Install Kibana

sudo apt install kibana -y

# Configure
sudo nano /etc/kibana/kibana.yml

server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]

sudo systemctl enable kibana
sudo systemctl start kibana

Install Logstash

sudo apt install logstash -y

# Basic configuration
sudo nano /etc/logstash/conf.d/02-beats-input.conf

input {
  beats {
    port => 5044
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
  }
}

Install Filebeat

sudo apt install filebeat -y

# Configure
sudo nano /etc/filebeat/filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/syslog
    - /var/log/auth.log
  fields:
    service: system

- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log
  fields:
    service: nginx-access

output.elasticsearch:
  hosts: ["localhost:9200"]

processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded

sudo filebeat modules enable system nginx
sudo systemctl enable filebeat
sudo systemctl start filebeat

Custom Monitoring Scripts

Disk Space Alert

sudo nano /usr/local/bin/disk_space_alert.sh

#!/bin/bash
THRESHOLD=80
EMAIL="admin@example.com"

df -H | grep -vE '^Filesystem|tmpfs|cdrom' | awk '{ print $5 " " $1 }' | while read output;
do
  usage=$(echo $output | awk '{ print $1}' | cut -d'%' -f1)
  partition=$(echo $output | awk '{ print $2 }')
  if [ $usage -ge $THRESHOLD ]; then
    echo "WARNING: Partition \"$partition\" is ${usage}% full" | mail -s "Disk Space Alert on $(hostname)" $EMAIL
    logger "Disk space warning: $partition is ${usage}% full"
  fi
done

Service Health Check

sudo nano /usr/local/bin/service_health_check.sh

#!/bin/bash
SERVICES="nginx postgresql ssh"
LOGFILE="/var/log/service_health.log"

for service in $SERVICES; do
    if systemctl is-active --quiet $service; then
        echo "$(date): $service is running" >> $LOGFILE
    else
        echo "$(date): WARNING - $service is not running" >> $LOGFILE
        systemctl start $service
        logger "Service $service was down and has been restarted"
    fi
done

Performance Metrics Collector

sudo nano /usr/local/bin/collect_metrics.sh

#!/bin/bash
METRICS_DIR="/var/log/metrics"
mkdir -p $METRICS_DIR

DATE=$(date +%Y%m%d_%H%M%S)
OUTPUT_FILE="$METRICS_DIR/metrics_$DATE.json"

# Collect metrics
{
    echo "{"
    echo "  \"timestamp\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\","
    echo "  \"hostname\": \"$(hostname)\","
    echo "  \"load_average\": $(cat /proc/loadavg | awk '{print "["$1", "$2", "$3"]"}'),"
    echo "  \"memory\": {"
    free -b | awk 'NR==2{printf "    \"total\": %s,\n    \"used\": %s,\n    \"free\": %s,\n    \"available\": %s\n", $2, $3, $4, $7}'
    echo "  },"
    echo "  \"disk\": ["
    df -B1 | tail -n +2 | while read line; do
        echo "    {"
        echo $line | awk '{printf "      \"filesystem\": \"%s\",\n      \"size\": %s,\n      \"used\": %s,\n      \"available\": %s,\n      \"use_percent\": \"%s\"\n", $1, $2, $3, $4, $5}'
        echo "    },"
    done | sed '$ s/,$//'
    echo "  ],"
    echo "  \"network\": {"
    echo "    \"connections\": $(ss -s | grep estab | awk '{print $2}')"
    echo "  }"
    echo "}"
} > $OUTPUT_FILE

Log Analysis and Management

Log Rotation

sudo nano /etc/logrotate.d/custom-apps

/var/log/myapp/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    postrotate
        systemctl reload myapp >/dev/null 2>&1 || true
    endscript
}

Log Analysis Tools

# Install log analysis tools
sudo apt install logwatch goaccess -y

# Configure logwatch
sudo nano /etc/logwatch/conf/logwatch.conf
# MailTo = admin@example.com
# Detail = High

# Use goaccess for web logs
goaccess /var/log/nginx/access.log -o /var/www/html/report.html --log-format=COMBINED

Centralized Logging with rsyslog

# On log server
sudo nano /etc/rsyslog.conf

# Uncomment these lines:
module(load="imudp")
input(type="imudp" port="514")

module(load="imtcp")
input(type="imtcp" port="514")

# Add template
$template RemoteLogs,"/var/log/remote/%HOSTNAME%/%PROGRAMNAME%.log"
*.* ?RemoteLogs
& stop

# On client servers
sudo nano /etc/rsyslog.conf

# Add at the end:
*.* @@log-server-ip:514

Alert Configuration

Email Alerts Setup

# Install mail utilities
sudo apt install mailutils postfix -y

# Configure postfix as Internet Site

Prometheus Alerting Rules

sudo nano /etc/prometheus/alerts/node_alerts.yml

groups:
  - name: node_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage detected"
          description: "CPU usage is above 80% (current value: {{ $value }}%)"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage detected"
          description: "Memory usage is above 90% (current value: {{ $value }}%)"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs|squashfs"} / node_filesystem_size_bytes) * 100 < 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space"
          description: "Disk space is below 10% (current value: {{ $value }}%)"

Performance Tuning for Monitoring

Optimize Prometheus Storage

# Configure retention
sudo nano /etc/systemd/system/prometheus.service

# Add to ExecStart:
--storage.tsdb.retention.time=30d \
--storage.tsdb.retention.size=10GB

Elasticsearch Optimization

sudo nano /etc/elasticsearch/jvm.options

# Set heap size (50% of RAM, max 32GB)
-Xms4g
-Xmx4g

Best Practices

Regular Reviews: Schedule weekly log reviews
Retention Policies: Define log retention based on compliance needs
Alert Fatigue: Avoid too many alerts; focus on actionable ones
Backup Metrics: Regularly backup Prometheus data
Security: Secure monitoring endpoints with authentication
Documentation: Document what metrics mean and thresholds
Automation: Automate responses to common alerts

Troubleshooting

High Resource Usage

# Find resource-intensive processes
ps aux | sort -nrk 3,3 | head -n 10  # CPU
ps aux | sort -nrk 4,4 | head -n 10  # Memory

# Check I/O wait
iostat -x 1

Missing Metrics

# Check exporters
curl http://localhost:9100/metrics  # Node exporter
curl http://localhost:9090/metrics  # Prometheus

# Verify targets in Prometheus
# http://localhost:9090/targets

Log Issues

# Check disk space
df -h /var/log

# Verify log permissions
ls -la /var/log/

# Force log rotation
sudo logrotate -f /etc/logrotate.conf

Conclusion

This comprehensive guide covered monitoring and logging on Ubuntu Server 22.04, from built-in tools to advanced stacks like Prometheus/Grafana and ELK. Effective monitoring is essential for maintaining reliable systems. Remember to regularly review and adjust your monitoring strategy based on your infrastructure's evolving needs.

Comprehensive System Monitoring and Logging Guide for Ubuntu Server 22.04

Need Professional Ubuntu Server Support?

Comprehensive System Monitoring and Logging Guide for Ubuntu Server 22.04

Prerequisites

Built-in Monitoring Tools

System Resource Monitoring

top and htop

System Load and Memory

CPU Information

Disk Usage and I/O

Network Statistics

Process Monitoring

Process Management

Resource Usage by Process

System Logging with journald

Basic journald Usage

Filtering Logs

journald Configuration

Export and Backup Logs

Advanced System Monitoring with Prometheus

Install Prometheus

Configure Prometheus

Create Prometheus Service

Install Node Exporter

Node Exporter Service

Visualization with Grafana

Install Grafana

Configure Grafana

Import Dashboards

ELK Stack Setup

Install Elasticsearch

Install Kibana

Install Logstash

Install Filebeat

Custom Monitoring Scripts

Disk Space Alert

Service Health Check

Performance Metrics Collector

Log Analysis and Management

Log Rotation

Log Analysis Tools

Centralized Logging with rsyslog

Alert Configuration

Email Alerts Setup

Prometheus Alerting Rules

Performance Tuning for Monitoring

Optimize Prometheus Storage

Elasticsearch Optimization

Best Practices

Troubleshooting

High Resource Usage

Missing Metrics

Log Issues

Conclusion