Comprehensive System Monitoring and Logging Guide for Ubuntu Server 22.04
Effective monitoring and logging are crucial for maintaining healthy Ubuntu servers. This guide covers everything from built-in tools to advanced monitoring stacks like Prometheus/Grafana and the ELK stack.
Prerequisites
- Ubuntu Server 22.04 LTS
- Root or sudo access
- Basic understanding of Linux systems
- Sufficient storage for logs and metrics
Built-in Monitoring Tools
System Resource Monitoring
top and htop
# Install htop
sudo apt install htop -y
# Basic top usage
top
# htop with tree view
htop -t
System Load and Memory
# Load average
uptime
cat /proc/loadavg
# Memory usage
free -h
cat /proc/meminfo
# Detailed memory stats
vmstat 1 5
CPU Information
# CPU info
lscpu
cat /proc/cpuinfo
# CPU usage per core
mpstat -P ALL 1
# Install sysstat for mpstat
sudo apt install sysstat -y
Disk Usage and I/O
# Disk usage
df -h
df -i # inode usage
# Directory sizes
du -sh /var/*
ncdu / # Interactive disk usage
# Disk I/O statistics
iostat -x 1
iotop # Real-time I/O
Network Statistics
# Network interfaces
ip -s link show
ifstat
# Network connections
ss -tuln
netstat -tulpn
# Bandwidth monitoring
iftop
nload
bmon
Process Monitoring
Process Management
# List processes
ps aux
ps -ef --forest
# Find specific process
pgrep -la nginx
pidof nginx
# Process tree
pstree -p
# System calls
strace -p <PID>
Resource Usage by Process
# Top CPU consumers
ps aux --sort=-%cpu | head
# Top memory consumers
ps aux --sort=-%mem | head
# Process accounting
sudo apt install acct -y
sudo accton on
sa # Summary of process accounting
System Logging with journald
Basic journald Usage
# View all logs
journalctl
# Follow logs in real-time
journalctl -f
# Show logs since last boot
journalctl -b
# Show previous boot logs
journalctl -b -1
# Filter by time
journalctl --since "2024-01-19 10:00:00"
journalctl --since "1 hour ago"
journalctl --until "2024-01-19 12:00:00"
Filtering Logs
# By unit/service
journalctl -u nginx
journalctl -u ssh.service
# By priority
journalctl -p err
journalctl -p warning
# By process
journalctl _PID=1234
journalctl _UID=1000
# Kernel messages
journalctl -k
journald Configuration
sudo nano /etc/systemd/journald.conf
[Journal]
Storage=persistent
Compress=yes
SplitMode=uid
MaxRetentionSec=1month
MaxFileSec=1week
SystemMaxUse=1G
SystemKeepFree=15%
ForwardToSyslog=yes
Export and Backup Logs
# Export to JSON
journalctl -o json > logs.json
# Export specific time range
journalctl --since "2024-01-19" --until "2024-01-20" > daily_logs.txt
# Verify journal integrity
journalctl --verify
Advanced System Monitoring with Prometheus
Install Prometheus
# Create user
sudo useradd --no-create-home --shell /bin/false prometheus
# Download Prometheus
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvf prometheus-2.45.0.linux-amd64.tar.gz
# Install binaries
sudo cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
# Create directories
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
Configure Prometheus
sudo nano /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: []
rule_files:
- "alerts/*.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['localhost:9100']
- job_name: 'apache'
static_configs:
- targets: ['localhost:9117']
Create Prometheus Service
sudo nano /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
Install Node Exporter
# Download Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz
tar xvf node_exporter-1.6.0.linux-amd64.tar.gz
# Install
sudo cp node_exporter-1.6.0.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
Node Exporter Service
sudo nano /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
Visualization with Grafana
Install Grafana
# Add repository
sudo apt install -y software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
# Install
sudo apt update
sudo apt install grafana -y
# Start service
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
Configure Grafana
# Access Grafana at http://server-ip:3000
# Default login: admin/admin
# Configure Prometheus data source
# URL: http://localhost:9090
Import Dashboards
# Popular dashboard IDs:
# 1860 - Node Exporter Full
# 3662 - Prometheus 2.0 Overview
# 7362 - MySQL Overview
ELK Stack Setup
Install Elasticsearch
# Add repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
# Install
sudo apt update
sudo apt install elasticsearch -y
# Configure
sudo nano /etc/elasticsearch/elasticsearch.yml
cluster.name: my-cluster
node.name: node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: localhost
http.port: 9200
discovery.type: single-node
xpack.security.enabled: false
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch
Install Kibana
sudo apt install kibana -y
# Configure
sudo nano /etc/kibana/kibana.yml
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
sudo systemctl enable kibana
sudo systemctl start kibana
Install Logstash
sudo apt install logstash -y
# Basic configuration
sudo nano /etc/logstash/conf.d/02-beats-input.conf
input {
beats {
port => 5044
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
}
}
Install Filebeat
sudo apt install filebeat -y
# Configure
sudo nano /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/syslog
- /var/log/auth.log
fields:
service: system
- type: log
enabled: true
paths:
- /var/log/nginx/access.log
fields:
service: nginx-access
output.elasticsearch:
hosts: ["localhost:9200"]
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
sudo filebeat modules enable system nginx
sudo systemctl enable filebeat
sudo systemctl start filebeat
Custom Monitoring Scripts
Disk Space Alert
sudo nano /usr/local/bin/disk_space_alert.sh
#!/bin/bash
THRESHOLD=80
EMAIL="admin@example.com"
df -H | grep -vE '^Filesystem|tmpfs|cdrom' | awk '{ print $5 " " $1 }' | while read output;
do
usage=$(echo $output | awk '{ print $1}' | cut -d'%' -f1)
partition=$(echo $output | awk '{ print $2 }')
if [ $usage -ge $THRESHOLD ]; then
echo "WARNING: Partition \"$partition\" is ${usage}% full" | mail -s "Disk Space Alert on $(hostname)" $EMAIL
logger "Disk space warning: $partition is ${usage}% full"
fi
done
Service Health Check
sudo nano /usr/local/bin/service_health_check.sh
#!/bin/bash
SERVICES="nginx postgresql ssh"
LOGFILE="/var/log/service_health.log"
for service in $SERVICES; do
if systemctl is-active --quiet $service; then
echo "$(date): $service is running" >> $LOGFILE
else
echo "$(date): WARNING - $service is not running" >> $LOGFILE
systemctl start $service
logger "Service $service was down and has been restarted"
fi
done
Performance Metrics Collector
sudo nano /usr/local/bin/collect_metrics.sh
#!/bin/bash
METRICS_DIR="/var/log/metrics"
mkdir -p $METRICS_DIR
DATE=$(date +%Y%m%d_%H%M%S)
OUTPUT_FILE="$METRICS_DIR/metrics_$DATE.json"
# Collect metrics
{
echo "{"
echo " \"timestamp\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\","
echo " \"hostname\": \"$(hostname)\","
echo " \"load_average\": $(cat /proc/loadavg | awk '{print "["$1", "$2", "$3"]"}'),"
echo " \"memory\": {"
free -b | awk 'NR==2{printf " \"total\": %s,\n \"used\": %s,\n \"free\": %s,\n \"available\": %s\n", $2, $3, $4, $7}'
echo " },"
echo " \"disk\": ["
df -B1 | tail -n +2 | while read line; do
echo " {"
echo $line | awk '{printf " \"filesystem\": \"%s\",\n \"size\": %s,\n \"used\": %s,\n \"available\": %s,\n \"use_percent\": \"%s\"\n", $1, $2, $3, $4, $5}'
echo " },"
done | sed '$ s/,$//'
echo " ],"
echo " \"network\": {"
echo " \"connections\": $(ss -s | grep estab | awk '{print $2}')"
echo " }"
echo "}"
} > $OUTPUT_FILE
Log Analysis and Management
Log Rotation
sudo nano /etc/logrotate.d/custom-apps
/var/log/myapp/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 0640 www-data adm
sharedscripts
postrotate
systemctl reload myapp >/dev/null 2>&1 || true
endscript
}
Log Analysis Tools
# Install log analysis tools
sudo apt install logwatch goaccess -y
# Configure logwatch
sudo nano /etc/logwatch/conf/logwatch.conf
# MailTo = admin@example.com
# Detail = High
# Use goaccess for web logs
goaccess /var/log/nginx/access.log -o /var/www/html/report.html --log-format=COMBINED
Centralized Logging with rsyslog
# On log server
sudo nano /etc/rsyslog.conf
# Uncomment these lines:
module(load="imudp")
input(type="imudp" port="514")
module(load="imtcp")
input(type="imtcp" port="514")
# Add template
$template RemoteLogs,"/var/log/remote/%HOSTNAME%/%PROGRAMNAME%.log"
*.* ?RemoteLogs
& stop
# On client servers
sudo nano /etc/rsyslog.conf
# Add at the end:
*.* @@log-server-ip:514
Alert Configuration
Email Alerts Setup
# Install mail utilities
sudo apt install mailutils postfix -y
# Configure postfix as Internet Site
Prometheus Alerting Rules
sudo nano /etc/prometheus/alerts/node_alerts.yml
groups:
- name: node_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% (current value: {{ $value }}%)"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: "High memory usage detected"
description: "Memory usage is above 90% (current value: {{ $value }}%)"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs|squashfs"} / node_filesystem_size_bytes) * 100 < 10
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space"
description: "Disk space is below 10% (current value: {{ $value }}%)"
Performance Tuning for Monitoring
Optimize Prometheus Storage
# Configure retention
sudo nano /etc/systemd/system/prometheus.service
# Add to ExecStart:
--storage.tsdb.retention.time=30d \
--storage.tsdb.retention.size=10GB
Elasticsearch Optimization
sudo nano /etc/elasticsearch/jvm.options
# Set heap size (50% of RAM, max 32GB)
-Xms4g
-Xmx4g
Best Practices
- Regular Reviews: Schedule weekly log reviews
- Retention Policies: Define log retention based on compliance needs
- Alert Fatigue: Avoid too many alerts; focus on actionable ones
- Backup Metrics: Regularly backup Prometheus data
- Security: Secure monitoring endpoints with authentication
- Documentation: Document what metrics mean and thresholds
- Automation: Automate responses to common alerts
Troubleshooting
High Resource Usage
# Find resource-intensive processes
ps aux | sort -nrk 3,3 | head -n 10 # CPU
ps aux | sort -nrk 4,4 | head -n 10 # Memory
# Check I/O wait
iostat -x 1
Missing Metrics
# Check exporters
curl http://localhost:9100/metrics # Node exporter
curl http://localhost:9090/metrics # Prometheus
# Verify targets in Prometheus
# http://localhost:9090/targets
Log Issues
# Check disk space
df -h /var/log
# Verify log permissions
ls -la /var/log/
# Force log rotation
sudo logrotate -f /etc/logrotate.conf
Conclusion
This comprehensive guide covered monitoring and logging on Ubuntu Server 22.04, from built-in tools to advanced stacks like Prometheus/Grafana and ELK. Effective monitoring is essential for maintaining reliable systems. Remember to regularly review and adjust your monitoring strategy based on your infrastructure's evolving needs.