SUSE Backup and Disaster Recovery: Complete Enterprise Guide
This comprehensive guide covers backup and disaster recovery solutions for SUSE Linux Enterprise Server, including backup strategies, tools and technologies, automation, disaster recovery planning, and business continuity management.
Introduction to Backup and Disaster Recovery
SUSE backup and disaster recovery encompasses: - Backup Strategies: Full, incremental, differential, and synthetic backups - Backup Tools: tar, rsync, Amanda, Bacula, and enterprise solutions - Storage Options: Local, NAS, SAN, tape, and cloud storage - Disaster Recovery: Planning, testing, and recovery procedures - Business Continuity: RTO/RPO objectives and high availability - Compliance: Data retention and regulatory requirements
Backup Strategy Development
Backup Planning Framework
# Create backup policy document
cat > /usr/local/share/backup-policy.md << 'EOF'
# SUSE Backup Policy Framework
## Backup Classifications
1. **Critical Systems** (RTO: 1hr, RPO: 15min)
- Database servers
- Authentication systems
- Core business applications
2. **Important Systems** (RTO: 4hr, RPO: 1hr)
- File servers
- Web servers
- Development systems
3. **Standard Systems** (RTO: 24hr, RPO: 24hr)
- User workstations
- Test environments
- Non-critical services
## Backup Schedule
- **Critical**: Continuous replication + hourly snapshots
- **Important**: Daily incremental + weekly full
- **Standard**: Daily incremental + monthly full
## Retention Policies
- Daily backups: 7 days
- Weekly backups: 4 weeks
- Monthly backups: 12 months
- Yearly backups: 7 years
## Storage Requirements
- Local: 2x active data size
- Remote: 1x active data size
- Archive: Based on retention policy
EOF
# Backup inventory script
cat > /usr/local/bin/backup-inventory.sh << 'EOF'
#!/bin/bash
# Generate backup inventory
INVENTORY_FILE="/var/log/backup/inventory-$(date +%Y%m%d).csv"
mkdir -p /var/log/backup
echo "Hostname,IP,Category,Data_Size,Backup_Method,Schedule,Retention" > "$INVENTORY_FILE"
# Scan network for systems
for ip in $(nmap -sn 192.168.1.0/24 | grep "report for" | awk '{print $5}'); do
hostname=$(nslookup $ip | grep "name =" | awk '{print $4}' | sed 's/\.$//')
# Determine category based on hostname
if [[ "$hostname" =~ db|auth|ldap ]]; then
category="Critical"
method="Replication+Snapshot"
schedule="Continuous"
retention="7years"
elif [[ "$hostname" =~ web|file|app ]]; then
category="Important"
method="Incremental+Full"
schedule="Daily"
retention="1year"
else
category="Standard"
method="Incremental"
schedule="Daily"
retention="3months"
fi
# Estimate data size (would need actual measurement)
data_size="Unknown"
echo "$hostname,$ip,$category,$data_size,$method,$schedule,$retention" >> "$INVENTORY_FILE"
done
echo "Backup inventory saved to: $INVENTORY_FILE"
EOF
chmod +x /usr/local/bin/backup-inventory.sh
Backup Infrastructure Setup
# Create backup infrastructure
cat > /usr/local/bin/setup-backup-infrastructure.sh << 'EOF'
#!/bin/bash
# Setup backup infrastructure
# Create backup directories
BACKUP_ROOT="/backup"
mkdir -p "$BACKUP_ROOT"/{daily,weekly,monthly,yearly}
mkdir -p "$BACKUP_ROOT"/staging
mkdir -p "$BACKUP_ROOT"/logs
mkdir -p "$BACKUP_ROOT"/configs
# Create backup user
useradd -r -d "$BACKUP_ROOT" -s /bin/bash backup
chown -R backup:backup "$BACKUP_ROOT"
# Setup backup repository structure
cat > "$BACKUP_ROOT/configs/repository.conf" << 'CONFIG'
# Backup Repository Configuration
BACKUP_ROOT="/backup"
STAGING_DIR="$BACKUP_ROOT/staging"
LOG_DIR="$BACKUP_ROOT/logs"
# Retention settings
DAILY_RETAIN=7
WEEKLY_RETAIN=4
MONTHLY_RETAIN=12
YEARLY_RETAIN=7
# Compression settings
COMPRESSION="gzip"
COMPRESSION_LEVEL=6
# Encryption settings
ENCRYPT_BACKUPS="yes"
GPG_RECIPIENT="backup@example.com"
CONFIG
# Create backup catalog database
sqlite3 "$BACKUP_ROOT/backup_catalog.db" << 'SQL'
CREATE TABLE backups (
id INTEGER PRIMARY KEY AUTOINCREMENT,
hostname TEXT NOT NULL,
backup_type TEXT NOT NULL,
backup_date DATETIME DEFAULT CURRENT_TIMESTAMP,
backup_size INTEGER,
backup_path TEXT,
checksum TEXT,
status TEXT,
retention_date DATE
);
CREATE TABLE backup_logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
backup_id INTEGER,
log_date DATETIME DEFAULT CURRENT_TIMESTAMP,
log_level TEXT,
message TEXT,
FOREIGN KEY (backup_id) REFERENCES backups(id)
);
CREATE INDEX idx_hostname ON backups(hostname);
CREATE INDEX idx_backup_date ON backups(backup_date);
CREATE INDEX idx_status ON backups(status);
SQL
echo "Backup infrastructure created in $BACKUP_ROOT"
EOF
chmod +x /usr/local/bin/setup-backup-infrastructure.sh
Backup Tools and Technologies
System Backup with tar and rsync
# Advanced tar backup script
cat > /usr/local/bin/system-backup-tar.sh << 'EOF'
#!/bin/bash
# System backup using tar
BACKUP_ROOT="/backup"
HOSTNAME=$(hostname -f)
DATE=$(date +%Y%m%d-%H%M%S)
BACKUP_TYPE="${1:-full}"
EXCLUDE_FILE="/etc/backup/exclude.list"
# Load configuration
source "$BACKUP_ROOT/configs/repository.conf"
# Logging function
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_DIR/backup-$DATE.log"
}
# Create exclude list if not exists
if [ ! -f "$EXCLUDE_FILE" ]; then
cat > "$EXCLUDE_FILE" << 'EXCLUDE'
/proc/*
/sys/*
/dev/*
/run/*
/tmp/*
/var/tmp/*
/var/cache/*
/backup/*
*.tmp
*.swp
*.log
EXCLUDE
fi
# Determine backup destination
case "$BACKUP_TYPE" in
full)
DEST_DIR="$BACKUP_ROOT/weekly"
TAR_OPTIONS="--create"
;;
incremental)
DEST_DIR="$BACKUP_ROOT/daily"
SNAPSHOT_FILE="$BACKUP_ROOT/.snapshots/$HOSTNAME.snar"
mkdir -p "$(dirname $SNAPSHOT_FILE)"
TAR_OPTIONS="--create --listed-incremental=$SNAPSHOT_FILE"
;;
differential)
DEST_DIR="$BACKUP_ROOT/daily"
REFERENCE_DATE=$(date -d "last sunday" +%Y%m%d)
TAR_OPTIONS="--create --newer-mtime=$REFERENCE_DATE"
;;
esac
BACKUP_FILE="$DEST_DIR/${HOSTNAME}-${BACKUP_TYPE}-${DATE}.tar.gz"
log "Starting $BACKUP_TYPE backup of $HOSTNAME"
# Create backup
tar $TAR_OPTIONS \
--gzip \
--preserve-permissions \
--one-file-system \
--exclude-from="$EXCLUDE_FILE" \
--file="$BACKUP_FILE" \
--verbose \
/ 2>&1 | tee -a "$LOG_DIR/backup-$DATE.log"
# Generate checksum
CHECKSUM=$(sha256sum "$BACKUP_FILE" | awk '{print $1}')
# Encrypt if configured
if [ "$ENCRYPT_BACKUPS" = "yes" ]; then
log "Encrypting backup"
gpg --trust-model always --encrypt --recipient "$GPG_RECIPIENT" "$BACKUP_FILE"
rm -f "$BACKUP_FILE"
BACKUP_FILE="${BACKUP_FILE}.gpg"
fi
# Update catalog
sqlite3 "$BACKUP_ROOT/backup_catalog.db" << SQL
INSERT INTO backups (hostname, backup_type, backup_size, backup_path, checksum, status)
VALUES ('$HOSTNAME', '$BACKUP_TYPE', $(stat -c%s "$BACKUP_FILE"), '$BACKUP_FILE', '$CHECKSUM', 'completed');
SQL
log "Backup completed: $BACKUP_FILE"
EOF
chmod +x /usr/local/bin/system-backup-tar.sh
# Rsync backup script
cat > /usr/local/bin/system-backup-rsync.sh << 'EOF'
#!/bin/bash
# System backup using rsync
BACKUP_ROOT="/backup"
REMOTE_HOST="${1:-backup-server.example.com}"
HOSTNAME=$(hostname -f)
DATE=$(date +%Y%m%d-%H%M%S)
# Rsync options
RSYNC_OPTS="-avzH --delete --numeric-ids --relative"
RSYNC_OPTS="$RSYNC_OPTS --exclude-from=/etc/backup/exclude.list"
RSYNC_OPTS="$RSYNC_OPTS --link-dest=../latest"
# Logging
LOG_FILE="/var/log/backup/rsync-$DATE.log"
mkdir -p "$(dirname $LOG_FILE)"
echo "Starting rsync backup to $REMOTE_HOST" | tee "$LOG_FILE"
# Create remote directory
ssh $REMOTE_HOST "mkdir -p /backup/$HOSTNAME/$DATE"
# Perform rsync
rsync $RSYNC_OPTS \
--log-file="$LOG_FILE" \
--stats \
/ $REMOTE_HOST:/backup/$HOSTNAME/$DATE/
# Update latest symlink
ssh $REMOTE_HOST "cd /backup/$HOSTNAME && rm -f latest && ln -s $DATE latest"
# Verify backup
if rsync --dry-run $RSYNC_OPTS / $REMOTE_HOST:/backup/$HOSTNAME/latest/ | grep -q "^>"; then
echo "WARNING: Backup verification failed - differences detected" | tee -a "$LOG_FILE"
exit 1
fi
echo "Rsync backup completed successfully" | tee -a "$LOG_FILE"
EOF
chmod +x /usr/local/bin/system-backup-rsync.sh
Database Backup
# PostgreSQL backup script
cat > /usr/local/bin/backup-postgresql.sh << 'EOF'
#!/bin/bash
# PostgreSQL backup script
BACKUP_DIR="/backup/postgresql"
DATE=$(date +%Y%m%d-%H%M%S)
RETENTION_DAYS=30
# Create backup directory
mkdir -p "$BACKUP_DIR"/{daily,weekly,archives}
# Backup all databases
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$BACKUP_DIR/backup.log"
}
log "Starting PostgreSQL backup"
# Get list of databases
DATABASES=$(sudo -u postgres psql -l | grep -E "^\s[a-zA-Z]" | awk '{print $1}' | grep -v template)
for DB in $DATABASES; do
log "Backing up database: $DB"
# Dump database
sudo -u postgres pg_dump \
--format=custom \
--blobs \
--verbose \
--file="$BACKUP_DIR/daily/${DB}-${DATE}.dump" \
"$DB" 2>&1 | tee -a "$BACKUP_DIR/backup.log"
# Compress dump
gzip "$BACKUP_DIR/daily/${DB}-${DATE}.dump"
done
# Backup global objects
sudo -u postgres pg_dumpall \
--globals-only \
--file="$BACKUP_DIR/daily/globals-${DATE}.sql"
# Create weekly backup on Sunday
if [ $(date +%w) -eq 0 ]; then
cp -a "$BACKUP_DIR/daily"/*-${DATE}.* "$BACKUP_DIR/weekly/"
fi
# WAL archiving setup
cat > /var/lib/pgsql/data/archive_command.sh << 'SCRIPT'
#!/bin/bash
# PostgreSQL WAL archiving
WAL_ARCHIVE="/backup/postgresql/wal_archive"
mkdir -p "$WAL_ARCHIVE"
# Archive WAL file
test ! -f "$WAL_ARCHIVE/$1" && cp "$1" "$WAL_ARCHIVE/$1"
SCRIPT
chmod +x /var/lib/pgsql/data/archive_command.sh
# Cleanup old backups
find "$BACKUP_DIR/daily" -name "*.dump.gz" -mtime +$RETENTION_DAYS -delete
find "$BACKUP_DIR/daily" -name "*.sql" -mtime +$RETENTION_DAYS -delete
log "PostgreSQL backup completed"
EOF
chmod +x /usr/local/bin/backup-postgresql.sh
# MySQL/MariaDB backup script
cat > /usr/local/bin/backup-mysql.sh << 'EOF'
#!/bin/bash
# MySQL/MariaDB backup script
BACKUP_DIR="/backup/mysql"
DATE=$(date +%Y%m%d-%H%M%S)
MYSQL_USER="backup"
MYSQL_PASS="backup_password"
mkdir -p "$BACKUP_DIR"/{daily,weekly,binlogs}
# Full backup with binary logs
mysqldump \
--user=$MYSQL_USER \
--password=$MYSQL_PASS \
--all-databases \
--single-transaction \
--routines \
--triggers \
--events \
--flush-logs \
--master-data=2 \
--result-file="$BACKUP_DIR/daily/full-backup-${DATE}.sql"
# Compress backup
gzip "$BACKUP_DIR/daily/full-backup-${DATE}.sql"
# Backup binary logs
mysql -u$MYSQL_USER -p$MYSQL_PASS -e "SHOW MASTER STATUS\G" > "$BACKUP_DIR/binlogs/position-${DATE}.txt"
cp /var/lib/mysql/mysql-bin.* "$BACKUP_DIR/binlogs/"
# Point-in-time recovery setup
cat > "$BACKUP_DIR/restore-point-in-time.sh" << 'SCRIPT'
#!/bin/bash
# MySQL point-in-time recovery
BACKUP_FILE="$1"
STOP_DATETIME="$2"
# Restore full backup
gunzip -c "$BACKUP_FILE" | mysql -u root -p
# Apply binary logs up to specific time
mysqlbinlog --stop-datetime="$STOP_DATETIME" /backup/mysql/binlogs/mysql-bin.* | mysql -u root -p
SCRIPT
chmod +x "$BACKUP_DIR/restore-point-in-time.sh"
echo "MySQL backup completed"
EOF
chmod +x /usr/local/bin/backup-mysql.sh
Virtual Machine Backup
# KVM VM backup script
cat > /usr/local/bin/backup-kvm-vms.sh << 'EOF'
#!/bin/bash
# KVM virtual machine backup
BACKUP_DIR="/backup/vms"
DATE=$(date +%Y%m%d-%H%M%S)
COMPRESS="yes"
mkdir -p "$BACKUP_DIR"/{images,configs,snapshots}
# Get list of running VMs
RUNNING_VMS=$(virsh list --name)
for VM in $RUNNING_VMS; do
echo "Backing up VM: $VM"
# Save VM configuration
virsh dumpxml "$VM" > "$BACKUP_DIR/configs/${VM}-${DATE}.xml"
# Create snapshot
SNAPSHOT_NAME="backup-${DATE}"
virsh snapshot-create-as "$VM" "$SNAPSHOT_NAME" \
--description "Backup snapshot" \
--disk-only \
--atomic
# Get disk information
DISKS=$(virsh domblklist "$VM" | grep -E "vd|sd" | awk '{print $1":"$2}')
for DISK_INFO in $DISKS; do
DISK_NAME=$(echo $DISK_INFO | cut -d: -f1)
DISK_PATH=$(echo $DISK_INFO | cut -d: -f2)
# Backup disk
if [ "$COMPRESS" = "yes" ]; then
qemu-img convert -O qcow2 -c "$DISK_PATH" \
"$BACKUP_DIR/images/${VM}-${DISK_NAME}-${DATE}.qcow2"
else
cp "$DISK_PATH" "$BACKUP_DIR/images/${VM}-${DISK_NAME}-${DATE}.img"
fi
done
# Merge snapshot back
virsh blockcommit "$VM" \
$(echo $DISKS | cut -d: -f1 | head -1) \
--active --pivot
# Delete snapshot
virsh snapshot-delete "$VM" "$SNAPSHOT_NAME"
done
# Backup offline VMs
OFFLINE_VMS=$(virsh list --all --name | grep -v "^$" | \
comm -23 - <(echo "$RUNNING_VMS" | sort))
for VM in $OFFLINE_VMS; do
echo "Backing up offline VM: $VM"
# Save configuration
virsh dumpxml "$VM" > "$BACKUP_DIR/configs/${VM}-${DATE}.xml"
# Copy disk images
for DISK_PATH in $(virsh domblklist "$VM" | grep -E "vd|sd" | awk '{print $2}'); do
DISK_NAME=$(basename "$DISK_PATH")
cp "$DISK_PATH" "$BACKUP_DIR/images/${VM}-${DISK_NAME}-${DATE}"
done
done
echo "VM backup completed"
EOF
chmod +x /usr/local/bin/backup-kvm-vms.sh
Enterprise Backup Solutions
Bacula Configuration
# Install Bacula
sudo zypper install bacula-dir bacula-sd bacula-fd bacula-console
# Configure Bacula Director
cat > /etc/bacula/bacula-dir.conf << 'EOF'
# Bacula Director Configuration
Director {
Name = bacula-dir
DIRport = 9101
QueryFile = "/etc/bacula/query.sql"
WorkingDirectory = "/var/lib/bacula"
PidDirectory = "/var/run"
Maximum Concurrent Jobs = 20
Password = "director_password"
Messages = Daemon
}
# Storage Daemon
Storage {
Name = File
Address = backup-server.example.com
SDPort = 9103
Password = "storage_password"
Device = FileStorage
Media Type = File
Maximum Concurrent Jobs = 10
}
# Catalog
Catalog {
Name = MyCatalog
dbname = "bacula"; dbuser = "bacula"; dbpassword = "bacula_db_password"
}
# File Sets
FileSet {
Name = "Full Set"
Include {
Options {
signature = MD5
compression = GZIP
}
File = /
}
Exclude {
File = /proc
File = /sys
File = /tmp
File = /var/tmp
File = /backup
}
}
FileSet {
Name = "Database Set"
Include {
Options {
signature = MD5
compression = GZIP
}
File = /var/lib/pgsql
File = /var/lib/mysql
}
}
# Schedules
Schedule {
Name = "WeeklyCycle"
Run = Full 1st sun at 23:05
Run = Differential 2nd-5th sun at 23:05
Run = Incremental mon-sat at 23:05
}
Schedule {
Name = "DailyCycle"
Run = Full sun at 23:05
Run = Incremental mon-sat at 23:05
}
# Client definitions
Client {
Name = server1-fd
Address = server1.example.com
FDPort = 9102
Catalog = MyCatalog
Password = "client1_password"
File Retention = 60 days
Job Retention = 6 months
AutoPrune = yes
}
# Job definitions
Job {
Name = "BackupServer1"
Type = Backup
Level = Incremental
Client = server1-fd
FileSet = "Full Set"
Schedule = "WeeklyCycle"
Storage = File
Messages = Standard
Pool = File
Priority = 10
Write Bootstrap = "/var/lib/bacula/%c.bsr"
}
# Restore job
Job {
Name = "RestoreFiles"
Type = Restore
Client = server1-fd
FileSet = "Full Set"
Storage = File
Pool = File
Messages = Standard
Where = /tmp/bacula-restores
}
# Pool definition
Pool {
Name = File
Pool Type = Backup
Recycle = yes
AutoPrune = yes
Volume Retention = 365 days
Maximum Volume Bytes = 50G
Maximum Volumes = 100
Label Format = "Vol-"
}
# Messages
Messages {
Name = Standard
mailcommand = "/usr/sbin/bsmtp -h localhost -f \"\(Bacula\) \<%r\>\" -s \"Bacula: %t %e of %c %l\" %r"
operatorcommand = "/usr/sbin/bsmtp -h localhost -f \"\(Bacula\) \<%r\>\" -s \"Bacula: Intervention needed for %j\" %r"
mail = backup-admin@example.com = all, !skipped
operator = backup-operator@example.com = mount
console = all, !skipped, !saved
append = "/var/log/bacula/bacula.log" = all, !skipped
catalog = all
}
Messages {
Name = Daemon
mailcommand = "/usr/sbin/bsmtp -h localhost -f \"\(Bacula\) \<%r\>\" -s \"Bacula daemon message\" %r"
mail = backup-admin@example.com = all, !skipped
console = all, !skipped, !saved
append = "/var/log/bacula/bacula-daemon.log" = all, !skipped
}
EOF
# Start Bacula services
systemctl enable bacula-dir bacula-sd bacula-fd
systemctl start bacula-dir bacula-sd bacula-fd
Amanda Configuration
# Install Amanda
sudo zypper install amanda-server amanda-client
# Configure Amanda
mkdir -p /etc/amanda/DailySet1
cd /etc/amanda/DailySet1
# Create amanda.conf
cat > amanda.conf << 'EOF'
# Amanda Configuration
org "Example Organization"
mailto "backup-admin@example.com"
dumpcycle 7 days
runspercycle 7
tapecycle 14 tapes
dumpuser "amanda"
inparallel 4
netusage 8000 Kbps
# Holding disk
holdingdisk hd1 {
directory "/var/amanda/holding"
use -100 Mb
chunksize 1Gb
}
# Tape device
define tapetype DISK {
length 100 Gb
filemark 4 Kb
}
define dumptype global {
comment "Global definitions"
auth "bsdtcp"
estimate calcsize
compress client best
index yes
record yes
}
define dumptype user-tar {
global
program "GNUTAR"
comment "User partitions dumped with tar"
priority low
}
define dumptype comp-root {
global
program "GNUTAR"
comment "Root partitions with compression"
compress client fast
priority high
}
# Tape changer
define changer vtape {
tapedev "chg-disk:/var/amanda/vtapes/DailySet1"
property "num-slot" "10"
property "auto-create-slot" "yes"
}
tpchanger "vtape"
# Network interfaces
define interface local {
use 10000 kbps
}
define interface lan {
use 1000 kbps
}
EOF
# Create disklist
cat > disklist << 'EOF'
# Amanda Disk List
server1.example.com / comp-root
server1.example.com /home user-tar
server2.example.com / comp-root
server2.example.com /var user-tar
EOF
# Initialize Amanda
su - amanda -c "amcheck DailySet1"
EOF
Disaster Recovery Planning
DR Plan Documentation
# Create DR plan template
cat > /usr/local/share/disaster-recovery-plan.md << 'EOF'
# Disaster Recovery Plan
## 1. Executive Summary
This document outlines the disaster recovery procedures for SUSE Linux Enterprise Server infrastructure.
## 2. Recovery Objectives
- **RTO (Recovery Time Objective)**: Maximum acceptable downtime
- Critical systems: 1 hour
- Important systems: 4 hours
- Standard systems: 24 hours
- **RPO (Recovery Point Objective)**: Maximum acceptable data loss
- Critical systems: 15 minutes
- Important systems: 1 hour
- Standard systems: 24 hours
## 3. Contact Information
- DR Coordinator: John Doe (555-0100)
- IT Manager: Jane Smith (555-0101)
- Network Admin: Bob Johnson (555-0102)
- Database Admin: Alice Brown (555-0103)
## 4. System Priorities
1. Authentication servers (LDAP, Kerberos)
2. Database servers
3. File servers
4. Application servers
5. Web servers
6. Development systems
## 5. Recovery Procedures
### 5.1 Phase 1: Assessment (0-30 minutes)
1. Assess extent of disaster
2. Activate DR team
3. Establish command center
4. Begin communication plan
### 5.2 Phase 2: Infrastructure Recovery (30 min - 2 hours)
1. Restore network connectivity
2. Restore authentication services
3. Restore DNS/DHCP services
4. Verify core infrastructure
### 5.3 Phase 3: System Recovery (2-8 hours)
1. Restore critical systems from backups
2. Restore database servers
3. Restore application servers
4. Restore file servers
### 5.4 Phase 4: Validation (8-12 hours)
1. Verify system functionality
2. Test application connectivity
3. Validate data integrity
4. User acceptance testing
### 5.5 Phase 5: Return to Production (12-24 hours)
1. Redirect users to recovered systems
2. Monitor system performance
3. Document lessons learned
4. Update DR procedures
## 6. Recovery Scenarios
### 6.1 Single Server Failure
- Use HA failover if available
- Restore from latest backup
- Rebuild from configuration management
### 6.2 Storage Failure
- Failover to replicated storage
- Restore from backup storage
- Rebuild RAID arrays
### 6.3 Site Failure
- Failover to DR site
- Restore from offsite backups
- Redirect network traffic
## 7. Testing Schedule
- Monthly: Backup restoration test
- Quarterly: Single system recovery
- Annually: Full DR simulation
## 8. Appendices
- A: System inventory
- B: Network diagrams
- C: Backup locations
- D: Vendor contacts
- E: Recovery checklists
EOF
# Create recovery runbooks
cat > /usr/local/bin/generate-recovery-runbook.sh << 'EOF'
#!/bin/bash
# Generate system-specific recovery runbook
SYSTEM="$1"
RUNBOOK_DIR="/usr/local/share/runbooks"
mkdir -p "$RUNBOOK_DIR"
cat > "$RUNBOOK_DIR/${SYSTEM}-recovery.md" << RUNBOOK
# Recovery Runbook: $SYSTEM
## Pre-Recovery Checklist
- [ ] Backup media available
- [ ] Network connectivity verified
- [ ] Hardware/VM resources ready
- [ ] Recovery credentials available
## Recovery Steps
### 1. Base System Installation
\`\`\`bash
# Boot from SLES installation media
# Perform minimal installation
# Configure network:
ip addr add $(ip addr show | grep "inet " | grep -v "127.0.0.1" | head -1 | awk '{print $2}') dev eth0
ip route add default via $(ip route | grep default | awk '{print $3}')
\`\`\`
### 2. Restore System Configuration
\`\`\`bash
# Mount backup location
mount backup-server:/backup /mnt/backup
# Restore base system
cd /
tar xzf /mnt/backup/latest/${SYSTEM}-full-backup.tar.gz
# Restore bootloader
grub2-install /dev/sda
grub2-mkconfig -o /boot/grub2/grub.cfg
\`\`\`
### 3. Service-Specific Recovery
$(case $SYSTEM in
*db*) echo "# Database recovery
pg_restore -h localhost -U postgres -d mydb /mnt/backup/postgresql/latest.dump"
;;
*web*) echo "# Web server recovery
systemctl enable nginx
systemctl start nginx"
;;
*) echo "# Application recovery
# Restore application data
# Start application services"
;;
esac)
### 4. Validation
\`\`\`bash
# System validation
systemctl status
ss -tlnp
df -h
# Service validation
$(case $SYSTEM in
*db*) echo "psql -U postgres -c 'SELECT version();'"
;;
*web*) echo "curl -I http://localhost"
;;
*) echo "# Test application functionality"
;;
esac)
\`\`\`
## Post-Recovery Tasks
- [ ] Update DNS records
- [ ] Verify monitoring
- [ ] Test user access
- [ ] Document recovery time
- [ ] Update inventory
RUNBOOK
echo "Recovery runbook created: $RUNBOOK_DIR/${SYSTEM}-recovery.md"
EOF
chmod +x /usr/local/bin/generate-recovery-runbook.sh
Automated Recovery Testing
# DR test automation script
cat > /usr/local/bin/dr-test-automation.sh << 'EOF'
#!/bin/bash
# Automated DR testing
TEST_TYPE="${1:-backup-restore}"
TEST_ENV="/dr-test"
LOG_FILE="/var/log/dr-test-$(date +%Y%m%d-%H%M%S).log"
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_FILE"
}
# Test backup restoration
test_backup_restore() {
local BACKUP_FILE="$1"
local TEST_DIR="$TEST_ENV/restore-test"
log "Testing backup restore: $BACKUP_FILE"
mkdir -p "$TEST_DIR"
cd "$TEST_DIR"
# Extract backup
tar xzf "$BACKUP_FILE" 2>&1 | tee -a "$LOG_FILE"
# Verify critical files
CRITICAL_FILES=(
"etc/passwd"
"etc/group"
"etc/fstab"
"etc/sysconfig/network"
)
for file in "${CRITICAL_FILES[@]}"; do
if [ -f "$file" ]; then
log "✓ Found: $file"
else
log "✗ Missing: $file"
return 1
fi
done
# Cleanup
rm -rf "$TEST_DIR"
log "Backup restore test completed successfully"
return 0
}
# Test VM recovery
test_vm_recovery() {
local VM_BACKUP="$1"
local TEST_VM="dr-test-vm"
log "Testing VM recovery: $VM_BACKUP"
# Create test VM from backup
virt-install \
--name "$TEST_VM" \
--memory 2048 \
--vcpus 2 \
--disk "$VM_BACKUP",bus=virtio \
--import \
--network network=isolated \
--graphics none \
--noautoconsole
# Wait for VM to boot
sleep 30
# Test VM connectivity
if virsh domstate "$TEST_VM" | grep -q "running"; then
log "✓ VM is running"
# Get VM IP
VM_IP=$(virsh domifaddr "$TEST_VM" | grep ipv4 | awk '{print $4}' | cut -d/ -f1)
if ping -c 3 "$VM_IP" > /dev/null 2>&1; then
log "✓ VM network connectivity OK"
else
log "✗ VM network connectivity failed"
fi
else
log "✗ VM failed to start"
fi
# Cleanup
virsh destroy "$TEST_VM" 2>/dev/null
virsh undefine "$TEST_VM" --remove-all-storage
log "VM recovery test completed"
}
# Test database recovery
test_database_recovery() {
local DB_BACKUP="$1"
local TEST_DB="test_recovery_db"
log "Testing database recovery: $DB_BACKUP"
# Create test database
sudo -u postgres createdb "$TEST_DB"
# Restore backup
gunzip -c "$DB_BACKUP" | sudo -u postgres psql "$TEST_DB" 2>&1 | tee -a "$LOG_FILE"
# Verify tables
TABLE_COUNT=$(sudo -u postgres psql -t -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='public';" "$TEST_DB")
if [ "$TABLE_COUNT" -gt 0 ]; then
log "✓ Database restored with $TABLE_COUNT tables"
else
log "✗ Database restoration failed"
fi
# Cleanup
sudo -u postgres dropdb "$TEST_DB"
log "Database recovery test completed"
}
# Main test execution
case "$TEST_TYPE" in
backup-restore)
LATEST_BACKUP=$(ls -t /backup/weekly/*.tar.gz | head -1)
test_backup_restore "$LATEST_BACKUP"
;;
vm-recovery)
LATEST_VM_BACKUP=$(ls -t /backup/vms/images/*.qcow2 | head -1)
test_vm_recovery "$LATEST_VM_BACKUP"
;;
database-recovery)
LATEST_DB_BACKUP=$(ls -t /backup/postgresql/daily/*.sql.gz | head -1)
test_database_recovery "$LATEST_DB_BACKUP"
;;
full-dr)
log "Starting full DR test"
# Run all tests
test_backup_restore "$(ls -t /backup/weekly/*.tar.gz | head -1)"
test_vm_recovery "$(ls -t /backup/vms/images/*.qcow2 | head -1)"
test_database_recovery "$(ls -t /backup/postgresql/daily/*.sql.gz | head -1)"
;;
esac
# Generate report
cat > "/var/log/dr-test-report-$(date +%Y%m%d).txt" << REPORT
Disaster Recovery Test Report
============================
Date: $(date)
Test Type: $TEST_TYPE
Log File: $LOG_FILE
Test Results:
$(grep -E "✓|✗" "$LOG_FILE")
Summary:
- Passed: $(grep -c "✓" "$LOG_FILE")
- Failed: $(grep -c "✗" "$LOG_FILE")
REPORT
log "DR test completed. Report generated."
EOF
chmod +x /usr/local/bin/dr-test-automation.sh
Backup Automation and Scheduling
Centralized Backup Management
# Master backup scheduler
cat > /usr/local/bin/master-backup-scheduler.sh << 'EOF'
#!/bin/bash
# Centralized backup scheduler
BACKUP_CONFIG="/etc/backup/schedule.conf"
LOG_DIR="/var/log/backup"
STATE_FILE="/var/lib/backup/scheduler.state"
# Load backup schedule
source "$BACKUP_CONFIG"
# Initialize state file
[ -f "$STATE_FILE" ] || echo "{}" > "$STATE_FILE"
# Schedule backup job
schedule_backup() {
local HOST="$1"
local TYPE="$2"
local TIME="$3"
case "$TYPE" in
full|incremental|differential)
ssh "$HOST" "/usr/local/bin/system-backup-tar.sh $TYPE" &
;;
database)
ssh "$HOST" "/usr/local/bin/backup-postgresql.sh" &
;;
vm)
ssh "$HOST" "/usr/local/bin/backup-kvm-vms.sh" &
;;
esac
# Update state
echo "{\"$HOST\": {\"last_$TYPE\": \"$(date)\", \"status\": \"running\"}}" | \
jq -s '.[0] * .[1]' "$STATE_FILE" - > "$STATE_FILE.tmp" && \
mv "$STATE_FILE.tmp" "$STATE_FILE"
}
# Monitor backup jobs
monitor_backups() {
while true; do
# Check job status
for job in $(jobs -p); do
if ! kill -0 "$job" 2>/dev/null; then
# Job completed
wait "$job"
STATUS=$?
# Update state based on exit status
fi
done
sleep 60
done
}
# Main scheduler loop
while true; do
CURRENT_HOUR=$(date +%H)
CURRENT_DOW=$(date +%u)
# Check schedule
for SCHEDULE_ENTRY in "${BACKUP_SCHEDULE[@]}"; do
IFS=':' read -r HOST TYPE FREQUENCY TIME <<< "$SCHEDULE_ENTRY"
case "$FREQUENCY" in
daily)
if [ "$CURRENT_HOUR" = "$TIME" ]; then
schedule_backup "$HOST" "$TYPE" "$TIME"
fi
;;
weekly)
if [ "$CURRENT_DOW" = "7" ] && [ "$CURRENT_HOUR" = "$TIME" ]; then
schedule_backup "$HOST" "$TYPE" "$TIME"
fi
;;
esac
done
# Wait until next hour
sleep 3600
done &
# Start monitoring
monitor_backups
EOF
chmod +x /usr/local/bin/master-backup-scheduler.sh
# Create systemd service
cat > /etc/systemd/system/backup-scheduler.service << 'EOF'
[Unit]
Description=Master Backup Scheduler
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/master-backup-scheduler.sh
Restart=always
User=backup
Group=backup
[Install]
WantedBy=multi-user.target
EOF
systemctl enable backup-scheduler.service
systemctl start backup-scheduler.service
Backup Monitoring and Reporting
# Backup monitoring dashboard
cat > /usr/local/bin/backup-monitoring.sh << 'EOF'
#!/bin/bash
# Backup monitoring and reporting
CATALOG_DB="/backup/backup_catalog.db"
REPORT_DIR="/var/www/html/backup-reports"
EMAIL_RECIPIENT="backup-admin@example.com"
mkdir -p "$REPORT_DIR"
# Generate HTML dashboard
generate_dashboard() {
cat > "$REPORT_DIR/index.html" << 'HTML'
<!DOCTYPE html>
<html>
<head>
<title>Backup Dashboard</title>
<meta http-equiv="refresh" content="300">
<style>
body { font-family: Arial, sans-serif; margin: 20px; }
table { border-collapse: collapse; width: 100%; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background-color: #4CAF50; color: white; }
.success { color: green; }
.warning { color: orange; }
.error { color: red; }
.summary { margin: 20px 0; padding: 15px; background: #f0f0f0; }
</style>
</head>
<body>
<h1>Backup System Dashboard</h1>
<p>Last updated: $(date)</p>
HTML
# Summary statistics
TOTAL_BACKUPS=$(sqlite3 "$CATALOG_DB" "SELECT COUNT(*) FROM backups WHERE backup_date > datetime('now', '-24 hours');")
SUCCESSFUL=$(sqlite3 "$CATALOG_DB" "SELECT COUNT(*) FROM backups WHERE status='completed' AND backup_date > datetime('now', '-24 hours');")
FAILED=$(sqlite3 "$CATALOG_DB" "SELECT COUNT(*) FROM backups WHERE status='failed' AND backup_date > datetime('now', '-24 hours');")
TOTAL_SIZE=$(sqlite3 "$CATALOG_DB" "SELECT SUM(backup_size) FROM backups WHERE backup_date > datetime('now', '-24 hours');" | numfmt --to=iec)
cat >> "$REPORT_DIR/index.html" << HTML
<div class="summary">
<h2>24-Hour Summary</h2>
<p>Total Backups: $TOTAL_BACKUPS</p>
<p>Successful: <span class="success">$SUCCESSFUL</span></p>
<p>Failed: <span class="error">$FAILED</span></p>
<p>Total Size: $TOTAL_SIZE</p>
</div>
<h2>Recent Backups</h2>
<table>
<tr>
<th>Hostname</th>
<th>Type</th>
<th>Date</th>
<th>Size</th>
<th>Status</th>
<th>Path</th>
</tr>
HTML
# Recent backup list
sqlite3 -html "$CATALOG_DB" << 'SQL' >> "$REPORT_DIR/index.html"
SELECT
hostname,
backup_type,
datetime(backup_date, 'localtime') as backup_date,
printf("%.2f GB", backup_size/1024.0/1024.0/1024.0) as size,
CASE
WHEN status='completed' THEN '<span class="success">Completed</span>'
WHEN status='failed' THEN '<span class="error">Failed</span>'
ELSE '<span class="warning">' || status || '</span>'
END as status,
backup_path
FROM backups
ORDER BY backup_date DESC
LIMIT 50;
SQL
cat >> "$REPORT_DIR/index.html" << 'HTML'
</table>
<h2>Backup Trends</h2>
<canvas id="backupChart"></canvas>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
<script>
// Add chart visualization here
</script>
</body>
</html>
HTML
}
# Check backup health
check_backup_health() {
local ISSUES=""
# Check for missed backups
MISSED=$(sqlite3 "$CATALOG_DB" << 'SQL'
SELECT hostname
FROM (SELECT DISTINCT hostname FROM backups) h
WHERE NOT EXISTS (
SELECT 1 FROM backups b
WHERE b.hostname = h.hostname
AND b.backup_date > datetime('now', '-24 hours')
);
SQL
)
if [ -n "$MISSED" ]; then
ISSUES+="Missed backups in last 24 hours:\n$MISSED\n\n"
fi
# Check for failed backups
FAILED=$(sqlite3 "$CATALOG_DB" "SELECT hostname, backup_date FROM backups WHERE status='failed' AND backup_date > datetime('now', '-24 hours');")
if [ -n "$FAILED" ]; then
ISSUES+="Failed backups:\n$FAILED\n\n"
fi
# Check disk space
BACKUP_FS_USAGE=$(df -h /backup | tail -1 | awk '{print $5}' | sed 's/%//')
if [ "$BACKUP_FS_USAGE" -gt 80 ]; then
ISSUES+="WARNING: Backup filesystem is ${BACKUP_FS_USAGE}% full\n\n"
fi
# Send alert if issues found
if [ -n "$ISSUES" ]; then
echo -e "Backup System Issues Detected:\n\n$ISSUES" | \
mail -s "Backup System Alert - $(date +%Y-%m-%d)" "$EMAIL_RECIPIENT"
fi
}
# Retention management
manage_retention() {
# Remove expired backups
sqlite3 "$CATALOG_DB" << 'SQL' | while read -r BACKUP_PATH; do
if [ -f "$BACKUP_PATH" ]; then
rm -f "$BACKUP_PATH"
echo "Removed expired backup: $BACKUP_PATH"
fi
done
SELECT backup_path
FROM backups
WHERE retention_date < date('now')
AND status = 'completed';
SQL
# Update catalog
sqlite3 "$CATALOG_DB" "DELETE FROM backups WHERE retention_date < date('now');"
}
# Main execution
generate_dashboard
check_backup_health
manage_retention
echo "Backup monitoring completed at $(date)"
EOF
chmod +x /usr/local/bin/backup-monitoring.sh
# Add to cron
echo "*/15 * * * * /usr/local/bin/backup-monitoring.sh" | crontab -
Cloud Backup Integration
S3 Compatible Storage
# S3 backup script
cat > /usr/local/bin/backup-to-s3.sh << 'EOF'
#!/bin/bash
# Backup to S3 compatible storage
S3_BUCKET="s3://company-backups"
S3_ENDPOINT="https://s3.example.com"
AWS_ACCESS_KEY_ID="access_key"
AWS_SECRET_ACCESS_KEY="secret_key"
# Configure AWS CLI
export AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY
# Backup function
backup_to_s3() {
local SOURCE="$1"
local DEST_PREFIX="$2"
local DATE=$(date +%Y%m%d-%H%M%S)
# Create tarball
TEMP_FILE="/tmp/backup-${DATE}.tar.gz"
tar czf "$TEMP_FILE" -C "$(dirname "$SOURCE")" "$(basename "$SOURCE")"
# Calculate checksum
CHECKSUM=$(sha256sum "$TEMP_FILE" | awk '{print $1}')
# Upload to S3
aws s3 cp "$TEMP_FILE" \
"${S3_BUCKET}/${DEST_PREFIX}/${DATE}/backup.tar.gz" \
--endpoint-url "$S3_ENDPOINT" \
--storage-class GLACIER \
--metadata "checksum=$CHECKSUM,hostname=$(hostname),date=$DATE"
# Cleanup
rm -f "$TEMP_FILE"
echo "Backup uploaded to S3: ${S3_BUCKET}/${DEST_PREFIX}/${DATE}/"
}
# Restore from S3
restore_from_s3() {
local S3_PATH="$1"
local DEST_DIR="$2"
# Download from S3
TEMP_FILE="/tmp/restore-$(date +%s).tar.gz"
aws s3 cp "$S3_PATH" "$TEMP_FILE" --endpoint-url "$S3_ENDPOINT"
# Extract
mkdir -p "$DEST_DIR"
tar xzf "$TEMP_FILE" -C "$DEST_DIR"
# Cleanup
rm -f "$TEMP_FILE"
echo "Restored from S3 to: $DEST_DIR"
}
# S3 lifecycle policy
create_lifecycle_policy() {
cat > /tmp/lifecycle-policy.json << 'JSON'
{
"Rules": [
{
"ID": "Archive old backups",
"Status": "Enabled",
"Transitions": [
{
"Days": 30,
"StorageClass": "GLACIER"
},
{
"Days": 90,
"StorageClass": "DEEP_ARCHIVE"
}
],
"Expiration": {
"Days": 2555
}
}
]
}
JSON
aws s3api put-bucket-lifecycle-configuration \
--bucket "company-backups" \
--lifecycle-configuration file:///tmp/lifecycle-policy.json \
--endpoint-url "$S3_ENDPOINT"
}
# Main execution
case "$1" in
backup)
backup_to_s3 "$2" "$3"
;;
restore)
restore_from_s3 "$2" "$3"
;;
setup-lifecycle)
create_lifecycle_policy
;;
esac
EOF
chmod +x /usr/local/bin/backup-to-s3.sh
Azure Backup Integration
# Azure backup script
cat > /usr/local/bin/backup-to-azure.sh << 'EOF'
#!/bin/bash
# Backup to Azure Blob Storage
AZURE_STORAGE_ACCOUNT="companybackups"
AZURE_STORAGE_KEY="storage_key"
CONTAINER_NAME="backups"
# Install Azure CLI if not present
if ! command -v az &> /dev/null; then
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
fi
# Login to Azure
export AZURE_STORAGE_ACCOUNT AZURE_STORAGE_KEY
# Backup to Azure
backup_to_azure() {
local SOURCE="$1"
local BLOB_PREFIX="$2"
local DATE=$(date +%Y%m%d-%H%M%S)
# Create snapshot
SNAPSHOT_FILE="/tmp/snapshot-${DATE}.tar.gz"
tar czf "$SNAPSHOT_FILE" -C "$(dirname "$SOURCE")" "$(basename "$SOURCE")"
# Upload to Azure
az storage blob upload \
--account-name "$AZURE_STORAGE_ACCOUNT" \
--account-key "$AZURE_STORAGE_KEY" \
--container-name "$CONTAINER_NAME" \
--name "${BLOB_PREFIX}/${DATE}/backup.tar.gz" \
--file "$SNAPSHOT_FILE" \
--tier Archive
# Set metadata
az storage blob metadata update \
--account-name "$AZURE_STORAGE_ACCOUNT" \
--account-key "$AZURE_STORAGE_KEY" \
--container-name "$CONTAINER_NAME" \
--name "${BLOB_PREFIX}/${DATE}/backup.tar.gz" \
--metadata hostname=$(hostname) date=$DATE
# Cleanup
rm -f "$SNAPSHOT_FILE"
echo "Backup uploaded to Azure: ${CONTAINER_NAME}/${BLOB_PREFIX}/${DATE}/"
}
# Setup Azure backup policy
setup_azure_policy() {
# Create lifecycle management policy
cat > /tmp/azure-lifecycle.json << 'JSON'
{
"rules": [
{
"enabled": true,
"name": "archiveoldbackups",
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"tierToArchive": {
"daysAfterModificationGreaterThan": 30
},
"delete": {
"daysAfterModificationGreaterThan": 2555
}
}
},
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["backups/"]
}
}
}
]
}
JSON
az storage account management-policy create \
--account-name "$AZURE_STORAGE_ACCOUNT" \
--policy @/tmp/azure-lifecycle.json
}
# Main execution
"$@"
EOF
chmod +x /usr/local/bin/backup-to-azure.sh
Compliance and Reporting
Compliance Audit Script
# Backup compliance audit
cat > /usr/local/bin/backup-compliance-audit.sh << 'EOF'
#!/bin/bash
# Backup compliance audit and reporting
AUDIT_DIR="/var/log/backup/audit"
COMPLIANCE_DB="/var/lib/backup/compliance.db"
mkdir -p "$AUDIT_DIR"
# Initialize compliance database
sqlite3 "$COMPLIANCE_DB" << 'SQL'
CREATE TABLE IF NOT EXISTS compliance_checks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
check_date DATETIME DEFAULT CURRENT_TIMESTAMP,
hostname TEXT,
check_type TEXT,
status TEXT,
details TEXT
);
CREATE TABLE IF NOT EXISTS retention_compliance (
id INTEGER PRIMARY KEY AUTOINCREMENT,
check_date DATETIME DEFAULT CURRENT_TIMESTAMP,
data_type TEXT,
required_retention TEXT,
actual_retention TEXT,
compliant BOOLEAN
);
SQL
# Check backup compliance
check_backup_compliance() {
local HOSTNAME="$1"
local REQUIRED_FREQUENCY="$2"
# Check if backups are running per schedule
LAST_BACKUP=$(sqlite3 /backup/backup_catalog.db \
"SELECT MAX(backup_date) FROM backups WHERE hostname='$HOSTNAME' AND status='completed';")
if [ -z "$LAST_BACKUP" ]; then
STATUS="non-compliant"
DETAILS="No successful backups found"
else
HOURS_SINCE=$((( $(date +%s) - $(date -d "$LAST_BACKUP" +%s) ) / 3600))
case "$REQUIRED_FREQUENCY" in
daily)
if [ "$HOURS_SINCE" -gt 24 ]; then
STATUS="non-compliant"
DETAILS="Last backup $HOURS_SINCE hours ago (>24h)"
else
STATUS="compliant"
DETAILS="Last backup $HOURS_SINCE hours ago"
fi
;;
weekly)
if [ "$HOURS_SINCE" -gt 168 ]; then
STATUS="non-compliant"
DETAILS="Last backup $HOURS_SINCE hours ago (>168h)"
else
STATUS="compliant"
DETAILS="Last backup $HOURS_SINCE hours ago"
fi
;;
esac
fi
# Record compliance check
sqlite3 "$COMPLIANCE_DB" \
"INSERT INTO compliance_checks (hostname, check_type, status, details) \
VALUES ('$HOSTNAME', 'backup_frequency', '$STATUS', '$DETAILS');"
}
# Check retention compliance
check_retention_compliance() {
local DATA_TYPE="$1"
local REQUIRED_DAYS="$2"
# Get oldest available backup
OLDEST_BACKUP=$(sqlite3 /backup/backup_catalog.db \
"SELECT MIN(backup_date) FROM backups WHERE backup_type LIKE '%$DATA_TYPE%' AND status='completed';")
if [ -n "$OLDEST_BACKUP" ]; then
ACTUAL_DAYS=$((( $(date +%s) - $(date -d "$OLDEST_BACKUP" +%s) ) / 86400))
if [ "$ACTUAL_DAYS" -ge "$REQUIRED_DAYS" ]; then
COMPLIANT=1
else
COMPLIANT=0
fi
else
ACTUAL_DAYS=0
COMPLIANT=0
fi
sqlite3 "$COMPLIANCE_DB" \
"INSERT INTO retention_compliance (data_type, required_retention, actual_retention, compliant) \
VALUES ('$DATA_TYPE', '$REQUIRED_DAYS days', '$ACTUAL_DAYS days', $COMPLIANT);"
}
# Generate compliance report
generate_compliance_report() {
REPORT_FILE="$AUDIT_DIR/compliance-report-$(date +%Y%m%d).html"
cat > "$REPORT_FILE" << 'HTML'
<!DOCTYPE html>
<html>
<head>
<title>Backup Compliance Report</title>
<style>
body { font-family: Arial, sans-serif; margin: 20px; }
.compliant { color: green; }
.non-compliant { color: red; }
table { border-collapse: collapse; width: 100%; margin: 20px 0; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background-color: #4CAF50; color: white; }
.summary { background: #f0f0f0; padding: 15px; margin: 20px 0; }
</style>
</head>
<body>
<h1>Backup System Compliance Report</h1>
<p>Generated: $(date)</p>
HTML
# Compliance summary
TOTAL_CHECKS=$(sqlite3 "$COMPLIANCE_DB" "SELECT COUNT(*) FROM compliance_checks WHERE date(check_date) = date('now');")
COMPLIANT=$(sqlite3 "$COMPLIANCE_DB" "SELECT COUNT(*) FROM compliance_checks WHERE status='compliant' AND date(check_date) = date('now');")
COMPLIANCE_RATE=$((COMPLIANT * 100 / TOTAL_CHECKS))
cat >> "$REPORT_FILE" << HTML
<div class="summary">
<h2>Executive Summary</h2>
<p>Total Compliance Checks: $TOTAL_CHECKS</p>
<p>Compliant: $COMPLIANT</p>
<p>Compliance Rate: ${COMPLIANCE_RATE}%</p>
</div>
<h2>Backup Frequency Compliance</h2>
<table>
<tr>
<th>Hostname</th>
<th>Check Time</th>
<th>Status</th>
<th>Details</th>
</tr>
HTML
sqlite3 -html "$COMPLIANCE_DB" << 'SQL' >> "$REPORT_FILE"
SELECT
hostname,
datetime(check_date, 'localtime') as check_time,
CASE
WHEN status='compliant' THEN '<span class="compliant">Compliant</span>'
ELSE '<span class="non-compliant">Non-Compliant</span>'
END as status,
details
FROM compliance_checks
WHERE date(check_date) = date('now')
ORDER BY hostname;
SQL
cat >> "$REPORT_FILE" << 'HTML'
</table>
<h2>Retention Compliance</h2>
<table>
<tr>
<th>Data Type</th>
<th>Required Retention</th>
<th>Actual Retention</th>
<th>Status</th>
</tr>
HTML
sqlite3 -html "$COMPLIANCE_DB" << 'SQL' >> "$REPORT_FILE"
SELECT
data_type,
required_retention,
actual_retention,
CASE
WHEN compliant=1 THEN '<span class="compliant">Compliant</span>'
ELSE '<span class="non-compliant">Non-Compliant</span>'
END as status
FROM retention_compliance
WHERE date(check_date) = date('now');
SQL
cat >> "$REPORT_FILE" << 'HTML'
</table>
<h2>Recommendations</h2>
<ul>
<li>Review and address all non-compliant systems</li>
<li>Verify backup schedules are properly configured</li>
<li>Ensure sufficient storage for retention requirements</li>
<li>Schedule regular compliance reviews</li>
</ul>
</body>
</html>
HTML
echo "Compliance report generated: $REPORT_FILE"
# Email report if non-compliant items found
if [ "$COMPLIANCE_RATE" -lt 100 ]; then
mail -s "Backup Compliance Alert - ${COMPLIANCE_RATE}% Compliant" \
-a "Content-Type: text/html" \
backup-admin@example.com < "$REPORT_FILE"
fi
}
# Run compliance checks
# Add your systems and requirements here
check_backup_compliance "server1.example.com" "daily"
check_backup_compliance "server2.example.com" "daily"
check_backup_compliance "db1.example.com" "daily"
# Check retention compliance
check_retention_compliance "database" 90
check_retention_compliance "system" 30
check_retention_compliance "archive" 2555
# Generate report
generate_compliance_report
EOF
chmod +x /usr/local/bin/backup-compliance-audit.sh
# Schedule compliance checks
echo "0 8 * * * /usr/local/bin/backup-compliance-audit.sh" | crontab -
Best Practices
Backup Best Practices Document
# Create best practices guide
cat > /usr/local/share/backup-best-practices.md << 'EOF'
# SUSE Backup and Disaster Recovery Best Practices
## Backup Strategy
1. **3-2-1 Rule**
- 3 copies of important data
- 2 different storage media
- 1 offsite copy
2. **Regular Testing**
- Monthly restoration tests
- Quarterly DR drills
- Annual full site recovery
3. **Documentation**
- Maintain recovery runbooks
- Document dependencies
- Keep contact lists updated
## Technical Best Practices
### System Backups
- Use incremental backups for efficiency
- Implement snapshot technology where available
- Compress backups to save space
- Encrypt sensitive backup data
- Verify backup integrity
### Database Backups
- Use native backup tools
- Implement point-in-time recovery
- Test restoration regularly
- Monitor backup performance impact
- Coordinate with application teams
### Virtual Machine Backups
- Use snapshot-based backups
- Quiesce applications before backup
- Consider agent-based and agentless options
- Backup VM configurations
- Test VM recovery procedures
### Storage Considerations
- Monitor backup storage capacity
- Implement retention policies
- Use deduplication where appropriate
- Consider tiered storage
- Plan for growth
### Security
- Encrypt backups at rest and in transit
- Implement access controls
- Audit backup access
- Secure backup credentials
- Test backup integrity
### Monitoring and Alerting
- Monitor backup job status
- Alert on failures immediately
- Track backup windows
- Monitor storage usage
- Generate regular reports
## Disaster Recovery
### Planning
- Define RTO and RPO for each system
- Identify critical dependencies
- Document recovery procedures
- Maintain vendor contacts
- Plan for various scenarios
### Testing
- Schedule regular DR tests
- Document test results
- Update procedures based on findings
- Train staff on procedures
- Validate recovery times
### Communication
- Establish notification procedures
- Maintain contact lists
- Define escalation paths
- Document decision trees
- Plan status updates
## Compliance
- Understand regulatory requirements
- Implement appropriate retention
- Document compliance measures
- Conduct regular audits
- Maintain audit trails
EOF
Conclusion
A comprehensive backup and disaster recovery strategy is essential for SUSE Linux Enterprise Server environments. Regular testing, automation, proper documentation, and continuous monitoring ensure data protection and business continuity. Following best practices and maintaining compliance with regulatory requirements provides a robust foundation for enterprise data protection.