Hero Image
- Marco Morath

Retention-Skript

Backup-Retention-Script

Anyone who has backups written to a folder will be familiar with the problem that although it is relatively easy to create the files in any file system, cleaning them up can be much work. This is especially true if you don't want to simply delete all files that exceed a certain age. Simply deleting files that exceed a certain age is relatively easy with find. I faced the same problem a few weeks ago. A program had repeatedly written backups over a long period of time. However, I didn't just want to delete them, but wanted to use a specific deletion logic.

The files from the last 14 days should all be available. After that, only the last files of each week up to the 13th week before the current date (corresponds to a quarter of a year) should be deleted. In addition, one file from each month up to a period of one year before the current date.

I wrote a backup retention script for deletion that fulfills these conditions:

#!/bin/bash

# Script:   
# Version:  1.0
# Autor:    Marco Morath
# Lizenz:   Gnu GPLv3 <https://www.gnu.org/licenses/gpl-3.0.html>

# Anwenderhinweis:
# Wird der Parameter dryrun mit übergeben, wird keine Löschung durchgeführt. Es wird nur ermittelt, welche Dateien
# gelöscht werden, wenn das Skript tatsächlich ausgeführt wird.

# Verzeichnis mit den Backups
backup_dir="/mnt/backupfolder"

# Ältere Backups behalten: 14 tägliche, 13 wöchentliche, 12 monatliche
daily_keep=14
weekly_keep=13
monthly_keep=12

# Log-Verzeichnis und -Datei
log_dir="/home/user/logdirectory"
log_file="$log_dir/$(date +%Y-%m-%d)-backup-retention.log"

# Logrotation
log_files=($(ls -1tr $log_dir/*backup-retention*.log))
log_keep=14

# Alte Logdateien löschen
if [ ${#log_files[@]} -gt $log_keep ]; then
    for log in "${log_files[@]:0:$((${#log_files[@]} - $log_keep))}"; do
        echo "Lösche Logdatei: $log"
        rm -f "$log"
    done
fi

# Neues Logfile erstellen
touch "$log_file"

# Liste der Backups in umgekehrter Reihenfolge nach Erstellungsdatum (älteste zuerst)
backups=($(ls -1tr --time=ctime $backup_dir))

# Arrays für die zu behaltenden Backups
daily_backups=()
weekly_backups=()
monthly_backups=()

# Datumswerte
today_in_seconds=$(date +%s)
daily_oldest_in_seconds=$(( $today_in_seconds - ( 86400 * $daily_keep ) ))
weekly_oldest_in_seconds=$(( $today_in_seconds - ( 86400 * 7 * $weekly_keep ) ))
monthly_oldest_in_seconds=$(( $today_in_seconds - ( 86400 * 30 * $monthly_keep ) ))

# Funktion, um die Woche eines Datums zu berechnen
get_week_of_year() {
    date -d "@$1" +%Y-%V
}

# Funktion, um den Monat eines Datums zu berechnen
get_month_of_year() {
    date -d "@$1" +%Y-%m
}

# Backup-Kategorisierung
declare -A weekly_latest
declare -A monthly_latest

for backup in "${backups[@]}"; do
    backup_date_seconds=$(stat -c %Y "$backup_dir/$backup")

    if [ $backup_date_seconds -ge $daily_oldest_in_seconds ]; then
        daily_backups+=($backup)

    elif [ $backup_date_seconds -ge $weekly_oldest_in_seconds ]; then
        week_of_year=$(get_week_of_year $backup_date_seconds)
        if [[ -z "${weekly_latest[$week_of_year]}" || $backup_date_seconds -gt $(stat -c %Y "$backup_dir/${weekly_latest[$week_of_year]}") ]]; then
            weekly_latest[$week_of_year]=$backup
        fi

    elif [ $backup_date_seconds -ge $monthly_oldest_in_seconds ]; then
        month_of_year=$(get_month_of_year $backup_date_seconds)
        if [[ -z "${monthly_latest[$month_of_year]}" || $backup_date_seconds -gt $(stat -c %Y "$backup_dir/${monthly_latest[$month_of_year]}") ]]; then
            monthly_latest[$month_of_year]=$backup
        fi

    fi

done

# Jüngste wöchentliche und monatliche Backups in Arrays speichern
for backup in "${weekly_latest[@]}"; do
    weekly_backups+=($backup)
done

for backup in "${monthly_latest[@]}"; do
    monthly_backups+=($backup)
done

# Alle zu behaltenden Backups zusammenführen
keep_backups=("${daily_backups[@]}" "${weekly_backups[@]}" "${monthly_backups[@]}")

# Funktion zum Protokollieren
log() {
    echo "$1" | tee -a "$log_file"
}

# Beginn des Vorgangs dokumentieren
log "##### Bereinigung begonnen ( $(date +"%d-%m-%Y %H:%M:%S") ) #####"

# Überprüfen, ob der Parameter "dryrun" gesetzt ist
dryrun=false
if [ "$1" == "dryrun" ]; then
    dryrun=true
    log "Dry run aktiviert: Es werden keine Backups gelöscht."
fi

# Backups zum Behalten ermitteln
echo "==== Zu behaltende Backups ===="
number=0
for backup in "${backups[@]}"; do
    if [[ " ${keep_backups[@]} " =~ " $backup " ]]; then

        number=$(($number+1))
        if $dryrun; then
            log "$number Würde behalten: $backup"

        else
            log "$number Behalte Backup: $backup"

        fi

    fi

done

# Backups zum Löschen ermitteln
echo "==== Zu löschende Backups ===="
number=0
for backup in "${backups[@]}"; do
    if [[ ! " ${keep_backups[@]} " =~ " $backup " ]]; then

        number=$(($number+1))
        if $dryrun; then
            log "$number Würde löschen: $backup"

        else
            log "$number Lösche Backup: $backup"
            rm -f "$backup_dir/$backup"

        fi

    fi

done

log "##### Backup-Bereinigung abgeschlossen ( $(date +"%d-%m-%Y %H:%M:%S") ) #####"

Git-Repository

In the first section of the script, the time periods of the versions to be stored can be defined. The storage location of the backups and the location of the log file must also be defined.

The script can then be started manually or time-controlled via crontab.

If you enter the parameter dryrun, the script is executed without any changes to the files and you can see what would happen during execution.

The script has been carefully prepared and tested. Nevertheless, unexpected or unwanted effects and thus data loss cannot be excluded. Please check and test the script before using it. No liability is accepted for loss of data!