(tldr: Beware of pipes with set -e
. And write more checks.)
At the Chaosdorf, we have an automated weekly backup
of all servers and other hosts. The script uses set -e
right at the start
and reports its success with send_nsca
just before quitting. A freshness
threshold is used to produce an alert if a backup run does not report in time.
This sounds like nothing can go wrong without being noticed. However, there is a problem: backup_external uses pipes. And in a pipe, only the return value of the last command is actually evaluated:
descent ~ > ( set -e; false | true; echo foo )
foo
So, if something along the way (e.g. tar or gpg) has a problem, the script will happily run along and report its success at the end. Which will result in something like this:
flux ~ > sudo ls -l /chaosdorf/backups/09 | fgrep feedback
-rw-r--r-- 1 chaosdorf chaosdorf 0 Mar 4 00:03 feedback.chaosdorf.dn42_etc.tar.xz.gpg
-rw-r--r-- 1 chaosdorf chaosdorf 24K Mar 4 00:03 feedback.chaosdorf.dn42_packages
-rw-r--r-- 1 chaosdorf chaosdorf 0 Mar 4 00:03 feedback.chaosdorf.dn42_root.tar.xz.gpg
-rw-r--r-- 1 chaosdorf chaosdorf 0 Mar 4 00:03 feedback.chaosdorf.dn42_usr_local.tar.xz.gpg
-rw-r--r-- 1 chaosdorf chaosdorf 0 Mar 4 00:03 feedback.chaosdorf.dn42_var_local.tar.xz.gpg
-rw-r--r-- 1 chaosdorf chaosdorf 0 Mar 4 00:03 feedback.chaosdorf.dn42_var_log.tar.xz.gpg
In this case, it was likely GPG refusing to work on a readonly filesystem (it's an embedded host running on an SD card, so making it readonly makes sense).
The good thing about this is: The failed backups are all empty files, and
finding empty files is as easy as running find -size 0
. So now we have a
second check on the receiving host to alert me whenever an obviously failed
backup is transferred.
So:
- Never, ever trust a single check
- If you have the disk space, keep more than just the most recent three backups (I actually did this right)