(tldr: Beware of pipes with
set -e. And write more checks.)
At the Chaosdorf, we have an automated weekly backup
of all servers and other hosts. The script uses
set -e right at the start
and reports its success with
send_nsca just before quitting. A freshness
threshold is used to produce an alert if a backup run does not report in time.
This sounds like nothing can go wrong without being noticed. However, there is a problem: backup_external uses pipes. And in a pipe, only the return value of the last command is actually evaluated:
descent ~ > ( set -e; false | true; echo foo ) foo
So, if something along the way (e.g. tar or gpg) has a problem, the script will happily run along and report its success at the end. Which will result in something like this:
flux ~ > sudo ls -l /chaosdorf/backups/09 | fgrep feedback -rw-r--r-- 1 chaosdorf chaosdorf 0 Mar 4 00:03 feedback.chaosdorf.dn42_etc.tar.xz.gpg -rw-r--r-- 1 chaosdorf chaosdorf 24K Mar 4 00:03 feedback.chaosdorf.dn42_packages -rw-r--r-- 1 chaosdorf chaosdorf 0 Mar 4 00:03 feedback.chaosdorf.dn42_root.tar.xz.gpg -rw-r--r-- 1 chaosdorf chaosdorf 0 Mar 4 00:03 feedback.chaosdorf.dn42_usr_local.tar.xz.gpg -rw-r--r-- 1 chaosdorf chaosdorf 0 Mar 4 00:03 feedback.chaosdorf.dn42_var_local.tar.xz.gpg -rw-r--r-- 1 chaosdorf chaosdorf 0 Mar 4 00:03 feedback.chaosdorf.dn42_var_log.tar.xz.gpg
In this case, it was likely GPG refusing to work on a readonly filesystem (it's an embedded host running on an SD card, so making it readonly makes sense).
The good thing about this is: The failed backups are all empty files, and
finding empty files is as easy as running
find -size 0. So now we have a
second check on the receiving host to alert me whenever an obviously failed
backup is transferred.
- Never, ever trust a single check
- If you have the disk space, keep more than just the most recent three backups (I actually did this right)