Posted on 2020-04-28.
It's said that if you aren't testing your backups periodically, you don't really have backups. But testing backups is a tedious chore. It needs to be automated as much as possible.
Fully automating it would defeat the purpose. For example, if I have a script that validates the backup every day and emails me when there's an error, and I never get any emails, maybe my backup works - or maybe the scheduler failed to run the script. Having the script email me on success too would be better. But still, if the email just says "no problems!", there's the possibility that the script is buggy and overlooking major issues in the data.1 Given the high frequency of bugs in software in general, that's a non-neglible risk.
Ideally, the computer should collect all the information I want to see to have reasonable confidence that the backup is working, and present it in a compact form. Two types of information that can help are metrics and thumbnails.
I think the most helpful metrics are ones that:
My tool for managing personal data exports tracks user-defined metrics for each dataset and generates reports comparing them over time, like this:
Metrics for todoist-fullsync: name 1 days ago 8 days ago ------------------------------ files 1 1 bytes 82363 86661 items 85 87 Metrics for instagram: name 16 days ago 117 days ago 303 days ago ------------------------------------------------ files 1 1 1 bytes 136144015 131003691 77073582 photos 481 461 253 videos 10 9 5
When I want to manually check that a backup archive is good, one of the things I do is glance at some of the files in it to make sure there aren't glaring problems. Doing this one at a time is boring. I wrote a tool that rercursively checks files in an archive/directory and builds a report with thumbnails of all the files, so all I have to do is scroll through it.