brokensandals.net -> Technical -> Backup tooling

Published on 2020-04-28.

I'm a little bit of a data hoarder. When an app or cloud service becomes an important part of my life, I try to have some process in place to regularly backup my information from it. An excellent post by karlicoss on "Building data liberation infrastructure" inspired me to improve my own processes and share them.

I wrote a tool called export_manager to help run and organize exports. It ensures your data is organized by date in a consistent directory structure, tracks metrics about your exported data, and produces reports to help you detect problems. Additionally, I made spot_check_files to help quickly visually inspect an export for bad or missing data. See making backup validation easier.

What follows is documentation on how I export data from various services. For each, I provide a sample config.toml file for pulling the data into export_manager. Tools that I wrote myself are in bold.

App/service	Export tool	Notes
Castro	n/a	Enable daily backups to iCloud in the app settings, and have iCloud Drive sync to your Mac. Example export_manager config This will cause the backups to be moved from iCloud Drive into your export folder whenever export_manager runs. git = true ingest.paths = "/Users/YOURUSERNAME/Library/Mobile Documents/iCloud~co~supertop~castro/Documents/Backups/*.castrobackup" interval = "7 days" keep = 1
Evernote	exporteer_evernote_osx	Uses AppleScript to invoke the Mac app's export functionality, producing HTML files or enex files. Example export_manager config This assumes you have installed the tool via `pip3 install exporteer_evernote_osx.` cmd = "((git add . && git commit -m 'HACK: add missed files') \|\| true) && (exporteer_evernote_osx sync \|\| true) && exporteer_evernote_osx export -n $PARCEL_PATH" git = true interval = "1 day" keep = 1
Feedly	exporteer_feedly	Uses puppeteer to log in to the site and download the OPML export. (They have an API you could use to get the OPML, but if I understand correctly, users without a paid plan cannot refresh their API tokens when they expire. So, use of the API would require frequent manual intervention.) Example export_manager config This assumes you have installed the tool via `npm install -g exporteer_feedly.` cmd = "source $DATASET_PATH/secrets.sh && exporteer_feedly > $PARCEL_PATH.xml" git = true interval = "1 day" keep = 1 # Track the number of feeds you're subscribed to metrics.feeds.cmd = "xmllint --xpath 'count(//outline[@type=\"rss\"])' $PARCEL_PATH" You will also need a `secrets.sh` file in the same directory as the `config.toml`: export FEEDLY_EMAIL=youremail export FEEDLY_PASSWORD=yourpassword
Goodreads	karlicoss/goodrexport	Retrieves data from the Goodreads API and builds an XML file. (Goodreads also has CSV export functionality, but doesn't make it available through the API; you could probably use puppeteer to script retrieval of it.) Example export_manager config This assumes you've cloned the goodrexport repo somewhere. cmd = "python3 /PATH_TO_goodrexport/export.py --secrets $DATASET_PATH/secrets.py > $PARCEL_PATH.xml" git = true interval = "1 day" keep = 1 # Track the number of books on your "read" and "to-read" shelves. metrics.read.cmd = "xmllint --xpath 'count(//shelf[@name=\"read\"])' $PARCEL_PATH" metrics.to_read.cmd = "xmllint --xpath 'count(//shelf[@name=\"to-read\"])' $PARCEL_PATH" You will also need a `secrets.py` file in the same directory as the `config.toml`: user_id = 'your_user_id' # this is a number key = 'your_api_key' # this is the KEY, not the SECRET - the secret does not appear to be used
Google Docs/Sheets/Slides	export-google-docs	Uses Google APIs from a Google Apps Script to export your documents as Office files (docx/xlsx/pptx) and PDFs into a zip file in a folder on your Google Drive. Example export_manager config This assumes you have set up the export-google-docs script in your Google account. Then, this config will cause the backups to be moved from Google Drive to your export folder whenever export_manager runs. ingest.paths = "/Users/YOURUSERNAME/Google Drive/PATH_TO_BACKUP_FOLDER/GoogleDocs*.zip" ingest.time_source = "mtime" interval = "1 day" keep = 10 # Track the number of docs and spreadsheets. metrics.docx_files.cmd = "unzip -l $PARCEL_PATH \| grep -c .docx" metrics.xlsx_files.cmd = "unzip -l $PARCEL_PATH \| grep -c .xlsx"
IMDB	exporteer_imdb	Uses puppeteer to log in to the website and download the CSVs of your watchlist and ratings. Example export_manager config This assumes you have installed the tool via `npm install -g exporteer_imdb`. cmd = "source $DATASET_PATH/secrets.sh && exporteer_imdb $PARCEL_PATH" git = true interval = "1 day" keep = 1 You will also need a `secrets.sh` file in the same directory as the `config.toml`: export IMDB_EMAIL=youremail export IMDB_PASSWORD=yourpassword
iTunes	n/a	Go to Preferences -> Advanced and check the box "Share iTunes Library XML with other applications". iTunes will maintain an up-to-date XML export of your library in the ~/Music folder. (Sadly, I've read this doesn't work on Catalina, so I'll have to find a new approach after I upgrade.) Example export_manager config This confiugres export_manager to periodically make copies of the iTunes XML file: cmd = "cp '/Users/YOURUSERNAME/Music/iTunes/iTunes Music Library.xml' $PARCEL_PATH.xml" git = true interval = "14 days" keep = 1 # Track the number of playlists you have. metrics.playlists.cmd = "xmllint --xpath 'count(//key[text()=\"Playlists\"]/following::array[1]/dict)' $PARCEL_PATH"
Pocket	karlicoss/pockexport	Retrieves JSON data from the Pocket API. Example export_manager config This assumes you've cloned the pockexport repo somewhere. cmd = "python3 /PATH_TO_pockexport/export.py --secrets $DATASET_PATH/secrets.py > $PARCEL_PATH.json" git = true interval = "1 day" keep = 1 # Track the number of items in your list and your archive. metrics.list_items.cmd = "jq '.list \| map(select(.status == \"0\")) \| length' $PARCEL_PATH" metrics.archive_items.cmd = "jq '.list \| map(select(.status == \"1\")) \| length' $PARCEL_PATH" You will also need a `secrets.py` file in the same directory as the `config.toml`: consumer_key = 'your_key' access_token = 'your_token'
Todoist	exporteer_todoist	Retrieves JSON data or backup zips from the Todoist API. Example export_manager config This assumes you have installed the tool via `pip3 install exporteer_todoist`. If you want to store the JSON produced by the sync API, use a `config.toml` like this: cmd = "source $DATASET_PATH/secrets.sh && exporteer_todoist full_sync \| jq > $PARCEL_PATH.json" git = true interval = "1 day" keep = 1 # Track the number of tasks. metrics.items.cmd = "jq '.items \| length' $PARCEL_PATH" Or if you want to store the zip files from the backup API (only available on paid accounts), use a `config.toml` like this: cmd = "source $DATASET_PATH/secrets.sh && exporteer_todoist latest_backup > $PARCEL_PATH.zip" interval = "4 day" keep = 5 # Track the number of files in the backup, and the date it was produced. metrics.csv_files.cmd = "unzip -l $PARCEL_PATH \| grep -c csv" metrics.date.cmd = "unzip -l $PARCEL_PATH \| awk 'NR == 4 {print $2}'" In either case, you'll need a `secrets.sh` file in the same directory as the `config.toml` file: export TODOIST_API_TOKEN=your_token_here
Trello	jtpio/trello-full-backup	Retrieves data from the Trello API and organizes it into a set of files and folders. Example export_manager config This assumes you have installed the tool via `pip3 install trello-full-backup`. cmd = "source $DATASET_PATH/secrets.sh && trello-full-backup -td $PARCEL_PATH" git = true interval = "1 day" keep = 1 metrics.my_boards.cmd = "ls -1 $PARCEL_PATH/me \| wc -l" You will also need a `secrets.sh` file in the same directory as the `config.toml`: export TRELLO_API_KEY=your_key_here export TRELLO_TOKEN=your_token_here

Rendering tools

trello_backup_renderer - creates HTML files from the Trello backups produced by jtpio/trello-full-backup
sms_backup_renderer (unmaintained) - creates HTML files from Android SMS backups produced by SMS Backup & Restore.

Outdated posts

Backing up Evernote to git - this still works, but my exporteer_evernote_osx tool is easier to use.
Backing up Google Photos to iCloud - this is no longer usable due to changes in Google Photos, though the approach could be used for automating import of photos from other folders into iCloud.