brokensandals.net -> Technical -> Backup tooling

Published on 2020-04-28.

I'm a little bit of a data hoarder. When an app or cloud service becomes an important part of my life, I try to have some process in place to regularly backup my information from it. An excellent post by karlicoss on "Building data liberation infrastructure" inspired me to improve my own processes and share them.

I wrote a tool called export_manager to help run and organize exports. It ensures your data is organized by date in a consistent directory structure, tracks metrics about your exported data, and produces reports to help you detect problems. Additionally, I made spot_check_files to help quickly visually inspect an export for bad or missing data. See making backup validation easier.

What follows is documentation on how I export data from various services. For each, I provide a sample config.toml file for pulling the data into export_manager. Tools that I wrote myself are in bold.

App/service Export tool Notes
Castro n/a Enable daily backups to iCloud in the app settings, and have iCloud Drive sync to your Mac.
Example export_manager config

This will cause the backups to be moved from iCloud Drive into your export folder whenever export_manager runs.

git = true
ingest.paths = "/Users/YOURUSERNAME/Library/Mobile Documents/iCloud~co~supertop~castro/Documents/Backups/*.castrobackup"
interval = "7 days"
keep = 1
Evernote exporteer_evernote_osx Uses AppleScript to invoke the Mac app's export functionality, producing HTML files or enex files.
Example export_manager config

This assumes you have installed the tool via pip3 install exporteer_evernote_osx.

cmd = "((git add . && git commit -m 'HACK: add missed files') || true) && (exporteer_evernote_osx sync || true) && exporteer_evernote_osx export -n $PARCEL_PATH"
git = true
interval = "1 day"
keep = 1
Feedly exporteer_feedly Uses puppeteer to log in to the site and download the OPML export. (They have an API you could use to get the OPML, but if I understand correctly, users without a paid plan cannot refresh their API tokens when they expire. So, use of the API would require frequent manual intervention.)
Example export_manager config

This assumes you have installed the tool via npm install -g exporteer_feedly.

cmd = "source $DATASET_PATH/secrets.sh && exporteer_feedly > $PARCEL_PATH.xml"
git = true
interval = "1 day"
keep = 1
# Track the number of feeds you're subscribed to
metrics.feeds.cmd = "xmllint --xpath 'count(//outline[@type=\"rss\"])' $PARCEL_PATH"

You will also need a secrets.sh file in the same directory as the config.toml:

export FEEDLY_EMAIL=youremail
export FEEDLY_PASSWORD=yourpassword
Goodreads karlicoss/goodrexport Retrieves data from the Goodreads API and builds an XML file. (Goodreads also has CSV export functionality, but doesn't make it available through the API; you could probably use puppeteer to script retrieval of it.)
Example export_manager config

This assumes you've cloned the goodrexport repo somewhere.

cmd = "python3 /PATH_TO_goodrexport/export.py --secrets $DATASET_PATH/secrets.py > $PARCEL_PATH.xml"
git = true
interval = "1 day"
keep = 1
# Track the number of books on your "read" and "to-read" shelves.
metrics.read.cmd = "xmllint --xpath 'count(//shelf[@name=\"read\"])' $PARCEL_PATH"
metrics.to_read.cmd = "xmllint --xpath 'count(//shelf[@name=\"to-read\"])' $PARCEL_PATH"

You will also need a secrets.py file in the same directory as the config.toml:

user_id = 'your_user_id' # this is a number
key = 'your_api_key' # this is the KEY, not the SECRET - the secret does not appear to be used
Google Docs/Sheets/Slides export-google-docs Uses Google APIs from a Google Apps Script to export your documents as Office files (docx/xlsx/pptx) and PDFs into a zip file in a folder on your Google Drive.
Example export_manager config

This assumes you have set up the export-google-docs script in your Google account. Then, this config will cause the backups to be moved from Google Drive to your export folder whenever export_manager runs.

ingest.paths = "/Users/YOURUSERNAME/Google Drive/PATH_TO_BACKUP_FOLDER/GoogleDocs*.zip"
ingest.time_source = "mtime"
interval = "1 day"
keep = 10
# Track the number of docs and spreadsheets.
metrics.docx_files.cmd = "unzip -l $PARCEL_PATH | grep -c .docx"
metrics.xlsx_files.cmd = "unzip -l $PARCEL_PATH | grep -c .xlsx"
IMDB exporteer_imdb Uses puppeteer to log in to the website and download the CSVs of your watchlist and ratings.
Example export_manager config

This assumes you have installed the tool via npm install -g exporteer_imdb.

cmd = "source $DATASET_PATH/secrets.sh && exporteer_imdb $PARCEL_PATH"
git = true
interval = "1 day"
keep = 1

You will also need a secrets.sh file in the same directory as the config.toml:

export IMDB_EMAIL=youremail
export IMDB_PASSWORD=yourpassword
iTunes n/a Go to Preferences -> Advanced and check the box "Share iTunes Library XML with other applications". iTunes will maintain an up-to-date XML export of your library in the ~/Music folder. (Sadly, I've read this doesn't work on Catalina, so I'll have to find a new approach after I upgrade.)
Example export_manager config

This confiugres export_manager to periodically make copies of the iTunes XML file:

cmd = "cp '/Users/YOURUSERNAME/Music/iTunes/iTunes Music Library.xml' $PARCEL_PATH.xml"
git = true
interval = "14 days"
keep = 1
# Track the number of playlists you have.
metrics.playlists.cmd = "xmllint --xpath 'count(//key[text()=\"Playlists\"]/following::array[1]/dict)' $PARCEL_PATH"
Pocket karlicoss/pockexport Retrieves JSON data from the Pocket API.
Example export_manager config

This assumes you've cloned the pockexport repo somewhere.

cmd = "python3 /PATH_TO_pockexport/export.py --secrets $DATASET_PATH/secrets.py > $PARCEL_PATH.json"
git = true
interval = "1 day"
keep = 1
# Track the number of items in your list and your archive.
metrics.list_items.cmd = "jq '.list | map(select(.status == \"0\")) | length' $PARCEL_PATH"
metrics.archive_items.cmd = "jq '.list | map(select(.status == \"1\")) | length' $PARCEL_PATH"

You will also need a secrets.py file in the same directory as the config.toml:

consumer_key = 'your_key'
access_token = 'your_token'
Todoist exporteer_todoist Retrieves JSON data or backup zips from the Todoist API.
Example export_manager config

This assumes you have installed the tool via pip3 install exporteer_todoist.

If you want to store the JSON produced by the sync API, use a config.toml like this:

cmd = "source $DATASET_PATH/secrets.sh && exporteer_todoist full_sync | jq > $PARCEL_PATH.json"
git = true
interval = "1 day"
keep = 1
# Track the number of tasks.
metrics.items.cmd = "jq '.items | length' $PARCEL_PATH"

Or if you want to store the zip files from the backup API (only available on paid accounts), use a config.toml like this:

cmd = "source $DATASET_PATH/secrets.sh && exporteer_todoist latest_backup > $PARCEL_PATH.zip"
interval = "4 day"
keep = 5
# Track the number of files in the backup, and the date it was produced.
metrics.csv_files.cmd = "unzip -l $PARCEL_PATH | grep -c csv"
metrics.date.cmd = "unzip -l $PARCEL_PATH | awk 'NR == 4 {print $2}'"

In either case, you'll need a secrets.sh file in the same directory as the config.toml file:

export TODOIST_API_TOKEN=your_token_here
Trello jtpio/trello-full-backup Retrieves data from the Trello API and organizes it into a set of files and folders.
Example export_manager config

This assumes you have installed the tool via pip3 install trello-full-backup.

cmd = "source $DATASET_PATH/secrets.sh && trello-full-backup -td $PARCEL_PATH"
git = true
interval = "1 day"
keep = 1
metrics.my_boards.cmd = "ls -1 $PARCEL_PATH/me | wc -l"

You will also need a secrets.sh file in the same directory as the config.toml:

export TRELLO_API_KEY=your_key_here
export TRELLO_TOKEN=your_token_here

Rendering tools

Outdated posts