Published on 2020-04-28.
I'm a little bit of a data hoarder. When an app or cloud service becomes an important part of my life, I try to have some process in place to regularly backup my information from it. An excellent post by karlicoss on "Building data liberation infrastructure" inspired me to improve my own processes and share them.
I wrote a tool called export_manager to help run and organize exports. It ensures your data is organized by date in a consistent directory structure, tracks metrics about your exported data, and produces reports to help you detect problems. Additionally, I made spot_check_files to help quickly visually inspect an export for bad or missing data. See making backup validation easier.
What follows is documentation on how I export data from various services.
For each, I provide a sample config.toml
file for pulling the data into export_manager.
Tools that I wrote myself are in bold.
App/service | Export tool | Notes |
---|---|---|
Castro | n/a | Enable daily backups to iCloud in the app settings, and have iCloud Drive sync to your Mac.
Example export_manager configThis will cause the backups to be moved from iCloud Drive into your export folder whenever export_manager runs. git = true ingest.paths = "/Users/YOURUSERNAME/Library/Mobile Documents/iCloud~co~supertop~castro/Documents/Backups/*.castrobackup" interval = "7 days" keep = 1 |
Evernote | exporteer_evernote_osx | Uses AppleScript to invoke the Mac app's export functionality, producing HTML files or enex files.
Example export_manager configThis assumes you have installed the tool via cmd = "((git add . && git commit -m 'HACK: add missed files') || true) && (exporteer_evernote_osx sync || true) && exporteer_evernote_osx export -n $PARCEL_PATH" git = true interval = "1 day" keep = 1 |
Feedly | exporteer_feedly | Uses puppeteer to log in to the site and download the OPML export.
(They have an API you could use to get the OPML, but if I understand correctly, users without a paid plan cannot refresh their API tokens when they expire.
So, use of the API would require frequent manual intervention.)
Example export_manager configThis assumes you have installed the tool via cmd = "source $DATASET_PATH/secrets.sh && exporteer_feedly > $PARCEL_PATH.xml" git = true interval = "1 day" keep = 1 # Track the number of feeds you're subscribed to metrics.feeds.cmd = "xmllint --xpath 'count(//outline[@type=\"rss\"])' $PARCEL_PATH" You will also need a export FEEDLY_EMAIL=youremail export FEEDLY_PASSWORD=yourpassword |
Goodreads | karlicoss/goodrexport | Retrieves data from the Goodreads API and builds an XML file.
(Goodreads also has CSV export functionality, but doesn't make it available through the API; you could probably use puppeteer to script retrieval of it.)
Example export_manager configThis assumes you've cloned the goodrexport repo somewhere. cmd = "python3 /PATH_TO_goodrexport/export.py --secrets $DATASET_PATH/secrets.py > $PARCEL_PATH.xml" git = true interval = "1 day" keep = 1 # Track the number of books on your "read" and "to-read" shelves. metrics.read.cmd = "xmllint --xpath 'count(//shelf[@name=\"read\"])' $PARCEL_PATH" metrics.to_read.cmd = "xmllint --xpath 'count(//shelf[@name=\"to-read\"])' $PARCEL_PATH" You will also need a user_id = 'your_user_id' # this is a number key = 'your_api_key' # this is the KEY, not the SECRET - the secret does not appear to be used |
Google Docs/Sheets/Slides | export-google-docs | Uses Google APIs from a Google Apps Script to export your documents as Office files (docx/xlsx/pptx) and PDFs into a zip file in a folder on your Google Drive.
Example export_manager configThis assumes you have set up the export-google-docs script in your Google account. Then, this config will cause the backups to be moved from Google Drive to your export folder whenever export_manager runs. ingest.paths = "/Users/YOURUSERNAME/Google Drive/PATH_TO_BACKUP_FOLDER/GoogleDocs*.zip" ingest.time_source = "mtime" interval = "1 day" keep = 10 # Track the number of docs and spreadsheets. metrics.docx_files.cmd = "unzip -l $PARCEL_PATH | grep -c .docx" metrics.xlsx_files.cmd = "unzip -l $PARCEL_PATH | grep -c .xlsx" |
IMDB | exporteer_imdb | Uses puppeteer to log in to the website and download the CSVs of your watchlist and ratings.
Example export_manager configThis assumes you have installed the tool via cmd = "source $DATASET_PATH/secrets.sh && exporteer_imdb $PARCEL_PATH" git = true interval = "1 day" keep = 1 You will also need a export IMDB_EMAIL=youremail export IMDB_PASSWORD=yourpassword |
iTunes | n/a | Go to Preferences -> Advanced and check the box "Share iTunes Library XML with other applications".
iTunes will maintain an up-to-date XML export of your library in the ~/Music folder.
(Sadly, I've read this doesn't work on Catalina, so I'll have to find a new approach after I upgrade.)
Example export_manager configThis confiugres export_manager to periodically make copies of the iTunes XML file: cmd = "cp '/Users/YOURUSERNAME/Music/iTunes/iTunes Music Library.xml' $PARCEL_PATH.xml" git = true interval = "14 days" keep = 1 # Track the number of playlists you have. metrics.playlists.cmd = "xmllint --xpath 'count(//key[text()=\"Playlists\"]/following::array[1]/dict)' $PARCEL_PATH" |
karlicoss/pockexport | Retrieves JSON data from the Pocket API.
Example export_manager configThis assumes you've cloned the pockexport repo somewhere. cmd = "python3 /PATH_TO_pockexport/export.py --secrets $DATASET_PATH/secrets.py > $PARCEL_PATH.json" git = true interval = "1 day" keep = 1 # Track the number of items in your list and your archive. metrics.list_items.cmd = "jq '.list | map(select(.status == \"0\")) | length' $PARCEL_PATH" metrics.archive_items.cmd = "jq '.list | map(select(.status == \"1\")) | length' $PARCEL_PATH" You will also need a consumer_key = 'your_key' access_token = 'your_token' |
|
Todoist | exporteer_todoist | Retrieves JSON data or backup zips from the Todoist API.
Example export_manager configThis assumes you have installed the tool via If you want to store the JSON produced by the sync API, use a cmd = "source $DATASET_PATH/secrets.sh && exporteer_todoist full_sync | jq > $PARCEL_PATH.json" git = true interval = "1 day" keep = 1 # Track the number of tasks. metrics.items.cmd = "jq '.items | length' $PARCEL_PATH" Or if you want to store the zip files from the backup API (only available on paid accounts), use a cmd = "source $DATASET_PATH/secrets.sh && exporteer_todoist latest_backup > $PARCEL_PATH.zip" interval = "4 day" keep = 5 # Track the number of files in the backup, and the date it was produced. metrics.csv_files.cmd = "unzip -l $PARCEL_PATH | grep -c csv" metrics.date.cmd = "unzip -l $PARCEL_PATH | awk 'NR == 4 {print $2}'" In either case, you'll need a export TODOIST_API_TOKEN=your_token_here |
Trello | jtpio/trello-full-backup | Retrieves data from the Trello API and organizes it into a set of files and folders.
Example export_manager configThis assumes you have installed the tool via cmd = "source $DATASET_PATH/secrets.sh && trello-full-backup -td $PARCEL_PATH" git = true interval = "1 day" keep = 1 metrics.my_boards.cmd = "ls -1 $PARCEL_PATH/me | wc -l" You will also need a export TRELLO_API_KEY=your_key_here export TRELLO_TOKEN=your_token_here |