Git-annex notes and configuration
Table of Contents
Author | Lee Hinman |
Date | 2018-09-25 14:18:30 |
Introduction
These are my notes when trying to use git-annex
http://git-annex.branchable.com/.
Use at your own risk.
A word about names:
thulcandra
is my local linux laptop running Fedora, where I do most of my development and browsing when on the roadperelandra
is my local linux laptop running Fedora, where I do most of my development and browsing when on the road (this laptop is in addition tothulcandra
)ivalice
is a Linux desktop running Fedora in my house, where I run containers for DNS servers, Jenkins CI, and other miscellaneous thingswritequit
is the machine that hosts http://writequit.org, I have it through a hosting provider, it's fairly powerful, and also hosts my Matrix homeserver, IRC bridge, etcrpi
is a raspberry pi that I run RetroPie on for retro/emulator gaming :)
General setup and generating metadata for all files
I find it useful to set this up before committing files, since it uses a git-annex pre-commit to generate the metadata:
git config --global annex.genmetadata true
I also recommend that you check the consistency of git repos as they are being transferred with:
git conifg --global transfer.fsckObjects true git conifg --global fetch.fsckObjects true git conifg --global receive.fsckObjects true
Also, on Fedora in particular, you'll need to use gpg2
instead of gpg
git config --global gpg.program gpg2
Documents
Usually this would be my computer's ~/Documents
folder, but because a lot of programs decide they
want to put random stuff in there (like Amazon MP3, Adobe stuff, Zoom, etc ಠ_ಠ), I created my own
folder in ~/docs
that I am using for storing actual documents.
Secure Documents
These are documents that contain sensitive information, mostly forms and applications for things, I
also keep an encrypted KeePassX DB inside of this repo. This repo lives in $HOME/lockbox
.
Git-annex has a nice way of doing this using git-remote-gcrypt, which you will need to download and
install in order for this to work.
mkdir ~/lockbox cd ~/lockbox git init git annex init # add some files, etc cp ~/my-secrets.txt . git annex add . git commit -m "Add secrets"
Next, I need to add an encrypted remote, I'll do that on writequit (my remote server):
# This is on WRITEQUIT, not local! git init --bare lockbox
Then I add it as an encrypted remote, you will need to replace ABCD1234
with your actual gpg key
id.
cd ~/lockbox
git annex initremote writequit-gcrypt type=gcrypt gitrepo=ssh://writequit.org/home/hinmanm/lockbox keyid=ABCD1234
git annex sync writequit-gcrypt
Then I can copy the encrypted content to writequit
git annex copy . --to writequit-gcrypt
Since this is REALLY important stuff, I also set up a repo on a USB backup drive
git init --bare /run/media/hinmanm/seagate-usbdrive/lockbox git annex initremote seagate-usbdrive-gcrypt type=gcrypt gitrepo=/run/media/hinmanm/seagate-usbdrive/lockbox keyid=ABCD1234
And then sync the content to it
git annex sync seagate-usbdrive-gcrypt --content
Now, because this is important, I ensure that I always have 2 copies of the data
git annex numcopies 2
And now that git-annex has ensured that I have enough copies, I can drop all the local copies of my files, so that I'm not keeping sensitive data on my local laptop
git annex drop .
In the future, if I need a file, I can do git annex get $file
to retrieve the file from an
encrypted remote.
If I ever lose the computer, I can recover the encrypted lockbox by doing the following:
git clone gcrypt::ssh://writequit.org/home/hinmanm/lockbox lockbox
And then initializing it as a git-annex remote:
git annex enableremote writequit-gcrypt gitrepo=ssh://writequit.org/home/hinmanm/lockbox
Photos
For photos, I want to keep some locally, and all of them backed up on an external USB drive. I want
git-annex
to keep track of what is where, so I know where to go and get the file.
On the laptop side:
mkdir ~/pics cd ~/pics git init git annex init "laptop" mkdir /Volumes/MINIDRIVE1/pics cd /Volumes/MINIDRIVE1/pics git init git annex init "usbdrive" git remote add laptop ~/pics cd ~/pics git remote add usbdrive /Volumes/MINIDRIVE1/pics ... add files ... git annex sync
And when I want to move them off of my local machine and only on the external drive:
cd ~/pics
git annex move <folder> --to usbdrive
git annex sync
On the USB drive side:
cd /Volumes/MINIDRIVE1/pics
git annex sync
Then, if I am running low on space, I can safely "drop" images and git-annex will ensure I still have at least one copy of the data:
cd ~/pics
git annex drop photos-from-trip
Custom tagged views
http://git-annex.branchable.com/tips/metadata_driven_views/
What I would like to do is to be able to view documents based on a time structure, instead of by category, so if I have:
$ tree -d ~/pics /Users/hinmanm/pics ├── christmas-party-2011-12 └── christmas-party-2013-12
It would be nice if I could have:
$ tree -d ~/pics /Users/hinmanm/pics ├── 2011 └── christmas-party-2011-12 └── 2013 └── christmas-party-2013-12
To tag a folder with a year tag:
git annex metadata --set year=2012 pics-from-2012-03 git annex metadata --set month=03 pics-from-2012-03 # or, with a random tag: git annex metadata --tag europe pics-from-europe
And then check out the view:
git annex view year=* month=*
To do the tagging automatically, we can configure genmetadata
:
git config annex.genmetadata true # or, globally git config --global annex.genmetadata true
Videos
I'd like to save some various shows, and presentations that people have given. I use the youtube-dl tool for downloading a most of the videos.
On my laptop side:
cd mkdir videos cd videos git init git annex init "laptop" git remote add delta ssh://delta-local/mnt/data/annex/videos ... add files ... git annex sync
On the file server side:
cd /mnt/data/annex mkdir videos cd videos git init git annex init "delta" git remote add laptop ssh://xanadu/Users/hinmanm/videos ... add files ... git annex sync
Podcasts
I recently switched to using git-annex to manage podcasts, so far I really like it.
Here's how I set it up:
mkdir ~/podcasts cd ~/podcasts git init git annex init "laptop" echo "http://theshipshow.com/podcast.xml" >> podcast-urls
Repeat for any podcast URL you feel appropriate. I then have two scripts:
get-podcasts.sh
#!/bin/sh xargs git annex importfeed --template='${feedtitle}/${itempubdate}-${itemtitle}${extension}' < podcast-urls
And fast-get-podcasts.sh
#!/bin/sh xargs git annex importfeed --fast --template='${feedtitle}/${itempubdate}-${itemtitle}${extension}' < podcast-urls
The only difference between the two is the --fast
option, which means only the metadata is
downloaded instead of all the files. I tend to use fast-get-podcasts.sh
when I'm at a cafe or
coffeeshop, and get-podcasts.sh
when I'm at home.
When downloading all of the meta, I can get the actual file with git annex get <file>
, like so:
cd ~/podcasts/The_Ship_Show/ git annex get "2015_04_21-Episode_55__I_Don_t_Always_Test__But_When_I_Do....mp3"
And then git-annex will download the file, when I'm done or don't want to keep the file around, I do
git annex drop "2015_04_21-Episode_55__I_Don_t_Always_Test__But_When_I_Do....mp3"
And the file is dropped from the local repo.