Git-annex notes and configuration

Table of Contents

Author Lee Hinman
Date 2018-09-25 14:18:30

Introduction

These are my notes when trying to use git-annex http://git-annex.branchable.com/.

Use at your own risk.

A word about names:

  • thulcandra is my local linux laptop running Fedora, where I do most of my development and browsing when on the road
  • perelandra is my local linux laptop running Fedora, where I do most of my development and browsing when on the road (this laptop is in addition to thulcandra)
  • ivalice is a Linux desktop running Fedora in my house, where I run containers for DNS servers, Jenkins CI, and other miscellaneous things
  • writequit is the machine that hosts http://writequit.org, I have it through a hosting provider, it's fairly powerful, and also hosts my Matrix homeserver, IRC bridge, etc
  • rpi is a raspberry pi that I run RetroPie on for retro/emulator gaming :)

General setup and generating metadata for all files

I find it useful to set this up before committing files, since it uses a git-annex pre-commit to generate the metadata:

git config --global annex.genmetadata true

I also recommend that you check the consistency of git repos as they are being transferred with:

git conifg --global transfer.fsckObjects true
git conifg --global fetch.fsckObjects true
git conifg --global receive.fsckObjects true

Also, on Fedora in particular, you'll need to use gpg2 instead of gpg

git config --global gpg.program gpg2

Documents

Usually this would be my computer's ~/Documents folder, but because a lot of programs decide they want to put random stuff in there (like Amazon MP3, Adobe stuff, Zoom, etc ಠ_ಠ), I created my own folder in ~/docs that I am using for storing actual documents.

Secure Documents

These are documents that contain sensitive information, mostly forms and applications for things, I also keep an encrypted KeePassX DB inside of this repo. This repo lives in $HOME/lockbox. Git-annex has a nice way of doing this using git-remote-gcrypt, which you will need to download and install in order for this to work.

mkdir ~/lockbox
cd ~/lockbox
git init
git annex init

# add some files, etc
cp ~/my-secrets.txt .
git annex add .
git commit -m "Add secrets"

Next, I need to add an encrypted remote, I'll do that on writequit (my remote server):

# This is on WRITEQUIT, not local!
git init --bare lockbox

Then I add it as an encrypted remote, you will need to replace ABCD1234 with your actual gpg key id.

cd ~/lockbox
git annex initremote writequit-gcrypt type=gcrypt gitrepo=ssh://writequit.org/home/hinmanm/lockbox keyid=ABCD1234
git annex sync writequit-gcrypt

Then I can copy the encrypted content to writequit

git annex copy . --to writequit-gcrypt

Since this is REALLY important stuff, I also set up a repo on a USB backup drive

git init --bare /run/media/hinmanm/seagate-usbdrive/lockbox
git annex initremote seagate-usbdrive-gcrypt type=gcrypt gitrepo=/run/media/hinmanm/seagate-usbdrive/lockbox keyid=ABCD1234

And then sync the content to it

git annex sync seagate-usbdrive-gcrypt --content

Now, because this is important, I ensure that I always have 2 copies of the data

git annex numcopies 2

And now that git-annex has ensured that I have enough copies, I can drop all the local copies of my files, so that I'm not keeping sensitive data on my local laptop

git annex drop .

In the future, if I need a file, I can do git annex get $file to retrieve the file from an encrypted remote.

If I ever lose the computer, I can recover the encrypted lockbox by doing the following:

git clone gcrypt::ssh://writequit.org/home/hinmanm/lockbox lockbox

And then initializing it as a git-annex remote:

git annex enableremote writequit-gcrypt gitrepo=ssh://writequit.org/home/hinmanm/lockbox

Photos

For photos, I want to keep some locally, and all of them backed up on an external USB drive. I want git-annex to keep track of what is where, so I know where to go and get the file.

On the laptop side:

mkdir ~/pics
cd ~/pics
git init
git annex init "laptop"


mkdir /Volumes/MINIDRIVE1/pics
cd /Volumes/MINIDRIVE1/pics
git init
git annex init "usbdrive"
git remote add laptop ~/pics


cd ~/pics
git remote add usbdrive /Volumes/MINIDRIVE1/pics

... add files ...

git annex sync

And when I want to move them off of my local machine and only on the external drive:

cd ~/pics
git annex move <folder> --to usbdrive
git annex sync

On the USB drive side:

cd /Volumes/MINIDRIVE1/pics
git annex sync

Then, if I am running low on space, I can safely "drop" images and git-annex will ensure I still have at least one copy of the data:

cd ~/pics
git annex drop photos-from-trip

Custom tagged views

http://git-annex.branchable.com/tips/metadata_driven_views/

What I would like to do is to be able to view documents based on a time structure, instead of by category, so if I have:

$ tree -d ~/pics
/Users/hinmanm/pics
├── christmas-party-2011-12
└── christmas-party-2013-12

It would be nice if I could have:

$ tree -d ~/pics
/Users/hinmanm/pics
├── 2011
    └── christmas-party-2011-12
└── 2013
    └── christmas-party-2013-12

To tag a folder with a year tag:

git annex metadata --set year=2012 pics-from-2012-03
git annex metadata --set month=03 pics-from-2012-03
# or, with a random tag:
git annex metadata --tag europe pics-from-europe

And then check out the view:

git annex view year=* month=*

To do the tagging automatically, we can configure genmetadata:

git config annex.genmetadata true
# or, globally
git config --global annex.genmetadata true

Videos

I'd like to save some various shows, and presentations that people have given. I use the youtube-dl tool for downloading a most of the videos.

On my laptop side:

cd
mkdir videos
cd videos
git init
git annex init "laptop"
git remote add delta ssh://delta-local/mnt/data/annex/videos

... add files ...

git annex sync

On the file server side:

cd /mnt/data/annex
mkdir videos
cd videos
git init
git annex init "delta"
git remote add laptop ssh://xanadu/Users/hinmanm/videos

... add files ...

git annex sync

Podcasts

I recently switched to using git-annex to manage podcasts, so far I really like it.

Here's how I set it up:

mkdir ~/podcasts
cd ~/podcasts
git init
git annex init "laptop"
echo "http://theshipshow.com/podcast.xml" >> podcast-urls

Repeat for any podcast URL you feel appropriate. I then have two scripts:

get-podcasts.sh

#!/bin/sh
xargs git annex importfeed --template='${feedtitle}/${itempubdate}-${itemtitle}${extension}' < podcast-urls

And fast-get-podcasts.sh

#!/bin/sh
xargs git annex importfeed --fast --template='${feedtitle}/${itempubdate}-${itemtitle}${extension}' < podcast-urls

The only difference between the two is the --fast option, which means only the metadata is downloaded instead of all the files. I tend to use fast-get-podcasts.sh when I'm at a cafe or coffeeshop, and get-podcasts.sh when I'm at home.

When downloading all of the meta, I can get the actual file with git annex get <file>, like so:

cd ~/podcasts/The_Ship_Show/
git annex get "2015_04_21-Episode_55__I_Don_t_Always_Test__But_When_I_Do....mp3"

And then git-annex will download the file, when I'm done or don't want to keep the file around, I do

git annex drop "2015_04_21-Episode_55__I_Don_t_Always_Test__But_When_I_Do....mp3"

And the file is dropped from the local repo.

synchronize the files to phone

I have a separate podcast method for my phone, need to figure out how to sync the two…

TODO S3 backup

I would like to keep local and off-site (meaning s3) backups of my photos, using git-annex.

Author: Lee Hinman

Created: 2018-09-25 Tue 14:18