Git annex

Usual operations

More on this topic in Git annex walkthrough.

Adding an annex to git repository.

$ git annex init

Adding one file.

$ git annex add myfile
add myfile ok
(recording state in git...)
$ git commit  -m"myfile added to annex"

When you add a file to the annex the content itself is stored in git-annex’s backend, .git/annex/ and the symlink is added to git.

When you commit it, a symlink to the content is committed to git.

Adding files in directory.

$ git annex add .
add foo.txt
add bigdoc.pdf
$ git commit -a -m added

all the unregistered files are added to the annex; but if annex.largefiles is configured only matched files are added to the annex other files are added directly to the git repository.

Droping the file content.

$ git annex drop myfile

Thi will remove myfile from your annex, but only if some exemplar exist in a remote annex. The symlink registered in git is unchanged, but will become broken, and is repaired if you do again a git annex get.

You can force to drop a file without checking a backup with:

$ git annex drop --force myfile

By default annex try to keep one exemplar of each file in some repository, but you can tweak the number with numcopies.

Modifying files.

In order to modify a file you have to replace the symlink by a copy of the file, it is the unlock operation.

$ git annex unlock myfile

Then you can modify your file, and register the modification with git.

$ git commit myfile -m"new version of myfile"

and git annex will record the new file, and git commit the change, the file stay in unlocked mode until you run

$ git annex lock myfile

If before comitting you add your new version of the file to the annex.

$ git annex add myfile

annex lock it again by replacing it by a symlink.

Refs: git-annex: unlocked files <tips/unlocked_files/,
git-annex: git-annex-unlock, git-annex: git-annex-lock, git-annex: git-annex-add.

Defining which files are stored in annex.

If you put in your .gitattributes file:

* annex.largefiles=((largerthan=100kb)and(not(mimetype=text/*)))or(mimetype=application/*)

or set your git config with

$ git config annex.largefiles '(largerthan=100kb and not mimetype=text/*) or mimetype=application/*'

Then when adding files

$ git annex add .
add foo.txt (non-large file; adding content to git repository) ok
add bigdoc.pdf
$ git commit -a -m added

Refs: largefiles

Which annexed files are no longer used.

$ git annex unused
unused . (checking for unused data...) (checking master...)
Some annexed data is no longer used by any files:
NUMBER  KEY
1       SHA256E-s27--8c20c91f834dabcf6b9ba5de670572136ce8382b2ebf897fa86c8d2d71f081ef.txt
2       SHA256E-s5--819f04e5706f509de5a6b833d3f561369156820b4240c7c26577b223e59aae97.txt
3       SHA256E-s18--12a4f82f1286e21ee87c63a14efb3e96d25ff26f82128377b87c8c34fc071499.txt

As the key is part of the commit message you can inspect which symlink in some commit had this key for target.

$ git log --name-status -S'SHA256E-s5--819f04e5706f509de5a6b833d3f561369156820b4240c7c26577b223e59aae97.txt'

If you want to drop extra copies of unused files you can do

$ git annex dropunused 1

to drop the file n° 1 or

$ git annex dropunused

to drop all unsused files, but the drop is effective only if git-annex can find in some remote with an other copy of the file. If you want to bypass checks and deleting without condition the file

$ git annex dropunused --force

of course you can no longer checkout some commit which used the file.

Adding a remote.

Any git remote can be also used as an annex remote. Either ordinary remotes or bare remotes. You create the remote repository with either ref:cloning your repo <remote_clone> or ref:creating a bare remote from scratch <creating_bare_remote>.

If you want to use the remote as an annex remote you declare it on the server with:

$ git annex init origin

where origin is the description you have choosen for the remote annex, usually you would choose either the same name than the remote name, or a derived explicit name. You can change description with git-annex describe

In any case you have added the remote to your local repository with:

$ git remote add origin myserver.org:bare-annex.git

It is also a good practive to add the local repo on the server

$ git remote add laptop mylaptop.org:myrepo

then you do an initial push from local repo to the origin remote

$ git push origin master git-annex

Then your remote appear on both repo

git annex info
$ git annex info
repository mode: indirect
trusted repositories: 0
semitrusted repositories: 4
    00000000-0000-0000-0000-000000000001 -- web
    00000000-0000-0000-0000-000000000002 -- bittorrent
    143a9c6a-99cd-4e60-8d66-277990d99d94 -- laptop [here]
    39e9aa8a-d369-4b2e-aa37-d6c4d3282ad5 -- [origin]

The content of the annex is not yet on origin you can synchronize it with git-annex get --all you can also use git-annex sync --content.

It is also possible to work on the local repo an do:

$ git annex copy --all --to=origin

If you do want to organize where is stored your annex files you can get a view of the present situation on all accessible servers by:

$ git annex whereis
whereis SHA256E-s1561353--2052371eabac0cb7eda1a8056d003060a7043274e4593e1091e5acacb1c96096.pdf (2 copies)
    143a9c6a-99cd-4e60-8d66-277990d99d94 -- [laptop]
    39e9aa8a-d369-4b2e-aa37-d6c4d3282ad5 -- origin [here]
ok
whereis SHA256E-s9407097--ee52bc1e983d0b1b1cb7a23e5cc280b94f798bb764ae2bea7c5f5460949fa56e.pdf (2 copies)
    143a9c6a-99cd-4e60-8d66-277990d99d94 -- [laptop]
    39e9aa8a-d369-4b2e-aa37-d6c4d3282ad5 -- origin [here]

You find in git-annex documentation how to handle special remotes such S3, rsync, webdav, and more than thirty remotes in the list of special remotes.

Refs:
centralized repo on your own server, using ssh remotes, git-annex: bare_repositories, git-annex: git-annex-sync, git-annex: git-annex-get, git-annex: git-annex-copy, git-annex: git-annex-whereis.