Garbage Collecting

Refs:

Git user manual: dangling-objects

git-gc(1), git-fsck(1) git-prune(1), git-pack-refs(1), git-repack(1), git-prune-packed(1), git-reflog(1).

Maintenance and Data Recovery has some examples of repository cleaning.

In a repository, some object become unreachable by any refs, during some operations, like deleting a branch, deleting an unreachable tag, rebasing, expiring entries in the reflog … These unreachable objects can be reported by git-fsck.

git fsck

If you use the default options:

$ git fsck
dangling tree 4df800e6a1c6ba57821e4e20680566492bbb5e81

report only the dangling objects.

You can add the object unreachable by any reference with:

$ git fsck --unreachable
unreachable tree 6e946512b1b4841dff713bb65e78a8ddbf0171d3
unreachable tree 4df800e6a1c6ba57821e4e20680566492bbb5e81
unreachable tree 125a284a7a428d91e199a3c9d7f7a834613d1d13

Both commands above use also the reflog. If you want to look at the object which are dangling except from the reflog or object unreachable except the reflog you do:

$ git fsck --dangling --no-reflogs
dangling commit 2b21a6a8e775a43eafbe1b9e8b1fb5debe77b2ee
dangling commit c3e1ecc9f0e7a67c9a05db92d6c6548a1c965830
dangling commit 1702677e8ca31c0536745b36280397457bf20002
dangling commit b746f71fa14416e22920f38b8439bbb1972481a3
dangling commit 408c05cafd2bfdb861676dd54a0ed083f7fdfcaa
dangling commit 6c92fe1d7f47e397c3ccd2713ddd9c3b8239d1c8
...
$ git fsck --unreachable --no-reflogs
unreachable commit 2b21a6a8e775a43eafbe1b9e8b1fb5debe77b2ee
unreachable tree 39610cc0e1126a2bb73cb3345b9228f1ccb374d9
unreachable commit c3e1ecc9f0e7a67c9a05db92d6c6548a1c965830
unreachable tree e981cec3e50cf10b59b544d86be681a7030cb0a6
...

Automated garbage collection with gc --auto.

Git stores the objects either one by one as loose objects, or with a very efficient method in packs. But if the size of packs is a lot lesser than the cumulated loose objects, the access time in a pack is longer. So git does not pack the objects after each operations, but only check the state of the repository with gc --auto.

gc --auto look if the number of loose objects exceeds gc.auto (default 6700) and then run git repack -d -l which in turn run git-prune-packed. setting gc.auto to 0 disables repacking. When the numbers of packs is greater than gc.autopacklimit (default 50, 0 disable it) git gc --auto consolidates them into one larger pack.

When doing a git gc --aggressive the efficiency of git-repack depends of gc.aggressiveWindow (default 250).

git gc --auto also pack refs when gc.packrefs has its default value of true, the refs are then placed in a single file $GIT_DIR/packed-refs, each modification of a ref again create a new ref in GIT_DIR/refs hierarchy that override the corresponding packed ref.

The gc command also run by default with the --prune option, which clean unreachable loose objects that are older than gc.pruneExpire (default 14 days). If you want to use an other date you have to add --prune=<date>, --prune=all prunes loose objects regardless of their age. Note that it may not be what you want on a shared repository, where an other operation could be run concurrently.

An unreachable commit is never pruned as long it is in a reflog(1), but gc --auto run git reflog expire to prune reflog entries that are older than gc.reflogExpire (default 90 days) or unreachable entries older than gc.reflogExpireUnreachable (default 30 days). These values can be also configured for each individual ref see git-config(1).

Records of conflicted merge are also kept gc.rerereresolved (default 60 days) or gc.rerereunresolved (default 15 days) for unresolved merges.

Forced garbage collection.

You can have an idea of the state of your repository by issuing git count-objects -vH

After some operation that creates a lot of unreachables objects, like rebasing or filtering branches you may want to run git gc without waiting the three months of expirability. This is also a necessity if you have to delete an object, now unreachable, but that contains some sensible data, or a very big object that was added and then deleted from the history (see the filter branch section).

As the operation is recorded in the reflog, you expire it with:

git reflog expire --expire=now --all

And you garbage collect all unreferenced objects with:

git gc --aggressive --prune=now