Maintenance and Data Recovery has some examples of repository cleaning.
In a repository, some object become unreachable by any refs, during some operations, like deleting a branch, deleting an unreachable tag, rebasing, expiring entries in the reflog … These unreachable objects can be reported by git-fsck.
If you use the default options:
$ git fsck dangling tree 4df800e6a1c6ba57821e4e20680566492bbb5e81
report only the dangling objects.
You can add the object unreachable by any reference with:
$ git fsck --unreachable unreachable tree 6e946512b1b4841dff713bb65e78a8ddbf0171d3 unreachable tree 4df800e6a1c6ba57821e4e20680566492bbb5e81 unreachable tree 125a284a7a428d91e199a3c9d7f7a834613d1d13
Both commands above use also the reflog. If you want to look at the object which are dangling except from the reflog or object unreachable except the reflog you do:
$ git fsck --dangling --no-reflogs dangling commit 2b21a6a8e775a43eafbe1b9e8b1fb5debe77b2ee dangling commit c3e1ecc9f0e7a67c9a05db92d6c6548a1c965830 dangling commit 1702677e8ca31c0536745b36280397457bf20002 dangling commit b746f71fa14416e22920f38b8439bbb1972481a3 dangling commit 408c05cafd2bfdb861676dd54a0ed083f7fdfcaa dangling commit 6c92fe1d7f47e397c3ccd2713ddd9c3b8239d1c8 ... $ git fsck --unreachable --no-reflogs unreachable commit 2b21a6a8e775a43eafbe1b9e8b1fb5debe77b2ee unreachable tree 39610cc0e1126a2bb73cb3345b9228f1ccb374d9 unreachable commit c3e1ecc9f0e7a67c9a05db92d6c6548a1c965830 unreachable tree e981cec3e50cf10b59b544d86be681a7030cb0a6 ...
Automated garbage collection with
Git stores the objects either one by one as loose objects, or with a
very efficient method in packs. But if the size of packs is a lot
lesser than the cumulated loose objects, the access time in a pack is
longer. So git does not pack the objects after each operations, but
only check the state of the repository with
gc --auto look if the number of loose objects exceeds
6700) and then run
git repack -d -l which in turn run
gc.auto to 0 disables repacking. When the numbers of packs is
gc.autopacklimit (default 50, 0 disable it)
git gc --auto consolidates them into one larger pack.
When doing a
git gc --aggressive the efficiency of git-repack depends
of gc.aggressiveWindow (default 250).
git gc --auto also pack refs when
gc.packrefs has its default
true, the refs are then placed in a single file
$GIT_DIR/packed-refs, each modification of a ref again create a
new ref in
GIT_DIR/refs hierarchy that override the corresponding
gc command also run by default with the
which clean unreachable loose objects that are older than
gc.pruneExpire (default 14 days). If you want to use an other date
you have to add
--prune=all prunes loose
objects regardless of their age. Note that it may not be what you want
on a shared repository, where an other operation could be run
An unreachable commit is never pruned as long it is in a
gc --auto run
git reflog expire to prune reflog entries that are
gc.reflogExpire (default 90 days) or unreachable
entries older than
gc.reflogExpireUnreachable (default 30 days).
These values can be also configured for each individual ref see
Records of conflicted merge are also kept
(default 60 days) or
gc.rerereunresolved (default 15 days) for
Forced garbage collection.¶
You can have an idea of the state of your repository by issuing
git count-objects -vH
After some operation that creates a lot of unreachables objects, like
rebasing or filtering branches you
may want to run
git gc without waiting the three months of
expirability. This is also a necessity if you have to delete an
object, now unreachable, but that contains some sensible data, or a
very big object that was added and then deleted from the history (see
the filter branch section).
As the operation is recorded in the reflog, you expire it with:
git reflog expire --expire=now --all
And you garbage collect all unreferenced objects with:
git gc --aggressive --prune=now