`git gc` 和 `git repack -ad; git prune` 有什么区别？

Question

`git gc` 和 `git repack -ad; git prune` 有什么区别？

gitgit-gc

7

有什么区别在于git gc和git repack -ad; git prune吗？如果有的话，git gc（或相反）会执行哪些额外步骤？在空间优化或安全方面，哪一个更好使用？

- Microsoft Linux TM

3个回答

1

git help gc 包含一些提示...

可选的配置变量 gc.rerereresolved 指示先前解决的有冲突合并记录保留的时间。

可选的配置变量 gc.rerereunresolved 指示未解决的有冲突合并记录保留的时间。

我认为如果只执行 git repack -ad; git prune，那么这些操作不会被执行。

- AnoE

0

请注意，git gc 运行时会运行 git prune，后者已经在 Git 2.22（2019 年第二季度）中得到改进。

"git prune" 已经学会了在可能的情况下利用可达性位图。

查看 commit cc80c95, commit c2bf473, commit fde67d6, commit d55a30b (2019年2月14日) 由 Jeff King (peff) 提交。
^{(由 Junio C Hamano -- gitster -- 合并于 commit f7213a3, 2019年3月7日)}

`修剪`: 使用位图进行可达性遍历

修剪通常必须遍历整个提交图，以查看哪些对象是可达的。
这正是可达性位图要解决的问题，因此让我们使用它们（如果它们可用的话）。

在这里查看可达性位图。

Here are timings on git.git:
Test                            HEAD^             HEAD
------------------------------------------------------------------------
5304.6: prune with bitmaps      3.65(3.56+0.09)   1.01(0.92+0.08) -72.3%
And on linux.git:
Test                            HEAD^               HEAD
--------------------------------------------------------------------------
5304.6: prune with bitmaps      35.05(34.79+0.23)   3.00(2.78+0.21) -91.4%
The tests show a pretty optimal case, as we'll have just repacked and should have pretty good coverage of all refs with our bitmaps.
But that's actually pretty realistic: normally prune is run via "gc" right after repacking.

Notes on the implementation: the change is actually in reachable.c, so it would improve reachability traversals by "reflog expire --stale-fix", as well.
Those aren't performed regularly, though (a normal "git gc" doesn't use --stale-fix), so they're not really worth measuring. There's a low chance of regressing that caller, since the use of bitmaps is totally transparent from the caller's perspective.

并且：

参见提交 fe6f2b0（2019年4月18日） by Jeff King (peff)。
^{（由Junio C Hamano -- gitster --合并在提交 d1311be，2019年5月8日）}

修剪：惰性执行可达性遍历

The general strategy of "git prune" is to do a full reachability walk, then for each loose object see if we found it in our walk.
But if we don't have any loose objects, we don't need to do the expensive walk in the first place.

This patch postpones that walk until the first time we need to see its results.

Note that this is really a specific case of a more general optimization, which is that we could traverse only far enough to find the object under consideration (i.e., stop the traversal when we find it, then pick up again when asked about the next object, etc).
That could save us in some instances from having to do a full walk. But it's actually a bit tricky to do with our traversal code, and you'd need to do a full walk anyway if you have even a single unreachable object (which you generally do, if any objects are actually left after running git-repack).

So in practice this lazy-load of the full walk catches one easy but common case (i.e., you've just repacked via git-gc, and there's nothing unreachable).

The perf script is fairly contrived, but it does show off the improvement:
 Test                            HEAD^             HEAD
 -------------------------------------------------------------------------
 5304.4: prune with no objects   3.66(3.60+0.05)   0.00(0.00+0.00) -100.0%
and would let us know if we accidentally regress this optimization.

Note also that we need to take special care with prune_shallow(), which relies on us having performed the traversal.
So this optimization can only kick in for a non-shallow repository. Since this is easy to get wrong and is not covered by existing tests, let's add an extra test to t5304 that covers this case explicitly.

`prune`: 使用位图进行可达性遍历

Pruning generally has to traverse the whole commit graph in order to see which objects are reachable.
This is the exact problem that reachability bitmaps were meant to solve, so let's use them (if they're available, of course).

Here are timings on git.git:
 Test                            HEAD^             HEAD
 ------------------------------------------------------------------------
 5304.6: prune with bitmaps      3.65(3.56+0.09)   1.01(0.92+0.08) -72.3%

而在 linux.git 上：

测试 HEAD^ HEAD -------------------------------------------------------------------------- 5304.6：使用位图修剪 35.05（34.79 + 0.23） 3.00（2.78 + 0.21）-91.4％

这些测试显示了一个相当理想的情况，因为我们刚刚重新打包并且使用位图覆盖了所有引用，所以覆盖率非常好。

但实际上这是相当现实的：通常在重新打包后会通过“gc”运行修剪。

关于实现的一些说明：

- 更改实际上在“reachable.c”中，因此它将通过“reflog expire --stale-fix”改进可达性遍历。不过，这些操作不会经常执行（普通的“git gc”不使用--stale-fix），因此它们不值得测量。调用者的回归风险很低，因为从调用者的角度来看，位图的使用是完全透明的。 - 位图情况实际上可以不创建“struct object”，而是调用者可以在位图结果中查找每个对象ID。但是，这只会在运行时产生微小的改进，并且会使调用者更加复杂。他们必须分别处理位图和非位图情况，在“git-prune”的情况下，我们还必须调整“prune_shallow()”，它依赖于我们的“SEEN”标志。 - 因为我们确实创建了真正的对象结构，所以我们要经过一些扭曲来创建正确类型的对象。这并不是严格必要的（lookup_unknown_object()就足够了），但是使用正确的类型更节省内存，因为我们已经知道它们。

自 Git 2.22（2019 年）开始使用可达性位图时，“不要丢失最近创建的对象和从它们可达的对象”安全机制被错误地禁用以保护我们免受竞争条件的影响，这一问题已在 Git 2.32（2021 年第二季度发布）中得到修正。

查看提交 2ba582b，提交 1e951c6（2021年4月28日）由Jeff King (peff)完成。
^{（由Junio C Hamano -- gitster --在提交 6e08cbd中合并，2021年5月7日）}

`prune`：使用位图保存最近可达对象

^{报告者：David Emett}
^{签名者：Jeff King}

我们将过期的修剪传递给mark_reachable_objects()，它不仅遍历可达对象，还将任何最近的对象视为可达性提示；有关详细信息，请参见d3038d2（“prune: keep objects reachable from recent objects”，2014-10-15，Git v2.2.0-rc0 -- merge）。

然而，这与在fde67d6中添加的位图代码路径产生了不良互动（“prune: use bitmaps for reachability traversal”，2019-02-13，Git v2.22.0-rc0 -- merge列在batch #2中）。

如果我们进入了位图优化路径，则立即返回以避免常规遍历，意外跳过“也遍历最近”的代码。

相反，我们应该对位图和常规遍历进行if-else判断，然后在任何情况下都进行“最近”的遍历。
这将重用"rev_info"作为位图和常规遍历，但这应该可以正常工作（位图代码以通常的方式清除挂起数组，就像常规遍历一样）。

- VonC

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Enrico Campidoglio · Accepted Answer

有什么区别在于默认情况下，git gc 对需要进行的维护任务非常保守。例如，除非存储库中的松散对象数量超过某个阈值（可通过 gc.auto 变量进行配置），否则它不会运行 git repack。此外，git gc 会运行比 git repack 和 git prune 更多的任务。

根据文档，git gc 运行以下操作：

git-prune
git-reflog
git-repack
git-rerere

具体来说，通过查看gc.c的源代码(第338-343行)¹，我们可以看到它最多调用以下命令：

pack-refs --all --prune
reflog expire --all
repack -d -l
prune --expire
worktree prune --expire
rerere gc

根据(121-126行)的包数，它可能会使用-A选项运行repack，而非(203-212行)。

* If there are too many loose objects, but not too many
* packs, we run "repack -d -l". If there are too many packs,
* we run "repack -A -d -l".  Otherwise we tell the caller
* there is no need.
if (too_many_packs())
    add_repack_all_option();
else if (!too_many_loose_objects())
    return 0;

请注意 need_for_gc函数中211-212行，如果仓库中没有足够的松散对象，则根本不会运行gc。

这在文档中进一步说明:

如果仓库中存在太多的松散对象或包，则需要进行清理。如果松散对象的数量超过“gc.auto”配置变量的值，则使用“git repack -d -l”将所有松散对象合并为单个包。将“gc.auto”的值设置为“0”将禁用松散对象的自动打包。如果包的数量超过“gc.autoPackLimit”的值，则使用“git repack”的“-A”选项将现有包（除了那些带有“.keep”文件标记的包）合并为单个包。

正如您所看到的，“git gc”会根据仓库的状态尽力做正确的事情。

一般来说，运行“git gc --auto”更好，因为它将尽可能少地执行必要的工作，以使仓库保持良好状态-安全且不浪费太多资源。

然而，请注意，垃圾回收可能已经在某些命令之后自动触发，除非将gc.auto配置变量设置为0禁用此行为。

来自文档:

--auto
使用此选项，git gc会检查是否需要进行任何清理工作; 如果不需要，则不执行任何操作。一些git命令在执行可能创建许多松散对象的操作后运行git gc --auto。

因此，对于大多数存储库，您通常不需要显式运行git gc，因为它已经为您处理了。

_{截至2016年08月08日提交的a0a1831。}

`git gc` 和 `git repack -ad; git prune` 有什么区别？

修剪: 使用位图进行可达性遍历

修剪：惰性执行可达性遍历

prune: 使用位图进行可达性遍历

prune：使用位图保存最近可达对象

`修剪`: 使用位图进行可达性遍历

`prune`: 使用位图进行可达性遍历

`prune`：使用位图保存最近可达对象