




参见: - Trilarion


Git追踪文件的内容而非文件名。因此,如果仅仅是重命名文件而不改变其内容,Git可以轻易地进行检测。 (Git不跟踪,但执行检测; 使用git mvgit rmgit add等效于相同操作。)

当文件被添加到存储库中时,文件名在树对象中。实际文件内容作为二进制大对象(blob)添加到存储库中。 如果包含相同内容的其他文件,则Git不会为其添加另一个blob。实际上,Git无法这样做,因为内容存储在文件系统中,哈希的前两个字符是目录名,其余的是其中的文件名。 因此,检测重命名只需要比较哈希。

为了检测重命名文件的小改动,Git使用某些算法和阈值限制来判断是否是重命名。例如,请查看git diff-M标志。还有一些配置值,如merge.renameLimit(在合并期间执行重命名检测时要考虑的文件数)。



"你不需要考虑如何做" - 我认为那就是问题所在? - bain
不幸的是,这些算法似乎在我的情况下无法正常工作。Git似乎被一些由Kdiff3意外留下并检入的.orig文件所困扰...Git似乎认为.orig文件被重命名为其他东西,而实际上其他文件是重命名的源头。如果我误解了我的情况,请原谅我,因为我不想发布虚假信息。 - Shawn Eary
@ShawnEary 我今天也有类似的经历,git cherry-pick 更新了错误的文件,因为它错误地认为这是一个重命名而不是添加。不幸的是,我已经推送了更改,所以我不得不手动重新添加正确的文件。在我看来,Git 的重命名检测是一个愚蠢的概念 - 它应该坚持用户明确的重命名(就像 hg 一样)。 - Frank Schmitt



至少在Git 2.33(2021年第三季度)中,对“git diff -l<n>(man)diff.renameLimit的文档进行了更新,并且将这些限制的默认值提高了。

请查看提交 94b82d5, 提交 9dd29db, 提交 6623a52, 提交 05d2c61 (2021年7月15日) 由Elijah Newren (newren)提交。
(由Junio C Hamano -- gitster --提交 268055b中合并,2021年7月28日)

重命名: 再次提高限制默认值

签名作者: Elijah Newren

These were last bumped in commit 92c57e5 ("bump rename limit defaults (again)", 2011-02-19, Git v1.7.5-rc0 -- merge), and were bumped both because processors had gotten faster, and because people were getting ugly merges that caused problems and reporting it to the mailing list (suggesting that folks were willing to spend more time waiting).

Since that time:

  • Linus has continued recommending kernel folks to set diff.renameLimit=0 (maps to 32767, currently)
  • Folks with repositories with lots of renames were happy to set merge.renameLimit above 32767, once the code supported that, to get correct cherry-picks
  • Processors have gotten faster
  • It has been discovered that the timing methodology used last time probably used too large example files.

The last point is probably worth explaining a bit more:

  • The "average" file size used appears to have been average blob size in the linux kernel history at the time (probably v2.6.25 or something close to it).
  • Since bigger files are modified more frequently, such a computation weights towards larger files.
  • Larger files may be more likely to be modified over time, but are not more likely to be renamed -- the mean and median blob size within a tree are a bit higher than the mean and median of blob sizes in the history leading up to that version for the linux kernel.
  • The mean blob size in v2.6.25 was half the average blob size in history leading to that point
  • The median blob size in v2.6.25 was about 40% of the mean blob size in v2.6.25.
  • Since the mean blob size is more than double the median blob size, any file as big as the mean will not be compared to any files of median size or less (because they'd be more than 50% dissimilar).
  • Since it is the number of files compared that provides the O(n^2) behavior, median-sized files should matter more than mean-sized ones.

The combined effect of the above is that the file size used in past calculations was likely about 5x too large.
Combine that with a CPU performance improvement of ~30%, and we can increase the limits by a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated time limits.

Keeping the same approximate time limit probably makes sense for diff.renameLimit (there is no progress feedback in e.g. git log -p(man)), but the experience above suggests merge.renameLimit could be extended significantly.
In fact, it probably would make sense to have an unlimited default setting for merge.renameLimit, but that would likely need to be coupled with changes to how progress is displayed.
(See for details in that area.)
For now, let's just bump the approximate time limit from 10s to 1m.

(Note: We do not want to use actual time limits, because getting results that depend on how loaded your system is that day feels bad, and because we don't discover that we won't get all the renames until after we've put in a lot of work rather than just upfront telling the user there are too many files involved.)

Using the original time limit of 2s for diff.renameLimit, and bumping merge.renameLimit from 10s to 60s, I found the following timings using the simple script at the end of this commit message (on an AWS c5.xlarge which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):

N   Timing
0    1.995s
0   59.973s

So let's round down to nice even numbers and bump the limits from 400->1000, and from 1000->7000.

Here is the measure_rename_perf script (adapted from in particular to avoid triggering the linear handling from basename-guided rename detection):


n=$1; shift

rm -rf repo
mkdir repo && cd repo
git init -q -b main

mkdata() {
  mkdir $1
  for i in `seq 1 $2`; do
    (sed "s/^/$i /" <../sample
     echo tag: $1
    ) >$1/$i

mkdata initial $n
git add .
git commit -q -m initial

mkdata new $n
git add .
cd new
for i in *; do git mv $i $i.renamed; done
cd ..
git rm -q -rf initial
git commit -q -m new

time git diff-tree -M -l0 --summary HEAD^ HEAD

git config现在在其手册页面中包含:


git config现在在其手册页面中包含:


同时,Git 2.33(2021年第三季度)也有以下更新:

查看提交 94b82d5, 提交 9dd29db, 提交 6623a52, 提交 05d2c61 (2021年7月15日)由Elijah Newren (newren)完成。
(由Junio C Hamano -- gitster --提交 268055b中合并,于2021年7月28日)

doc: 澄清重命名/复制限制的文档

签名作者:Elijah Newren

  • 9027f53(“为精确重命名执行线性时间/空间重命名逻辑”,2007-10-25,Git v1.5.4-rc0 - 合并
  • bd24aa2(“diffcore-rename:基于基本名称指导不精确的重命名检测”,2021-02-14,Git v2.31.0-rc1 - 合并
  • (作为旁注,对于复制检测,基本名称指导的不精确重命名检测被关闭,而精确重命名只会导致源文件(没有目标文件)从用于二次检测的文件集中删除。因此,对于复制检测,文档更接近正确。)

    git config现在在其手册页面中包含以下内容:

    在复制/重命名检测的详尽部分中要考虑的文件数量;等同于'git diff'选项-l

    git config现在在其手册页面中包含以下内容:


    如果未指定,则默认为 diff.renameLimit 的值。
    如果未指定 merge.renameLimitdiff.renameLimit,则当前默认为 1000。

    diff-options 现在在其手册页面中包括:





    网页内容由stack overflow 提供, 点击上面的