为什么执行 git stash -p 命令需要很长时间才能开始?

8

在我的代码库中,git diffgit stash 都能够快速运行,在不到一秒钟的时间内完成。但是,git stash -p 在显示第一个 hunk 前需要花费约 20 秒钟的时间。这是为什么呢?


2
我无法重现这个问题。您应该告诉我们使用的Git版本、操作系统及其版本、存储库的大小以及在隐藏更改时有多少更改。如果您正在使用Linux,请查看“strace -f git stash -p”的输出并查看它正在执行什么操作。 - John Zwinck
@JohnZwinck:git 2.7.4,Ubuntu 16.04.4。如果没有更改,它会立即显示,但只要有一个微小的更改,就需要20多秒。.git目录为2GB,包括.git的整个目录为9GB。比较git stashgit stash -pstrace -f,后者会进行更多的read调用。 - Tor Klingberg
请问你能否共享使用 -p 储存的文件列表? - Vinay Prajapati
没有实际的日志记录,包括执行命令所需的时间、git版本和系统详细信息,就无法确定为什么在您的计算机上运行缓慢。请添加更多细节。 - Vinay Prajapati
2个回答

2
这应该会在Git 2.25.2(2020年3月)中得到改善,该版本添加了代码简化。
请参见讨论
请参见提交26f924d(由Elijah Newren (newren)于2020年1月7日提交)。(由Junio C Hamano -- gitster --在2020年1月22日提交的提交a3648c0合并)

unpack-trees: exit check_updates() early if updates are not wanted

Signed-off-by: Elijah Newren

check_updates() has a lot of code that repeatedly checks whether o->update or o->dry_run are set.

(Note that o->dry_run is a near-synonym for !o->update, but not quite as per commit 2c9078d05bf2 ("unpack-trees: add the dry_run flag to unpack_trees_options", 2011-05-25, Git v1.7.6-rc0).)
In fact, this function almost turns into a no-op whenever the condition

!o->update || o->dry_run

is met.

Simplify the code by checking this condition at the beginning of the function, and when it is true, do the few things that are relevant and return early.

There are a few things that make the conversion not quite obvious:

  • The fact that check_updates() does not actually turn into a no-op when updates are not wanted may be slightly surprising.
    However, commit 33ecf7eb61 (Discard "deleted" cache entries after using them to update the working tree, 2008-02-07, Git v1.5.5-rc0) put the discarding of unused cache entries in check_updates() so we still need to keep the call to remove_marked_cache_entries().
    It's possible this call belongs in another function, but it is certainly needed as tests will fail if it is removed.
  • The original called remove_scheduled_dirs() unconditionally.
    Technically, commit 7847892716 (unlink_entry(): introduce schedule_dir_for_removal(), 2009-02-09, Git v1.6.3-rc0) should have made that call conditional, but it didn't matter in practice because remove_scheduled_dirs() becomes a no-op when all the calls to unlink_entry() are skipped.
    As such, we do not need to call it.
  • When (o->dry_run && o->update), the original would have two calls to git_attr_set_direction() surrounding a bunch of skipped updates.
    These two calls to git_attr_set_direction() cancel each other out and thus can be omitted when o->dry_run is true just as they already are when !o->update.
  • The code would previously call setup_collided_checkout_detection() and report_collided_checkout() even when o->dry_run.
    However, this was just an expensive no-op because setup_collided_checkout_detection() merely cleared the CE_MATCHED flag for each cache entry, and report_collided_checkout() reported which ones had it set.
    Since a dry-run would skip all the checkout_entry() calls, CE_MATCHED would never get set and thus no collisions would be reported.
    Since we can't detect the collisions anyway without doing updates, skipping the collisions detection setup and reporting is an optimization.
  • The code previously would call get_progress() and display_progress() even when (!o->update || o->dry_run).
    This served to show how long it took to skip all the updates, which is somewhat useless.
    Since we are skipping the updates, we can skip showing how long it takes to skip them.

1
升级后速度从6秒提高到4秒。虽然不是很惊人,但总比没有好。 - Kyle Heironimus

1
我注意到了同样的问题。这个问题至少在一年前开始出现,但自那时以来并没有得到改善。 我也在一个非常大的repo上使用git。不幸的是,在我的情况下,里面还有很多二进制数据,因为它只是使用git_svn镜像SVN repo,并且我的同事认为把二进制测试数据放入repo是一个好主意。
没有答案,只有提示和猜测要搜索的地方。
看起来最大的区别是,对于stash -p的情况会调用函数stash_patch。否则是stash_working_tree
stash_patch中,有子进程调用其他git命令。其中之一是read-tree(参见:man git-read-tree)。最终命令看起来像这样:GIT_INDEX_FILE=index.stash.<PID> git read-tree HEAD。实际上这不需要时间。
下一步是另一个子进程调用GIT_INDEX_FILE=index.stash.<PID> git add--interactive --patch=stash -- <PATH> - 这就是所有读取的来源,也是花费所有时间的地方。 有趣的是:在GIT_INDEX_FILE=index.stash.<PID> git read-tree HEAD之后只调用GIT_INDEX_FILE=index.stash.<PID> git status与调用git add--interactive一样昂贵。实际上,add--interactive是一个实现add -p的perl脚本。我不懂perl,阅读起来很困难,但它可能会检查工作目录状态并使用与git status相同的代码。
基本思想似乎是:
  • 从HEAD创建一个临时索引
  • 交互式将更改添加到该索引中
  • 将已更改的临时索引保存到树状结构中
昂贵的部分似乎是获取工作目录相对于临时索引的状态。为什么这么昂贵我不知道。可能有一些缓存数据无效,并且它必须读取工作副本中的所有文件,至少在某种程度上与临时索引进行比较,但要理解这一点,就必须深入了解git status的内部。
我尝试这样测量:

GIT_INDEX_FILE=.git/index.stash.test git read-tree HEAD
GIT_TRACE_PERFORMANCE=/tmp/trace_status GIT_INDEX_FILE=.git/index.stash.test git st .

结果看起来像这样:

20:31:20.439868 read-cache.c:2290       performance: 0.000269090 s:  read cache .git/index.stash.test
20:31:20.441368 preload-index.c:147     performance: 0.001419629 s:   preload index
20:32:15.568433 read-cache.c:1605       performance: 55.128484420 s:  refresh index
20:32:15.568611 diff-lib.c:251          performance: 0.000054503 s:  diff-files
20:32:15.568847 unpack-trees.c:1546     performance: 0.000004362 s:    traverse_trees
20:32:15.568868 unpack-trees.c:447      performance: 0.000008189 s:    check_updates
20:32:15.568874 unpack-trees.c:1643     performance: 0.000040807 s:   unpack_trees
20:32:15.568879 diff-lib.c:537          performance: 0.000079322 s:  diff-index
20:32:15.569115 name-hash.c:600         performance: 0.000197074 s:   initialize name hash
20:32:15.573785 dir.c:2326              performance: 0.004883714 s:  read directory 
20:32:15.574904 read-cache.c:3017       performance: 0.001083674 s:  write index, changed mask = 82
20:32:15.575125 trace.c:475             performance: 55.135763475 s: git command: /usr/lib/git-core/git status .
20:32:15.575421 trace.c:475             performance: 55.136831211 s: git command: git st .

我的仓库看起来像这样:

>$ du -hd 1
1,1M    ./.idea
74M     ./code
3,0G    ./.git
2,4G    ./test-data
5,5G    .

如果直接应用到git stash -p,则类似的图片如下:

20:43:55.968088 read-cache.c:1605       performance: 59.716998605 s:  refresh index
20:43:55.969584 trace.c:475             performance: 59.719061140 s: git command: git update-index --refresh

git update-index --refresh的Man页面中写道:

USING --REFRESH
       --refresh does not calculate a new sha1 file or bring the index up to date for mode/content changes. But what it does do is to "re-match" the stat information of a file with the index, so that you can refresh the index for a
       file that hasn’t been changed but where the stat entry is out of date.

       For example, you’d want to do this after doing a git read-tree, to link up the stat index details with the proper files.

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接