在我的代码库中,git diff
和 git stash
都能够快速运行,在不到一秒钟的时间内完成。但是,git stash -p
在显示第一个 hunk 前需要花费约 20 秒钟的时间。这是为什么呢?
newren
)于2020年1月7日提交)。(由Junio C Hamano -- gitster
--在2020年1月22日提交的提交a3648c0合并)
unpack-trees
: exitcheck_updates()
early if updates are not wantedSigned-off-by: Elijah Newren
check_updates()
has a lot of code that repeatedly checks whethero->update
oro->dry_run
are set.(Note that
o->dry_run
is a near-synonym for!o->update,
but not quite as per commit 2c9078d05bf2 ("unpack-trees
: add thedry_run
flag tounpack_trees_options
", 2011-05-25, Git v1.7.6-rc0).)
In fact, this function almost turns into a no-op whenever the condition!o->update || o->dry_run
is met.
Simplify the code by checking this condition at the beginning of the function, and when it is true, do the few things that are relevant and return early.
There are a few things that make the conversion not quite obvious:
- The fact that check_updates() does not actually turn into a no-op when updates are not wanted may be slightly surprising.
However, commit 33ecf7eb61 (Discard "deleted
" cache entries after using them to update the working tree, 2008-02-07, Git v1.5.5-rc0) put the discarding of unused cache entries incheck_updates()
so we still need to keep the call toremove_marked_cache_entries()
.
It's possible this call belongs in another function, but it is certainly needed as tests will fail if it is removed.- The original called
remove_scheduled_dirs()
unconditionally.
Technically, commit 7847892716 (unlink_entry()
: introduceschedule_dir_for_removal()
, 2009-02-09, Git v1.6.3-rc0) should have made that call conditional, but it didn't matter in practice becauseremove_scheduled_dirs()
becomes a no-op when all the calls to unlink_entry() are skipped.
As such, we do not need to call it.- When
(o->dry_run && o->update)
, the original would have two calls togit_attr_set_direction()
surrounding a bunch of skipped updates.
These two calls togit_attr_set_direction()
cancel each other out and thus can be omitted wheno->dry_run
is true just as they already are when!o->update
.- The code would previously call
setup_collided_checkout_detection()
andreport_collided_checkout()
even wheno->dry_run
.
However, this was just an expensive no-op becausesetup_collided_checkout_detection()
merely cleared theCE_MATCHED
flag for each cache entry, andreport_collided_checkout()
reported which ones had it set.
Since a dry-run would skip all thecheckout_entry()
calls,CE_MATCHED
would never get set and thus no collisions would be reported.
Since we can't detect the collisions anyway without doing updates, skipping the collisions detection setup and reporting is an optimization.- The code previously would call
get_progress()
anddisplay_progress()
even when(!o->update || o->dry_run)
.
This served to show how long it took to skip all the updates, which is somewhat useless.
Since we are skipping the updates, we can skip showing how long it takes to skip them.
stash -p
的情况会调用函数stash_patch
。否则是stash_working_tree
。stash_patch
中,有子进程调用其他git命令。其中之一是read-tree
(参见:man git-read-tree
)。最终命令看起来像这样:GIT_INDEX_FILE=index.stash.<PID> git read-tree HEAD
。实际上这不需要时间。GIT_INDEX_FILE=index.stash.<PID> git add--interactive --patch=stash -- <PATH>
- 这就是所有读取的来源,也是花费所有时间的地方。
有趣的是:在GIT_INDEX_FILE=index.stash.<PID> git read-tree HEAD
之后只调用GIT_INDEX_FILE=index.stash.<PID> git status
与调用git add--interactive
一样昂贵。实际上,add--interactive
是一个实现add -p
的perl脚本。我不懂perl,阅读起来很困难,但它可能会检查工作目录状态并使用与git status
相同的代码。git status
的内部。
GIT_INDEX_FILE=.git/index.stash.test git read-tree HEAD
GIT_TRACE_PERFORMANCE=/tmp/trace_status GIT_INDEX_FILE=.git/index.stash.test git st .
20:31:20.439868 read-cache.c:2290 performance: 0.000269090 s: read cache .git/index.stash.test
20:31:20.441368 preload-index.c:147 performance: 0.001419629 s: preload index
20:32:15.568433 read-cache.c:1605 performance: 55.128484420 s: refresh index
20:32:15.568611 diff-lib.c:251 performance: 0.000054503 s: diff-files
20:32:15.568847 unpack-trees.c:1546 performance: 0.000004362 s: traverse_trees
20:32:15.568868 unpack-trees.c:447 performance: 0.000008189 s: check_updates
20:32:15.568874 unpack-trees.c:1643 performance: 0.000040807 s: unpack_trees
20:32:15.568879 diff-lib.c:537 performance: 0.000079322 s: diff-index
20:32:15.569115 name-hash.c:600 performance: 0.000197074 s: initialize name hash
20:32:15.573785 dir.c:2326 performance: 0.004883714 s: read directory
20:32:15.574904 read-cache.c:3017 performance: 0.001083674 s: write index, changed mask = 82
20:32:15.575125 trace.c:475 performance: 55.135763475 s: git command: /usr/lib/git-core/git status .
20:32:15.575421 trace.c:475 performance: 55.136831211 s: git command: git st .
我的仓库看起来像这样:
>$ du -hd 1
1,1M ./.idea
74M ./code
3,0G ./.git
2,4G ./test-data
5,5G .
如果直接应用到git stash -p
,则类似的图片如下:
20:43:55.968088 read-cache.c:1605 performance: 59.716998605 s: refresh index
20:43:55.969584 trace.c:475 performance: 59.719061140 s: git command: git update-index --refresh
git update-index --refresh的Man页面中写道:
USING --REFRESH
--refresh does not calculate a new sha1 file or bring the index up to date for mode/content changes. But what it does do is to "re-match" the stat information of a file with the index, so that you can refresh the index for a
file that hasn’t been changed but where the stat entry is out of date.
For example, you’d want to do this after doing a git read-tree, to link up the stat index details with the proper files.
.git
目录为2GB,包括.git
的整个目录为9GB。比较git stash
和git stash -p
的strace -f
,后者会进行更多的read
调用。 - Tor Klingberg