Git：使用共享本地存储（使用硬链接）克隆

Question

Git：使用共享本地存储（使用硬链接）克隆

4

我希望能够为大量开发人员重复克隆一个非常大的远程git仓库变得更加容易。本地按用户缓存某种方式是必要的。显然有很多方法可以实现这一点，我只是惊讶的是似乎我认为最自然的方法在git中不存在。

在此问题上是否有行业标准做法？是否有我误解的git选项？

理想的解决方案

#first clone - very slow.
git clone ssh://remote.repo/repo.git repo1
#subsequent clones - lightning fast
git clone --shared-with-hard-links repo1 ssh://remote.repo/repo.git repo2

在这个虚构的解决方案中，没有创建.git/objects/info/alternates，对象只是通过硬链接进行克隆共享，就像rsync的--link-dest选项或当repo在本地文件系统上时的git克隆一样。

我看到的替代方案都不太理想:

我可以执行git clone --reference repo1 ssh://remote.repo/repo.git repo2，但它依赖于repo1的存在，如果删除repo1，则repo2将无法正常使用。
我可以执行git clone --dissociate --reference repo1 ssh://remote.repo/repo.git repo2，但存储空间不共享，因此现在我已经使用了两倍的存储空间，而且可能仍然相对较慢。
有各种各样的hack方法，需要在克隆和拉取周围添加包装器。与真正的编程相比，复杂性显然是微不足道的，但在一堆包装器下运行您的SCM实际上只会成为一个麻烦，应该尽量避免。
- 在每个开发人员的计算机上的中心位置存储git“缓存”库，并在clone周围自动获取并使用clone --reference <cache>的包装器。
- 记住每个克隆，随后的克隆将寻找现有的本地克隆并从中本地克隆（创建硬链接），然后进行修复。粗略地说，它大致如下:

#find any existing clones... repo1
git clone /path/to/repo1 repo2
git remote rm origin
git remote add origin ssh://remote.repo/repo.git
git fetch
#Abandon any local changes made in the other workspace
for ref in $(git --git-dir "$gitdir" for-each-ref  refs/heads --format "%(refname)" ) ; do
    refbase=$(basename $ref)
    run_cmd git --git-dir "$gitdir" update-ref $ref remotes/origin $refbase
done

但这似乎都像是一种黑客行为，肯定会有更好的方法吧？

谢谢，
Mort

注：

我们实际上有一个本地局域网镜像。由于仓库太大了，我们需要比仅此更好的方法来达到合理的克隆速度。
仓库很大。通过GigE克隆需要11分钟，在Windows上最多需要40分钟。

更新

我能想到的最好的方法是在/var/cache/git/<repo_name>.git中拥有一个缓存，它是中央仓库的一个clone --mirror。新的克隆使用--shared选项，既可以减少初始克隆的空间/时间，也可以加速后续的fetch操作。有一个包装脚本来clone一个新的工作区，并执行此操作：

git --git-dir /var/cache/git/<repo_name>.git remote update
git clone --shared /var/cache/git/<repo_name>.git
git remote set-url origin ssh://remote.repo/repo.git

我希望有一种依赖硬链接的东西，因为它们不会受到共享缓存中的对象被删除的影响。但我猜这样的东西不存在。

- Mort

有趣的阅读材料：https://github.com/git/git/commit/908700c0082487f9c859b951370148ff7e8acb97 （用于即将发布的Git 2.7） - VonC

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Matthieu Moy · Accepted Answer

默认情况下，当您克隆本地存储库时，Git会使用硬链接。所以，您可以...

git clone /path/to/repo /path/to/clone
cd /path/to/clone
git remote add upstream http://example.com/path/to/repo/to/clone
git fetch upstream

但这样做有很多缺点：

下一次的git gc操作将会破坏硬链接，并占用你的磁盘空间。
仅在相同分区上的/path/to/repo和/path/to/clone才能生效。
你在处理结果时必须小心使用工具，例如没有带有-H选项的rsync将复制所有的硬链接。

我认为在大多数情况下，.git /objects/info/alternates 更好。