如何拆分一个Git仓库并跟踪目录重命名?

31

我目前有一个包含多个项目的庞大git仓库,每个项目都在其自己的子目录中。我需要将其拆分成单独的仓库,每个项目在其自己的仓库中。

我尝试了git filter-branch --prune-empty --subdirectory-filter PROJECT master

然而,许多项目目录在它们的生命周期中经历了几次重命名,而git filter-branch不会跟随重命名,因此提取出的仓库实际上没有任何历史记录超过最后一次重命名。

如何有效地从一个庞大的git仓库中提取一个子目录,并跟踪该目录的所有重命名到过去?


2
希望能看到使用 git-filter-repo 的解决方案,因为现在它已经被推荐用来替代 git-filter-branch - Martin Thøgersen
3个回答

19
感谢@Chronial,我能够编写一个脚本来按照我的需求处理我的git仓库:
git filter-branch --prune-empty --index-filter '
    # Delete files which are NOT needed
    git ls-files -z | egrep -zv  "^(NAME1|NAME2|NAME3)" | 
        xargs -0 -r git rm --cached -q             
    # Move files to root directory
    git ls-files -s | sed -e "s-\t\(NAME1\|NAME2\|NAME3\)/-\t-" |
        GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
        git update-index --index-info &&
        ( test ! -f "$GIT_INDEX_FILE.new" \
            || mv -f "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE" )
'

基本上这个功能做的事情如下:
  1. 删除所有在三个目录NAME1、NAME2或NAME3之外的文件(其中一个项目在其生命周期内被重命名为NAME1 -> NAME2 -> NAME3)。

  2. 将这三个目录中的所有内容移动到仓库的根目录。

  3. 我需要测试是否存在"$GIT_INDEX_FILE.new",因为从svn导入到git会创建没有任何文件(只有目录)的提交。仅在最初使用'git svn clone'创建存储库时需要。


2
仅仅是为了补充这个 惊人的 回答,从我自己的经验来说- 任何使用 Mac 的人都需要使用 homebrew 安装 GNU grep、sed 和 findutils,并相应地将 egrep 替换为 gegrep,xargs 替换为 gxargs,sed 替换为 gsed。 - thomasmichaelwallace
虽然这个脚本可以在名为NAME1,NAME2或NAME3的根目录上工作,但它也可以在名为NAME1,NAME2或NAME3的子目录上工作,将这些子目录拉到存储库的根目录。但如果NAME1, NAME2或NAME3是包含空格的目录,则此脚本将无法正常工作。 - Mort

8

我有一个非常大的代码仓库,需要提取其中的一个文件夹;即使使用 --index-filter 预计也需要8小时才能完成。以下是我采用的方法:

  1. Obtain a list of all the past names of the folder. In my case there were only two, old-name and new-name.
  2. For each name:

    $ git checkout master
    $ git checkout -b filter-old-name
    $ git filter-branch --subdirectory-filter old-name
    

    This will give you several disconnected branches, each containing history for one of the names.

  3. The filter-old-name branch should end with the commit which renamed the folder, and the filter-new-name branch should begin with the same commit. (The same applies if there was more than one rename: you'll wind up with an equivalent number of branches, each with a commit shared with the next one along.) One should delete everything and the other should recreate it again. Make sure that these two commits have identical contents; if they don't, the file was modified in addition to being renamed, and you will need to merge the changes. (In my case I didn't have this problem so I don't know how to solve it.)

    An easy way to check this is to try rebasing filter-new-name on top of filter-old-name and then squashing the two commits together: git should complain that this produces an empty commit. (Note that you will want to do this on a spare branch and then delete it: rebasing deletes the Committer information from the commits, thus losing some of the history you want to keep.)

  4. The next step is to graft the two branches together, skipping the two commits which renamed the folder. (Otherwise there will be a weird jump where everything is deleted and recreated.) This involves finding the full SHA (all 40 characters!) of the two commits and putting them into git's info, with the new name branch's commit first, and the old name branch's commit second.

    $ echo $NEW_NAME_SECOND_COMMIT_SHA1 $OLD_NAME_PENULTIMATE_COMMIT_SHA1 >> .git/info/grafts
    

    If you've done this right, git log --graph should now show a line from the end of the new history to the start of the old history.

  5. This graft is currently temporary: it is not yet part of the history, and won't follow along with clones or pushes. To make it permanent:

    $ git filter-branch
    

    This will refilter the branch without trying to make any further changes, making the graft permanent (changing all of the commits in the filter-new-name branch). You should now be able to delete the .git/info/grafts file.

在完成所有操作后,您现在应该在 filter-new-name 分支上拥有文件夹两个名称的所有历史记录。然后,您可以使用此单独的代码库,或将其合并到另一个代码库中,或者按照您的意愿处理这些历史记录。

这种方法比“每次提交”过滤器快得多。尽管如此,我在合并两个分支时遇到了问题。我发现一些冲突,而且我真的无法解释它们。 - Luis Ayuso

6
我不认为git有这样的内置功能。您需要构建自己的筛选器。只需使用git filter-branch --prune-empty --tree-filter YOURSCRIPT。然后,您的脚本将必须识别正确的文件夹(也许是其中一个特定文件的名称,或者您可能拥有项目过去所有名称的列表),删除其他所有内容并将文件夹内容上移一级。
如果您的repo非常大,并且没有时间运行此脚本,则可以使用--index-filter更快地实现相同的效果,但编写该脚本会更加复杂。您将必须使用git修改索引的命令而不是文件系统修改命令。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接