使用LibGit2Sharp截断GIT提交历史记录

3

我计划使用LibGit2/LibGit2Sharp和GIT来进行一种非正统的方式,因此我想请熟悉API的人确认我提出的理论是否可行。 :)

场景

仅在存储库中存在主分支。将跟踪和提交大量包含大型二进制和非二进制文件的目录。大多数二进制文件在提交之间会更改。由于磁盘空间限制(磁盘经常填满),存储库应不超过10个提交。

API没有提供截断提交历史记录的功能,从指定的CommitId开始回到主分支的初始提交,并删除任何作为结果悬空的GIT对象。

我已经测试了ReferenceCollection.RewiteHistory方法,并可以使用它来从提交中删除父项。这为我创建了一个新的提交历史记录,从CommitId开始返回到HEAD。但是,这仍然留下了所有旧提交以及与这些提交唯一相关的任何引用或blob。我现在的计划是简单地自己清理这些悬空的GIT对象。有人看到这种方法有问题或者有更好的方法吗?

2个回答

3
但这仍然留下了所有旧的提交以及任何只属于这些提交的引用或二进制大对象。我现在的计划是简单地清理这些悬空的GIT对象。
在重写存储库的历史记录时,LibGit2Sharp会注意不要丢弃重写后的引用。它们存储的命名空间默认为refs/original。可以通过RewriteHistoryOptions参数进行更改。
为了删除旧的提交、树和二进制大对象,首先必须删除这些引用。可以使用以下代码实现:
foreach (var reference in repo.Refs.FromGlob("refs/original/*"))
{
    repo.Refs.Remove(reference);
}

下一步是清理现在悬挂的 Git 对象。然而,这不能通过 LibGit2Sharp 来完成(目前还不支持)。一个选择是通过以下命令切换到 Git:
git gc --aggressive

这种方法会非常有效/具有破坏性/不可恢复地减小代码库的大小。

有人看到这种方法存在问题或者有更好的方法吗?

你的方法看起来是有效的。

更新

有人看到这种方法存在问题或者有更好的方法吗?

如果限制是硬盘容量,另一种选择就是使用像git-annexgit-bin之类的工具将大型二进制文件存储在git仓库之外。查看这个SO问题以获取有关此主题的不同观点和潜在缺点(部署、锁定等)。

我将尝试您提供的RewriteHistoryOptions和foreach代码。但需要注意的是,对于我来说,它看起来像是对悬空的git对象进行File.Delete操作。

请注意,这可能是一条崎岖的道路。

  • Git以两种格式存储对象。松散格式(每个对象在磁盘上保存为一个文件)或打包格式(磁盘上的一条条目包含多个对象)。从打包文件中删除对象通常有点复杂,因为它需要重新编写打包文件。
  • 在Windows上,.git\objects文件夹中的条目通常是只读文件。处于这种状态时,File.Delete无法将它们删除。您必须首先取消只读属性,例如通过调用 File.SetAttributes(path, FileAttributes.Normal);
  • 尽管您可能能够确定已重写了哪些提交,但确定哪些是悬空/不可访问的树形结构和Blob可能变成非常复杂的任务。

Lib2GitSharp是否以压缩格式存储文件?我看到它可以接受存储库中的压缩文件,但我没有看到任何特定的API会导致打包发生。如果我必须处理压缩文件,那么我的工作将变得更加困难。 - user3092651
另外,如果正在打包文件,我能防止这种行为吗? - user3092651

0
根据上面的建议,这是我想出来的初步(仍在测试中)C#代码,它将在特定SHA处截断主分支,创建一个新的初始提交。它还会删除所有悬空引用和Blob。
        public class RepositoryUtility
{
    public RepositoryUtility()
    {
    }
    public String[] GetPaths(Commit commit)
    {
        List<String> paths = new List<string>();
        RecursivelyGetPaths(paths, commit.Tree);
        return paths.ToArray();
    }
    private void RecursivelyGetPaths(List<String> paths, Tree tree)
    {
        foreach (TreeEntry te in tree)
        {
            paths.Add(te.Path);
            if (te.TargetType == TreeEntryTargetType.Tree)
            {
                RecursivelyGetPaths(paths, te.Target as Tree);
            }
        }
    }
    public void TruncateCommits(String repositoryPath, Int32 maximumCommitCount)
    {
        IRepository repository = new Repository(repositoryPath);
        Int32 count = 0;
        string newInitialCommitSHA = null;
        foreach (Commit masterCommit in repository.Head.Commits)
        {
            count++;
            if (count == maximumCommitCount)
            {
                newInitialCommitSHA = masterCommit.Sha;
            }
        }
        //there must be parent commits to the commit we want to set as the new initial commit
        if (count > maximumCommitCount)
        {
            TruncateCommits(repository, repositoryPath, newInitialCommitSHA);
        }
    }
    private void RecursivelyCheckTreeItems(Tree tree,Dictionary<String, TreeEntry> treeItems, Dictionary<String, GitObject> gitObjectDeleteList)
    {
        foreach (TreeEntry treeEntry in tree)
        {
            //if the blob does not exist in a commit before the truncation commit then add it to the deletion list
            if (!treeItems.ContainsKey(treeEntry.Target.Sha))
            {
                if (!gitObjectDeleteList.ContainsKey(treeEntry.Target.Sha))
                {
                    gitObjectDeleteList.Add(treeEntry.Target.Sha, treeEntry.Target);
                }
            }
            if (treeEntry.TargetType == TreeEntryTargetType.Tree)
            {
                RecursivelyCheckTreeItems(treeEntry.Target as Tree, treeItems, gitObjectDeleteList);
            }
        }
    }
    private void RecursivelyAddTreeItems(Dictionary<String, TreeEntry> treeItems, Tree tree)
    {
        foreach (TreeEntry treeEntry in tree)
        {
            //check for existance because if a file is renamed it can exist under a tree multiple times with the same SHA
            if (!treeItems.ContainsKey(treeEntry.Target.Sha))
            {
                treeItems.Add(treeEntry.Target.Sha, treeEntry);
            }
            if (treeEntry.TargetType == TreeEntryTargetType.Tree)
            {
                RecursivelyAddTreeItems(treeItems, treeEntry.Target as Tree);
            }
        }
    }
    private void TruncateCommits(IRepository repository, String repositoryPath, string newInitialCommitSHA)
    {
        //get a repository object
        Dictionary<String, TreeEntry> treeItems = new Dictionary<string, TreeEntry>();
        Commit selectedCommit = null;
        Dictionary<String, GitObject> gitObjectDeleteList = new Dictionary<String, GitObject>();
        //loop thru the commits starting at the head moving towards the initial commit  
        foreach (Commit masterCommit in repository.Head.Commits)
        {
            //if non null then we have already found the commit where we want the truncation to occur
            if (selectedCommit != null)
            {
                //since this is a commit after the truncation point add it to our deletion list
                gitObjectDeleteList.Add(masterCommit.Sha, masterCommit);
                //check the blobs of this commit to see if they should be deleted
                RecursivelyCheckTreeItems(masterCommit.Tree, treeItems, gitObjectDeleteList);
            }
            else
            {
                //have we found the commit that we want to be the initial commit
                if (String.Equals(masterCommit.Sha, newInitialCommitSHA, StringComparison.CurrentCultureIgnoreCase))
                {
                    selectedCommit = masterCommit;
                }
                //this commit is before the new initial commit so record the tree entries that need to be kept.
                RecursivelyAddTreeItems(treeItems, masterCommit.Tree);                    
            }
        }

        //this function simply clears out the parents of the new initial commit
        Func<Commit, IEnumerable<Commit>> rewriter = (c) => { return new Commit[0]; };
        //perform the rewrite
        repository.Refs.RewriteHistory(new RewriteHistoryOptions() { CommitParentsRewriter = rewriter }, selectedCommit);

        //clean up references now in origional and remove the commits that they point to
        foreach (var reference in repository.Refs.FromGlob("refs/original/*"))
        {
            repository.Refs.Remove(reference);
            //skip branch reference on file deletion
            if (reference.CanonicalName.IndexOf("master", 0, StringComparison.CurrentCultureIgnoreCase) == -1)
            {
                //delete the Blob from the file system
                DeleteGitBlob(repositoryPath, reference.TargetIdentifier);
            }
        }
        //now remove any tags that reference commits that are going to be deleted in the next step
        foreach (var reference in repository.Refs.FromGlob("refs/tags/*"))
        {
            if (gitObjectDeleteList.ContainsKey(reference.TargetIdentifier))
            {
                repository.Refs.Remove(reference);
            }
        }
        //remove the commits from the GIT ObectDatabase
        foreach (KeyValuePair<String, GitObject> kvp in gitObjectDeleteList)
        {
            //delete the Blob from the file system
            DeleteGitBlob(repositoryPath, kvp.Value.Sha);
        }
    }

    private void DeleteGitBlob(String repositoryPath, String blobSHA)
    {
        String shaDirName = System.IO.Path.Combine(System.IO.Path.Combine(repositoryPath, ".git\\objects"), blobSHA.Substring(0, 2));
        String shaFileName = System.IO.Path.Combine(shaDirName, blobSHA.Substring(2));
        //if the directory exists
        if (System.IO.Directory.Exists(shaDirName))
        {
            //get the files in the directory
            String[] directoryFiles = System.IO.Directory.GetFiles(shaDirName);
            foreach (String directoryFile in directoryFiles)
            {
                //if we found the file to delete
                if (String.Equals(shaFileName, directoryFile, StringComparison.CurrentCultureIgnoreCase))
                {
                    //if readonly set the file to RW
                    FileInfo fi = new FileInfo(shaFileName);
                    if (fi.IsReadOnly)
                    {
                        fi.IsReadOnly = false;
                    }
                    //delete the file
                    File.Delete(shaFileName);
                    //eliminate the directory if only one file existed 
                    if (directoryFiles.Length == 1)
                    {
                        System.IO.Directory.Delete(shaDirName);
                    }
                }
            }
        }
    }
}

感谢您的所有帮助,我们非常感激。

请注意,我从原始代码中进行了编辑,因为它没有考虑到目录。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接