如果您想查看所有blob的大小,Vi的答案中的git-fatfiles脚本非常好用,但它太慢了,无法使用。我删除了40行输出限制,然后它试图使用我计算机的所有RAM而不是完成任务。此外,当对输出求和以查看文件使用的所有空间时,它会给出不准确的结果。
我用Rust重写了它,我发现这比其他语言更少出错。如果传递了--directories
标志,则我还添加了将各个目录中所有提交使用的空间总和起来的功能。可以提供路径以限制搜索某些文件或目录。
src/main.rs:
use std::{
collections::HashMap,
io::{self, BufRead, BufReader, Write},
path::{Path, PathBuf},
process::{Command, Stdio},
thread,
};
use bytesize::ByteSize;
use structopt::StructOpt;
#[derive(Debug, StructOpt)]
#[structopt()]
pub struct Opt {
#[structopt(
short,
long,
help("Show the size of directories based on files committed in them.")
)]
pub directories: bool,
#[structopt(help("Optional: only show the size info about certain paths."))]
pub paths: Vec<String>,
}
fn get_revs_for_paths(paths: Vec<String>) -> HashMap<String, PathBuf> {
let mut process = Command::new("git");
let mut process = process.arg("rev-list").arg("--all").arg("--objects");
if !paths.is_empty() {
process = process.arg("--").args(paths);
};
let output = process
.output()
.expect("Failed to execute command git rev-list.");
let mut id_map = HashMap::new();
for line in io::Cursor::new(output.stdout).lines() {
if let Some((k, v)) = line
.expect("Failed to get line from git command output.")
.split_once(' ')
{
id_map.insert(k.to_owned(), PathBuf::from(v));
}
}
id_map
}
fn get_sizes_of_objects(ids: Vec<&String>) -> HashMap<String, u64> {
let mut process = Command::new("git")
.arg("cat-file")
.arg("--batch-check=%(objectname) %(objecttype) %(objectsize:disk)")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.expect("Failed to execute command git cat-file.");
let mut stdin = process.stdin.expect("Could not open child stdin.");
let ids: Vec<String> = ids.into_iter().cloned().collect();
let write_thread = thread::spawn(|| {
for obj_id in ids {
writeln!(stdin, "{}", obj_id).expect("Could not write to child stdin");
}
drop(stdin);
});
let output = process
.stdout
.take()
.expect("Could not get output of command git cat-file.");
let mut id_map = HashMap::new();
for line in BufReader::new(output).lines() {
let line = line.expect("Failed to get line from git command output.");
let line_split: Vec<&str> = line.split(' ').collect();
if let [id, "blob", size] = &line_split[..] {
id_map.insert(
id.to_string(),
size.parse::<u64>().expect("Could not convert size to int."),
);
};
}
write_thread.join().unwrap();
id_map
}
fn main() {
let opt = Opt::from_args();
let revs = get_revs_for_paths(opt.paths);
let sizes = get_sizes_of_objects(revs.keys().collect());
let file_sizes: Vec<(&Path, u64)> = sizes
.iter()
.map(|(id, size)| (revs[id].as_path(), *size))
.collect();
let mut file_size_sums: HashMap<&Path, u64> = HashMap::new();
for (mut path, size) in file_sizes.into_iter() {
if opt.directories {
let parent = path.parent();
path = match parent {
Some(parent) => parent,
_ => {
eprint!("File has no parent directory: {}", path.display());
continue;
}
};
}
*(file_size_sums.entry(path).or_default()) += size;
}
let sizes: Vec<(&Path, u64)> = file_size_sums.into_iter().collect();
print_sizes(sizes);
}
fn print_sizes(mut sizes: Vec<(&Path, u64)>) {
sizes.sort_by_key(|(_path, size)| *size);
for file_size in sizes.iter() {
println!("{:10}{}", ByteSize(file_size.1), file_size.0.display())
}
}
Cargo.toml:
[package]
name = "git-fatfiles"
version = "0.1.0"
edition = "2018"
[dependencies]
structopt = { version = "0.3"}
bytesize = {version = "1"}
选项:
USAGE:
git-fatfiles [FLAGS] [paths]...
FLAGS:
-d, --directories Show the size of directories based on files committed in them.
-h, --help Prints help information
ARGS:
<paths>... Optional: only show the size info about certain paths.
git repack -a -d
将我的 956MB 仓库压缩至 250MB。非常成功!谢谢! - AlexGrafedu
可能会误导您认为超级模块很大,而实际上是一个子模块,下面的答案需要在子模块目录中运行。 - esmit