如何高效地确定GitHub存储库中的初始提交时间?存储库具有“created_at”属性,但对于包含导入历史记录的存储库,最早的提交可能要早得多。
当使用命令行时,可以使用以下类似的命令:
git rev-list --max-parents=0 HEAD
然而,我在GitHub API中没有看到相应的等效命令。
如何高效地确定GitHub存储库中的初始提交时间?存储库具有“created_at”属性,但对于包含导入历史记录的存储库,最早的提交可能要早得多。
当使用命令行时,可以使用以下类似的命令:
git rev-list --max-parents=0 HEAD
然而,我在GitHub API中没有看到相应的等效命令。
totalCount
和endCursor
:{
repository(name: "linux", owner: "torvalds") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1) {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}
pageInfo
对象返回类似于这样的内容:"totalCount": 931886,
"pageInfo": {
"endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 0"
}
我没有关于游标字符串格式 b961f8dc8976c091180839f4483d67b7c2ca2578 0
的任何来源,但我已经测试了一些拥有超过1000次提交的其他仓库,似乎它的格式总是这样的:
<static hash> <incremented_number>
所以,如果totalCount
大于1,您只需从totalCount
中减去2,并获取最旧的提交(或首次提交,如果您喜欢):
{
repository(name: "linux", owner: "torvalds") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1, after: "b961f8dc8976c091180839f4483d67b7c2ca2578 931884") {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}
它生成以下输出(由Linus Torvalds进行初始提交):
{
"data": {
"repository": {
"ref": {
"target": {
"history": {
"nodes": [
{
"message": "Linux-2.6.12-rc2\n\nInitial git repository build. I'm not bothering with the full history,\neven though we have it. We can create a separate \"historical\" git\narchive of that later if we want to, and in the meantime it's about\n3.2GB when imported into git - space that would just make the early\ngit days unnecessarily complicated, when we don't have a lot of good\ninfrastructure for it.\n\nLet it rip!",
"committedDate": "2005-04-16T22:20:36Z",
"authoredDate": "2005-04-16T22:20:36Z",
"oid": "1da177e4c3f41524e886b7f1b8a0c1fc7321cac2",
"author": {
"email": "torvalds@ppc970.osdl.org",
"name": "Linus Torvalds"
}
}
],
"totalCount": 931886,
"pageInfo": {
"endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 931885"
}
}
}
}
}
}
}
使用Python的简单实现来获取第一个提交,方法如下:
import requests
token = "YOUR_TOKEN"
name = "linux"
owner = "torvalds"
branch = "master"
query = """
query ($name: String!, $owner: String!, $branch: String!){
repository(name: $name, owner: $owner) {
ref(qualifiedName: $branch) {
target {
... on Commit {
history(first: 1, after: %s) {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}
"""
def getHistory(cursor):
r = requests.post("https://api.github.com/graphql",
headers = {
"Authorization": f"Bearer {token}"
},
json = {
"query": query % cursor,
"variables": {
"name": name,
"owner": owner,
"branch": branch
}
})
return r.json()["data"]["repository"]["ref"]["target"]["history"]
#in the first request, cursor is null
history = getHistory("null")
totalCount = history["totalCount"]
if (totalCount > 1):
cursor = history["pageInfo"]["endCursor"].split(" ")
cursor[1] = str(totalCount - 2)
history = getHistory(f"\"{' '.join(cursor)}\"")
print(history["nodes"][0])
else:
print("got oldest commit (initial commit)")
print(history["nodes"][0])
你可以在javascript中找到一个例子,该例子链接在此帖子中。
path
参数(在history(...)
解析器中)用于获取给定子目录或文件的第一次提交。https://docs.github.com/en/rest/commits/commits#list-commits--parameters。 - Alex Rintt如果数据已被缓存(在GitHub的一侧),并且根据您的精度要求,这可以仅使用两个请求完成。
首先,通过使用until
参数将GET
请求发送到/repos/:owner/:repo/commits
来检查创建时间之前是否有提交记录(如VonC的答案所建议的),并通过per_page
参数限制返回的提交记录数为1。
如果创建时间之前有提交记录,则可以调用贡献者统计终点(/repos/:owner/:repo/stats/contributors
)。响应每位贡献者都有一个weeks
列表,并且最古老的w
值与最古老的提交记录发生在同一周。
如果需要精确的时间戳,则可以再次使用提交记录终点,并将until
和since
设置为最古老的周值后7天。
请注意,统计终点可能会返回202
,表示统计信息不可用,此时需要在几秒钟后重试。
建议是列出一个仓库的提交记录(详见GitHub api V3部分),使用until
参数,设置为创建该仓库的时间(再加上一天,例如)。
GET /repos/:owner/:repo/commits
这样,您将列出在创建存储库时或之前创建的所有提交:这将限制列表,排除存储库创建后创建的所有提交。
githop
改名为 retrogit
了? https://github.com/mihaip/retrogit/commit/6cc162c91b25bd26da379da2c1656fff6c199a1a - testworks REPO="owner/repo"
URL="https://api.github.com/repos/$REPO/commits"
H=" -H \"Accept: application/vnd.github+json\" \
-H \"X-GitHub-Api-Version: 2022-11-28\""
response=$(curl -s -L --include $H $URL | awk 'NR > 1')
# Split the output into header and json
header=$(echo "$response" | awk 'BEGIN{RS="\r\n";ORS="\r\n"} /^[a-zA-Z0-9-]+:/')
commits=$(echo "$response" | awk '!/^[a-zA-Z0-9-]+:/')
# If paginated, get last page
if [[ $header == *"link"* ]]; then
# Extract the last page value
link_line=$(echo "$header" | grep -i "^link:")
last_page=$(echo "$link_line" | sed -n 's/.*page=\([0-9]\+\)[^0-9].*rel="last".*/\1/p')
# Get last-page commits
commits=$(curl -s -L $H $URL?page=$last_page)
fi
# Print first commit
echo $commits | jq '.[-1].commit'
https://github.com/USER/REPO/commits?after=LAST_COMMIT_SHA+COMMIT_COUNT_MINUS_2
# Example. Commit count in this case was 1573
https://github.com/sindresorhus/refined-github/commits/master
?after=a76ed868a84cd0078d8423999faaba7380b0df1b+1571
在页面编号上进行试错。
https://github.com/fatfreecrm/fat_free_crm/commits/master?page=126
Git历史记录,例如使用gitk,可以帮助您更有效地进行试错。