是否有一种方法可以通过GitHub API访问GitHub个人资料页面上“贡献的存储库”模块中的数据?理想情况下是整个列表,而不仅仅是似乎只能在Web上获取的前五个。
是否有一种方法可以通过GitHub API访问GitHub个人资料页面上“贡献的存储库”模块中的数据?理想情况下是整个列表,而不仅仅是似乎只能在Web上获取的前五个。
使用GraphQL API v4,现在您可以通过以下方式获取这些贡献的存储库:
{
viewer {
repositoriesContributedTo(first: 100, contributionTypes: [COMMIT, ISSUE, PULL_REQUEST, REPOSITORY]) {
totalCount
nodes {
nameWithOwner
}
pageInfo {
endCursor
hasNextPage
}
}
}
}
如果你贡献的仓库超过100个(包括你自己仓库),你需要在下一次请求中指定after: "END_CURSOR_VALUE"
在repositoriesContributedTo
中进行分页。
includeUserRepositories:true
ه°†è‡ھه·±çڑ„é،¹ç›®هŒ…هگ«è؟›هژ»م€‚ - Joachim Breitner使用Google BigQuery和GitHub Archive,我使用以下命令拉取了我提交拉取请求的所有存储库:
SELECT repository_url
FROM [githubarchive:github.timeline]
WHERE payload_pull_request_user_login ='rgbkrk'
GROUP BY repository_url;
你可以使用类似的语义来提取你参与过的仓库数量以及它们所用的编程语言:SELECT COUNT(DISTINCT repository_url) AS count_repositories_contributed_to,
COUNT(DISTINCT repository_language) AS count_languages_in
FROM [githubarchive:github.timeline]
WHERE payload_pull_request_user_login ='rgbkrk';
如果您正在寻找总体贡献,其中包括报告的问题,请使用
SELECT COUNT(DISTINCT repository_url) AS count_repositories_contributed_to,
COUNT(DISTINCT repository_language) AS count_languages_in
FROM [githubarchive:github.timeline]
WHERE actor_attributes_login = 'rgbkrk'
GROUP BY repository_url;
这里的区别在于 actor_attributes_login
来自Issue Events API。
您可能还想捕获自己的存储库,这些存储库可能没有由您自己提交的问题或PR。
githubarchive:github.timeline
表已被弃用。 - sulaiman sudirmangithubarchive:year.2017
),因此当前查询将如下所示:SELECT repo.name FROM [githubarchive:year.2017] WHERE actor.login ='rgbkrk' GROUP BY repo.name;
- gene_woodSELECT repo.name FROM \
githubarchive.year.2019` WHERE actor.login ='rgbkrk' GROUP BY repo.name;。顺便说一句:可以使用
*`代替年份。 - PF4Public我之前尝试为Github摘要器实现了类似的东西... 获取用户贡献过但不拥有的存储库的步骤如下(以我的用户为例):
https://api.github.com/search/issues?q=type:pr+state:closed+author:megawac&per_page=100&page=1
https://api.github.com/repos/jashkenas/underscore/contributors
repos/:owner/:repo/contributors
https://api.github.com/users/megawac/subscriptions
https://api.github.com/users/megawac/orgs
https://api.github.com/orgs/jsdelivr/repos
这将错过那些用户没有提交拉取请求但已被添加为贡献者的仓库。我们可以通过搜索以下内容来增加找到这些仓库的概率:
1)任何问题都可以打开(不只是关闭的拉取请求)
2)用户已关注的存储库
显然,这需要比我们想要的更多的请求,但当他们让你虚假功能时,你能做什么\o/
您可以通过GitHub的GraphQL API获取最近一年左右的信息,例如Bertrand Martel的答案所示。
如果需要追溯至2011年以前的所有信息,则可以在GitHub Archive中找到,如Kyle Kelley的答案中所述。 然而,BigQuery的语法和GitHub的API似乎已经发生了变化,那里显示的例子在08/2020时已不再适用。
以下是我发现自己贡献过的所有库的方法:
SELECT distinct repo.name
FROM (
SELECT * FROM `githubarchive.year.2011` UNION ALL
SELECT * FROM `githubarchive.year.2012` UNION ALL
SELECT * FROM `githubarchive.year.2013` UNION ALL
SELECT * FROM `githubarchive.year.2014` UNION ALL
SELECT * FROM `githubarchive.year.2015` UNION ALL
SELECT * FROM `githubarchive.year.2016` UNION ALL
SELECT * FROM `githubarchive.year.2017` UNION ALL
SELECT * FROM `githubarchive.year.2018`
)
WHERE (type = 'PushEvent'
OR type = 'PullRequestEvent')
AND actor.login = 'YOUR_USER'
有些仓库返回的结果只有名称,没有用户或组织信息。但是我仍然需要手动处理结果。
https://api.github.com/search/repositories?q=%20+fork:true+user:username
将fork参数设置为true可以确保您查询所有用户的存储库,包括被fork的。
然而,如果您想确保用户不仅仅是fork了存储库,还对其进行了贡献,那么您应该遍历通过“search”请求获取的每个存储库,并检查用户是否在其中。这相当麻烦,因为github只返回100个贡献者,并且没有解决方案...
我来到了这个问题。(GithubAPI:获取用户曾经提交过的存储库)
我发现一个实际的技巧是有一个名为http://www.githubarchive.org/ 的项目。他们记录从2011年开始的所有公共事件。虽然不是理想的,但可能很有帮助。
因此,例如在您的情况下:
SELECT payload_pull_request_head_repo_clone_url
FROM [githubarchive:github.timeline]
WHERE payload_pull_request_base_user_login='outoftime'
GROUP BY payload_pull_request_head_repo_clone_url;
https://github.com/jreidthompson/noaa.git
https://github.com/kkrol89/sunspot.git
https://github.com/rterbush/sunspot.git
https://github.com/ottbot/cassandra-cql.git
https://github.com/insoul/cequel.git
https://github.com/mcordell/noaa.git
https://github.com/hackhands/sunspot_rails.git
https://github.com/lgierth/eager_record.git
https://github.com/jnicklas/sunspot.git
https://github.com/klclee/sunspot.git
https://github.com/outoftime/cequel.git
您可以在bigquery.cloud.google.com上进行实验,数据模式可以在此处找到:https://github.com/igrigorik/githubarchive.org/blob/master/bigquery/schema.js
"""
Get all your repos contributed to for the past year.
This uses Selenium and Chrome to login to github as your user, go through
your contributions page, and grab the repo from each day's contribution page.
Requires python3, selenium, and Chrome with chromedriver installed.
Change the username variable, and run like this:
GITHUB_PASS="mypassword" python3 github_contributions.py
"""
import os
import sys
import time
from pprint import pprint as pp
from urllib.parse import urlsplit
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
username = 'jessejoe'
password = os.environ['GITHUB_PASS']
repos = []
driver = webdriver.Chrome()
driver.get('https://github.com/login')
driver.find_element_by_id('login_field').send_keys(username)
password_elem = driver.find_element_by_id('password')
password_elem.send_keys(password)
password_elem.submit()
# Wait indefinitely for 2-factor code
if 'two-factor' in driver.current_url:
print('2-factor code required, go enter it')
while 'two-factor' in driver.current_url:
time.sleep(1)
driver.get('https://github.com/{}'.format(username))
# Get all days that aren't colored gray (no contributions)
contrib_days = driver.find_elements_by_xpath(
"//*[@class='day' and @fill!='#eeeeee']")
for day in contrib_days:
day.click()
# Wait until done loading
WebDriverWait(driver, 10).until(
lambda driver: 'loading' not in driver.find_element_by_css_selector('.contribution-activity').get_attribute('class'))
# Get all contribution URLs
contribs = driver.find_elements_by_css_selector('.contribution-activity a')
for contrib in contribs:
url = contrib.get_attribute('href')
# Only care about repo owner and name from URL
repo_path = urlsplit(url).path
repo = '/'.join(repo_path.split('/')[0:3])
if repo not in repos:
repos.append(repo)
# Have to click something else to remove pop-up on current day
driver.find_element_by_css_selector('.vcard-fullname').click()
driver.quit()
pp(repos)
它使用Python和Selenium自动化Chrome浏览器登录Github,进入你的贡献页面,点击每一天并从任何贡献中获取repo名称。由于此页面仅显示1年的活动,因此此脚本只能获取到这些信息。
gh
jq
bash
${GH_USER}
环境变量#!/bin/bash
ghjq() { # <endpoint> <filter>
# filter all pages of authenticated requests to https://api.github.com
gh api --paginate "$1" | jq -r "$2"
}
repos="$(
ghjq users/$GH_USER/repos .[].full_name
ghjq "search/issues?q=is:pr+author:$GH_USER+is:merged" \
'.items[].repository_url | sub(".*github.com/repos/"; "")'
ghjq users/$GH_USER/subscriptions .[].full_name
for org in "$(ghjq users/$GH_USER/orgs .[].login)"; do
ghjq orgs/$org/repos .[].full_name
done
)"
repos="$(echo "$repos" | sort -u)"
# print repo if user is a contributor
for repo in $repos; do
if [[ $(ghjq repos/$repo/contributors "[.[].login | test(\"$GH_USER\")] | any") == "true" ]]; then
echo $repo
fi
done
https://developer.github.com/v3/activity/events/#list-public-events-performed-by-a-user
我们需要请求Github在他们的API中实现这个功能。