从Github拉取所有Gists？

Question

从Github拉取所有Gists？

27

有没有API调用或任何脚本可以将我在Github上的所有Gist拉到外部git存储库，或者只返回它们的名称列表？我知道每一个都是单独的git存储库，所以我认为我能做的最好的事情就是获取后者，然后编写脚本将它们全部放到我的本地盒子上。

编辑1：我知道从一个服务中拉取和推送git存储库，我特别寻找了解如何收集我拥有的所有Gists，包括私人和公共的权威列表的人。我也觉得这可能对其他人有用。这不仅仅是关于迁移，而是一种备份策略......某种程度上。

编辑2：似乎这可能是不可能的。我显然没有努力搜索更新的Github/Gist API。其他API调用可以使用简单的curl命令，但是对于Gist的v1 API则不行。尽管API表示待定所有私人和公共的Gists，但我认为除非有开明的灵魂帮助我，否则整个事情都无望。

$ curl http://github.com/api/v2/json/repos/show/alharaka
{"repositories":[{"url":"https://github.com/alharaka/babushka","has_wiki":true,"homepage":"http:
... # tons of more output
echo $?
0
$

这个有些不太好用。

$ curl https://gist.github.com/api/v1/:format/gists/:alharaka
$ echo $?
0
$

编辑3： 在我被问之前，我注意到API版本有所不同；这个“华丽的黑客”也没有帮助。但还是非常酷。

$ curl https://gist.github.com/api/v2/:format/gists/:alharaka # Notice v2 instead of v1
$ echo $?
0
$

- alharaka

我认为这个问题需要更清晰的描述你想要做什么。 - Soren

我已经预料到了。请参见上文。如果还不清楚，我不知道还有什么其他方法可以让它更清晰：API允许我使用JSON（无论是否经过身份验证）来获取关于存储库的数据或写入它们。相当酷，我必须说。但是，功能还没有全部实现。 - songei2f

https://gist.github.com/1622504 - endolith

12个回答

15

有一个API v3的改编版，它基于nicerobot的脚本，该脚本最初是为API v1编写的：

#!/usr/bin/env python
# Clone or update all a user's gists
# curl -ks https://raw.github.com/gist/5466075/gist-backup.py | USER=fedir python
# USER=fedir python gist-backup.py

import json
import urllib
from subprocess import call
from urllib import urlopen
import os
import math
USER = os.environ['USER']

perpage=30.0
userurl = urlopen('https://api.github.com/users/' + USER)
public_gists = json.load(userurl)
gistcount = public_gists['public_gists']
print "Found gists : " + str(gistcount)
pages = int(math.ceil(float(gistcount)/perpage))
print "Found pages : " + str(pages)

f=open('./contents.txt', 'w+')

for page in range(pages):
    pageNumber = str(page + 1)
    print "Processing page number " + pageNumber
    pageUrl = 'https://api.github.com/users/' + USER  + '/gists?page=' + pageNumber + '&per_page=' + str(int(perpage))
    u = urlopen (pageUrl)
    gists = json.load(u)
    startd = os.getcwd()
    for gist in gists:
        gistd = gist['id']
        gistUrl = 'git://gist.github.com/' + gistd + '.git' 
        if os.path.isdir(gistd):
            os.chdir(gistd)
            call(['git', 'pull', gistUrl])
            os.chdir(startd)
        else:
            call(['git', 'clone', gistUrl])
        if gist['description'] == None:
            description = ''
        else:
            description = gist['description'].encode('utf8').replace("\r",' ').replace("\n",' ')
        print >> f, gist['id'], gistUrl, description

- Fedir RYKHTIK

curl语句现在似乎是：curl -ks https://gist.githubusercontent.com/fedir/5466075/raw/gist-backup.py | USER=fedir python - philshem

@philshem 谢谢，没错，Github已经更改了URL，我已经更新了答案。 - Fedir RYKHTIK

5

@Fedir的脚本的一个版本，考虑了Github的分页（如果您有几百个Gist）：

#!/usr/bin/env python
# Clone or update all a user's gists
# curl -ks https://raw.github.com/gist/5466075/gist-backup.py | USER=fedir python
# USER=fedir python gist-backup.py

import json
import urllib
from subprocess import call
from urllib import urlopen
import os
import math
USER = os.environ['USER']

perpage=30.0
userurl = urlopen('https://api.github.com/users/' + USER)
public_gists = json.load(userurl)
gistcount = public_gists['public_gists']
print "Found gists : " + str(gistcount)
pages = int(math.ceil(float(gistcount)/perpage))
print "Found pages : " + str(pages)

f=open('./contents.txt', 'w+')

for page in range(pages):
    pageNumber = str(page + 1)
    print "Processing page number " + pageNumber
    pageUrl = 'https://api.github.com/users/' + USER  + '/gists?page=' + pageNumber + '&per_page=' + str(int(perpage))
    u = urlopen (pageUrl)
    gists = json.load(u)
    startd = os.getcwd()
    for gist in gists:
        gistd = gist['id']
        gistUrl = 'git://gist.github.com/' + gistd + '.git' 
        if os.path.isdir(gistd):
            os.chdir(gistd)
            call(['git', 'pull', gistUrl])
            os.chdir(startd)
        else:
            call(['git', 'clone', gistUrl])

- saranicole

很好的改进。看起来应该从1开始进行页面迭代，而不是0，否则您将获取不到最后一页。 - Fedir RYKHTIK

好的脚本，易于采用，可以拉取私有Gist。 - bryan_basho

我该如何使用这个脚本来拉取我的私密Gist？ - yeedle

@bryan_basho，“易于适应”是指如何？ - brasofilo

5

根据这个答案提供的提示，我编写了这个简单的Python脚本，它对我来说非常好用。

这是非常简洁的代码，几乎没有任何错误检查，并且将用户的所有gist克隆到当前目录中。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Clone all gists of GitHub username given on the command line."""

import subprocess
import sys
import requests

if len(sys.argv) > 1:
    gh_user = sys.argv[1]
else:
    print("Usage: clone-gists.py <GitHub username>")
    sys.exit(1)

req = requests.get('https://api.github.com/users/%s/gists' % gh_user)

for gist in req.json():
    ret = subprocess.call(['git', 'clone', gist['git_pull_url']])
    if ret != 0:
        print("ERROR cloning gist %s. Please check output." % gist['id'])

请访问https://gist.github.com/SpotlightKid/042491a9a2987af04a5a，查看一个能够处理更新的版本。

- Chris Arndt

4

您可以使用GitHub CLI和一些Bash脚本来完成此操作。目标是下载每个Gist并将其保存在具有可读名称的单独目录中。

安装GitHub CLI：sudo apt install gh或brew install gh，然后使用gh auth login登录
定义一个将字符串转换为slug格式的函数。在为Gists创建文件夹时非常有用。请参见下面的slugify()函数。例如：slugify "hello world"变成hello-world
循环遍历所有Gists，并将其克隆到具有可读名称的单独文件夹中。

# function that creates a slug from a text

slugify(){ echo "$1" | iconv -t ascii//TRANSLIT | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z; }

# initializes a counter and lists every gist in reverse order
# then clones all of them in a directory named: COUNTER-gist-description

cnt=0; gh gist list --limit 1000 | cut -f1,2 | tac | while read id name; do ((cnt++)); gh gist clone $id $cnt-`slugify "$name"`; done

结果：

1-my-first-gist/
2-my-second-gist/
3-my-third-gist/
...

- hlorand

3

除了Thomas Traum之前的几个回答，现在似乎必须使用用户代理：http://developer.github.com/v3/#user-agent-required。

所以我自己进行了练习：https://github.com/sanusart/gists-backup。它知道分页、重复描述和缺少描述。

- sanusart

是的。稍微更新了一下。 - sanusart

2

我写了一个快速的node.js脚本作为练习，它可以下载所有的代码片段，并将它们保存在与“gist描述”名称相匹配的文件夹中，使用与原始代码片段相同的文件名。 https://gist.github.com/thomastraum/5227541

var request = require('request')
    , path = require('path')
    , fs = require('fs')
    , url = "https://api.github.com/users/thomastraum/gists"
    , savepath = './gists';

request(url, function (error, response, body) {

    if (!error && response.statusCode == 200) {

        gists = JSON.parse( body );
        gists.forEach( function(gist) {

            console.log( "description: ", gist.description );
            var dir = savepath + '/' + gist.description;

            fs.mkdir( dir, function(err){
                for(var file in gist.files){

                    var raw_url = gist.files[file].raw_url;
                    var filename = gist.files[file].filename;

                    console.log( "downloading... " + filename );
                    request(raw_url).pipe(fs.createWriteStream( dir + '/' + filename ));
                }
            });
        });

    }

});

- Thomas Traum

1

这个 Ruby Gem 看起来可以帮助你的问题。我还没有尝试过，但它看起来很有前途。

首先

gem install gisty

你需要放置

export GISTY_DIR="$HOME/dev/gists"

在你的 .bashrc 或 .zshrc 文件中，这个目录是你的代码片段保存位置。

你需要

git config --global github.user your_id
git config --global github.token your_token

在你的 .gitconfig 文件中添加上述配置

使用方法

gisty post file1 file2 ...

将 file1 和 file2 发布到你的 gist 上
gisty private_post file1 file2 ...

私密地发布 file1 和 file2
gisty sync

同步到你所有的 gist 上
gisty pull_all

从远程仓库拉取到本地仓库
gisty list

列出已克隆的本地 gist 仓库

- studiomohawk

现在它使用OAuth。截至本文撰写时，与Gist API的v3版本配合使用效果非常好！ - Mark Jaquith

0

如果你只需要下载特定用户上传的所有代码片段，那么这个简单的Python脚本可以帮助你实现。

特定用户的代码片段信息可以通过API公开获取。

"https://api.github.com/users/" + username + "/gists"

您可以简单地循环遍历API公开的JSON，获取Gist列表，执行克隆操作，或者使用指定的原始URL下载Gist。下面的简单脚本循环遍历JSON，提取文件名和原始URL，并下载所有Gist并将其保存在本地文件夹中。

import requests

# Replace username with correct username
url = "https://api.github.com/users/" + username + "/gists"

resp = requests.get(url)
gists = resp.json()

for gist in gists:
    for file in gist["files"]:
        fname = gist["files"][file]["filename"]
        furl = gist["files"][file]["raw_url"]
        print("{}:{}".format(fname, furl)) # This lists out all gists

        Use this to download all gists
        pyresp = requests.get(furl)

        with open("../folder/" + fname, "wb") as pyfile:
            for chunk in pyresp.iter_content(chunk_size=1024):
                if chunk:
                    pyfile.write(chunk)
        print("{} downloaded successfully".format(fname))

- HVS

0

2021年3月更新（Python3）

如果用户有大量具有相同文件名的代码片段，这个程序非常适用。

import requests, json, time, uuid
headers = {"content-type" : "application/json"}
url =  'https://api.github.com/users/ChangeToYourTargetUser/gists?per_page=100&page='

for page in range(1,100):  #do pages start at 1 or 0?
    print('page: ' + str(page))
    r = requests.get(url+str(page), headers = headers)
    metadata_file = './data/my_gist_list.json'
    # Getting metadata
    prettyJson = json.dumps(r.json(), indent=4, sort_keys=True)
    f = open(metadata_file, 'w')
    f.write(prettyJson)

    print('Metadata obtained as {}'.format(metadata_file))

    # Downloading files
    data = r.json()
    counter = 0
    for i in data:
        time.sleep(1.1)
        files_node = i['files']
        file_name = [k for k in files_node][0]
        r = requests.get(files_node[file_name]['raw_url'])
        f = open('./data/{}'.format(str(uuid.uuid4())), 'w')
        f.write(r.text)
        f.close()
        print('Download' + str(i))
        counter += 1

    print('{} files successfully downloaded.'.format(counter))

- juanMSFT

那是Python吗？ - Jens Baitinger

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Koraktor · Accepted Answer

GitHub API的第三个版本可以以相当简单的方式实现此功能：

https://api.github.com/users/koraktor/gists

这将为你提供该用户的所有Gists列表，该列表提供了各种网址，包括到单个Gists的API网址，例如：

https://api.github.com/gists/921286

查看Gists API v3文档。