如何在Python中使用`tqdm`展示在线下载数据的进度?

6
我可以找到一些文档,解释如何使用tqdm包,但我无法从中弄清楚如何在在线下载数据时生成进度条。
下面是我从ResidentMario复制的一个下载数据的示例代码。
def download_file(url, filename):
    """
    Helper method handling downloading large files from `url` to `filename`. Returns a pointer to `filename`.
    """
    r = requests.get(url, stream=True)
    with open(filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
    return filename


dat = download_file("https://data.cityofnewyork.us/api/views/h9gi-nx95/rows.csv?accessType=DOWNLOAD",
                    "NYPD Motor Vehicle Collisions.csv")

可以有人给我展示如何在这里使用tqdm包来显示下载进度吗?

谢谢

3个回答

13

目前我会做这样的事情:

def download_file(url, filename):
    """
    Helper method handling downloading large files from `url` to `filename`. Returns a pointer to `filename`.
    """
    chunkSize = 1024
    r = requests.get(url, stream=True)
    with open(filename, 'wb') as f:
        pbar = tqdm( unit="B", total=int( r.headers['Content-Length'] ) )
        for chunk in r.iter_content(chunk_size=chunkSize): 
            if chunk: # filter out keep-alive new chunks
                pbar.update (len(chunk))
                f.write(chunk)
    return filename

感谢 @silmaril... 另外还根据Shenez Chen的回答调整了unit_scaleunit_divisor,使输出更易读。干杯 :-) - undefined

3

pbar.clear()和pbar.close()

手动更新进度条,适用于读取文件等流式操作。 https://github.com/tqdm/tqdm#returns

def download_file(url, filename):
"""
Helper method handling downloading large files from `url` to `filename`. Returns a pointer to `filename`.
"""
    r = requests.get(url, stream=True)

    with open(filename, 'wb') as f:
        pbar = tqdm(unit="B", unit_scale=True, unit_divisor=1024, total=int( r.headers['Content-Length'] ))
        pbar.clear()  #  clear 0% info
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                pbar.update(len(chunk))
                f.write(chunk)
        pbar.close()
    return filename

1
你的回答可以通过提供更多支持信息来改进。请编辑以添加进一步的细节,例如引用或文档,以便他人可以确认你的答案是正确的。您可以在帮助中心中找到有关如何编写良好答案的更多信息。 - Community

-2
感谢silmaril,但以下代码对我更有意义。
def download_file(url, filename):
    r = requests.get(url, stream=True)
    filelength = int(r.headers['Content-Length'])

    with open(filename, 'wb') as f:
        pbar = tqdm(total=int(filelength/1024))
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:                   # filter out keep-alive new chunks
                pbar.update ()
                f.write(chunk)

基本上,您需要执行两个HTTP请求才能下载单个文件。如果目标URL经过动态处理,则效率不高。 - silmaril

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接