使用Python通过wget下载文件

Question

使用Python通过wget下载文件

pythonlinux

32

我该如何使用Python和wget下载文件（视频）并将其保存到本地？由于有许多文件，因此我如何知道一个文件已经下载完毕，以便自动开始下载另一个文件？

谢谢。

- CoreIs

3

你如何做到呢？首先搜索所有与你的问题类似的之前提出的问题：http://stackoverflow.com/questions/tagged/wget+python。其次，阅读这个特定的问题：https://dev59.com/XnRC5IYBdhLWcg3wD87r - S.Lott

6个回答

21

不要这样做，请使用urllib2或者urlgrabber代替。

- Ignacio Vazquez-Abrams

18

为什么不应该使用 wget？这个答案需要进一步扩展解释。 - muhuk

11

这是因为它开启了一个全新的流程来完成Python本身就能够胜任的任务。 - Ignacio Vazquez-Abrams

6

因为它削弱了可移植性。 - Liz Av

9

你想知道使用这些库之一写出 wget -rl1 -I /stuff/i/want/ http://url/<incrementing number> 是否不容易？ - Seppo Erviälä

7

wget可以通过VPN客户端进行操作，而urllib对于https会给我返回这个错误：urlopen error Tunnel connection failed: 407 Proxy Authentication Required。 - tommy.carstensen

15

如果您使用os.system()来生成一个wget进程，它会阻塞直到wget完成下载（或以错误退出）。因此，只需在循环中调用os.system('wget blah')，直到您下载完所有文件。

或者，您可以使用urllib2或httplib。您需要编写一定量的代码，但是由于可以重复使用单个HTTP连接来下载多个文件，而不是为每个文件打开新连接，所以性能更好。

- Adam Rosenfield

“os.system()不推荐使用，我们应该使用subprocess作为替代方案，是这样吗？” - alper

9

没有理由使用os.system。避免用Python编写shell脚本，可以使用urllib.urlretrieve或类似的工具。

编辑...回答你问题的第二部分，你可以使用标准库队列类设置线程池。由于您要下载很多内容，所以GIL不应该是一个问题。生成您想要下载的URL列表，并将其提供给您的工作队列，它将处理向工作线程推送请求。

我正在等待数据库更新完成，请稍等。


#!/usr/bin/python
import sys
import threading
import urllib
from Queue import Queue
import logging
class Downloader(threading.Thread):
    def __init__(self, queue):
        super(Downloader, self).__init__()
        self.queue = queue

    def run(self):
        while True:
            download_url, save_as = queue.get()
            # sentinal
            if not download_url:
                return
            try:
                urllib.urlretrieve(download_url, filename=save_as)
            except Exception, e:
                logging.warn("error downloading %s: %s" % (download_url, e))
if __name__ == '__main__':
    queue = Queue()
    threads = []
    for i in xrange(5):
        threads.append(Downloader(queue))
        threads[-1].start()
for line in sys.stdin:
        url = line.strip()
        filename = url.split('/')[-1]
        print "Download %s as %s" % (url, filename)
        queue.put((url, filename))
# if we get here, stdin has gotten the ^D
    print "Finishing current downloads"
    for i in xrange(5):
        queue.put((None, None))

- McJeff

1

在 download_url, save_as = queue.get() 中有一个错误。应该是 download_url, save_as = self.queue.get()。 - disfated

1

通过pypi安装wget http://pypi.python.org/pypi/wget/0.3

pip install wget

然后运行，就像文档中所述

python -m wget <url>

- BozoJoe

19

对于其他觉得困惑的人，链接库并没有使用wget，而是用urllib。目前它不支持类似于wget（http://www.gnu.org/software/wget/）的任何功能。 - Rob Russell

-7

没有必要使用Python。避免用Python编写shell脚本，可以选择像Bash或等效的工具。

- davr

3

用Python编写shell脚本是可以的。如果你想快速完成任务但不喜欢bash的语法，那就用Python吧。如果你要做一个更大的项目，那么尽量避免使用外部调用。 - Jabba

5

Python是一种优秀的脚本语言。 - Mark Lakata

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mark Lakata · Accepted Answer

简短回答。获取一个文件。

 import urllib.request
 urllib.request.urlretrieve("http://google.com/index.html", filename="local/index.html")

如果需要的话，你可以想办法循环执行那个操作。