Python `requests.get` 请求返回整个响应的超时设置

Question

Python `requests.get` 请求返回整个响应的超时设置

331

我正在搜集一些网站的统计数据，并且出于简便考虑，我正在使用 requests 库。以下是我的代码：

data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
    r= requests.get(w, verify=False)
    data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )

现在，我想让 requests.get 在 10 秒钟后超时，以便循环不会卡住。

这个问题以前也引起过兴趣，但没有干净的答案。

我听说也许不使用 requests 是个好主意，但是那么我该如何获得 requests 提供的好东西（元组中的那些）呢？

- Kiarash

可能是如何使用Python请求执行有时间限制的响应下载？的重复问题。 - yprez

相关：使用urllib2或任何其他http库读取超时 - jfs

22个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Chris Johnson · Answer 1

这可能有些过度，但Celery分布式任务队列对超时有很好的支持。

特别是，您可以定义一个软限制时间，它只会在您的进程中引发异常（因此您可以进行清理），以及/或者一个硬限制时间，当超出时间限制时终止任务。

在幕后，这使用与您“之前”帖子中提到的相同信号方法，但以更可用和可管理的方式实现。如果您要监视的网站列表很长，则可能从其主要功能中受益-管理大量任务执行的各种方式。

- ub_marco · Answer 2

如果您使用了选项stream=True，您可以这样做：

r = requests.get(
    'http://url_to_large_file',
    timeout=1,  # relevant only for underlying socket
    stream=True)

with open('/tmp/out_file.txt'), 'wb') as f:
    start_time = time.time()
    for chunk in r.iter_content(chunk_size=1024):
        if chunk:  # filter out keep-alive new chunks
            f.write(chunk)
        if time.time() - start_time > 8:
            raise Exception('Request took longer than 8s')

该解决方案不需要信号或多进程。

- John Smith · Answer 3

尽管问题涉及请求，但我发现使用pycurl CURLOPT_TIMEOUT或CURLOPT_TIMEOUT_MS非常容易。

不需要线程或信号：

import pycurl
import StringIO

url = 'http://www.example.com/example.zip'
timeout_ms = 1000
raw = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.TIMEOUT_MS, timeout_ms)  # total timeout in milliseconds
c.setopt(pycurl.WRITEFUNCTION, raw.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, url)
c.setopt(pycurl.HTTPGET, 1)
try:
    c.perform()
except pycurl.error:
    traceback.print_exc() # error generated on timeout
    pass # or just pass if you don't want to print the error

- Denis Kuzin · Answer 4

这是一个从http://docs.python-requests.org/en/master/user/advanced/#streaming-uploads获取的另一种解决方案。

上传之前，您可以找出内容的大小：

TOO_LONG = 10*1024*1024  # 10 Mb
big_url = "http://ipv4.download.thinkbroadband.com/1GB.zip"
r = requests.get(big_url, stream=True)
print (r.headers['content-length'])
# 1073741824  

if int(r.headers['content-length']) < TOO_LONG:
    # upload content:
    content = r.content

但是要小心，发送者可能会在“content-length”响应字段中设置错误的值。

- Fayzan qureshi · Answer 5

timeout = (连接超时, 数据读取超时) 或者只提供单个参数(timeout=1)

，将Original Answer翻译成“最初的回答”。

import requests

try:
    req = requests.request('GET', 'https://www.google.com',timeout=(1,1))
    print(req)
except requests.ReadTimeout:
    print("READ TIME OUT")

- Christian Long · Answer 6

有一个名为 timeout-decorator的包，您可以使用它来设置任何Python函数的超时时间。

@timeout_decorator.timeout(5)
def mytest():
    print("Start")
    for i in range(1,10):
        time.sleep(1)
        print("{} seconds have passed".format(i))

它使用了一些答案中建议的信号方法。或者，您可以告诉它使用多进程而不是信号（例如，如果您处于多线程环境中）。

- ACEE · Answer 7

这段代码适用于socket错误11004和10060......

# -*- encoding:UTF-8 -*-
__author__ = 'ACE'
import requests
from PyQt4.QtCore import *
from PyQt4.QtGui import *


class TimeOutModel(QThread):
    Existed = pyqtSignal(bool)
    TimeOut = pyqtSignal()

    def __init__(self, fun, timeout=500, parent=None):
        """
        @param fun: function or lambda
        @param timeout: ms
        """
        super(TimeOutModel, self).__init__(parent)
        self.fun = fun

        self.timeer = QTimer(self)
        self.timeer.setInterval(timeout)
        self.timeer.timeout.connect(self.time_timeout)
        self.Existed.connect(self.timeer.stop)
        self.timeer.start()

        self.setTerminationEnabled(True)

    def time_timeout(self):
        self.timeer.stop()
        self.TimeOut.emit()
        self.quit()
        self.terminate()

    def run(self):
        self.fun()


bb = lambda: requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip")

a = QApplication([])

z = TimeOutModel(bb, 500)
print 'timeout'

a.exec_()

- DovaX · Answer 8

最大的问题是，如果连接无法建立，requests包会等待太长时间并阻塞程序的其余部分。

有几种方法可以解决这个问题，但当我寻找类似于requests的一行代码时，我找不到任何东西。这就是为什么我构建了一个名为reqto（“请求超时”）的requests包装器，它支持所有标准方法的适当超时。

pip install reqto

语法与requests相同

import reqto

response = reqto.get(f'https://pypi.org/pypi/reqto/json',timeout=1)
# Will raise an exception on Timeout
print(response)

此外，您可以设置自定义超时函数。

def custom_function(parameter):
    print(parameter)


response = reqto.get(f'https://pypi.org/pypi/reqto/json',timeout=5,timeout_function=custom_function,timeout_args="Timeout custom function called")
#Will call timeout_function instead of raising an exception on Timeout
print(response)

重要提示是导入行

import reqto

由于 monkey_patch 在后台运行，因此需要比使用 requests、线程等其他导入更早。

- technico · Answer 9

我在这个页面上尝试了许多解决方案，但仍然遇到不稳定、随机挂起、连接性能差等问题。

现在我正在使用Curl，并且非常满意它的“max time”功能以及全局性能，即使实现方式很差。

content=commands.getoutput('curl -m6 -Ss "http://mywebsite.xyz"')

这里，我定义了一个最大时间参数为6秒，包括连接和传输时间。

如果您更喜欢使用Python语法，我相信Curl有一个不错的Python绑定库 :)

- Dima Tisnek · Answer 10

如果需要的话，可以创建一个“看门狗”线程，在10秒后破坏请求的内部状态，例如：

关闭底层套接字，并最好
如果请求重试操作，则触发异常

请注意，根据系统库的不同，您可能无法设置DNS解析的截止时间。