Python中的requests.get出现连接超时错误

4

语言版本:Python 3.6.3
IDE 版本:PyCharm 2017.2.3

我正在尝试解析一个天气网站以打印某个地方的天气。由于我正在学习Python,之前我使用了 urllib.request.urlopen(url).read() 并且它可以工作。现在,我正在修改代码以使用 BeautifulSoup4requests 模块。以下是我的代码:

from bs4 import *
import requests
url = "https://www.accuweather.com/en/in/dhenkanal/189844/weather-forecast/189844"
data = requests.get(url)
soup = BeautifulSoup(data.text, "html.parser")
print(soup.find('div', {'class': 'info'}))

但是每次我尝试运行代码时,都会出现以下错误:
Traceback (most recent call last):
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
chunked=chunked)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\urllib3\connectionpool.py", line 387, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\urllib3\connectionpool.py", line 383, in _make_request
httplib_response = conn.getresponse()
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1331, in getresponse
response.begin()
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 297, in begin
version, status, reason = self._read_status()
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 258, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\ssl.py", line 1009, in recv_into
return self.read(nbytes, buffer)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\ssl.py", line 871, in read
return self._sslobj.read(len, buffer)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\ssl.py", line 631, in read
v = self._sslobj.read(len, buffer)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\adapters.py", line 440, in send
timeout=timeout
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\urllib3\connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\urllib3\util\retry.py", line 357, in increment
raise six.reraise(type(error), error, _stacktrace)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\urllib3\packages\six.py", line 685, in reraise
raise value.with_traceback(tb)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
chunked=chunked)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\urllib3\connectionpool.py", line 387, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\urllib3\connectionpool.py", line 383, in _make_request
httplib_response = conn.getresponse()
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1331, in getresponse
response.begin()
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 297, in begin
version, status, reason = self._read_status()
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 258, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\ssl.py", line 1009, in recv_into
return self.read(nbytes, buffer)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\ssl.py", line 871, in read
return self._sslobj.read(len, buffer)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\ssl.py", line 631, in read
v = self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', TimeoutError(10060, 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond', None, 10060, None))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:/Projects/Python/Practice/Practice1.py", line 5, in 
data = requests.get(url)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "C:\Users\Nrusingh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\adapters.py", line 490, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', TimeoutError(10060, 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond', None, 10060, None))

Process finished with exit code 1  

这是什么错误,如何纠正它?为什么在urllib中可以工作,但在requests中却不能?

很抱歉要添加外部链接,因为我不知道如何将错误日志添加到问题中,而且stackoverflow也不允许我在问题中添加我的错误日志。 - Nrusingh Prasad Acharya
已编辑,这是一个大问题 :) - roganjosh
短回答,使用名为“用户代理”的标题。回答如下 :) - dina
2个回答

5

我直接使用了您的代码,但是我得到了相同的错误。然后我按照浏览器发送请求的方式进行了跟踪。有些服务器如果请求中没有预期的头信息,它们将不会响应,并用作后端处理的一部分。原来这个服务器正在查找一个叫做user-agent的头信息,通常用于确定请求来自哪个客户端。现在,下面是修正过的代码,它可以正常工作:

from bs4 import *
import requests
url = "https://www.accuweather.com/en/in/dhenkanal/189844/weather-forecast/189844"

headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}

data = requests.get(url, headers=headers)
soup = BeautifulSoup(data.text, "html.parser")

现在你可以玩弄你的汤!实际上,您可以传递更多的标头,如accept、dnt、pragma、accept-language、cache-control等。这些http标头的说明是另一个问题,另一个时间再说。希望它有所帮助 :)

2

尝试增加您的requests.get方法的超时参数:

requests.get(url, headers=headers, timeout=5)

但是,如果您的脚本被服务器阻止以防止爬取尝试。如果是这种情况,您可以通过设置适当的标头来伪造网络浏览器。

{"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)", "Referer": "http://example.com"}

你最终的代码

import requests
url = "https://www.accuweather.com/en/in/dhenkanal/189844/weather-forecast/189844"
headers = {"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)", "Referer": "http://example.com"}
data = requests.get(url,headers=headers,timeout=5)

我甚至无法在浏览器中加载链接。 120的超时时间不足以防止GET错误。 - roganjosh
我在5秒超时后得到了结果。 - Argus Malware
1
在头部添加用户代理,以避免他们的脚本阻止爬取。 - Argus Malware

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接