在Python中使用requests.get()或requests.post()时遇到无法连接代理的错误

Question

在Python中使用requests.get()或requests.post()时遇到无法连接代理的错误

pythonproxyrequestpython-requestshttp-proxy

8

我有两个URL用于获取数据。使用我的代码，第一个URL可以正常工作，而第二个URL则会出现“代理错误（ProxyError）”。

我正在使用Python 3中的“requests”库，并尝试在Google和这里搜索问题，但都没有成功。

我的代码片段如下：

    import requests

    proxies = {
      'http': 'http://user:pass@xxx.xxx.xxx.xxx:xxxx',
      'https': 'http://user:pass@xxx.xxx.xxx.xxx:xxxx',
    }

    url1 = 'https://en.oxforddictionaries.com/definition/act'
    url2 = 'https://dictionary.cambridge.org/dictionary/english/act'

    r1 = requests.get(url1, proxies=proxies)
    r2 = requests.get(url2, proxies=proxies)

url1 没问题，但是 url2 显示以下错误：

    ProxyError: HTTPSConnectionPool(host='dictionary.cambridge.org', port=443): Max retries exceeded with url: /dictionary/english/act (Caused by ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response',)))

使用request.post()也会发生同样的情况。

请解释一下为什么会出现这种情况，这两个URL之间的握手有什么区别吗？
urllib.request.urlopen运行良好，所以我明确地寻求使用requests来获取答案。

- thepunitsingh

2个回答

0

import re
import requests
import json
from bs4 import BeautifulSoup
import pymysql
import time, datetime
import os

from requests.adapters import HTTPAdapter


def get_random_proxy():
    proxypool_url = 'http://127.0.0.1:5555/random'
    """
    get random proxy from proxypool
    :return: proxy
    """
    return requests.get(proxypool_url).text.strip()


headers = {
    'User-Agent': 'Chrome',
    'Referer': 'https://www.nmpa.gov.cn/datasearch/home-index.html?79QlcAyHig6m=1636513393895',
    'Host': 'nmpa.gov.cn',
    'Origin': 'https://nmpa.gov.cn',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Connection': 'close'
}
url = 'https://www.nmpa.gov.cn/datasearch/search-result.html'


def start_requests(coo):
    # r = json.loads(r.text)
    headers['Set-Cookie'] = coo
    s = requests.get(url=url, headers=headers, stream=True, timeout=(5, 5), verify=False)
    s.encoding = 'utf8'
    print(s.status_code)
    print(s)


while True:
    proxy = {'http': 'http://' + get_random_proxy(), 'https': 'https://' + get_random_proxy()}
    print(proxy)
    try:
        sess = requests.Session()
        sess.keep_alive = False  # 关闭多余连接
        res = sess.get(url='https://nmpa.gov.cn', headers={'User-Agent': 'Chrome'}, proxies=proxy, timeout=10,
                       verify=False)
        res.close()
        print(res.status_code)
        res.encoding = 'utf8'
        cookie = res.headers['Set-Cookie']
        print(cookie)
        if res.status_code == 200:
            print(res.status_code)
            time.sleep(10)
            start_requests(cookie)
            break
    except Exception as error:
        time.sleep(10)
        print("没有连接成功", error)

- lilin

6

虽然这段代码可能解决了问题，但是如果您能够附带说明它是如何解决问题的，那么可以帮助提高您的文章质量，并可能得到更多的赞同。请记住，您的回答不仅是为了回答当前提问者的问题，也为未来的读者提供参考。请编辑您的答案添加说明并指明其适用的局限性和假设条件。 - Suraj Rao

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Phoenix · Accepted Answer

当我使用headers关键字参数，并将User-Agent字符串设置为Chrome时，我能够成功获取url2的有效响应。

r2 = requests.get(url2, proxies=proxies, headers={'User-Agent': 'Chrome'})

为了回答你的第一个问题，这种情况发生的可能原因与服务器端设置有关。它可能被配置为不接受来自未知代理或缺失“User-Agent”头的请求。