来自Python requests库的HTTP请求中缺少Host头

9

在由requests Python库生成的HTTP请求消息中,HTTP/1.1必需的主机头字段在哪里?

import requests

response = requests.get("https://www.google.com/")
print(response.request.headers)

输出:

{'用户代理': 'python-requests/2.22.0', '接受编码': 'gzip, deflate', '接受': '*/*', '连接': '保持连接'}


使用 https://httpbin.org/get 获取服务器接收到的所有标头。requests.get("https://httpbin.org/get") print(response.content) 看起来它发送了 Host,但后面没有显示出来。如果在 get(..., headers={'Host':'example.com'}) 中设置错误的 Host,则仍然可以正常工作 - 因此该字段似乎不是那么必需的 - 然后它会在 response.request.headers 中显示此标头。 - furas
1个回答

7
requests默认情况下不会添加HOST header到请求中。如果没有显式添加,则决策将委托给底层的http模块。
请参见http/client.py的这个部分:
(如果在requests.get中显式提供了'Host' header,则skip_hostTrue
    if self._http_vsn == 11:
        # Issue some standard headers for better HTTP/1.1 compliance

        if not skip_host:
            # this header is issued *only* for HTTP/1.1
            # connections. more specifically, this means it is
            # only issued when the client uses the new
            # HTTPConnection() class. backwards-compat clients
            # will be using HTTP/1.0 and those clients may be
            # issuing this header themselves. we should NOT issue
            # it twice; some web servers (such as Apache) barf
            # when they see two Host: headers

            # If we need a non-standard port,include it in the
            # header.  If the request is going through a proxy,
            # but the host of the actual URL, not the host of the
            # proxy.

            netloc = ''
            if url.startswith('http'):
                nil, netloc, nil, nil, nil = urlsplit(url)

            if netloc:
                try:
                    netloc_enc = netloc.encode("ascii")
                except UnicodeEncodeError:
                    netloc_enc = netloc.encode("idna")
                self.putheader('Host', netloc_enc)
            else:
                if self._tunnel_host:
                    host = self._tunnel_host
                    port = self._tunnel_port
                else:
                    host = self.host
                    port = self.port

                try:
                    host_enc = host.encode("ascii")
                except UnicodeEncodeError:
                    host_enc = host.encode("idna")

                # As per RFC 273, IPv6 address should be wrapped with []
                # when used as Host header

                if host.find(':') >= 0:
                    host_enc = b'[' + host_enc + b']'

                if port == self.default_port:
                    self.putheader('Host', host_enc)
                else:
                    host_enc = host_enc.decode("ascii")
                    self.putheader('Host', "%s:%s" % (host_enc, port)) 

因此,当检查requests发送到服务器的头时,我们看不到'Host'头。

如果我们向http://httpbin/get发送一个请求并打印响应,我们可以看到确实发送了Host头。

import requests

response = requests.get("http://httpbin.org/get")
print('Response from httpbin/get')
print(response.json())
print()
print('response.request.headers')
print(response.request.headers)

输出结果

Response from httpbin/get
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 
 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.20.0'},
 'origin': 'XXXXXX', 'url': 'https://httpbin.org/get'}

response.request.headers
{'User-Agent': 'python-requests/2.20.0', 'Accept-Encoding': 'gzip, deflate', 
 'Accept': '*/*', 'Connection': 'keep-alive'}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接