Python 3获取HTTP页面

Question

Python 3获取HTTP页面

pythonhttppython-3.x

31

我该如何使用Python获取HTTP页面的内容？目前我只有请求对象并且已经导入了http.client模块。

- BiscottiGummyBears

6个回答

14

使用内置模块" http.client "

import http.client

connection = http.client.HTTPSConnection("api.bitbucket.org", timeout=2)
connection.request('GET', '/2.0/repositories')
response = connection.getresponse()
print('{} {} - a response on a GET request by using "http.client"'.format(response.status, response.reason))
content = response.read().decode('utf-8')
print(content[:100], '...')

结果：

200 OK - 使用“http.client”对GET请求的响应 {"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...

使用第三方库“requests”

response = requests.get("https://api.bitbucket.org/2.0/repositories")
print('{} {} - a response on a GET request by using "requests"'.format(response.status_code, response.reason))
content = response.content.decode('utf-8')
print(content[:100], '...')

结果：

200 OK - 通过使用“requests”对GET请求的响应 {"pagelen": 10，"values": [{"scm": "hg"，"website": ""，"has_wiki": true，"name": "tweakmsg"，"links ...

使用内置模块“urllib.request”

response = urllib.request.urlopen("https://api.bitbucket.org/2.0/repositories")
print('{} {} - a response on a GET request by using "urllib.request"'.format(response.status, response.reason))
content = response.read().decode('utf-8')
print(content[:100], '...')

结果：

200 OK - 使用“urllib.request”进行GET请求的响应{"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...

注意事项：

Python 3.4
响应结果很可能只有内容不同

- PADYMKO

2

您也可以使用requests库。我发现这特别有用，因为它更容易检索和显示HTTP头。

import requests

source = 'http://www.pythonlearn.com/code/intro-short.txt'

r = requests.get(source)

print('Display actual page\n')
for line in r:
    print (line.strip())

print('\nDisplay all headers\n')
print(r.headers)

- dimsum88

这是Python 3吗？ - Nam G VU

1

pip install requests

import requests

r = requests.get('https://api.spotify.com/v1/search?type=artist&q=beyonce')
r.json()

- Anthony Awuley

0

添加以下代码，可以将数据格式化为易于阅读的形式：

text = f.read().decode('utf-8')

- SKGoC

0

https://dev59.com/4WAf5IYBdhLWcg3w52Ll#41862742 看看这个吧。它与你遇到的问题类似，而且非常简单，代码行数也很少。当我意识到Python3不能简单地使用get_page时，这确实帮了我很多。

这是一个不错的替代方案。 (希望这能帮到你，加油！)

- buda__

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Greg Hewgill · Accepted Answer

56

使用urllib.request可能是实现此操作最简单的方法：

import urllib.request
f = urllib.request.urlopen("http://stackoverflow.com")
print(f.read())

- Greg Hewgill

尝试过了，我得到了“AttributeError: 'module' object has no attribute 'urlopen'” - BiscottiGummyBears

1

抱歉，我刚刚注意到您正在使用Python 3。我已经更新了我的示例以匹配它。 - Greg Hewgill

2

@Davide GualanoпјҡPython 2.xдёӯзҡ„urllib2жЁЎеқ—е·Із»Ҹж•ҙеҗҲеҲ°Python 3.xзҡ„urllibжЁЎеқ—йӣҶдёӯпјҡhttp://docs.python.org/library/urllib2.html - Greg Hewgill

@Greg：我的错，我没有仔细阅读问题标题 :) - Davide Gualano