我该如何使用Python获取HTTP页面的内容?目前我只有请求对象并且已经导入了http.client模块。
我该如何使用Python获取HTTP页面的内容?目前我只有请求对象并且已经导入了http.client模块。
使用urllib.request
可能是实现此操作最简单的方法:
import urllib.request
f = urllib.request.urlopen("http://stackoverflow.com")
print(f.read())
使用内置模块" http.client "
import http.client
connection = http.client.HTTPSConnection("api.bitbucket.org", timeout=2)
connection.request('GET', '/2.0/repositories')
response = connection.getresponse()
print('{} {} - a response on a GET request by using "http.client"'.format(response.status, response.reason))
content = response.read().decode('utf-8')
print(content[:100], '...')
结果:
200 OK - 使用“http.client”对GET请求的响应 {"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...
使用第三方库“requests”
response = requests.get("https://api.bitbucket.org/2.0/repositories")
print('{} {} - a response on a GET request by using "requests"'.format(response.status_code, response.reason))
content = response.content.decode('utf-8')
print(content[:100], '...')
结果:
200 OK - 通过使用“requests”对GET请求的响应 {"pagelen": 10,"values": [{"scm": "hg","website": "","has_wiki": true,"name": "tweakmsg","links ...
使用内置模块“urllib.request”
response = urllib.request.urlopen("https://api.bitbucket.org/2.0/repositories")
print('{} {} - a response on a GET request by using "urllib.request"'.format(response.status, response.reason))
content = response.read().decode('utf-8')
print(content[:100], '...')
结果:
200 OK - 使用“urllib.request”进行GET请求的响应{"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...
注意事项:
import requests
source = 'http://www.pythonlearn.com/code/intro-short.txt'
r = requests.get(source)
print('Display actual page\n')
for line in r:
print (line.strip())
print('\nDisplay all headers\n')
print(r.headers)
pip install requests
import requests
r = requests.get('https://api.spotify.com/v1/search?type=artist&q=beyonce')
r.json()
添加以下代码,可以将数据格式化为易于阅读的形式:
text = f.read().decode('utf-8')
https://dev59.com/4WAf5IYBdhLWcg3w52Ll#41862742 看看这个吧。它与你遇到的问题类似,而且非常简单,代码行数也很少。 当我意识到Python3不能简单地使用get_page时,这确实帮了我很多。
这是一个不错的替代方案。 (希望这能帮到你,加油!)
urllib2
жЁЎеқ—е·Із»Ҹж•ҙеҗҲеҲ°Python 3.xзҡ„urllib
жЁЎеқ—йӣҶдёӯпјҡhttp://docs.python.org/library/urllib2.html - Greg Hewgill