我一直在查看Python Requests文档,但是我没有看到任何我想要实现的功能。
在我的脚本中,我设置了allow_redirects=True
。
我想知道页面是否被重定向到其他地方,新的URL是什么。
例如,如果起始URL为:www.google.com/redirect
而最终URL为www.google.co.uk/redirected
我该如何获取那个URL?
我一直在查看Python Requests文档,但是我没有看到任何我想要实现的功能。
在我的脚本中,我设置了allow_redirects=True
。
我想知道页面是否被重定向到其他地方,新的URL是什么。
例如,如果起始URL为:www.google.com/redirect
而最终URL为www.google.co.uk/redirected
我该如何获取那个URL?
response.history
属性是一个响应列表,其中包含导致最终 URL 的所有响应,可以在 response.url
中找到最终 URL。response = requests.get(someurl)
if response.history:
print("Request was redirected")
for resp in response.history:
print(resp.status_code, resp.url)
print("Final destination:")
print(response.status_code, response.url)
else:
print("Request was not redirected")
Demo:
(演示:)>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/3')
>>> response.history
(<Response [302]>, <Response [302]>, <Response [302]>)
>>> for resp in response.history:
... print(resp.status_code, resp.url)
...
302 http://httpbin.org/redirect/3
302 http://httpbin.org/redirect/2
302 http://httpbin.org/redirect/1
>>> print(response.status_code, response.url)
200 http://httpbin.org/get
r = requests.get('http://github.com/', allow_redirects=False)
r.status_code # 302
r.url # http://github.com, not https.
r.headers['Location'] # https://github.com/ -- the redirect destination
r.next
是怎么回事吗?我以为它会包含一个指向重定向 URL 的 PreparedRequest
,但事实并非如此... - Elias Strehle我认为在处理 URL 重定向时,调用 requests.head
要比调用 requests.get
更安全。在这里查看 GitHub 的一个问题。:
r = requests.head(url, allow_redirects=True)
print(r.url)
文档中有这段话 https://requests.readthedocs.io/en/master/user/quickstart/#redirection-and-history
import requests
r = requests.get('http://www.github.com')
r.url
#returns https://www.github.com instead of the http page you asked for
对于Python3.5,您可以使用以下代码:
import urllib.request
res = urllib.request.urlopen(starturl)
finalurl = res.geturl()
print(finalurl)
import requests
def expand_short_url(url):
r = requests.head(url, allow_redirects=False)
r.raise_for_status()
if 300 < r.status_code < 400:
url = r.headers.get('Location', url)
return url
使用方法(短网址为本问题的网址):
short_url = 'https://tinyurl.com/' + '4d4ytpbx'
full_url = expand_short_url(short_url)
print(full_url)
输出:
https://dev59.com/questions/WmIj5IYBdhLWcg3wPy4a
所有答案都适用于最终URL存在/正常工作的情况。
如果最终URL似乎无法工作,则以下是捕获所有重定向的方法。
有一种情况是最终URL不再起作用,而其他方式(如URL历史记录)会出现错误。
代码片段
long_url = ''
url = 'http://example.com/bla-bla'
try:
while True:
long_url = requests.head(url).headers['location']
print(long_url)
url = long_url
except:
print(long_url)
我无法使用requests库,不得不采用不同的方法。这是我发布作为解决方案的代码。(使用requests获取重定向URL)
这种方式实际上打开了浏览器,等待浏览器将URL记录在历史记录中,然后读取您历史记录中的最后一个URL。我为Google Chrome编写了此代码,但如果您使用不同的浏览器,也应该能够跟随。
import webbrowser
import sqlite3
import pandas as pd
import shutil
webbrowser.open("https://twitter.com/i/user/2274951674")
#source file is where the history of your webbroser is saved, I was using chrome, but it should be the same process if you are using different browser
source_file = 'C:\\Users\\{your_user_id}\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\History'
# could not directly connect to history file as it was locked and had to make a copy of it in different location
destination_file = 'C:\\Users\\{user}\\Downloads\\History'
time.sleep(30) # there is some delay to update the history file, so 30 sec wait give it enough time to make sure your last url get logged
shutil.copy(source_file,destination_file) # copying the file.
con = sqlite3.connect('C:\\Users\\{user}\\Downloads\\History')#connecting to browser history
cursor = con.execute("SELECT * FROM urls")
names = [description[0] for description in cursor.description]
urls = cursor.fetchall()
con.close()
df_history = pd.DataFrame(urls,columns=names)
last_url = df_history.loc[len(df_history)-1,'url']
print(last_url)
>>https://twitter.com/ozanbayram01
urllib2
。 - logi-kal