Python请求库重定向新的URL

Question

Python请求库重定向新的URL

pythonhttpredirectpython-requests

145

我一直在查看Python Requests文档，但是我没有看到任何我想要实现的功能。

在我的脚本中，我设置了allow_redirects=True。

我想知道页面是否被重定向到其他地方，新的URL是什么。

例如，如果起始URL为：www.google.com/redirect

而最终URL为www.google.co.uk/redirected

我该如何获取那个URL？

- Daniel Pilch

查看这个答案来处理urllib2。 - logi-kal

请在此处使用Web浏览器检查我的解决方案（https://stackoverflow.com/questions/62503861/to-get-redirected-url-with-requests/70869177#70869177）。 - Shahin Shirazi

8个回答

96

这是回答一个稍微不同的问题，但由于我自己也被卡住了，希望它对其他人有用。如果您想使用 allow_redirects=False 并直接获取第一个重定向对象，而不是跟随一系列重定向的链，并且您只想直接从 302 响应对象中获取重定向位置，则 r.url 将无法工作。相反，它是“Location”头：

r = requests.get('http://github.com/', allow_redirects=False)
r.status_code  # 302
r.url  # http://github.com, not https.
r.headers['Location']  # https://github.com/ -- the redirect destination

- hwjp

谢谢 - 这让我的URL引荐脚本(其中有成千上万的URL)提高了几秒钟。 - ahinkle

你知道 r.next 是怎么回事吗？我以为它会包含一个指向重定向 URL 的 PreparedRequest，但事实并非如此... - Elias Strehle

值得一提的是，这个答案只会给出第一个重定向URL。如果访问这个URL本来会再次重定向到一个新的URL，你将会错过它。 - Nioooooo

52

我认为在处理 URL 重定向时，调用 requests.head 要比调用 requests.get 更安全。在这里查看 GitHub 的一个问题。：

r = requests.head(url, allow_redirects=True)
print(r.url)

- Geng Jiawen

4

这应该是被接受的答案。简短明了。 - Volatil3

8

并非所有服务器都会对HEAD请求做出与GET相同的响应。 - Blender

对我来说，这种方法在提取最终重定向URL方面非常有效，为处理3万个URL节省了大量手动工作。 - Ashish Tripathi

47

文档中有这段话 https://requests.readthedocs.io/en/master/user/quickstart/#redirection-and-history

import requests

r = requests.get('http://www.github.com')
r.url
#returns https://www.github.com instead of the http page you asked for

- Back2Basics

13

对于Python3.5，您可以使用以下代码：

import urllib.request
res = urllib.request.urlopen(starturl)
finalurl = res.geturl()
print(finalurl)

- Shuai.Z

这是Python 3.5的正确答案，我找了很久才找到，谢谢。 - jjj

你能完成你的回答吗？如果你找到了如何使用Python3进行重定向的方法，谢谢。 - Vladimir Despotovic

2

我写了下面的函数来从短网址（bit.ly、t.co等）获取完整的URL：

import requests

def expand_short_url(url):
    r = requests.head(url, allow_redirects=False)
    r.raise_for_status()
    if 300 < r.status_code < 400:
        url = r.headers.get('Location', url)

    return url

使用方法（短网址为本问题的网址）：

short_url = 'https://tinyurl.com/' + '4d4ytpbx'
full_url = expand_short_url(short_url)
print(full_url)

输出：

https://dev59.com/questions/WmIj5IYBdhLWcg3wPy4a

- Jossef Harush Kadouri

0

所有答案都适用于最终URL存在/正常工作的情况。如果最终URL似乎无法工作，则以下是捕获所有重定向的方法。有一种情况是最终URL不再起作用，而其他方式（如URL历史记录）会出现错误。
代码片段

long_url = ''
url = 'http://example.com/bla-bla'
try:
    while True:
        long_url = requests.head(url).headers['location']
        print(long_url)
        url = long_url
except:
    print(long_url)

- Tushar

-1

我无法使用requests库，不得不采用不同的方法。这是我发布作为解决方案的代码。（使用requests获取重定向URL）

这种方式实际上打开了浏览器，等待浏览器将URL记录在历史记录中，然后读取您历史记录中的最后一个URL。我为Google Chrome编写了此代码，但如果您使用不同的浏览器，也应该能够跟随。

import webbrowser
import sqlite3
import pandas as pd
import shutil

webbrowser.open("https://twitter.com/i/user/2274951674")
#source file is where the history of your webbroser is saved, I was using chrome, but it should be the same process if you are using different browser
source_file = 'C:\\Users\\{your_user_id}\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\History'
# could not directly connect to history file as it was locked and had to make a copy of it in different location
destination_file = 'C:\\Users\\{user}\\Downloads\\History'
time.sleep(30) # there is some delay to update the history file, so 30 sec wait give it enough time to make sure your last url get logged
shutil.copy(source_file,destination_file) # copying the file.
con = sqlite3.connect('C:\\Users\\{user}\\Downloads\\History')#connecting to browser history
cursor = con.execute("SELECT * FROM urls")
names = [description[0] for description in cursor.description]
urls = cursor.fetchall()
con.close()
df_history = pd.DataFrame(urls,columns=names)
last_url = df_history.loc[len(df_history)-1,'url']
print(last_url)

>>https://twitter.com/ozanbayram01

- Shahin Shirazi

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martijn Pieters · Accepted Answer

你正在寻找请求历史记录。 response.history 属性是一个响应列表，其中包含导致最终 URL 的所有响应，可以在 response.url 中找到最终 URL。

response = requests.get(someurl)
if response.history:
    print("Request was redirected")
    for resp in response.history:
        print(resp.status_code, resp.url)
    print("Final destination:")
    print(response.status_code, response.url)
else:
    print("Request was not redirected")

Demo:

（演示：）

>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/3')
>>> response.history
(<Response [302]>, <Response [302]>, <Response [302]>)
>>> for resp in response.history:
...     print(resp.status_code, resp.url)
... 
302 http://httpbin.org/redirect/3
302 http://httpbin.org/redirect/2
302 http://httpbin.org/redirect/1
>>> print(response.status_code, response.url)
200 http://httpbin.org/get