根据文档,既不需要使用
Content-Disposition
也不需要其
filename
属性。而且,我在互联网上检查了许多链接,没有找到带有
Content-Disposition
头的响应。所以,在大多数情况下,我不会太依赖它,只是从请求 URL 中提取这个信息(注意:我从
req.url
获取它,因为可能存在重定向,我们想要获取“真实”的文件名)。我使用
werkzeug
,因为它看起来更强大,并且可以处理带引号和不带引号的文件名。最终,我得出了这个解决方案(适用于 Python 3.8 及以上版本):
from urllib.parse import urlparse
import requests
import werkzeug
def get_filename(url: str):
try:
with requests.get(url) as req:
if content_disposition := req.headers.get("Content-Disposition"):
param, options = werkzeug.http.parse_options_header(content_disposition)
if param == 'attachment' and (filename := options.get('filename')):
return filename
path = urlparse(req.url).path
name = path[path.rfind('/') + 1:]
return name
except requests.exceptions.RequestException as e:
raise e
我使用 pytest
和 requests_mock
编写了一些测试:
import pytest
import requests
import requests_mock
from main import get_filename
TEST_URL = 'https://pwrk.us/report.pdf'
@pytest.mark.parametrize(
'headers,expected_filename',
[
(
{'Content-Disposition': 'attachment; filename="filename.pdf"'},
"filename.pdf"
),
(
{'Content-Disposition': 'attachment; filename=filename with spaces.pdf'},
"filename with spaces.pdf"
),
(
{'Content-Disposition': 'attachment;'},
"report.pdf"
),
(
{'Content-Disposition': 'inline;'},
"report.pdf"
),
(
{},
"report.pdf"
)
]
)
def test_get_filename(headers, expected_filename):
with requests_mock.Mocker() as m:
m.get(TEST_URL, text='resp', headers=headers)
assert get_filename(TEST_URL) == expected_filename
def test_get_filename_exception():
with requests_mock.Mocker() as m:
m.get(TEST_URL, exc=requests.exceptions.RequestException)
with pytest.raises(requests.exceptions.RequestException):
get_filename(TEST_URL)
0c9605301e48beda0f000000.pdf
”(因为这是请求中的文件名),但幸运的是我决定先测试一下。而且FireFox想把它保存为“Mater Sci Eng B47(1997)33.pdf”。 - Jongwarecontent-disposition : inline; filename="Mater Sci Eng B47 (1997) 33.pdf"
。顺便提一句,许多PDF文档中都嵌入了标题,但并非所有文档都有,并且如果PDF文件以二进制形式存在,则可能很难访问。 - PM 2Ring