如何在Python中检查URL是网页链接还是文件链接

Question

如何在Python中检查URL是网页链接还是文件链接

5

假设我的链接如下所示：

    http://example.com/index.html
    http://example.com/stack.zip
    http://example.com/setup.exe
    http://example.com/news/

在上述链接中，第一个和第四个链接是网页链接，第二个和第三个链接是文件链接。

这些只是一些文件链接的例子，例如.zip和.exe，但可能还有许多其他文件。

是否有任何标准方法来区分文件URL或网页链接？提前感谢。

- Bishwash

通过 HTTP 响应的 Content-Type，以及 URL 的扩展名，例如 html、zip、exe。 - Omid Raha

@Omid Raha：我期望有一些内置函数来检查这个。 - Bishwash

好的，请检查我的答案。 - Omid Raha

2个回答

1

import urllib
mytest = urllib.urlopen('http://www.sec.gov')
mytest.headers.items()

('content-length', '20833'), ('expires', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('server', 'SEC'), ('connection', 'close'), ('cache-control', 'max-age=0'), ('date', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('content-type', 'text/html')]

mytest.headers.items()是一个元组列表，在我的例子中，你可以看到列表中的最后一项描述了内容。

我不确定长度是否会变化，因此您可以通过迭代来查找其中包含“content-type”的元组。

- PyNEwbie

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Omid Raha · Accepted Answer

import urllib
import mimetypes


def guess_type_of(link, strict=True):
    link_type, _ = mimetypes.guess_type(link)
    if link_type is None and strict:
        u = urllib.urlopen(link)
        link_type = u.headers.gettype() # or using: u.info().gettype()
    return link_type

演示:

links = ['https://dev59.com/n3zaa4cB1Zd3GeqPKwgv', # It's a html page
         'http://upload.wikimedia.org/wikipedia/meta/6/6d/Wikipedia_wordmark_1x.png', # It's a png file
         'http://commons.wikimedia.org/wiki/File:Typing_example.ogv', # It's a html page
         'http://upload.wikimedia.org/wikipedia/commons/e/e6/Typing_example.ogv'   # It's an ogv file
]

for link in links:
    print(guess_type_of(link))

输出：

text/html
image/x-png
text/html
application/ogg