urllib.urlopen返回一个旧页面？

Question

4

我有一个非常简单的HTML页面（目录列表），我尝试使用urllib读取该页面，方法如下：

page =  urllib.urlopen(coreRepositoryUrl).read()

问题在于，我读取的这个HTML版本比最新版本还要旧。info()函数返回以下结果：

Date: Fri, 19 Apr 2013 18:48:09 GMT
Server: Apache/2.0.52 (Fedora)
Content-Type: text/html; charset=UTF-8
Connection: close
Age: 481084

页面最后更新于今天（2013年04月25日）。哪个组件可能是缓存组件？

- zeller

你能添加你的链接吗？urlopen().info()在我这里对_google.com_起作用（PasteBin）。 - awesoon

@soon 这是一个本地构建服务器。（不幸的是，我无法通过公司代理访问pastebin...）但我刚刚发现了一个类似的问题，答案令人失望... https://dev59.com/Q3A65IYBdhLWcg3w4CwL - zeller

1

urllib 可能会使用自己的缓存（在某些条件下，请参见 tempcache, ftpcache in URLopener），该缓存与 http 缓存无关。urllib.urlcleanup() 清除缓存。urllib2 不缓存任何内容。 - jfs

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- acj · Accepted Answer

在您的请求中添加带有值"max-age=0"的头文件"Cache-Control"。

import urllib2
req = urllib2.Request(url)
req.add_header('Cache-Control', 'max-age=0')
resp = urllib2.urlopen(req)
content = resp.read()

使用这个头信息后，每个缓存都会重新验证其缓存条目。