Python - 如何从RSS源获取时区

Question

Python - 如何从RSS源获取时区

3

我需要获取一个RSS feed的发布日期，并且需要知道该日期所在的时区。我将这个日期存储为UTC时间，同时我也想另外添加一个字段来存储时区信息，这样我以后就可以通过这两个字段来操作这个日期了。

我的当前代码如下：

for entry in feed['entries']:
    if hasattr(entry, 'published'):
        if isinstance(entry.published_parsed, struct_time):
            dt = datetime(*entry.published_parsed[:-3])

最终的dt值是UTC中的正确日期时间，但我还需要获取原始时区。有人可以帮忙吗？

编辑:

以后参考，即使它不是我最初问题的一部分，如果你需要操作非标准时区（例如est），你需要根据您的规格制作一个转换表。感谢这个答案: 在Python中解析带有时区缩写名称的日期/时间字符串？

- Meir

在你的编辑中：使用 pytz 在 Python 中操作时区。 - jfs

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Omid Raha · Accepted Answer

您可以使用 dateutil 包中的 parser.parse 方法。

例如，针对 statckoverflow:

import feedparser
from dateutil import parser, tz

url = 'http://stackoverflow.com/feeds/tag/python'
feed = feedparser.parse(url)
published = feed.entries[0].published
dt = parser.parse(published)

print(published)
print(dt) # that is timezone aware
print(dt.utcoffset()) # time zone of time
print(dt.astimezone(tz.tzutc())) # that is timezone aware as UTC

2012-11-28T19:07:32Z
2012-11-28 19:07:32+00:00
0:00:00
2012-11-28 19:07:32+00:00

你可以看到published以Z结尾，这意味着时区在UTC：

请查看feedparser中的日期格式历史。

Atom 1.0 states that all date elements “MUST conform to the date-time 
production in RFC 3339. 
In addition, an uppercase T character MUST be used to separate date and time, 
and an uppercase Z character MUST be present in the absence of 
a numeric time zone offset.”

另一个例子如下:

import feedparser
from dateutil import parser, tz

url = 'http://omidraha.com/rss/'
feed = feedparser.parse(url)
published = feed.entries[0].published
dt = parser.parse(published)

print(published)
print(dt) # that is timezone aware
print(dt.utcoffset()) # time zone of time
print(dt.astimezone(tz.tzutc())) # that is timezone aware as UTC

Thu, 26 Dec 2013 14:24:04 +0330
2013-12-26 14:24:04+03:30
3:30:00
2013-12-26 10:54:04+00:00

但这也取决于所接收到的时间数据格式和提要类型。