使用Python解析(ics / icalendar)文件

80

我有一个如下格式的.ics文件。解析它的最佳方法是什么?我需要检索每个条目的摘要,描述和时间。

BEGIN:VCALENDAR
X-LOTUS-CHARSET:UTF-8
VERSION:2.0
PRODID:-//Lotus Development Corporation//NONSGML Notes 8.0//EN
METHOD:PUBLISH
BEGIN:VTIMEZONE
TZID:India
BEGIN:STANDARD
DTSTART:19500101T020000
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID="India":20100615T111500
DTEND;TZID="India":20100615T121500
TRANSP:OPAQUE
DTSTAMP:20100713T071035Z
CLASS:PUBLIC
DESCRIPTION:Emails\nDarlene\n Murphy\nDr. Ferri\n

UID:12D3901F0AD9E83E65257743001F2C9A-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-APPTTYPE:0
X-LOTUS-CHILD_UID:12D3901F0AD9E83E65257743001F2C9A
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID="India":20100628T130000
DTEND;TZID="India":20100628T133000
TRANSP:OPAQUE
DTSTAMP:20100628T055408Z
CLASS:PUBLIC
DESCRIPTION:
SUMMARY:smart energy management
LOCATION:8778/92050462
UID:07F96A3F1C9547366525775000203D96-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-NOTICETYPE:A
X-LOTUS-APPTTYPE:3
X-LOTUS-CHILD_UID:07F96A3F1C9547366525775000203D96
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID="India":20100629T110000
DTEND;TZID="India":20100629T120000
TRANSP:OPAQUE
DTSTAMP:20100713T071037Z
CLASS:PUBLIC
SUMMARY:meeting
UID:6011DDDD659E49D765257751001D2B4B-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-APPTTYPE:0
X-LOTUS-CHILD_UID:6011DDDD659E49D765257751001D2B4B
END:VEVENT
6个回答

99

icalendar 包 看起来很不错。

例如,写入文件:

from icalendar import Calendar, Event
from datetime import datetime
from pytz import UTC # timezone

cal = Calendar()
cal.add('prodid', '-//My calendar product//mxm.dk//')
cal.add('version', '2.0')

event = Event()
event.add('summary', 'Python meeting about calendaring')
event.add('dtstart', datetime(2005,4,4,8,0,0,tzinfo=UTC))
event.add('dtend', datetime(2005,4,4,10,0,0,tzinfo=UTC))
event.add('dtstamp', datetime(2005,4,4,0,10,0,tzinfo=UTC))
event['uid'] = '20050115T101010/27346262376@mxm.dk'
event.add('priority', 5)

cal.add_component(event)

f = open('example.ics', 'wb')
f.write(cal.to_ical())
f.close()

泰达!你得到了这个文件:

BEGIN:VCALENDAR
PRODID:-//My calendar product//mxm.dk//
VERSION:2.0
BEGIN:VEVENT
DTEND;VALUE=DATE:20050404T100000Z
DTSTAMP;VALUE=DATE:20050404T001000Z
DTSTART;VALUE=DATE:20050404T080000Z
PRIORITY:5
SUMMARY:Python meeting about calendaring
UID:20050115T101010/27346262376@mxm.dk
END:VEVENT
END:VCALENDAR

但是这个文件里面有什么内容?

g = open('example.ics','rb')
gcal = Calendar.from_ical(g.read())
for component in gcal.walk():
    print component.name
g.close()

你可以轻松看到它:

>>> 
VCALENDAR
VEVENT
>>> 

关于解析有关事件的数据,怎么办:

g = open('example.ics','rb')
gcal = Calendar.from_ical(g.read())
for component in gcal.walk():
    if component.name == "VEVENT":
        print(component.get('summary'))
        print(component.get('dtstart'))
        print(component.get('dtend'))
        print(component.get('dtstamp'))
g.close()

现在你获得:

>>> 
Python meeting about calendaring
20050404T080000Z
20050404T100000Z
20050404T001000Z
>>> 

1
然而,它似乎以原始的datetime返回datetimes,这些datetime没有utcoffset. :( - kojiro
7
@BradMontgomery,看起来 iCalendar 包的维护者已经更改了,版本 3.0 可在此处以 BSD 许可证获得:https://github.com/collective/icalendar - mpdaugherty
1
@mpdaugherty 这是个好消息!很高兴看到代码得到了一些维护 :) - Brad Montgomery
1
类似的使用新API(以及pytz用于UTC处理)的示例可以在此处找到:http://icalendar.readthedocs.org/en/latest/examples.html - x29a
5
@zvyn 发出的命令 print(component.get('dtstart')) 返回的是 <icalendar.prop.vDDDTypes object at 0x7f25bb8f46d8> 这样的结果。如果想将其转换为日期时间对象,例如 2019-11-06 10:00:00-01:00,需要使用命令 print(component.get('dtstart').dt) - AstroFloyd
显示剩余6条评论

17

你也可以使用vobject模块来完成这个任务:http://pypi.python.org/pypi/vobject

如果你有一个sample.ics文件,你可以像下面这样读取它的内容:

# read the data from the file
data = open("sample.ics").read()

# parse the top-level event with vobject
cal = vobject.readOne(data)

# Get Summary
print 'Summary: ', cal.vevent.summary.valueRepr()
# Get Description
print 'Description: ', cal.vevent.description.valueRepr()

# Get Time
print 'Time (as a datetime object): ', cal.vevent.dtstart.value
print 'Time (as a string): ', cal.vevent.dtstart.valueRepr()

1
readOne will parse only one vevent. Give example of readComponents - Khurshid Alam

10

刚开始学习Python;上面的评论非常有帮助,所以我想贴出更完整的示例。

# ics to csv example
# dependency: https://pypi.org/project/vobject/

import vobject
import csv

with open('sample.csv', mode='w') as csv_out:
    csv_writer = csv.writer(csv_out, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csv_writer.writerow(['WHAT', 'WHO', 'FROM', 'TO', 'DESCRIPTION'])

    # read the data from the file
    data = open("sample.ics").read()

    # iterate through the contents
    for cal in vobject.readComponents(data):
        for component in cal.components():
            if component.name == "VEVENT":
                # write to csv
                csv_writer.writerow([component.summary.valueRepr(),component.attendee.valueRepr(),component.dtstart.valueRepr(),component.dtend.valueRepr(),component.description.valueRepr()])


5

四年后,对ICS格式的理解更加深入,如果我只需要这些字段,我会直接使用本地字符串方法:

import io

# Probably not a valid .ics file, but we don't really care for the example
# it works fine regardless
file = io.StringIO('''
BEGIN:VCALENDAR
X-LOTUS-CHARSET:UTF-8
VERSION:2.0
DESCRIPTION:Emails\nDarlene\n Murphy\nDr. Ferri\n

SUMMARY:smart energy management
LOCATION:8778/92050462
DTSTART;TZID="India":20100629T110000
DTEND;TZID="India":20100629T120000
TRANSP:OPAQUE
DTSTAMP:20100713T071037Z
CLASS:PUBLIC
SUMMARY:meeting
UID:6011DDDD659E49D765257751001D2B4B-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-APPTTYPE:0
X-LOTUS-CHILD_UID:6011DDDD659E49D765257751001D2B4B
END:VEVENT
'''.strip())

parsing = False
for line in file:
    field, _, data = line.partition(':')
    if field in ('SUMMARY', 'DESCRIPTION', 'DTSTAMP'):
        parsing = True
        print(field)
        print('\t'+'\n\t'.join(data.split('\n')))
    elif parsing and not data:
        print('\t'+'\n\t'.join(field.split('\n')))
    else:
        parsing = False

将数据存储和解析日期时间留给读者练习(始终使用UTC)

以下是旧答案


您可以使用正则表达式:

import re
text = #your text
print(re.search("SUMMARY:.*?:", text, re.DOTALL).group())
print(re.search("DESCRIPTION:.*?:", text, re.DOTALL).group())
print(re.search("DTSTAMP:.*:?", text, re.DOTALL).group())

我相信可能可以跳过第一个和最后一个单词,但我不确定如何使用正则表达式实现。不过你可以这样做:

print(' '.join(re.search("SUMMARY:.*?:", text, re.DOTALL).group().replace(':', ' ').split()[1:-1])

2
不要重复造轮子! - Dirk
1
@Dirk 我认为对于社区来说,拥有多种做事的方式是有益的。谁知道,在某些情况下,ics解析器可能无法正常工作,而Wayne的答案可能会挽救某个人的一天! - Jonathan Komar
3
@Dirk 绝对不要重复造轮子,但也不要添加比你需要的更多的东西。如果你只需要几个简单的字段,那么你真的不需要比 std lib 更多的东西。如果我要做的事情比这多得多,我可能会直接安装一个库——特别是如果我实际上正在尝试创建约会。 - Wayne Werner

4
如果有其他人在查看此内容,ics软件包似乎更新得比本帖中提到的其他软件包都要好。这是我正在使用的一些示例代码:https://pypi.org/project/ics/
from ics import Calendar, Event

with open(in_file, 'r') as file:
        ics_text = file.read()

c = Calendar(ics_text) for e in c.events:
        print(e.name)

1
“更新得更好”是相对的。从文档中可以看到:“ics.py始终使用UTC作为日期的内部表示方式。这是错误的,会导致许多问题。” - minusf

-2

我会逐行解析并搜索你的关键词,然后获取索引并提取出来以及更多的X个字符(根据你需要的数量)。然后解析这个更小的字符串,使其符合你的需求。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接