我有来自Exchange服务器的日志
2010-05-20T01:53:33.097Z,12.10.53.144,,12.10.53.200,EXHUB-10,08CCC3F50C35F2D2;2010-05-20T01:53:32.128Z;0,EXHUB-10\Default EXHUB-10,SMTP,RECEIVE,829888,,norma@ccc.gov.my,,521647,1,,,"NEAC Sub-Working Group Meeting - Upgrade Skills of the Labour Force's and Enhance Vocational and Technical Training- 2:30 pm Monday May 24, 2010",lee.cheesung@gmail.com,<>,00A:
我使用这个正则表达式来匹配和分组模式;
(\d{4}-\d{2}-\d{2})(?:[\w\s]+)(\d+:\d+:\d+.\d+)(?:[\w+\d.]*),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(['"].*['"]|.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(?:(\d{4}-\d{2}-\d{2}\w\d{2}:\d{2}:\d{2}.\d+)(?:\w+)*)*(.*)
基本上,日志中的信息是由逗号分隔的。不幸的是,对于“电子邮件主题”字段,如果用户输入逗号,则日志将出现在双引号中,例如上面的示例-日期格式中的逗号"Monday May 24, 2010"
.....521647,1,,,"NEAC Sub-Working Group Meeting - Upgrade Skills of the Labour Force's and Enhance Vocational and Technical Training- 2:30 pm Monday May 24, 2010",lee.keesung@gmail.com,.....
如何抓取整个主题以及特定组(第19组)中的逗号而不使用双引号?