在Python中从一个复杂的字符串中获取日期

Question

在Python中从一个复杂的字符串中获取日期

3

我将尝试使用datetime.strptime从两个字符串中获取单个日期时间。时间比较容易（例如8:53PM），所以我可以像这样做：

theTime = datetime.strptime(givenTime, "%I:%M%p")

然而，这个字符串不仅仅是一个日期，它还是一个类似于http://site.com/?year=2011&month=10&day=5&hour=11的链接。我知道我可以做一些像这样的事情：

theDate = datetime.strptime(givenURL, "http://site.com/?year=%Y&month=%m&day=%d&hour=%H")

但我不想从链接中获取那个小时数，因为它已经在其他地方被检索。有没有办法放置一个虚拟符号（例如 %x 或其他）作为最后一个变量的灵活空间？

最终，我希望有一条类似于以下的单行内容：

theDateTime = datetime.strptime(givenURL + givenTime, ""http://site.com/?year=%Y&month=%m&day=%d&hour=%x%I:%M%p")

（虽然显然，％x不会被使用）。有什么想法？

- alukach

3个回答

1

import datetime
import re

givenURL  = 'http://site.com/?year=2011&month=10&day=5&hour=11'
givenTime = '08:53PM'

print ' givenURL == ' + givenURL
print 'givenTime == ' + givenTime

regx = re.compile('year=(\d\d\d\d)&month=(\d\d?)&day=(\d\d?)&hour=\d\d?')
print '\nmap(int,regx.search(givenURL).groups()) ==',map(int,regx.search(givenURL).groups())

theDate = datetime.date(*map(int,regx.search(givenURL).groups()))
theTime = datetime.datetime.strptime(givenTime, "%I:%M%p")

print '\ntheDate ==',theDate,type(theDate)
print '\ntheTime ==',theTime,type(theTime)


theDateTime = theTime.replace(theDate.year,theDate.month,theDate.day)
print '\ntheDateTime ==',theDateTime,type(theDateTime)

结果

 givenURL == http://site.com/?year=2011&month=10&day=5&hour=11
givenTime == 08:53PM

map(int,regx.search(givenURL).groups()) == [2011, 10, 5]

theDate == 2011-10-05 <type 'datetime.date'>

theTime == 1900-01-01 20:53:00 <type 'datetime.datetime'>

theDateTime == 2011-10-05 20:53:00 <type 'datetime.datetime'>

编辑 1

由于strptime()速度较慢，我改进了我的代码以消除它。

from datetime import datetime
import re
from time import clock


n = 10000

givenURL  = 'http://site.com/?year=2011&month=10&day=5&hour=11'
givenTime = '08:53AM'

# eyquem
regx = re.compile('year=(\d\d\d\d)&month=(\d\d?)&day=(\d\d?)&hour=\d\d? (\d\d?):(\d\d?)(PM|pm)?')
t0 = clock()
for i in xrange(n):
    given = givenURL + ' ' + givenTime
    mat = regx.search(given)
    grps = map(int,mat.group(1,2,3,4,5))
    if mat.group(6):
        grps[3] += 12 # when it is PM/pm, the hour must be augmented with 12
    theDateTime1 = datetime(*grps)
print clock()-t0,"seconds   eyquem's code"
print theDateTime1


print

# Artsiom Rudzenka
dateandtimePattern = "http://site.com/?year=%Y&month=%m&day=%d&time=%I:%M%p"
t0 = clock()
for i in xrange(n):
    theDateTime2 = datetime.strptime(givenURL.split('&hour=')[0] + '&time=' + givenTime, dateandtimePattern)
print clock()-t0,"seconds   Artsiom's code"
print theDateTime2

print
print theDateTime1 == theDateTime2

结果

0.460598763251 seconds   eyquem's code
2011-10-05 08:53:00

2.10386180366 seconds   Artsiom's code
2011-10-05 08:53:00

True

我的代码运行速度提高了4.5倍。如果需要执行大量这样的转换，这可能会很有趣。

- eyquem

非常令人印象深刻，但不幸的是，您的方法比另一种方法不太易读，因此，像我这样技能水平的人可能会有点迷失。感谢您的输入，这是一个相当酷的方式。 - alukach

0

使用格式字符串无法实现该操作。但是，如果小时不重要，您可以像第一个示例中那样从URL中获取它，然后调用theDateTime.replace(hour=hour_from_a_different_source)。

这样您就不必进行任何额外的解析。

- Brent Newey

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Artsiom Rudzenka · Accepted Answer

如果您想从URL中简单地跳过时间，可以使用split，例如以下方式：

givenURL = 'http://site.com/?year=2011&month=10&day=5&hour=11'
pattern = "http://site.com/?year=%Y&month=%m&day=%d"
theDate = datetime.strptime(givenURL.split('&hour=')[0], pattern)

我不确定我是否正确地理解了你的意思，但是：

givenURL = 'http://site.com/?year=2011&month=10&day=5&hour=11'
datePattern = "http://site.com/?year=%Y&month=%m&day=%d"
timePattern = "&time=%I:%M%p"

theDateTime = datetime.strptime(givenURL.split('&hour=')[0] + '&time=' givenTime, datePattern + timePattern)