Pandas从特定ISO格式进行日期时间转换

Question

Pandas从特定ISO格式进行日期时间转换

8

非常感谢你的帮助。

我正在尝试将一个ISO格式的字符串日期时间转换为日期时间对象。但是我已经尝试了许多方法，没有成功。请帮忙处理一下。

例如，我有一个数据框，其中列"时间"类似于下面显示的内容。这是从数据库中提取的，并且这是输出的格式。

2018-12-04T04:39:26Z
2018-12-04T05:10:54.6Z
2018-12-04T05:17:32Z
2018-12-04T10:51:20.5Z
...

目前我尝试了许多方法，但都没有成功：

df.index = pd.to_datetime(df.index, format = "%Y-%m-%dT%H:%M:%SZ", errors='ignore')

df.index = pd.to_datetime(df.index)

df.time = df.time.map(lambda x: pd.to_datetime(dt.datetime.strptime(x, '%Y-%m-%dT%H:%M:%SZ'), format = '%d/%m/%Y %H:%M'))

再次感谢！

- J. Reyes

在了解时间列表中有两种ISO格式之前，我尝试过那些尝试。我该如何处理这个问题？ - J. Reyes

你使用的是哪个pandas版本，以及你遇到了哪些错误？df.index = pd.to_datetime(df.index)这个方法对于我来说可以处理你所发布的示例。 - Tom Ron

你能发布一下它产生的错误（带有你调用的确切代码）吗？两种不同格式有什么问题？当我将这个示例数据粘贴到IPython中并调用 pandas.to_datetime 时，它能在所有条目上正常工作，没有错误，并且结果正确。你得到了什么错误的结果？ - ely

3

您可能需要使用pd.to_datetime(df.index, errors='coerce')。这可以处理两种格式，同时将完全不正确的日期强制转换为NaT。在第一种情况下，由于有两个格式，其中一个格式将不匹配并引发错误，而使用errors ='ignore'则会返回输入。 - ALollz

我发现如果没有更多支持问题的代码，帮助会很困难，请展示真实Dataframe的head。 - Pedro Lobito

感谢大家的帮助。我使用ALollz重新运行，它起作用了！谢谢！pd.to_datetime(df.index, errors='coerce') - J. Reyes

3个回答

2

我本想回答这个问题。最终，我只是创建了一个处理不同数据输入并创建带有列名的数据框的函数。感谢ALollz关于pd.to_datetime(df.index, errors='coerce')的评论。

因此，为了将索引从ISO格式的字符串转换为日期时间格式，我建立并遵循了以下顺序：

df = pd.DataFrame([[-1.8, '2018-09-14T13:36:00Z']], columns = ['current', 'time'])
df.set_index('time', inplace = True)   # make it your index by using the inplace=True
df.index = pd.to_datetime(df.index, errors='coerce')

将文本转换为日期时间后，要检查日期是否正确。如果日期不正确，则可能需要指定格式来读取。

谢谢！

- J. Reyes

-1

pandas.to_datetime() 方法有一个 'infer_datetime_format' 参数，文档中说：

infer_datetime_format : boolean, default False
If True and no format is given, attempt to infer the format of the datetime strings, 
and if it can be inferred, switch to a faster method of parsing them. 
In some cases this can increase the parsing speed by ~5-10x.

所以将infer_datatime_format设置为true，保留format参数默认值即可，这对我来说有效。

以下是我的情况：

>>> hours_df.head()
                            Open    High   Close     Low         Volume
Date                                                                   
2020-01-05T02:00:00.000Z  7457.9  7481.5  7431.3  7442.1  1147.57478328
2020-01-05T01:00:00.000Z  7374.8    7479  7374.8  7457.9  2709.45095966
2020-01-05T00:00:00.000Z  7354.9  7392.1  7354.2  7374.7   642.60575144

>>> hours_df.index
Index(['2020-01-05T02:00:00.000Z', '2020-01-05T01:00:00.000Z',
       '2020-01-05T00:00:00.000Z'],
      dtype='object', name='Date')

>>> hours_df.index = pd.to_datetime(hours_df.index, infer_datetime_format=True)

>>> hours_df.index
DatetimeIndex(['2020-01-05 02:00:00+00:00', '2020-01-05 01:00:00+00:00',
               '2020-01-05 00:00:00+00:00'],
              dtype='datetime64[ns, UTC]', name='Date', freq=None)

>>> hours_df.head()
                             Open    High   Close     Low         Volume
Date                                                                    
2020-01-05 02:00:00+00:00  7457.9  7481.5  7431.3  7442.1  1147.57478328
2020-01-05 01:00:00+00:00  7374.8    7479  7374.8  7457.9  2709.45095966
2020-01-05 00:00:00+00:00  7354.9  7392.1  7354.2  7374.7   642.60575144

- Shi-Xiaopeng

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rysicin · Accepted Answer

有点晚了，但我认为这个回答需要可见，以便简化人们的生活。

如果像你所说的那样是从数据库中提取出来的，那么你可以直接在建立数据框时进行操作。大多数Pandas读取函数都有一个参数“parse_dates”。如文档所述：

注意：对于格式为iso8601的日期，存在快速路径。

因此，即使您有两个或更多包含日期的列，也可以以极其简单的方式完成操作。

df = pd.read_csv("x.csv", parse_dates=["Date1", "Date2"], names=["ID", "Date1", "Date2"])