Python Pandas: 当日期小于13号时，pandas.to_datetime()会交换日期和月份

Question

Python Pandas: 当日期小于13号时，pandas.to_datetime()会交换日期和月份

27

我写了一段代码来读取多个文件，但是在一些文件中，当日期小于13时，日期和月份会交换，而任何大于等于13的日期，例如13/06/11，都保持正确（DD/MM/YY）。我尝试通过以下方式修复它，但它不起作用。

我的数据框看起来像这样：实际日期从2015年6月12日到2015年6月13日当我将我的日期时间列读取为字符串时，日期保持正确dd/mm/yyyy

tmp                     p1 p2 
11/06/2015 00:56:55.060  0  1
11/06/2015 04:16:38.060  0  1
12/06/2015 16:13:30.060  0  1
12/06/2015 21:24:03.060  0  1
13/06/2015 02:31:44.060  0  1
13/06/2015 02:37:49.060  0  1

但是当我将列的数据类型更改为日期时间列时，对于小于13号的每一天，我的日期和月份会互换。

输出：

print(df)
tmp                  p1 p2 
06/11/2015 00:56:55  0  1
06/11/2015 04:16:38  0  1
06/12/2015 16:13:30  0  1
06/12/2015 21:24:03  0  1
13/06/2015 02:31:44  0  1
13/06/2015 02:37:49  0  1

这是我的代码：

我循环遍历文件：

df = pd.read_csv(PATH+file, header = None,error_bad_lines=False , sep = '\t')

当我的代码完成读取所有文件后，我将它们连接起来。问题是我的日期时间列需要是datetime类型，所以当我使用pd_datetime()更改其类型时，如果日期小于13号，它会交换日期和月份。

在转换完我的日期时间列后，日期是正确的（字符串类型）。

print(tmp) # as a result I get 11.06.2015 12:56:05 (11june2015)

但是当我更改列类型时，会出现以下情况：

tmp = pd.to_datetime(tmp, unit = "ns")
tmp = temps_absolu.apply(lambda x: x.replace(microsecond=0))
print(tmp) # I get 06-11-2016 12:56:05 (06november2015 its not the right date)

问题是：当日期小于13时，我应该使用或更改哪个命令以停止日期和月份的交换？ 更新：此命令交换了我的列中所有的日期和月份。

tmp =  pd.to_datetime(tmp, unit='s').dt.strftime('%#m/%#d/%Y %H:%M:%S')

因此，为了仅交换错误的日期，我编写了一个条件：

for t in tmp:
        if (t.day < 13):
            t = datetime(year=t.year, month=t.day, day=t.month, hour=t.hour, minute=t.minute, second = t.second)

但这也不起作用。

- Oumab10

1

你有什么问题？ - Scott Boston

问题是：我该如何停止日期和月份的交换？ - Oumab10

4个回答

1

我解决了我的问题，但是使用了一种消耗内存的方法，首先将我的tmp列分割成日期和时间列，然后重新将日期列拆分为日、月和年，这样我就可以查找小于13天的日期并用相应的月份替换它们。

df['tmp'] = pd.to_datetime(df['tmp'], unit='ns')
df['tmp'] = df['tmp'].apply(lambda x: x.replace(microsecond=0))
df['date'] = [d.date() for d in df['tmp']]
df['time'] = [d.time() for d in df['tmp']]
df[['year','month','day']] = df['date'].apply(lambda x: pd.Series(x.strftime("%Y-%m-%d").split("-")))

df['day'] = pd.to_numeric(df['day'], errors='coerce')
df['month'] = pd.to_numeric(df['month'], errors='coerce')
df['year'] = pd.to_numeric(df['year'], errors='coerce')


#Loop to look for days less than 13 and then swap the day and month
for index, d in enumerate(df['day']):
        if(d <13): 
 df.loc[index,'day'],df.loc[index,'month']=df.loc[index,'month'],df.loc[index,'day']

# 将系列转换为字符串类型以便合并

 df['day'] = df['day'].astype(str)
 df['month'] = df['month'].astype(str)
 df['year'] = df['year'].astype(str)
 df['date']=  pd.to_datetime(df[['year', 'month', 'day']])
 df['date'] = df['date'].astype(str)
 df['time'] = df['time'].astype(str)

# 合并时间、日期和地点结果至我们的列中

df['tmp'] =pd.to_datetime(df['date']+ ' '+df['time'])

# 删除添加的列

df.drop(df[['date','year', 'month', 'day','time']], axis=1, inplace = True)

- Oumab10

这很好用。非常感谢您的时间！ - Miguel Gonzalez

在循环过程中，不断引发来自pandas库的KeyError 905。您知道如何解决吗？先感谢您。 - Miguel Gonzalez

1

我遇到了相同的问题。在我的情况下，日期是索引列（称为“Date”）。上述解决方案直接在带有索引列“Date”的数据框上使用to_datetime()并没有对我起作用。我必须先使用read_csv()而不将索引设置为“Date”，然后对其应用to_datetime()，最后才将索引设置为“Date”。

df= pd.read_csv(file, parse_dates=True)
df.Date = pd.to_datetime(df.Date, dayfirst=True)
df = df.set_index('Date')

- Irina S.

0

我遇到了同样的问题，从13号开始日期和月份就会交换。这个方法对我有用，基本上我通过字符串类型重新排序日期，并使用条件语句和to_datetime函数。

def calendario(fecha):
    
    if fecha.day < 13:
        dia_real = fecha.month
        mes_real = fecha.day
        
        if dia_real < 10:
            dia_real = '0'+str(dia_real)
        
        nfecha = str(dia_real) + str(mes_real) + str(fecha.year)
        nfecha = pd.to_datetime(nfecha, format='%d%m%Y', errors='ignore')
        
    else:
        nfecha = fecha
    
    return nfecha

df['Nueva_fecha']=df['Fecha'].apply(calendario)

输出结果如预期：输入图像描述

- Pachu MS

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Scott Boston · Accepted Answer

61

您可以在 pd.to_datetime 中使用 dayfirst 参数。

pd.to_datetime(df.tmp, dayfirst=True)

输出：

0   2015-06-11 00:56:55
1   2015-06-11 04:16:38
2   2015-06-12 16:13:30
3   2015-06-12 21:24:03
4   2015-06-13 02:31:44
5   2015-06-13 02:37:49
Name: tmp, dtype: datetime64[ns]

- Scott Boston

5

为什么这个答案没有被采纳？它非常有效，谢谢！ - Dionysos Da Vinci

如果指定了日期格式，错误是否会持续存在？ - Miguel Gonzalez

1

@MiguelGonzalez 不会的，如果你使用了精确的格式%d/%m/%Y，就不会出现错误。 - Scott Boston

1

@AmitTiwari 这是美国标准，我们将2022年2月1日列为2/1/2022。在世界其他地方，他们可能会将2022年2月1日列为1/2/2022。Pandas假定第一种格式，您可以使用dayfirst参数覆盖它，使得pandas将1/2/2022读作2022年2月1日，而不是2022年1月2日。因此，当“月份”超过12时，所谓的“翻转”只会发生。 - Scott Boston

1

非常感谢！@Scott Boston，你的回答很有帮助！我已经搜索了许多资源来解决这个问题，但直到我找到了你的答案，才真正解决了这个问题。谢谢！ - Yeo Keat

显示剩余5条评论