我有两个数据框 (logs
和 failures
),我想将它们合并,以便在 logs
中添加一列,该列的值是在“failures”中找到的最近日期。
下面是生成logs
,failures
和所需的output
的代码:
import pandas as pd
logs=pd.DataFrame({'date-time':pd.Series(['23/10/2015 10:20:54','22/10/2015 09:51:32','21/10/2015 06:51:32','28/10/2015 16:59:32','25/10/2015 04:41:32','24/10/2015 11:50:11']),'var1':pd.Series([0,1,3,1,2,4])})
logs['date-time']=pd.to_datetime(logs['date-time'])
failures=pd.DataFrame({'date':pd.Series(['23/10/2015 00:00:00','22/10/2015 00:00:00','21/10/2015 00:00:00']),'failure':pd.Series([1,1,1])})
failures['date']=pd.to_datetime(failures['date'])
output=pd.DataFrame({'date-time':pd.Series(['23/10/2015 10:20:54','22/10/2015 09:51:32','21/10/2015 06:51:32','28/10/2015 16:59:32','25/10/2015 04:41:32','24/10/2015 11:50:11']),'var1':pd.Series([0,1,3,1,2,4]),'closest_failure':pd.Series(['23/10/2015 00:00:00','22/10/2015 00:00:00','21/10/2015 00:00:00','23/10/2015 00:00:00','23/10/2015 00:00:00','23/10/2015 00:00:00'])})
output['date-time']=pd.to_datetime(output['date-time'])
有什么想法吗?真实数据集非常大,因此效率也是一个问题。