对于绘图而言,如何对具有缺失值的时间序列数据集进行对齐

3
我有三个带有缺失值的数据集,每个数据集包含一个时间列和一个数据列。两行之间的最小时间差为1秒(00:00:01):
Dataset 1:          Dataset 2:          Dataset 3:  
00:00:00    81                          00:00:00    70
00:00:01    81                      
00:00:02    81                      
00:00:03    81                          00:00:03    99
00:00:04    81                          00:00:04    100
00:00:05    80      00:00:05    80      00:00:05    101
00:00:06    80      00:00:06    100         
                    00:00:07    92      00:00:07    88
00:00:08    83      00:00:08    80      00:00:08    88
00:00:09    84      00:00:09    83      00:00:09    87
00:00:10    86                      
00:00:11    89                      
00:00:12    90                      
00:00:13    92                          00:00:13    92
00:00:14    94                          00:00:14    94
00:00:15    94      00:00:15    96      00:00:15    93
00:00:16    96      00:00:16    97          
00:00:17    98      00:00:17    100     00:00:17    99
00:00:18    100                         00:00:18    99
00:00:19    101                         00:00:19    101
00:00:20    103                     

为了方便可视化,上面的表格显示了缺失值的空字段。实际数据是密集的,例如下面这样:
Dataset 1:          Dataset 2:          Dataset 3:  
00:00:00    81      00:00:05    80      00:00:00    70
00:00:01    81      00:00:06    100     00:00:03    99
00:00:02    81      00:00:07    92      00:00:04    100
00:00:03    81      00:00:08    80      00:00:05    101
00:00:04    81      00:00:09    83      00:00:07    88
00:00:05    80      00:00:15    96      00:00:08    88
00:00:06    80      00:00:16    97      00:00:09    87
00:00:08    83      00:00:17    100     00:00:13    92
00:00:09    84                          00:00:14    94
00:00:10    86                          00:00:15    93
00:00:11    89                          00:00:17    99
00:00:12    90                          00:00:18    99
00:00:13    92                          00:00:19    101
00:00:14    94                      
00:00:15    94                      
00:00:16    96                      
00:00:17    98                      
00:00:18    100                     
00:00:19    101                     
00:00:20    103                     

现在我希望对齐这些数据,以便可以用以下方式绘制:

Combined

和这样:

Split

我的天真做法是:

  1. 找到每个数据集中的最小/最大时间。
  2. 创建一个表格,每个时间都有一行,三列的值都为n/a
  3. 循环遍历每个数据集并将值分配给该表格。

是否有一些Python函数/库可以以有效的方式执行这些步骤?或者有更好的方法来完成这个任务吗?

敬礼!

1个回答

3
您可以使用time列将所有数据框按索引连接起来:concat
dfs = [df1, df2, df3]
df = pd.concat([x.set_index('time')['val'] for x in dfs], 
                axis=1, 
                keys=['a','b','c'],
                sort=True)
print (df)
              a      b      c
00:00:00   81.0    NaN   70.0
00:00:01   81.0    NaN    NaN
00:00:02   81.0    NaN    NaN
00:00:03   81.0    NaN   99.0
00:00:04   81.0    NaN  100.0
00:00:05   80.0   80.0  101.0
00:00:06   80.0  100.0    NaN
00:00:07    NaN   92.0   88.0
00:00:08   83.0   80.0   88.0
00:00:09   84.0   83.0   87.0
00:00:10   86.0    NaN    NaN
00:00:11   89.0    NaN    NaN
00:00:12   90.0    NaN    NaN
00:00:13   92.0    NaN   92.0
00:00:14   94.0    NaN   94.0
00:00:15   94.0   96.0   93.0
00:00:16   96.0   97.0    NaN
00:00:17   98.0  100.0   99.0
00:00:18  100.0    NaN   99.0
00:00:19  101.0    NaN  101.0
00:00:20  103.0    NaN    NaN

如果每个DataFrame中都有时间缺失,则可以使用DataFrame.asfreq,但是需要使用DatetimeIndex

df.index = pd.to_datetime(df.index)
df = df.asfreq('S')
df.index = df.index.time
print (df)
              a      b      c
00:00:00   81.0    NaN   70.0
00:00:01   81.0    NaN    NaN
00:00:02   81.0    NaN    NaN
00:00:03   81.0    NaN   99.0
00:00:04   81.0    NaN  100.0
00:00:05   80.0   80.0  101.0
00:00:06   80.0  100.0    NaN
00:00:07    NaN   92.0   88.0
00:00:08   83.0   80.0   88.0
00:00:09   84.0   83.0   87.0
00:00:10   86.0    NaN    NaN
00:00:11   89.0    NaN    NaN
00:00:12   90.0    NaN    NaN
00:00:13   92.0    NaN   92.0
00:00:14   94.0    NaN   94.0
00:00:15   94.0   96.0   93.0
00:00:16   96.0   97.0    NaN
00:00:17   98.0  100.0   99.0
00:00:18  100.0    NaN   99.0
00:00:19  101.0    NaN  101.0
00:00:20  103.0    NaN    NaN

为绘图使用 DataFrame.plot:

df.plot()

对于每个绘图,我们可以使用以下代码:

df.plot(subplots=True)

1
谢谢!运行完美。我只需要添加一些插值 df = df.interpolate(method ='linear') - Hyndrix

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接