相邻数据框行之间的时间差异

3

这个问题类似,我想要计算数据帧中行之间的时间差异。不过,与那个问题不同的是,这个差异应该是通过groupby id来实现的。

例如,这个数据帧:

df = pd.DataFrame(
    {'id': [6,6,6,6,6,10,10,10,10,10],
 'timestamp': ['2016-04-01 00:04:00','2016-04-01 00:04:20','2016-04-01 00:04:30',
              '2016-04-01 00:04:35','2016-04-01 00:04:54','2016-04-30 13:04:59',
              '2016-04-30 13:05:00','2016-04-30 13:05:12','2016-04-30 13:05:20',
               '2016-04-30 13:05:51']}
)
df.head()
    id        timestamp
0    6  2016-04-01 00:04:00
1    6  2016-04-01 00:04:20
2    6  2016-04-01 00:04:30
3    6  2016-04-01 00:04:35
4    6  2016-04-01 00:04:54
5   10  2016-04-30 13:04:59
6   10  2016-04-30 13:05:00
7   10  2016-04-30 13:05:12
8   10  2016-04-30 13:05:20
9   10  2016-04-30 13:05:51

接着,我想创建一个名为ΔT的列来显示差异,如下所示:

df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S')
df['ΔT'] = df.groupby('id').index.to_series().diff().astype('timedelta64[s]')

AttributeError: 'DataFrameGroupBy' object has no attribute 'index'

预期输出:

    id        timestamp        ΔT
0    6  2016-04-01 00:04:00    0
1    6  2016-04-01 00:04:20   20
2    6  2016-04-01 00:04:30   10
3    6  2016-04-01 00:04:35    5
4    6  2016-04-01 00:04:54   19
5   10  2016-04-30 13:04:59    0
6   10  2016-04-30 13:05:00    1
7   10  2016-04-30 13:05:12   12
8   10  2016-04-30 13:05:20    8
9   10  2016-04-30 13:05:51   31
2个回答

3
df.groupby('id')['timestamp'].diff().dt.total_seconds().fillna(0)

3
尝试:
df["ΔT"] = df.groupby("id").diff()
df["ΔT"] = df["ΔT"].dt.seconds
df["ΔT"] = df["ΔT"].fillna(0).astype(int)
print(df)

输出:

   id           timestamp  ΔT
0   6 2016-04-01 00:04:00   0
1   6 2016-04-01 00:04:20  20
2   6 2016-04-01 00:04:30  10
3   6 2016-04-01 00:04:35   5
4   6 2016-04-01 00:04:54  19
5  10 2016-04-30 13:04:59   0
6  10 2016-04-30 13:05:00   1
7  10 2016-04-30 13:05:12  12
8  10 2016-04-30 13:05:20   8
9  10 2016-04-30 13:05:51  31

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接