我有一个像这样的pandas数据框:
>df
leg speed
1 10
1 11
1 12
1 13
1 12
1 15
1 19
1 12
2 10
2 10
2 12
2 15
2 19
2 11
: :
我希望创建一个新列roll_speed
,它取最近5个位置的滚动平均速度。但我想在其中加入更详细的条件。
- Groupby
leg
(it doesn't take into account the speed of the rows in differentleg
. I want the rolling window to be changed from 1 to 5 maximum according to the available rows. For example in
leg == 1
, in the first row there is only one row to calculate, so the rolling speed should be10/1 = 10
. For the second row, there are only two rows available for calculation, the rolling speed should be(10+11)/2 = 10.5
.leg speed roll_speed 1 10 10 # 10/1 1 11 10.5 # (10+11)/2 1 12 11 # (10+11+12)/3 1 13 11.5 # (10+11+12+13)/4 1 12 11.6 # (10+11+12+13+12)/5 1 15 12.6 # (11+12+13+12+15)/5 1 19 14.2 # (12+13+12+15+19)/5 1 12 14.2 # (13+12+15+19+12)/5 2 10 10 # 10/1 2 10 10 # (10+10)/2 2 12 10.7 # (10+10+12)/3 2 15 11.8 # (10+10+12+15)/4 2 19 13.2 # (10+10+12+15+19)/5 2 11 13.4 # (10+12+15+19+11)/5 : :
我的尝试:
df['roll_speed'] = df.speed.rolling(5).mean()
但是当计算可用行数少于五行时,它只会返回NA。我该如何解决这个问题?感谢任何帮助!