Pandas数据框聚合固定数量的行。

Question

Pandas数据框聚合固定数量的行。

3

我正在处理一些数据，我想获取每匹马在其最近6场比赛中（包括本次比赛）的排名(finishing position)。本次比赛的日期定义为'race_id'。

是否有一种方法可以使用groupby和agg，但仅聚合前6个值？

数据框如下：

finishing_position  horse_id    race_id
 1                  K01         2014011
 2                  K02         2014011
 3                  M01         2014011
 4                  K01         2014012
 2                  K01         2014021
 3                  K01         2014031
 1                  M01         2015011
 2                  K01         2016012
 1                  K02         2016012
 3                  M01         2016012
 4                  J01         2016012

I want the result to be

finishing_position  horse_id    race_id     recent
 1                  K01         2014011
 2                  K02         2014011
 3                  M01         2014011
 4                  K01         2014012     1
 2                  K01         2014021     1/4
 3                  K01         2014031     1/4/2
 1                  M01         2015011     3
 2                  K01         2016012     1/4/2/3
 1                  K02         2016012     2
 3                  M01         2016012     3/1
 4                  J01         2016012

- boygood

2个回答

1

修订了@Wen的答案，使其仅聚合到前N条记录。

df['recent']=df.finishing_position.astype(str)+'/'
df['recent']=df.groupby('horse_id').recent.apply(lambda x : x.cumsum().shift().str[:-1].fillna(''))

def last_n_record(string,recent_no):
    count = string.count('/')
    if count+1 >= recent_no:
       return string.split('/',count - recent_no + 1)[-1]
    else:
       return string

recent_no = 3 # Lets take 3 recent records as demo
df['recent'] = df.recent.apply(lambda x: last_n_record(x,recent_no))
df
    finishing_position horse_id  race_id recent
0                    1      K01  2014011       
1                    2      K02  2014011       
2                    3      M01  2014011       
3                    4      K01  2014012      1
4                    2      K01  2014021    1/4
5                    3      K01  2014031  1/4/2
6                    1      M01  2015011      3
7                    2      K01  2016012  4/2/3
8                    1      K02  2016012      2
9                    3      M01  2016012    3/1
10                   4      J01  2016012

- R.yan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- BENY · Accepted Answer

我们可以使用cumsum与groupby一起使用。

df['recent']=df.finishing_position.astype(str)+'/'
df['recent']=df.groupby('horse_id').recent.apply(lambda x : x.cumsum().shift().str[:-1].fillna(''))
df
Out[140]: 
    finishing_position horse_id  race_id   recent
0                    1      K01  2014011         
1                    2      K02  2014011         
2                    3      M01  2014011         
3                    4      K01  2014012        1
4                    2      K01  2014021      1/4
5                    3      K01  2014031    1/4/2
6                    1      M01  2015011        3
7                    2      K01  2016012  1/4/2/3
8                    1      K02  2016012        2
9                    3      M01  2016012      3/1
10                   4      J01  2016012