pandas DataFrame.groupby和apply自定义函数

Question

pandas DataFrame.groupby和apply自定义函数

5

我有一个包含很多重复项的数据框（我需要Type/StrikePrice配对唯一），就像这样：

                   Pos  AskPrice
Type  StrikePrice
C     1500.0       10    281.6
C     1500.0       11    281.9
C     1500.0       12    281.7     <- I need this one
P     1400.0       30    1200.5
P     1400.0       31    1250.2    <- I need this one

我如何按 Type + StrikePrice 进行分组，并应用我的逻辑（自己的函数）来决定选择组中哪一行（比如通过最大的 Pos）。

期望的结果是

                   Pos  AskPrice
Type  StrikePrice
C     1500.0       12    281.7
P     1400.0       31    1250.2

非常感谢！

- user2528473

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

首先对于唯一索引进行reset_index，然后使用groupby和idxmax查找每个组中最大值的索引，并通过loc选择行，最后使用set_index设置MultiIndex：

df = df.reset_index()
df = df.loc[df.groupby(['Type','StrikePrice'])['Pos'].idxmax()]
       .set_index(['Type','StrikePrice'])

或者使用sort_values和drop_duplicates：

df = (df.reset_index()
       .sort_values(['Type','StrikePrice', 'Pos'])
       .drop_duplicates(['Type','StrikePrice'], keep='last')
       .set_index(['Type','StrikePrice']))
print (df)

                  Pos  AskPrice
Type StrikePrice               
C    1500.0        12     281.7
P    1400.0        31    1250.2

如果需要自定义函数，请使用GroupBy.apply：

def f(x):
    return x[x['Pos'] == x['Pos'].max()]

df = df.groupby(level=[0,1], group_keys=False).apply(f)
print (df)
                  Pos  AskPrice
Type StrikePrice               
C    1500.0        12     281.7
P    1400.0        31    1250.2