我想获取每种动物中体重最大的动物的大小。以下是测试代码:
import numpy as np
import pandas as pd
print('numpy version =', np.__version__)
print('pandas version =', pd.__version__)
print()
def get_size_with_max_weight(subf):
print(subf)
return subf['size'][subf['weight'].idxmax()]
df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(),
'size': list('SSMMMLL'),
'weight': [8, 10, 11, 1, 20, 12, 12],
'adult': [False] * 5 + [True] * 2})
print(df)
print()
gf = df.groupby('animal').apply(get_size_with_max_weight)
print()
print(gf)
但是当我尝试在DataFrame组中运行apply函数时,每个组应该只被执行一次。但是当使用idxmax()函数作为索引与另一列一起调用时,我发现前两个组的apply函数被执行了两次。以下是输出内容:
numpy version = 1.18.5
pandas version = 1.0.5
animal size weight adult
0 cat S 8 False
1 dog S 10 False
2 cat M 11 False
3 fish M 1 False
4 dog M 20 False
5 cat L 12 True
6 cat L 12 True
animal size weight adult
0 cat S 8 False
2 cat M 11 False
5 cat L 12 True
6 cat L 12 True
animal size weight adult
1 dog S 10 False
4 dog M 20 False
animal size weight adult
0 cat S 8 False
2 cat M 11 False
5 cat L 12 True
6 cat L 12 True
animal size weight adult
1 dog S 10 False
4 dog M 20 False
animal size weight adult
3 fish M 1 False
animal
cat L
dog M
fish M
dtype: object
您可以看到,组cat/dog被打印了两次。如果我不使用idxmax()函数,这种情况就不会出现。问题出在哪里?