使用Pandas的groupby() + apply()函数并传入参数。

Question

使用Pandas的groupby() + apply()函数并传入参数。

57

我想要使用 df.groupby() 结合 apply() 来对每个组的每一行应用一个函数。

通常我使用以下代码，这通常是有效的（注意，这是没有使用 groupby() 的情况）：

df.apply(myFunction, args=(arg1,))

使用groupby()方法，我尝试了以下操作：

df.groupby('columnName').apply(myFunction, args=(arg1,))

然而，我遇到了以下错误：

TypeError: myFunction() got an unexpected keyword argument 'args'

因此，我的问题是：如何在需要参数的函数中使用 groupby() 和 apply()？

- beta

5

这可以通过df.groupby('columnName').apply(myFunction, ('arg1'))来实现。 - Zero

1

@Zero 这是非常好的答案，因为它与 OP 的尝试解决方案非常相似，而且不需要使用 lambda。我建议你把它发布为一个答案。 - DontDivideByZero

@Zero，我和楼主有完全相同的问题，但这对我不起作用——我仍然会得到和楼主一样的错误。另外，我可以问一下为什么您的评论应该有效，为什么楼主的方法（与我的相同）不行？我在任何地方都没有找到相关文档。 - Pythonista anonymous

尝试使用.apply(myFunction, args = ('arg1',)，注意在arg1后面加上,。 - beta

实际上，我刚刚自己尝试了一下，它也不起作用... - beta

3个回答

9

在使用args参数时报错的原因可能是由于pandas.DataFrame.apply具有args参数(元组)，而pandas.core.groupby.GroupBy.apply没有。

当你在一个DataFrame上调用.apply时，你可以使用这个参数；当你在一个groupby对象上调用.apply时，你不能使用这个参数。

在@MaxU的答案中，表达式lambda x: myFunction(x, arg1)被传递给func(第一个参数);因为arg1在lambda中被指定了，所以没有必要指定额外的*args/**kwargs。

一个例子：

import numpy as np
import pandas as pd

# Called on DataFrame - `args` is a 1-tuple
# `0` / `1` are just the axis arguments to np.sum
df.apply(np.sum, axis=0)  # equiv to df.sum(0)
df.apply(np.sum, axis=1)  # equiv to df.sum(1)


# Called on groupby object of the DataFrame - will throw TypeError
print(df.groupby('col1').apply(np.sum, args=(0,)))
# TypeError: sum() got an unexpected keyword argument 'args'

- Brad Solomon

6

对于我来说

df2 = df.groupby('columnName').apply(lambda x: my_function(x, arg1, arg2))

很有效

- Hitesh Somani

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MaxU - stand with Ukraine · Accepted Answer

pandas.core.groupby.GroupBy.apply 没有 named 参数 args，但是pandas.DataFrame.apply 有。

因此，请尝试这样做：

df.groupby('columnName').apply(lambda x: myFunction(x, arg1))

或者如@Zero所建议的:

df.groupby('columnName').apply(myFunction, ('arg1'))

演示：

In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list('abc'))

In [83]: df
Out[83]:
   a  b  c
0  0  3  1
1  0  3  4
2  3  0  4
3  4  2  3
4  3  4  1

In [84]: def f(ser, n):
    ...:     return ser.max() * n
    ...:

In [85]: df.apply(f, args=(10,))
Out[85]:
a    40
b    40
c    40
dtype: int64

使用GroupBy.apply时，可以传递命名参数：

In [86]: df.groupby('a').apply(f, n=10)
Out[86]:
    a   b   c
a
0   0  30  40
3  30  40  40
4  40  20  30

一个参数元组：

In [87]: df.groupby('a').apply(f, (10))
Out[87]:
    a   b   c
a
0   0  30  40
3  30  40  40
4  40  20  30