将列作为参数传递给pandas groupby应用函数

Question

将列作为参数传递给pandas groupby应用函数

3

假设我有以下数据框：

a = np.random.rand(10)
b = np.random.rand(10)*10
c = np.random.rand(10)*100
groups = np.array([1,1,2,2,2,2,3,3,4,4])
df = pd.DataFrame({"a":a,"b":b,"c":c,"groups":groups})

我希望根据组对df进行分组，并将以下函数应用于每个组的两个列（a和b）：

def my_fun(x,y):
    tmp =  np.sum((x*y))/np.sum(y)
    return tmp

我尝试的方法是：

df.groupby("groups").apply(my_fun,("a","b"))

但那样做不起作用，会导致错误：

ValueError: Unable to coerce to Series, the length must be 4: given 2

最终输出基本上是每个组的一个数字。我可以通过循环解决问题，但我认为应该有更好的方法？谢谢。

- Ress

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Quang Hoang · Accepted Answer

在不改变您的函数的情况下，您想要做：

df.groupby("groups").apply(lambda d: my_fun(d["a"],d["b"]))

输出：

groups
1    0.603284
2    0.183289
3    0.828273
4    0.361103
dtype: float64

话虽如此，您可以重写函数，使其将数据框作为第一个位置参数：

def myfunc(data, val_col, weight_col):
    return np.sum(data[val_col]*data[weight_col])/np.sum(data[weight_col])

df.groupby('groups').apply(myfunc, 'a', 'b')