从 Pandas 中提取每列的平均值

Question

从 Pandas 中提取每列的平均值

3

我有一个DataFrame1，显示了观众对每部电影的评分和类型：

movie_id| rating | action | comedy | drama
0         4        1        1        1
1         5        0        1        0
2         3        0        1        1

数字1表示这是一部动作电影，数字0表示不是。

我提取了单个题材的平均评分。例如，对于动作电影，我这样做：

new=df1[df1["action"]==1]
new['rating'].mean()

原始数据显示为4。现在我需要提取所有类型的平均评分，结果应该如下：

action | comedy | drama
4        4        3.5

有什么关于如何处理的建议吗？

- kayak

3个回答

2

你可以合并类型列并过滤只保留值等于1的数据。然后按照类型分组并计算平均值。

pd.melt(
    df,
    value_vars=["action", "comedy", "drama"],
    var_name="genre",
    id_vars=["movie_id", "rating"],
).query("value == 1").groupby("genre")["rating"].mean()

这提供了

genre
action    4.0
comedy    4.0
drama     3.5
Name: rating, dtype: float64

- user1717828

1

将评分列与动作、喜剧和剧情列相乘，用np.nan替换0并计算平均值：

(df.iloc[:, 2:]
   .mul(df.rating, axis = 0)
   # mean implicitly excludes nulls during computations
   .replace(0, np.nan) 
   .mean()
)
action    4.0
comedy    4.0
drama     3.5
dtype: float64

如果你想获得类似于数据框的输出，可以使用mean参数来聚合并返回一个序列：

(df.iloc[:, 2:]
   .mul(df.rating, axis = 0)
   .replace(0, np.nan) 
   .agg(['mean']) # note the `mean` is in a list
)

      action  comedy  drama
mean     4.0     4.0    3.5

- sammywemmy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- BENY · Accepted Answer

在您的案例中，我们可以选择列，然后将所有0到NaN之间的内容筛选出来，并使用评分进行乘法运算。

out = df.loc[:,['action','comedy','drama']].where(lambda x : x==1).mul(df.rating,axis=0).mean()
Out[377]: 
action    4.0
comedy    4.0
drama     3.5
dtype: float64

如果你需要一个数据框

out = out.to_frame().T