如何在数据框中聚合具有多个列的重复行

Question

如何在数据框中聚合具有多个列的重复行

5

我有一个 data.frame，它看起来像这样（但实际上有更多的列和行）：

    Gene      Cell1    Cell2    Cell3     
1      A          2        7        8 
2      A          5        2        9 
3      B          2        7        8
4      C          1        4        3

我希望将具有相同 Gene 值的行进行求和，以获得如下结果：

    Gene      Cell1    Cell2    Cell3     
1      A          7        9       17  
2      B          2        7        8
3      C          1        4        3

根据之前的答案，我尝试使用aggregate函数，但是我不清楚如何得到以上的结果。以下是我尝试过的代码：

aggregate(df[,-1], list(df[,1]), FUN = sum)

有人知道我做错了什么吗？

- Euclides

你使用聚合函数得到的结果有什么问题？ - Bea

2个回答

4

或者使用dplyr：

library(dplyr)
df %>%
  group_by(Gene) %>%
  summarise_all(sum) %>%
  data.frame() -> newdf # so that newdf can further be used, if needed

- jay.sf

1

其他方法也可以工作，但这个更加健壮和直观。我喜欢不需要声明要求加总的列。 - Ahdee

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- lukeA · Accepted Answer

6

aggregate(df[,-1], list(Gene=df[,1]), FUN = sum)
#   Gene Cell1 Cell2 Cell3
# 1    A     7     9    17
# 2    B     2     7     8
# 3    C     1     4     3

将为您提供您所需的输出结果。

- lukeA

当我们运行上述代码时，会出现错误：aggregate.data.frame(df[, -1], list(Gene = df[, 1]), FUN = sum) : arguments must have same length。 - Manoj Kumar

@ManojKumar 请在您的帖子中添加 str(df) 的输出结果。 - lukeA

当然，@lukeA，这是它：`Classes ‘data.table’ and 'data.frame': 4 obs. of 4 variables: $ Gene : chr "A" "A" "B" "C" $ Cell1: int 2 5 2 1 $ Cell2: int 7 2 7 4 $ Cell3: int 8 9 8 3

attr(*, ".internal.selfref")=<externalptr>`

- Manoj Kumar

2

@ManojKumar 谢啦。你得到了一个数据表对象，那里的索引有点不同。所以你可以做这个：aggregate(df[,-1], list(Gene=df[[1]]), FUN = sum)。但是如果你已经有了一个数据表，你可能想要利用它聚合数据的优势；df[, lapply(.SD, sum), by=Gene]。 - lukeA