虽然他人曾提出过类似问题,但是他们的数据结构有些不同。我的数据集包含多个分组变量和数值数据的列。我需要对每一行的数值数据求和并将结果输出到一个新的数据列中。请参考下面的DATA
数据集和期望的RESULTS
表格。我希望使用dplyr
中的mutate
函数找到解决方案。我主要使用dplyr
包来操作我的数据集。尽管我可以通过dplyr
中的gather
、group_by
和sumarise
函数完成这个任务,但是我在处理非常大的数据集时会出现“gathered”数据表超过2,000,000行的情况。感谢您的帮助。
DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000))
RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
SUM_STUFF = c(3, 6, 90, 120, 300, 600, 900, 15000, 18000))
mutate(DATA, SUM_STUFF = rowSums(DATA[,3:4]))
应该是其中的一种方式。 - jazzurrowithin(DATA, { SUMS=rowSums(DATA[,3:4]) })
。 - r2evanswithin(DATA, { SUMS=rowSums(DATA[,sapply(DATA, is.numeric)]) })
,它将对所有数值列求和...有点激进,但没有魔法常量! - r2evans