我经常使用以下dplyr语法计算数据框的汇总统计信息:
1. Aggregate <-
2. Original Dataset %>%
3. Group_By %>%
4. Filter %>%
5. Summarize %>%
6. Left_Join(back to Aggregate)
例如:
Original <- data.frame(A = 1:100,B = sample(LETTERS,100,replace = TRUE),C = rnorm(100))
# Calculate 1st Summary Statistic
Aggregate <- Original %>% group_by(B) %>%
filter(A > 50) %>%
summarize(meanC = mean(C))
# Calculate 2nd Summary Statistic
Aggregate <- Original %>% group_by(B) %>%
summarize(Q = sum(C)) %>%
left_join(x = Aggregate,y = Original,by = "B")
我的问题有两个方面:
A)是否有更好的方法基于另一个表构建摘要统计表?左连接感觉非常笨重。
B)如何使用"data.table"方式来做这件事,即如何加入返回到聚合表?
Aggregate[Aggregate[,meanC:=mean(C),by=.(B)]]
感谢您提供的任何建议...
orig[ , meanC := mean(C), by=B]
。 - DanY':='()
符号以及带有.SDcols
的lapply
。 - DanY