I have data in a data frame in this format:
grp1 grp2 grp3 grp4 result
1 0 1 0 0 1
2 1 0 0 0 0
3 0 0 0 1 1
4 0 0 0 1 1
5 1 0 0 0 0
6 0 1 0 0 1
.
.
.
这可以通过以下方式生成:
set.seed(13)
groups <- c("grp1", "grp2", "grp3", "grp4", "result")
# Randomly assign each to group and a result
x <- do.call(rbind, lapply(1:50, function(x) c(sample(c(1,0,0,0), 4), sample(0:1, 1))))
df <- data.frame(x)
colnames(df) <- groups
我的目标是让数据格式化为以下形式:
group freq
1 grp1 0.5625000
2 grp2 0.5000000
3 grp3 0.6250000
4 grp4 0.2857143
频率是每个组中结果的百分比。
我目前使用dplyr尝试:
library(dplyr)
df %>%
group_by(grp1, grp2, grp3, grp4, result) %>%
summarize(n = n()) %>%
mutate(freq = n / sum(n)) %>%
select(-n) %>%
filter(result == 1)
导致
grp1 grp2 grp3 grp4 result freq
1 0 0 0 1 1 0.5625000
2 0 0 1 0 1 0.5000000
3 0 1 0 0 1 0.6250000
4 1 0 0 0 1 0.2857143
reshape2::melt
或tidyr::gather
。 - Gregor Thomas