使用dplyr计算分组比率

6
使用以下数据框,我想按复制品和组分组,然后计算治疗值与对照值的比率。
structure(list(group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L), .Label = c("case", "controls"), class = "factor"), treatment = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "EPA", class = "factor"), 
    replicate = structure(c(2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L), .Label = c("four", 
    "one", "three", "two"), class = "factor"), fatty_acid_family = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "saturated", class = "factor"), 
    fatty_acid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "14:0", class = "factor"), 
    quant = c(6.16, 6.415, 4.02, 4.05, 4.62, 4.435, 3.755, 3.755
    )), .Names = c("group", "treatment", "replicate", "fatty_acid_family", 
"fatty_acid", "quant"), class = "data.frame", row.names = c(NA, 
-8L))

我尝试使用dplyr:

group_by(dataIn, replicate, group) %>% transmute(ratio = quant[group=="case"]/quant[group=="controls"])

但是这会导致Error: incompatible size (%d), expecting %d (the group size) or 1的错误。

最初我认为这可能是因为我试图从一个8行深的df中创建4个比率,所以我想summarise可能是答案(将每个组折叠成一个比率),但这也不起作用(我的理解是有缺陷的)。

group_by(dataIn, replicate, group) %>% summarise(ratio = quant[group=="case"]/quant[group=="controls"])

  replicate    group ratio
1      four     case    NA
2      four controls    NA
3       one     case    NA
4       one controls    NA
5     three     case    NA
6     three controls    NA
7       two     case    NA
8       two controls    NA

我希望能得到一些关于我做错了什么或者这是否可以通过dplyr完成的建议。

谢谢。


2
不要按group分组 - eddi
2个回答

10

你可以尝试:

group_by(dataIn, replicate) %>% 
    summarise(ratio = quant[group=="case"]/quant[group=="controls"])
#Source: local data frame [4 x 2]
#
#  replicate    ratio
#1      four 1.078562
#2       one 1.333333
#3     three 1.070573
#4       two 1.446449

因为您按照复制和组分组,所以无法同时访问来自不同组的数据。


1
谢谢,工作得很好。我认为我理解了“group”的问题,但是我需要使用dplyr并再次阅读文档来尝试一下。 - duff
嗨,我有许多“已处理”数据集(例如此示例中的“案例”)与一组单个“控制”值。您无需指定“案例”即可迭代案例(我的复数添加)。即 summarise(ratio = quant / quant[group=="controls"]) - Ben G Small

1

@talat的回答对我很有帮助。我创建了一个最小化可重现的示例来帮助自己理解:

df <- structure(list(a = c("a", "a", "b", "b", "c", "c", "d", "d"), 
    b = c(1, 2, 1, 2, 1, 2, 1, 2), c = c(22, 15, 5, 0.2, 107, 
    6, 0.2, 4)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

#   a b     c
# 1 a 1  22.0
# 2 a 2  15.0
# 3 b 1   5.0
# 4 b 2   0.2
# 5 c 1 107.0
# 6 c 2   6.0
# 7 d 1   0.2
# 8 d 2   4.0

library(dplyr)

df %>%  
  group_by(a) %>% 
  summarise(prop = c[b == 1] / c[b == 2])

#   a      prop
# 1 a  1.466667
# 2 b 25.000000
# 3 c 17.833333
# 4 d  0.050000


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接