在R中使用dplyr按组比较均值（ANOVA）

Question

在R中使用dplyr按组比较均值（ANOVA）

3

我拥有不同子群体（例如按照课程、年龄组、性别等分类）的调查问题的汇总结果（N、均值、标准差）。我希望能够确定那些子群中存在具有统计学意义的条目，以便可以进一步探究结果。理想情况下，这应该在使用tidyverse / dplyr在R Markdown中准备数据报告的过程中完成。

我的数据如下所示：

> head(demo, 11)
# A tibble: 11 x 7
# Groups:   qid, subgroup [3]
     qid question subgroup name       N  mean    sd
   <int> <chr>    <chr>    <chr>  <dbl> <dbl> <dbl>
 1     1 noise     NA       total   214  3.65 1.03
 2     1 noise     course   A       11  4     0.77
 3     1 noise     course   B       47  3.55  1.16
 4     1 noise     course   C       31  3.29  1.24
 5     1 noise     course   D       40  3.8   0.85
 6     1 noise     course   E       16  3.38  1.09
 7     1 noise     course   F       11  3.55  1.13
 8     1 noise     course   G       25  4.12  0.73
 9     1 noise     course   H       25  3.68  0.85
10     1 noise     gender   f       120 3.65  1.07
11     1 noise     gender   m       93  3.67  0.98

我需要的是一个新的列，如果在给定问题的子组内存在统计学上显著的差异，则标记为TRUE，否则标记为FALSE。类似于下面的sigdiff：

     qid question subgroup name       N  mean    sd     sigdiff     
   <int> <chr>    <chr>    <chr>  <dbl> <dbl> <dbl>       <lgl>
 2     1 noise     course   A       11  4     0.77        FALSE
 3     1 noise     course   B       47  3.55  1.16        FALSE 
 4     1 noise     course   C       31  3.29  1.24        FALSE 
 5     1 noise     course   D       40  3.8   0.85        FALSE 
 6     1 noise     course   E       16  3.38  1.09        FALSE 
 7     1 noise     course   F       11  3.55  1.13        FALSE 
 8     1 noise     course   G       25  4.12  0.73        FALSE 
 9     1 noise     course   H       25  3.68  0.85        FALSE

现在，一种非常巧妙的方法是采用基于rpsychi包的此方法来确定任何组之间是否存在显着差异。

但是，我未能将其适应于我的分组tibble。我（失败的）方法是尝试通过dplyr的新功能group_map来简单调用执行ANOVA的函数：

if(!require(rpsychi)){install.packages("rpsychi")}
library(rpsychi)
if(!require(tidyverse)){install.packages("tidyverse")}
library(tidyverse)

#' function establishing significant difference
#' between survey answers within subgroups

anovagrptest <- function(grpsum){
  
      anovaresult <- ind.oneway.second(grpsum$mean, grpsum$sd, grpsum$N, sig.level = 0.05)
      
      # compare critical F Value
      fcrit <- qf(.95, anovaresult$anova.table$df[1], anovaresult$anova.table$df[2])
      if(anovaresult$anova.table$F[1] > fcrit){return(TRUE)
      }else{return(FALSE)}
    }

#' pass the subset of the data for the group to the function which 
#' "returns a list of results from calling .f on each group"

relquestions <- demo %>% 
  group_by(qid, subgroup) %>% 
  group_map(~ anovagrptest(.x))

代码因为“delta.upper + dfb的错误：二元运算符的非数值参数”而中止。非常感谢您的想法。

- user13782962

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martin Gal · Accepted Answer

我认为你使用NA导致了问题。首先，我认为你不需要那个映射函数（但说实话我不能百分之百确定）。

demo %>% 
  select(-id) %>%
  group_by(qid, subgroup) %>%
  mutate(new_column = ind.oneway.second(mean, sd, N, sig.level = 0.05) %>%
           {qf(.95, .[["anova.table"]][["df"]][1], .[["anova.table"]][["df"]][2]) < .[["anova.table"]][["F"]][1]})

引起的原因

Error: Problem with `mutate()` input `new_column`.
x non-numeric argument for binary operator
i Input `new_column` is ``%>%`(...)`.
i The error occured in group 3: qid = 1, subgroup = NA.
Run `rlang::last_error()` to see where the error occurred.

当我删除包含NA的行时

demo %>% 
  select(-id) %>%
  group_by(qid, subgroup) %>%
  drop_na() %>%
  mutate(new_column = ind.oneway.second(mean, sd, N, sig.level = 0.05) %>%
           {qf(.95, .[["anova.table"]][["df"]][1], .[["anova.table"]][["df"]][2]) < .[["anova.table"]][["F"]][1]})

我理解

# A tibble: 10 x 8
# Groups:   qid, subgroup [2]
     qid question subgroup name      N  mean    sd new_column
   <dbl> <chr>    <chr>    <chr> <dbl> <dbl> <dbl> <lgl>  
 1     1 noise    course   A        11  4     0.77 FALSE  
 2     1 noise    course   B        47  3.55  1.16 FALSE  
 3     1 noise    course   C        31  3.29  1.24 FALSE  
 4     1 noise    course   D        40  3.8   0.85 FALSE  
 5     1 noise    course   E        16  3.38  1.09 FALSE  
 6     1 noise    course   F        11  3.55  1.13 FALSE  
 7     1 noise    course   G        25  4.12  0.73 FALSE  
 8     1 noise    course   H        25  3.68  0.85 FALSE  
 9     1 noise    gender   f       120  3.65  1.07 FALSE  
10     1 noise    gender   m        93  3.67  0.98 FALSE