汇总表格(平均值 + 标准误)和2因素方差分析的p值

4

我正在尝试制作一个表格,输出通常通过2-way anova进行分析的大型研究的摘要统计信息,查看两个变量的主效应以及交互效应。

我希望能够快速运行统计数据,并以易于阅读的格式输出,如果有漂亮的格式那就更好了。

我已经能够获得2-way anova输出,并且我还使用了gtsummary包和tbl_summary来制作表格。但是,我无法弄清楚如何按多个变量分组。我的解决方案是创建一个新变量,将两个独立变量组合起来,以便将数据分成正确的组。

可复现的示例如下。

我想知道是否有一种方法可以制作与我所拥有的均值(sem)输出相同的表格,但具有我的2-way anova结果(也在下面粘贴)。在这个titanic示例中,我想要一个针对“性别”主效应的P值列,下一列是“上船”的主效应的p值,然后是一个交互的p值。

有什么想法吗?

library(titanic)
library(tidyverse)
library(gtsummary)
library(plotrix) #has a std.error function


##I really want to look at a 2-way anova, looking for the p-value for Sex, Embarked, and their interaction.
#This code just allows me to make a table with the 4 columns I want, but of course it now won't do the correct stats.
df <- titanic_train %>%
  filter(Embarked != "C" &  Embarked != "") %>%
  mutate(grp = paste(Sex, Embarked)) #add a new column that combines Sex & Pclass

#code to make my table 
  
table1 <- df %>%  
  select(grp, Age, Fare, Survived) %>%
  tbl_summary(
    by = grp,  ##can't figure out a way to put 2 variables here (Sex & Embarked)
    missing = "ifany", 
    statistic = all_continuous() ~ "{mean} ({std.error})",
    digits = all_continuous() ~ 1) %>% #this puts 1 decimal place for all values
   modify_header(stat_by = md("**{level}**<br>N =  {n}")) %>%
  bold_labels() %>%
  modify_spanning_header(all_stat_cols() ~ "**These are the Columns I Want**") %>%
  add_p(test = everything() ~ "aov",  ##This is a 1-way ANOVA, but I need 2 variables
  )

table1

#these are the p-values I want in my table:
two_way_anova_age <- aov(Age ~ Sex * Embarked, data = df)
summary(two_way_anova_age)

two_way_anova_fare <- aov(Fare ~ Sex * Embarked, data = df)
summary(two_way_anova_fare)

two_way_anova_surv <- aov(Survived ~ Sex * Embarked, data = df)
summary(two_way_anova_surv)
1个回答

2

以下是如何在gtsummary表中合并结果。

library(gtsummary)
library(titanic)
library(tidyverse)
library(plotrix) #has a std.error function

packageVersion("gtsummary")
#> [1] '1.4.0'

# create smaller version of the dataset
df <- 
  titanic_train %>%
  select(Sex, Embarked, Age, Fare) %>%
  filter(Embarked != "") # deleting empty Embarked status

# first, write a little function to get the 2-way ANOVA p-values in a table
# function to get 2-way ANOVA p-values in tibble
twoway_p <- function(variable) {
  paste(variable, "~ Sex * Embarked") %>%
    as.formula() %>%
    aov(data = df) %>% 
    broom::tidy() %>%
    select(term, p.value) %>%
    filter(complete.cases(.)) %>%
    pivot_wider(names_from = term, values_from = p.value) %>%
    mutate(
      variable = .env$variable,
      row_type = "label"
    )
}

# add all results to a single table (will be merged with gtsummary table in next step)
twoway_results <-
  bind_rows(
    twoway_p("Age"),
    twoway_p("Fare")
  )
twoway_results
#> # A tibble: 2 x 5
#>            Sex Embarked `Sex:Embarked` variable row_type
#>          <dbl>    <dbl>          <dbl> <chr>    <chr>   
#> 1 0.00823      3.97e- 1         0.611  Age      label   
#> 2 0.0000000191 4.27e-16         0.0958 Fare     label


tbl <-
  # first build a stratified `tbl_summary()` table to get summary stats by two variables
  df %>%
  tbl_strata(
    strata =  Sex,
    .tbl_fun =
      ~.x %>%
      tbl_summary(
        by = Embarked,
        missing = "no",
        statistic = all_continuous() ~ "{mean} ({std.error})",
        digits = everything() ~ 1
      ) %>%
      modify_header(all_stat_cols() ~ "**{level}**")
  ) %>%
  # merge the 2way ANOVA results into tbl_summary table
  modify_table_body(
    ~.x %>%
      left_join(
        twoway_results,
        by = c("variable", "row_type")
      )
  ) %>%
  # by default the new columns are hidden, add a header to unhide them
  modify_header(list(
    Sex ~ "**Sex**", 
    Embarked ~ "**Embarked**", 
    `Sex:Embarked` ~ "**Sex * Embarked**"
  )) %>%
  # adding spanning header to analysis results
  modify_spanning_header(c(Sex, Embarked, `Sex:Embarked`) ~ "**Two-way ANOVA p-values**") %>%
  # format the p-values with a pvalue formatting function
  modify_fmt_fun(c(Sex, Embarked, `Sex:Embarked`) ~ style_pvalue) %>%
  # update the footnote to be nicer looking
  modify_footnote(all_stat_cols() ~ "Mean (SE)")

在此输入图片描述

这是由reprex包(v1.0.0)于2021年3月27日创建的


1
这看起来很不错。我直接复制并粘贴了代码;使用remotes::install_github("ddsjoberg/gtsummary")进行安装,检查包版本(获取1.3.7.9010),但是我在"could not find "tbl_strata"和"sound not find "modify_fmt_fun"两个方面都遇到了错误。 - Erin Giles
你能再检查一下开发版的安装是否出现了错误吗?重新启动R并尝试从一个新的会话中进行安装。 - Daniel D. Sjoberg
1
好的,成功了,谢谢!我不得不使用remotes::install_github("ddsjoberg/gtsummary", force = TRUE)-- 我想那是唯一的区别。 - Erin Giles
太棒了,这是一张可爱的桌子! - Daniel D. Sjoberg
下一个问题,如果有人能帮忙--我现在正在尝试将这个(或类似的)转换为ANOVA输出,以循环遍历约100列并将所有内容放入表格中。 有没有办法做到这一点?通过上面的示例,我可以按编号选择列 df <- titanic_train %>% select(Sex, Embarked, (1:10)) %>% select(!("Name" | "Ticket")) %>% filter(Embarked != "") # 删除空的Embarked状态 -- 问题是如何在bind_rows部分上执行类似的操作。老实说,这个问题不需要太漂亮 - 可以循环遍历一堆双向方差分析并导出到CSV。 - Erin Giles

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接