整洁、向量化的 fisher.test R

Question

整洁、向量化的 fisher.test R

3

我想要将 fisher.test 函数向量化，以便可以并行应用于多个变量。

例如： 构建一个玩具数据集。

library(tidyverse)
library(broom)
n=20
outcome <- rbinom(n, size =c(0,1), prob = 0.7)
feature1 <- rbinom(n, size=c(2), prob = 0.5) + 10
feature2 <- rbinom(n, size=c(1), prob = 0.5)+20
df <- tibble(outcome, feature1, feature2)

这是一个功能的预期输出：

df %>%
  select(outcome, feature1) %>%
  count(outcome, feature1) %>%
  pivot_wider(names_from = feature1, values_from = n) %>%
  fisher.test(.) %>%
  tidy()

我尝试了group_modify，并尝试将其并行应用于几个不同的变量，但它不起作用：

df %>% 
  pivot_longer(c("feature1", "feature2"), names_to = "variable", values_to = "value") %>%
  group_by(variable) %>%
  group_modify(~count(outcome, value)) %>%
  pivot_wider(names_from = value, values_from = n) %>%
  fisher.test(.) %>%
  tidy()

我得到错误信息 "no method 'count' applicable for an object of class c("integer', 'numeric')"

理想情况下，我的最终输出应该是：

all_pvalues <- tribble(
      ~variable, ~p.value, 
       feature1,   0.805,
       feature2,   0.582)

非常感谢您的提前帮助，

- T. Walter

4个回答

3

使用 base R

stack(lapply(df[-1], \(x) fisher.test(df$outcome, x)$p.value))[2:1]
       ind    values
1 feature1 0.5939033
2 feature2 0.1576883

- akrun

3

更快的解决方案。+1 - langtang

1

一种方法是使用group_split和map_dfr函数：

library(dplyr)
library(broom)
library(tidyr)
library(purrr)
df  |> 
  pivot_longer(c("feature1", "feature2"), names_to = "variable", values_to = "value") |>
  group_split(variable) |>
  set_names(c("feature1", "feature2")) |> 
  map_dfr(~.x |> count(outcome, value) |>pivot_wider(names_from = value, values_from = n) |> 
            mutate(across(everything(), \(x) replace_na(x,0))) |> 
            fisher.test() |> 
            tidy(), .id = "variable") |> 
  select(variable, p.value)

输出：

# A tibble: 2 × 2
  variable p.value
  <chr>      <dbl>
1 feature1  0.0991
2 feature2  0.438

- Julian

1

我终于找到了一个解决方案：

df %>%
  pivot_longer(-outcome, names_to = "variable", values_to = "value") %>%
  group_by(outcome, variable, value) %>%
  summarise(n = n()) %>%
  ungroup() %>%
  group_by(variable) %>%
  group_modify(~pivot_wider(data = ., names_from = value, values_from = n)) %>%
  replace(is.na(.), 0) %>%
  select(-outcome) %>%
  group_modify(~tidy(fisher.test(x=.))) %>%
  select(variable, p.value) %>%
  ungroup()

- T. Walter

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- langtang · Accepted Answer

cols = names(df)[grepl("^feature",names(df))]

tibble(
  feature = cols,
  pvalue = t(reframe(df, across(cols, ~fisher.test(outcome,.x)$p.value)))[,1]
)

输出：

  feature  pvalue
  <chr>     <dbl>
1 feature1  0.837
2 feature2  0.642