在R中为卡方检验格式化数据

Question

在R中为卡方检验格式化数据

3

我试图重新格式化我的数据，以便在r中运行卡方检验。我的数据设置是将自变量放在一列中，而自变量组的计数则分布在另外两列中。这里是我数据格式的示例。

> example <- data.frame(category = c("x","y","x","y"), true = c(2,4,6,3), false = c(7,9,3,5))
> example
  category true false
1        x    2     7
2        y    4     9
3        x    6     3
4        y    3     5

据我所知，chisq.test函数无法处理这种格式的数据，因此我认为我需要重新格式化数据以使其看起来像下面的“好例子”，以便运行该函数。我的问题是对于大型数据集，我不确定有没有简便的方法进行这种数据透视。

> good_example <- data.frame(category = c('x','x','y','y','x','x','y','y'),
                           variable = c('true','false','true','false','true','false','true','false'),
                           count = c(2,7,4,9,6,3,3,5))
> good_example
  category variable count
1        x     true     2
2        x    false     7
3        y     true     4
4        y    false     9
5        x     true     6
6        x    false     3
7        y     true     3
8        y    false     5
> tab <- tapply(good_example$count, list(good_example$category, good_example$variable), FUN=sum)
> chisq.test(tab, correct = FALSE)

    Pearson's Chi-squared test

data:  tab
X-squared = 0.50556, df = 1, p-value = 0.4771

- Emma Beck

1

使用 tidyr，您可以使用 pivot_longer 和类似以下的代码：pivot_longer(example, cols = c("true", "false"), names_to = "variable", values_to = "count")。 - Ben

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- StupidWolf · Accepted Answer

如果您只需要根据x和y总结所有的真和假，那么请按如下方式操作：

tab = do.call(rbind,by(example[,-1],example$category,colSums))
chisq.test(tab,correct=FALSE)

更精简的版本(由@markus指出)，其中你可以根据类别拆分数据，对所有列应用求和函数，除了用于拆分的列:

tab = aggregate(.~category, example, sum)

或者使用dplyr / tidyr版本：

library(dplyr)
tab = example %>% group_by(category) %>% summarise_all(sum)
chisq.test(tab[,-1],correct=FALSE)