如何在dplyr中进行按类别汇总和计数子组的总结

4

利用内置的泰坦尼克号数据集,我目前统计了变量“Class”的观测次数。如何创建一个新列,其中包含“Survive ='Yes'”和“Survive ='No'”的计数。

> as.data.frame(Titanic) %>% 
      mutate_if(is.character, as.factor) %>% 
      group_by(Class) %>%
      summarise("Number of Observations" = n() )

# A tibble: 4 × 2
  Class `Number of Observations`
  <fct>                    <int>
1 1st                          8
2 2nd                          8
3 3rd                          8
4 Crew                         8

我希望能得到类似这样的东西

# A tibble: 4 × 2
  Class `Number of Observations`   Survived.Yes   Survived.No
  <fct>                    <int>
1 1st                          8      4              4
2 2nd                          8      4              4
3 3rd                          8      4              4
4 Crew                         8      4              4

我尝试将 "Survived" 放入“group by”语句,但它会输出到单独的一行。

as.data.frame(Titanic) %>% 
  mutate_if(is.character, as.factor) %>% 
  group_by(Class, Survived) %>%
  summarise("Number of Observations" = n() )

# A tibble: 8 × 3
# Groups:   Class [4]
  Class Survived `Number of Observations`
  <fct> <fct>                       <int>
1 1st   No                              4
2 1st   Yes                             4
3 2nd   No                              4
4 2nd   Yes                             4
5 3rd   No                              4
6 3rd   Yes                             4
7 Crew  No                              4
8 Crew  Yes                             4

非常感谢您的建议。

1个回答

2
您可以使用sum(Survived == "Yes")来获取每个组中“Yes”的计数。
as.data.frame(Titanic) %>% 
  group_by(Class) %>%
  summarise(
    "Number of Observations" = n(),
    across(Survived, list(Yes = ~ sum(. == "Yes"),
                          No  = ~ sum(. == "No"))))

# # A tibble: 4 x 4
#   Class `Number of Observations` Survived_Yes Survived_No
#   <fct>                    <int>        <int>       <int>
# 1 1st                          8            4           4
# 2 2nd                          8            4           4
# 3 3rd                          8            4           4
# 4 Crew                         8            4           4

你也可以使用tidyr中的pivot_wider()函数:
library(tidyr)

as.data.frame(Titanic) %>%
  add_count(Class, name = "Number of Observations") %>%
  pivot_wider(c(Class, last_col()),
              names_from = Survived, names_prefix = "Survived_",
              values_from = Survived, values_fn = length)

# # A tibble: 4 x 4
#   Class `Number of Observations` Survived_No Survived_Yes
#   <fct>                    <int>       <int>        <int>
# 1 1st                          8           4            4
# 2 2nd                          8           4            4
# 3 3rd                          8           4            4
# 4 Crew                         8           4            4

你甚至不需要附加其他软件包。
addmargins(xtabs(~ Class + Survived, Titanic), 2)

#       Survived
# Class  No Yes Sum
#   1st   4   4   8
#   2nd   4   4   8
#   3rd   4   4   8
#   Crew  4   4   8

我遇到了同样的问题,并使用(并点赞)了您的第一个解决方案。谢谢!但是我不明白在“sum”项之前使用波浪线“〜”的用途。能否发表评论? - W Barker
1
@WBarker波浪符“~”是包“purrr”支持的特殊语法,用于表示函数。例如,Yes = ~ sum(. == "Yes")相当于Yes = function(x) sum(x=="Yes")。更多用法可以在?across的帮助页面中找到。 - Darren Tsai

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接