按两列进行分组并汇总多列

4

我有一个数据框,我想按“州”和“日期”列进行分组,然后汇总其他列的值,类似于下面的格式。

df

State  Female  Male   Date
------------------------------
Texas  2       2     01/01/04
Texas  3        1     01/01/04
Texas  5        4     02/01/04
Cali   1        1     05/06/05
Cali   2        1     05/06/05
Cali   3         1    10/06/05
Cali   1         2     10/06/05
NY    10         5    11/06/05
NY    11         6    12/06/05

期望的结果

df

State  Female  Male   Date
------------------------------
Texas  5       3     01/01/04
Texas  5        4     02/01/04
Cali   3        2     05/06/05
Cali   4         3    10/06/05
NY    10         5    11/06/05
NY    11         6    12/06/05


我尝试使用group by和summarize,但是我并不确定如何对两列进行相同的操作。
df <- df_homicides %>% 
        group_by(state) %>% 
        summarise(Female = sum(Female))

``
Thanks for your help!
2个回答

3

我们可以使用dplyr版本>= 1.00中的acrosssummarise一起使用。

library(dplyr)
df %>%
   group_by(State, Date) %>%
   summarise(across(everything(), sum, na.rm = TRUE), .groups = 'drop')
# A tibble: 6 x 4
#  State Date       Female  Male
#  <chr> <chr>       <int> <int>
#1 Cali  05/06/2005      3     2
#2 Cali  10/06/2005      4     3
#3 NY    11/06/2005     10     5
#4 NY    12/06/2005     11     6
#5 Texas 01/01/2004      5     3
#6 Texas 02/01/2004      5     4

或者使用base R下的aggregate

aggregate(.~ State + Date, df, sum, na.rm = TRUE)

数据

df <-  structure(list(State = c("Texas", "Texas", "Texas", "Cali", "Cali", 
"Cali", "Cali", "NY", "NY"), Female = c(2L, 3L, 5L, 1L, 2L, 3L, 
1L, 10L, 11L), Male = c(2L, 1L, 4L, 1L, 1L, 1L, 2L, 5L, 6L), 
    Date = c("01/01/2004", "01/01/2004", "02/01/2004", "05/06/2005", 
    "05/06/2005", "10/06/2005", "10/06/2005", "11/06/2005", "12/06/2005"
    )), class = "data.frame", row.names = c(NA, -9L))

这个dplyr函数不起作用。错误在across(): ! 只能在像mutate()filter()group_by()这样的数据屏蔽动词中使用。 - undefined
@treetopdewdrop,我只收到了一个警告信息,使用的是dplyr 1.1.2版本。甚至尝试了当前的CRAN版本dplyr 1.1.3,仍然只有一个警告信息。所以,从你的信息中无法确定你使用的是哪个版本的dplyr。 - undefined

1

尝试这个。您可以使用summarise_all()来聚合多个变量并使用所需的函数进行计算。以下是代码:

library(dplyr)
#Code
df %>% group_by(State,Date) %>%
  summarise_all(.funs = sum,na.rm=T)

输出:

# A tibble: 6 x 4
# Groups:   State [3]
  State Date       Female  Male
  <chr> <chr>       <int> <int>
1 Cali  05/06/2005      3     2
2 Cali  10/06/2005      4     3
3 NY    11/06/2005     10     5
4 NY    12/06/2005     11     6
5 Texas 01/01/2004      5     3
6 Texas 02/01/2004      5     4

一些使用的数据:
#Data
df <- structure(list(State = c("Texas", "Texas", "Texas", "Cali", "Cali", 
"Cali", "Cali", "NY", "NY"), Female = c(2L, 3L, 5L, 1L, 2L, 3L, 
1L, 10L, 11L), Male = c(2L, 1L, 4L, 1L, 1L, 1L, 2L, 5L, 6L), 
    Date = c("01/01/2004", "01/01/2004", "02/01/2004", "05/06/2005", 
    "05/06/2005", "10/06/2005", "10/06/2005", "11/06/2005", "12/06/2005"
    )), class = "data.frame", row.names = c(NA, -9L))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接