使用dplyr和tidyr计算小计

5
expand.grid(country = c('Sweden','Norway', 'Denmark','Finland'),
            sport = c('curling','crosscountry','downhill')) %>% 
    mutate(medals = sample(0:3, 12, TRUE)) -> 
 data

使用reshape2的dcast可以在一行中实现此操作。使用自定义名称来指定边距需要额外的步骤。
library(reshape2)

data %>% 
  dcast(country ~  sport, margins = TRUE, sum) %>% 

 # optional renaming of the margins `(all)`
  rename(Total = `(all)`) %>% 
  mutate(country = ifelse(country == "(all)", "Total", country))

我的dplyr + tidyr方法很冗长。使用tidyr和dplyr编写这个最佳的(紧凑且易读)方法是什么?

library(dplyr)
library(tidyr)

data %>% 
  group_by(sport) %>% 
  summarise(medals = sum(medals)) %>% 
  mutate(country = 'Total') ->
  sport_totals

data %>% 
  group_by(country) %>% 
  summarise(medals = sum(medals)) %>% 
  mutate(sport = 'Total') ->
  country_totals

data %>% 
  summarise(medals = sum(medals)) %>% 
  mutate(sport = 'Total',
         country = 'Total') ->
  totals

data %>% 
  bind_rows(country_totals, sport_totals, totals) %>% 
  spread(sport, medals)

2
这是在Excel中非常基础且极易操作的事情,在R中却需要花费大量时间。建议您查看rpivotTable - Nettle
1个回答

4

我不知道这是否是最好的(紧凑且易读),但它有效 ;)

data %>%
  spread(sport, medals) %>%
  mutate(Total = rowSums(.[2:4])) %>%
  rbind(., data.frame(country="Total", t(colSums(.[2:5]))))

  country curling crosscountry downhill Total
1  Sweden       0            2        0     2
2  Norway       1            1        0     2
3 Denmark       2            2        1     5
4 Finland       3            0        2     5
5   Total       6            5        3    14

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接