如何使用dplyr中的map2()函数对列列表进行mutate()操作

Question

如何使用dplyr中的map2()函数对列列表进行mutate()操作

4

我最近需要编译一个学生成绩数据框（每个学生一行，包含ID列和几个整数值列，每个分数组件一个）。我需要将“主”数据框和多个“修正”数据框（主要包含NA和一些对主数据进行的更新）合并，以便结果包含来自主数据中的最大值和所有修正值。

我通过复制粘贴一系列mutate()调用来成功完成这项工作（请参见下面的示例），但我认为这种方法不够优雅。我想做的是，使用类似map2和两个列列表来逐对比较列。如下所示（不能直接使用）：

list_of_cols1 <- list(col1.x, col2.x, col3.x)
list_of_cols2 <- list(col1.y, col2.y, col3.y
map2(list_of_cols1, list_of_cols2, ~ column = pmax(.x, .y, na.rm=T))

我似乎无法找出如何做到这一点。我的问题是：如何在dplyr管道中的一个map2()调用中指定这些列的列表并进行变异，或者甚至有可能吗-我搞错了吗？

最小工作示例

library(tidyverse)

master <- tibble(
  id=c(1,2,3), 
  col1=c(1,1,1),
  col2=c(2,2,2),
  col3=c(3,3,3)
)
correction1 <- tibble(
  id=seq(1,3),
  col1=c(NA, NA, 2 ),
  col2=c( 1, NA, 3 ),
  col3=c(NA, NA, NA)
)

result <- reduce(
  # Ultimately there would several correction data frames
  list(master, correction1), 
  function(x,y) {
    x <- x %>% 
      left_join(
        y,
        by = c("id")
      ) %>%
      # Wish I knew how to do this mutate call with map2 
      mutate(
        col1 = pmax(col1.x, col1.y, na.rm=T), 
        col2 = pmax(col2.x, col2.y, na.rm=T), 
        col3 = pmax(col3.x, col3.y, na.rm=T)
      ) %>%
      select(id, col1:col3)
  }
)

结果是

> result
# A tibble: 3 x 4
     id  col1  col2  col3
  <int> <dbl> <dbl> <dbl>
1     1     1     2     3
2     2     1     2     3
3     3     2     3     3

- user1642246

请澄清，只有当值大于“master”中的值时才应进行更正？ - acylam

好问题，但不是的，目标是在master和correction1（以及correction2，correction3等）表中找到最大值。 - user1642246

3个回答

1

不要使用 left_join，只需绑定行，然后进行汇总。例如：

result <- reduce(
  list(master, master), 
  function(x,y) {
    bind_rows(x, y) %>%
      group_by(id) %>%
      summarize_all(max, na.rm=T)
  }
)
result
#     id  col1  col2  col3
#   <dbl> <dbl> <dbl> <dbl>
# 1     1     1     2     3
# 2     2     1     2     3
# 3     3     2     3     3

实际上，您甚至不需要使用reduce，因为bind_rows可以接受列表。

添加另一个表格。

correction2 <- tibble(id=2,col1=NA,col2=8,col3=NA)
bind_rows(master, correction1, correction2) %>% 
  group_by(id) %>%
  summarize_all(max, na.rm=T)

- MrFlick

啊，如此简单！我甚至没有想到我可以按行而不是比较列。这是第一个答案，非常简洁美观。 - user1642246

1

如果“纠正”表的结构始终与“主”表相同，您可以执行以下操作：

library(dplyr)
library(purrr)

update_master = function(...){
  map(list(...), as.matrix) %>%
    reduce(pmax, na.rm = TRUE) %>%
    data.frame()
}

update_master(master, correction1)

为了让 id 可以接受字符值，进行以下修改：

update_master = function(x, ...){
  map(list(x, ...), function(x) as.matrix(x[-1])) %>%
    reduce(pmax, na.rm = TRUE) %>%
    data.frame(id = x[[1]], .)
}

update_master(master, correction1)

结果：

  id col1 col2 col3
1  1    1    2    3
2  2    1    2    3
3  3    2    3    3

- acylam

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- rdh · Accepted Answer

抱歉，这并没有回答你关于map2的问题。我发现在tidy R中，对行进行聚合比对列进行聚合更容易：

library(dplyr)

master <- tibble(
  id=c(1,2,3), 
  col1=c(1,1,1),
  col2=c(2,2,2),
  col3=c(3,3,3)
)
correction1 <- tibble(
  id=seq(1,3),
  col1=c(NA, NA, 2 ),
  col2=c( 1, NA, 3 ),
  col3=c(NA, NA, NA)
)

result <- list(master, correction1) %>% 
  bind_rows() %>% 
  group_by(id) %>% 
  summarise_all(max, na.rm = TRUE)

result
#> # A tibble: 3 x 4
#>      id  col1  col2  col3
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     1     1     2     3
#> 2     2     1     2     3
#> 3     3     2     3     3