使用dplyr::across在R中为多列前缀计算两列之间的差异。

3
zed <- data.frame(
  aAgg = c(5, 10, 15, 20),
  bAgg = c(8, 16, 24, 32),
  aPg = c(6, 9, 11, 24),
  bPg = c(7, 15, 22, 26)
)

diff_func <- function(col) {
  return(`{col}Agg` - `{colPg}`)
}

zed %>% 
  dplyr::mutate(dplyr::across(.cols = c('a', 'b'), .fns = diff_func, .names = "{col}Diff"))

# we want the output that this outputs, without having to have a mutate for each field.
zed <- zed %>%
  dplyr::mutate(aDiff = aAgg - aPg) %>%
  dplyr::mutate(bDiff = bAgg - bPg)

我们尝试使用dplyr的across函数创建多列。对于每个列前缀(在此场景中为ab),我们想要计算prefixAgg - prefixPg的差值,并将新列命名为prefixDiff。上面代码示例中的最后3行生成了期望的输出结果。我们当前的diff_func不正确,会导致错误。

有没有一个可以传递给across的函数来生成这个输出呢?

2个回答

5
我们可能需要循环遍历“Agg”列或“Pg”列,并在替换列名称中的子字符串(cur_column())后获取相应的列,然后修改.names
library(dplyr)
library(stringr)
zed %>%
   mutate(across(ends_with("Agg"), ~ .x -
   get(str_replace(cur_column(), "Agg", "Pg")), 
   .names = "{str_replace(.col, 'Agg', 'Diff')}"))

-输出

  aAgg bAgg aPg bPg aDiff bDiff
1    5    8   6   7    -1     1
2   10   16   9  15     1     1
3   15   24  11  22     4     2
4   20   32  24  26    -4     6

或者用两个across,得到它们的差值 - 结果是一个数据框/表格,然后unpack数据框列。

library(tidyr)
zed %>% 
  mutate(Diff = across(ends_with("Agg")) - across(ends_with("Pg"))) %>% 
  unpack(where(is.data.frame), names_sep = "")
# A tibble: 4 × 6
   aAgg  bAgg   aPg   bPg DiffaAgg DiffbAgg
  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1     5     8     6     7       -1        1
2    10    16     9    15        1        1
3    15    24    11    22        4        2
4    20    32    24    26       -4        6

注意:如果需要,可以重命名列

zed %>% 
  mutate(across(ends_with("Agg"), 
  .names = "{str_remove(.col, 'Agg')}Diff") - 
      across(ends_with("Pg")))
  aAgg bAgg aPg bPg aDiff bDiff
1    5    8   6   7    -1     1
2   10   16   9  15     1     1
3   15   24  11  22     4     2
4   20   32  24  26    -4     6

或者也可以使用dplyoveracross2

library(dplyover)
zed %>%
  mutate(across2(ends_with("Agg"), ends_with("Pg"), `-`, 
  .names_fn = ~ str_replace(.x, "Agg_.*", "Diff")))
  aAgg bAgg aPg bPg aDiff bDiff
1    5    8   6   7    -1     1
2   10   16   9  15     1     1
3   15   24  11  22     4     2
4   20   32  24  26    -4     6

3

split.defaultdplyr 解决方案(可能是最快的解决方案,请参见这里):

zed <- data.frame(
  aAgg = c(5, 10, 15, 20),
  bAgg = c(8, 16, 24, 32),
  aPg = c(6, 9, 11, 24),
  bPg = c(7, 15, 22, 26)
)

library(dplyr, warn.conflicts = F)
zed %>% 
  split.default(
    sub('^(.{1}).*', '\\1', names(zed))
  ) %>% 
  lapply(
    function(.x) .x[[1]] - .x[[2]]
  ) %>% 
  setNames(., paste0(names(.), 'Diff')) %>% 
  mutate(zed, !!!.)
#>   aAgg bAgg aPg bPg aDiff bDiff
#> 1    5    8   6   7    -1     1
#> 2   10   16   9  15     1     1
#> 3   15   24  11  22     4     2
#> 4   20   32  24  26    -4     6

本内容由 reprex 包 (v2.0.1) 在 2022-08-09 创建


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接