使用dplyr :: starts_with（）和lambda函数

Question

使用dplyr :: starts_with（）和lambda函数

3

我有以下实现

library(dplyr)
library(tidyr)
dat = data.frame('A' = 1:3, 'C_1' = 1:3, 'C_2' = 1:3, 'M' = 1:3)

以下代码有效

dat %>% rowwise %>% mutate(Anew = list({function(x) c(x[1]^2, x[2] + 5, x[3] + 1)}(c(M, C_1, C_2)))) %>% ungroup %>% unnest_wider(Anew, names_sep = "")

然而，当我尝试使用dplyr::starts_with()查找列名时，下面的内容不起作用。

dat %>% rowwise %>% mutate(Anew = list({function(x) c(x[1]^2, x[2] + 5, x[3] + 1)}(c(M, starts_with('C_'))))) %>% ungroup %>% unnest_wider(Anew, names_sep = "")

任何有关如何在这种情况下正确应用starts_with()的指针都将非常有帮助。

PS：这是我早期帖子Apply custom function that returns multiple values after dplyr::rowwise()的延续。

- Brian Smith

1

当C_列的数量不等于2时会发生什么？这似乎类似于我展示的情况，即 nm1 <- c(C_1 = 5, C_2 = 1); dat %>% mutate(across(starts_with("C_"), ~ .x + nm1[cur_column()], .names = "{.col}_new"))。 - akrun

我已经编辑了原始帖子，以更准确地反映我的实际问题。 - Brian Smith

1

我猜你需要 c_across，即

dat %>% rowwise %>% mutate(Anew = list((function(x) c(x[1]^2, x[2] + 5, x[3] + 1))(c_across(starts_with("C_"))))) %>% unnest_wider(Anew, names_sep = "")

。此外，你需要在数据中有 C_3，因为 x[3] 对应第三列。 - akrun

2个回答

1

如果我们将 starts_with 包装在 c_across 中，并假设有一个以 C_ 开头的第三列，则即席 lambda 函数将工作。

library(dplyr)
library(tidyr)
dat %>%
  rowwise %>%
   mutate(Anew = list((function(x) c(x[1]^2, x[2] + 5, x[3] + 
      1))(c_across(starts_with("C_"))))) %>%
  unnest_wider(Anew, names_sep = "")

-输出

# A tibble: 3 × 8
      A   C_1   C_2   C_3     M Anew1 Anew2 Anew3
  <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1     1     1     1     1     1     1     6     2
2     2     2     2     2     2     4     7     3
3     3     3     3     3     3     9     8     4

或者我们可以创建一个命名的 list 并使用 across 按列应用函数（这样更高效），而不是使用 rowwise。

fns <- list(C_1 = function(x) x^2, C_2 = function(x) x + 5, 
      C_3 = function(x) x + 1)
dat %>%
   mutate(across(starts_with("C_"), 
    ~ fns[[cur_column()]](.x), .names = "Anew{seq_along(.fn)}"))

-输出

   A C_1 C_2 C_3 M Anew1 Anew2 Anew3
1 1   1   1   1 1     1     6     2
2 2   2   2   2 2     4     7     3
3 3   3   3   3 3     9     8     4

数据

dat <- data.frame('A' = 1:3, 'C_1' = 1:3, 'C_2' = 1:3, C_3 = 1:3, 'M' = 1:3)

- akrun

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- G. Grothendieck · Accepted Answer

starts_with 必须在选择函数内使用，以便我们可以编写此代码。 across 也是一个选择函数，因此我们可以交替使用 across(M | starts_with('C_')) 代替 select(...)。 c_across 也是一个选择函数，但它不保留名称。

dat %>%
  rowwise %>%
  mutate(Anew = list({function(x) c(x[1]^2, x[2] + 5, x[3] + 1)}
    (select(cur_data(), M, starts_with('C_'))))) %>%
  ungroup %>%
  unnest_wider(Anew, names_sep = "")
## # A tibble: 3 × 7
##       A   C_1   C_2     M AnewM AnewC_1 AnewC_2
##   <int> <int> <int> <int> <dbl>   <dbl>   <dbl>
## 1     1     1     1     1     1       6       2
## 2     2     2     2     2     4       7       3
## 3     3     3     3     3     9       8       4

在这里，group_modify 也可以使用并允许使用公式符号来指定匿名函数。匿名函数中的索引已被重新排序以对应输入顺序。

dat %>%
  group_by(A) %>%
  group_modify(~ cbind(.x, Anew = c(.x[3]^2, .x[1] + 5, .x[2] + 1))) %>%
  ungroup
## # A tibble: 3 × 7
##       A   C_1   C_2     M Anew.M Anew.C_1 Anew.C_2
##   <int> <int> <int> <int>  <dbl>    <dbl>    <dbl>
## 1     1     1     1     1      1        6        2
## 2     2     2     2     2      4        7        3
## 3     3     3     3     3      9        8        4