在`rowwise`列表列中,使用不同的列名在`dplyr`中使用`rename`或`mutate`重命名数据框的左侧。

4
我正在尝试使用 {dplyr} 1.0.0 的 list-columns 处理 data.frames,我想知道在分组 rowwise 的嵌套 data.frame 中是否可以在管道中使用 rename()mutate() 对列进行重命名和操作。

为什么我想知道 / 这样做?据我所知,{dplyr} 1.0.0 的哲学推荐使用 rowwise() 而不是在列上使用 {purrr} 的 map 系列。下面首先展示了我在 {dplyr} 1.0.0 之前的做法,然后展示了一些示例(其中大多数都无法正常工作)来说明如何使用 {dplyr} 1.0.0。

虽然{rlang}支持在左手边使用粘合字符串(glue strings),这在编写{dplyr}自定义函数时可以使用,但是在rowwisetibble的{dplyr}函数的左手边似乎还不支持(至少我的下面的示例无法工作)。

对于rename,我找到了一种使用rename_with()的方法,但我不知道如何使它与mutate一起工作。

我也不理解我得到的大部分错误消息。它们或多或少地表明,在:=之前我没有在左手边使用字符串,但在rowwise模式下,我引用的列(new)实际上是一个字符向量,其长度为1

library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
library(purrr)

myiris <- iris %>% 
  nest_by(Species, .key = "mydat") %>% 
  ungroup %>% 
  mutate(new = letters[1:3])

# our data looks like this
# we want to use the strings in column `new` on the LHS of `rename` and `mutate`
myiris
#> # A tibble: 3 x 3
#>   Species                 mydat new  
#>   <fct>      <list<tbl_df[,4]>> <chr>
#> 1 setosa               [50 x 4] a    
#> 2 versicolor           [50 x 4] b    
#> 3 virginica            [50 x 4] c

# For reference: under dplyr < 1.0 I did the following:

# rename in pipe
# working
myiris %>% 
  mutate(mydat = map2(mydat, new,
                      ~ rename_at(.x, "Sepal.Length", function(z) paste(.y)))) %>% 
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 4
#>       a Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   5.1         3.5          1.4         0.2
#> 2   4.9         3            1.4         0.2
#> 3   4.7         3.2          1.3         0.2
#> 4   4.6         3.1          1.5         0.2
#> # ... with 46 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 4
#>       b Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   7           3.2          4.7         1.4
#> 2   6.4         3.2          4.5         1.5
#> 3   6.9         3.1          4.9         1.5
#> 4   5.5         2.3          4           1.3
#> # ... with 46 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 4
#>       c Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   6.3         3.3          6           2.5
#> 2   5.8         2.7          5.1         1.9
#> 3   7.1         3            5.9         2.1
#> 4   6.3         2.9          5.6         1.8
#> # ... with 46 more rows

# mutate in pipe
# was never working even under dplyr < 1.0.0
myiris %>% 
  mutate(mydat = map2(mydat, new,
                      ~ mutate(.x, eval(.y) := .y))) %>% 
  pull(mydat)
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `map2(mydat, new, ~mutate(.x, `:=`(eval(.y), .y)))`.

# mutate with custom function
# working
mymutate <- function(df, y) {
  mutate(df, !! y := y)
}

myiris %>% 
  mutate(mydat = map2(mydat, new,
                      ~ mymutate(.x, .y))) %>% 
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width a    
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>
#> 1          5.1         3.5          1.4         0.2 a    
#> 2          4.9         3            1.4         0.2 a    
#> 3          4.7         3.2          1.3         0.2 a    
#> 4          4.6         3.1          1.5         0.2 a    
#> # ... with 46 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width b    
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>
#> 1          7           3.2          4.7         1.4 b    
#> 2          6.4         3.2          4.5         1.5 b    
#> 3          6.9         3.1          4.9         1.5 b    
#> 4          5.5         2.3          4           1.3 b    
#> # ... with 46 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width c    
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>
#> 1          6.3         3.3          6           2.5 c    
#> 2          5.8         2.7          5.1         1.9 c    
#> 3          7.1         3            5.9         2.1 c    
#> 4          6.3         2.9          5.6         1.8 c    
#> # ... with 46 more rows





# dplyr > 1.0.0
# objective: `rename()` or `mutate()` in pipe on list-column of data.frames 
#            while using different column names on LHS coming from another
#            column (here `new`)

myiris_row <- myiris %>% rowwise

# rename --------
# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% rename({{new}} := "Sepal.Length"))) 
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(...)`.
#> i The error occured in row 1.

# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% rename(!! new := "Sepal.Length")))  
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(...)`.
#> i The error occured in row 1.

# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% rename(!! sym(new) := "Sepal.Length")))  
#> Error: Only strings can be converted to symbols

# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% rename(all_of(new) := "Sepal.Length")))  
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(mydat %>% rename(`:=`(all_of(new), "Sepal.Length")))`.
#> i The error occured in row 1.

# working, but only with `rename_with()`
myiris_row %>% 
  mutate(mydat = list(mydat %>% rename_with(~ new, "Sepal.Length")))  %>%
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 4
#>       a Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   5.1         3.5          1.4         0.2
#> 2   4.9         3            1.4         0.2
#> 3   4.7         3.2          1.3         0.2
#> 4   4.6         3.1          1.5         0.2
#> # ... with 46 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 4
#>       b Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   7           3.2          4.7         1.4
#> 2   6.4         3.2          4.5         1.5
#> 3   6.9         3.1          4.9         1.5
#> 4   5.5         2.3          4           1.3
#> # ... with 46 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 4
#>       c Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   6.3         3.3          6           2.5
#> 2   5.8         2.7          5.1         1.9
#> 3   7.1         3            5.9         2.1
#> 4   6.3         2.9          5.6         1.8
#> # ... with 46 more rows


# mutate ------
# the values of the new column don't matter
# here we just use the same input as the name, to show that RHS evaluation is easier.

# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% mutate(!! new := new))) 
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(...)`.
#> i The error occured in row 1.

# not working
myiris %>% 
  mutate(mydat = list(mydat %>% mutate(!! sym(new) := new))) 
#> Error: Only strings can be converted to symbols

# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% mutate(all_of(new) := new))) 
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(mydat %>% mutate(`:=`(all_of(new), new)))`.
#> i The error occured in row 1.

# almost working (what's going on in the data[[1]] btw!)
myiris_row %>% 
  mutate(mydat = list(mydat %>% mutate("{{new}}" := new)))  %>%
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width `promise_fn(3L)`
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>           
#> 1          5.1         3.5          1.4         0.2 a               
#> 2          4.9         3            1.4         0.2 a               
#> 3          4.7         3.2          1.3         0.2 a               
#> 4          4.6         3.1          1.5         0.2 a               
#> # ... with 46 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width `"b"`
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>
#> 1          7           3.2          4.7         1.4 b    
#> 2          6.4         3.2          4.5         1.5 b    
#> 3          6.9         3.1          4.9         1.5 b    
#> 4          5.5         2.3          4           1.3 b    
#> # ... with 46 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width `"c"`
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>
#> 1          6.3         3.3          6           2.5 c    
#> 2          5.8         2.7          5.1         1.9 c    
#> 3          7.1         3            5.9         2.1 c    
#> 4          6.3         2.9          5.6         1.8 c    
#> # ... with 46 more rows

2020年12月22日由reprex package (v0.3.0)创建


1
这很有趣,特别是你的函数“run on the fly”不起作用myiris%>% mutate(mydat = map2(mydat,new,function(df,y)df%>% mutate(!! y:= y)))%>% mydat。这可能与在环境中查找对象有关。 - akrun
1
这从来没有真正起作用,因为叹号叹号运算符 !! 只在顶层工作,而不在嵌套表达式中工作。如果可以使用 eval(y) :=eval_tidy(y) := 代替,那就没问题了,但它们不起作用。 - TimTeaFan
1
最好在他们的 GitHub 页面上提出问题。 - akrun
我正在考虑这件事,但也许有一种实际的方法可以做到。我对LHS表达式的所有可能符号表示感到困惑,所以我首先在这里发布它以确保。 - TimTeaFan
1个回答

2

您可以使用 quote() 保护您的 !! 防止外部调用,然后在嵌套调用中再次使用 !! 进行取消引用:

myiris_row %>% 
  mutate(mydat = list(mydat %>% mutate(!! quote(!!new) := new))) %>%
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width a    
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>
#>  1          5.1         3.5          1.4         0.2 a    
#>  2          4.9         3            1.4         0.2 a    
#>  3          4.7         3.2          1.3         0.2 a    
#>  4          4.6         3.1          1.5         0.2 a    
#>  5          5           3.6          1.4         0.2 a    
#>  6          5.4         3.9          1.7         0.4 a    
#>  7          4.6         3.4          1.4         0.3 a    
#>  8          5           3.4          1.5         0.2 a    
#>  9          4.4         2.9          1.4         0.2 a    
#> 10          4.9         3.1          1.5         0.1 a    
#> # ... with 40 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width b    
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>
#>  1          7           3.2          4.7         1.4 b    
#>  2          6.4         3.2          4.5         1.5 b    
#>  3          6.9         3.1          4.9         1.5 b    
#>  4          5.5         2.3          4           1.3 b    
#>  5          6.5         2.8          4.6         1.5 b    
#>  6          5.7         2.8          4.5         1.3 b    
#>  7          6.3         3.3          4.7         1.6 b    
#>  8          4.9         2.4          3.3         1   b    
#>  9          6.6         2.9          4.6         1.3 b    
#> 10          5.2         2.7          3.9         1.4 b    
#> # ... with 40 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width c    
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>
#>  1          6.3         3.3          6           2.5 c    
#>  2          5.8         2.7          5.1         1.9 c    
#>  3          7.1         3            5.9         2.1 c    
#>  4          6.3         2.9          5.6         1.8 c    
#>  5          6.5         3            5.8         2.2 c    
#>  6          7.6         3            6.6         2.1 c    
#>  7          4.9         2.5          4.5         1.7 c    
#>  8          7.3         2.9          6.3         1.8 c    
#>  9          6.7         2.5          5.8         1.8 c    
#> 10          7.2         3.6          6.1         2.5 c    
#> # ... with 40 more rows

myiris_row %>% 
  mutate(mydat = list(mydat %>% rename(!! quote(!!new) := "Sepal.Length"))) %>%
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 4
#>        a Sepal.Width Petal.Length Petal.Width
#>    <dbl>       <dbl>        <dbl>       <dbl>
#>  1   5.1         3.5          1.4         0.2
#>  2   4.9         3            1.4         0.2
#>  3   4.7         3.2          1.3         0.2
#>  4   4.6         3.1          1.5         0.2
#>  5   5           3.6          1.4         0.2
#>  6   5.4         3.9          1.7         0.4
#>  7   4.6         3.4          1.4         0.3
#>  8   5           3.4          1.5         0.2
#>  9   4.4         2.9          1.4         0.2
#> 10   4.9         3.1          1.5         0.1
#> # ... with 40 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 4
#>        b Sepal.Width Petal.Length Petal.Width
#>    <dbl>       <dbl>        <dbl>       <dbl>
#>  1   7           3.2          4.7         1.4
#>  2   6.4         3.2          4.5         1.5
#>  3   6.9         3.1          4.9         1.5
#>  4   5.5         2.3          4           1.3
#>  5   6.5         2.8          4.6         1.5
#>  6   5.7         2.8          4.5         1.3
#>  7   6.3         3.3          4.7         1.6
#>  8   4.9         2.4          3.3         1  
#>  9   6.6         2.9          4.6         1.3
#> 10   5.2         2.7          3.9         1.4
#> # ... with 40 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 4
#>        c Sepal.Width Petal.Length Petal.Width
#>    <dbl>       <dbl>        <dbl>       <dbl>
#>  1   6.3         3.3          6           2.5
#>  2   5.8         2.7          5.1         1.9
#>  3   7.1         3            5.9         2.1
#>  4   6.3         2.9          5.6         1.8
#>  5   6.5         3            5.8         2.2
#>  6   7.6         3            6.6         2.1
#>  7   4.9         2.5          4.5         1.7
#>  8   7.3         2.9          6.3         1.8
#>  9   6.7         2.5          5.8         1.8
#> 10   7.2         3.6          6.1         2.5
#> # ... with 40 more rows

1
谢谢!双感叹号的符号看起来有些奇怪,但它确实有效!它甚至可以解决 {dplyr} < 1.0.0 的问题,例如 myiris %>% mutate(mydat = map2(mydat, new, ~ mutate(.x, !! quote(!!.y) := .y))) - TimTeaFan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接