dplyr：如何使用mutate按列索引而不是列名引用列？

Question

dplyr：如何使用mutate按列索引而不是列名引用列？

73

使用dplyr，您可以像这样做：

iris %>% head %>% mutate(sum=Sepal.Length + Sepal.Width) 
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

但是在上面，我用列名来引用列。如何使用列索引1和2来实现相同的结果？

我有以下方法，但感觉不够优雅。

iris %>% head %>% mutate(sum=apply(select(.,1,2),1,sum))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

- Alby

6个回答

5

在 mutate 中，避免重复使用 . 并且保留分组信息的替代方法是使用 dplyr::cur_data_all()。来自 help(cur_data_all)

cur_data_all() 返回当前组的数据（包括分组变量）

考虑以下示例：

iris %>% group_by(Species) %>% mutate(sum = .[[1]] + .[[2]]) %>% head
#Error: Problem with `mutate()` column `sum`.
#ℹ `sum = .[[1]] + .[[2]]`.
#ℹ `sum` must be size 50 or 1, not 150.
#ℹ The error occurred in group 1: Species = setosa.

如果你使用cur_data_all()，它就没有问题：

iris %>% mutate(sum = select(cur_data_all(),1) + select(cur_data_all(),2)) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length
#1          5.1         3.5          1.4         0.2  setosa          8.6
#2          4.9         3.0          1.4         0.2  setosa          7.9
#3          4.7         3.2          1.3         0.2  setosa          7.9
#4          4.6         3.1          1.5         0.2  setosa          7.7
#5          5.0         3.6          1.4         0.2  setosa          8.6
#6          5.4         3.9          1.7         0.4  setosa          9.3

在提取运算符（[[）方面，同样的方法适用。

iris %>% mutate(sum = cur_data()[[1]] + cur_data()[[2]]) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
#1          5.1         3.5          1.4         0.2  setosa 8.6
#2          4.9         3.0          1.4         0.2  setosa 7.9
#3          4.7         3.2          1.3         0.2  setosa 7.9
#4          4.6         3.1          1.5         0.2  setosa 7.7
#5          5.0         3.6          1.4         0.2  setosa 8.6
#6          5.4         3.9          1.7         0.4  setosa 9.3

- Ian Campbell

5

我有点晚来到这个游戏，但在这种情况下，我的个人策略是编写自己的 tidyverse-compliant 函数，以实现我想要的功能。所谓 tidyverse-compliant，指的是函数的第一个参数是数据框，并且输出是可以添加到数据框中的向量。

sum_cols <- function(x, col1, col2){
   x[[col1]] + x[[col2]]
}

iris %>%
  head %>%
  mutate(sum = sum_cols(x = ., col1 = 1, col2 = 2))

- SavedByJESUS

1

现在可以非常好地使用 dplyr::rowwise() 和 dplyr::c_across() 的组合来完成此操作 (packageVersion("dplyr") >= 1.0.0)。

library(dplyr)

packageVersion("dplyr")
#> [1] '1.0.10'

iris %>% 
  head %>% 
  rowwise() %>% 
  mutate(sum = sum(c_across(c(1, 2))))
#> # A tibble: 6 × 6
#> # Rowwise: 
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   sum
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl>
#> 1          5.1         3.5          1.4         0.2 setosa    8.6
#> 2          4.9         3            1.4         0.2 setosa    7.9
#> 3          4.7         3.2          1.3         0.2 setosa    7.9
#> 4          4.6         3.1          1.5         0.2 setosa    7.7
#> 5          5           3.6          1.4         0.2 setosa    8.6
#> 6          5.4         3.9          1.7         0.4 setosa    9.3

^{使用reprex v2.0.2于2022年11月01日创建}

- Dan Adams

1

你认为这个版本怎么样？
受@SavedByJesus的回答启发。

applySum <- function(df, ...) {
  assertthat::assert_that(...length() > 0, msg = "one or more column indexes are required")
  mutate(df, Sum = apply(as.data.frame(df[, c(...)]), 1, sum))
}

iris %>%
  head(2) %>%
  applySum(1, 2)
#
### output
#
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
#
### you can select and sum more then two columns by the same function
#
iris %>%
  head(2) %>%
  applySum(1, 2, 3, 4)
#
### output
#
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species  Sum
1          5.1         3.5          1.4         0.2  setosa 10.2
2          4.9         3.0          1.4         0.2  setosa  9.5

- benaja

0

为了解决评论中@pluke提出的问题，dplyr不真正支持列索引。

这不是一个完美的解决方案，但你可以使用基本的R来解决这个问题


iris[1] <- iris[1] + iris[2]

- Nina Sonneborn

关于dplyr不支持列索引的相关评论...我想知道循环解决方案是什么？ - Markm0705

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jeremycg · Accepted Answer

99

您可以尝试：

iris %>% head %>% mutate(sum = .[[1]] + .[[2]])

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

- jeremycg

12

请注意，这将与group_by不兼容：iris％>% group_by（Species）％>% mutate（sum = .[[1]] + .[[2]]），而iris％>% group_by（Species）％>% mutate（sum = Sepal.Length + Sepal.Width）可以。 - MrFlick

2

@MrFlick - 也许我漏掉了什么。当你按行计算时，为什么分组很重要呢？如果他们正在执行其他操作，他们可能可以在其中添加一个 ungroup() 然后重新分组。我以前发现过这是必要的。 - Rich Scriven

8

这更像是一个警告，提醒这种方法真正绕过了很多dplyr的基础设施，因此可能会破坏一些本应正常工作的分组等功能。实质上，您跳过了mutate的"data="参数。对于逐行的mutate()而言，您是正确的，但请考虑以下情况：iris %>% group_by(Species) %>% summarize(x=mean(.[[1]] + .[[2]])) 这不是指定索引列的一个好的“通用”方法。 - MrFlick

6

当你设置mutate列时，通过列引用进行操作的工作原理是怎样的？iris %>% head %>% mutate(.[[1]] = .[[1]] + .[[2]])会报错：Error: unexpected '=' in "iris %>% head %>% mutate(.[[1]] =" - pluke

关于 dplyr 1.0.0，有一个解决方法：df %>% group_by(eval(names(.)[1])) %>% ... - Jorge Esteban Mendoza

1

这种解决方案的另一个注意点是，本地管道运算符 |> 不支持 . 符号表示法。 - cbrnr