如何在dplyr中使用“summarise”函数并动态指定列名？

Question

如何在dplyr中使用“summarise”函数并动态指定列名？

6

我正在使用R中的dplyr包中的summarize函数从表格中总结组均值。我希望能够动态地使用存储在另一个变量中的列名字符串来完成这个操作。

以下是“正常”的方式，当然它可以工作：

myTibble <- group_by( iris, Species)
summarise( myTibble, avg = mean( Sepal.Length))

# A tibble: 3 x 2
  Species     avg
  <fct>      <dbl>
1 setosa      5.01
2 versicolor  5.94
3 virginica   6.59

然而，我希望做的是这样的：

myTibble <- group_by( iris, Species)
colOfInterest <- "Sepal.Length"
summarise( myTibble, avg = mean( colOfInterest))

我已经阅读了dplyr编程页面，并尝试了很多组合，如quo、enquo、!!、.dots=(...)等等，但我还没有找到正确的方法。

我也知道这个答案，但是，1）当我使用标准评估函数standardise_时，R告诉我它已经过时了，2）那个答案似乎一点也不优雅。那么，有没有一个好的、简单的方法来做到这一点呢？

谢谢！

- Vance

2个回答

2

另一个解决方案：

iris %>% 
  group_by(Species) %>% 
  summarise_at(vars("Sepal.Length"), mean) %>%
  ungroup()

# A tibble: 3 x 2
  Species    Sepal.Length
  <fct>             <dbl>
1 setosa             5.01
2 versicolor         5.94
3 virginica          6.59

- Florian

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- G. Grothendieck · Accepted Answer

1) 使用以下方式使用!!sym(...)：

colOfInterest <- "Sepal.Length"
iris %>% 
  group_by(Species) %>%
  summarize(avg = mean(!!sym(colOfInterest))) %>%
  ungroup

提供：

# A tibble: 3 x 2
  Species      avg
  <fct>      <dbl>
1 setosa      5.01
2 versicolor  5.94
3 virginica   6.59

2) 第二种方法是：

colOfInterest <- "Sepal.Length"
iris %>% 
  group_by(Species) %>%
  summarize(avg = mean(.data[[colOfInterest]])) %>%
  ungroup

当然，在基本的R语言中，这是很直接的：

aggregate(list(avg = iris[[colOfInterest]]), iris["Species"], mean)