使用dplyr标准评估版本的do.call

Question

使用dplyr标准评估版本的do.call

10

我如何使用变量参数和函数来使用标准评估版本的dplyr中的summarise_使得do.call函数能够正常工作？

## Some sample data, function, and variables to interpolate
set.seed(0)
dat <- data.frame(a=runif(10), b=runif(10))
fn <- function(x, y) IQR(x / y, na.rm = TRUE)
funs <- list(fn="fn")
targs <- list("a", "b")

这是我正在尝试使其正常工作的 lazyeval::interp

library(dplyr)
interp(~do.call(fn, xs), .values=list(fn=funs$fn, xs=targs))
# ~do.call("fn", list("a", "b"))

但它不起作用，

dat %>%
  summarise_(out = interp(~do.call(fn, xs), .values=list(fn=funs$fn, xs=targs)))

期望的结果

dat %>%
  summarise(out = do.call(fn, list(a, b)))
#        out
# 1 1.084402

如果我添加一些打印语句，我知道问题是“a”和“b”没有被正确解释，但我还没有找到如何正确引用它们的方法。

fn <- function(x, y) { print(x); print(y); IQR(x / y, na.rm = TRUE) }
dat %>%
  summarise_(out = interp(~do.call(fn, xs), fn=funs$fn, xs=targs))
# [1] "a"
# [1] "b"
# Error: non-numeric argument to binary operator

- Rorschach

我可能会漏掉一些东西（从未理解过interp结构的吸引力），但是dat %>% summarise(out = do.call(fn, unname(.[unlist(targs)])))可以用来获取list(a,b)，或者dat %>% summarise(out = do.call(fn, lapply(targs, function(x) .[[x]])))。 - Frank

你真的需要在这里使用 do.call 吗？它不能只是像这样 dat %>% summarise_(out = interp(~f(x,y), f = as.name(funs$fn), x = as.name(targs[[1]]), y = as.name(targs[[2]]))) 吗？ - talat

你能否澄清一下关于不知道函数需要多少个参数的情况？如果你提供一个你想要实现的例子，有可能会有人帮助你找到解决这个问题的方法，而无需使用 do.call。 - aosmith

@TheTime 我正在考虑使用 eval(parse(text=...)) 范式可能会起作用，将 targs = "list(a, b)"，但这将创建相同的问题，其中 a 仍然被解释为 "a"。有时使用没有引号的字符串是 R 的祸根，会产生无尽的问题。 - Mike Williamson

我不明白为什么@Frank提供的答案不起作用。应该在哪里添加group_by才能使其正常运行？ - Sam Dickson

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- kdauria · Accepted Answer

targs 参数需要是一个 call 类。在下面的第二行（和第三行）中，call 中的变量（a 和 b）需要是一个 name 类。可以通过使用 ?call、?as.name 和 ?is.language 来更好地理解这一行。

dat <- data.frame(a=runif(10), b=runif(10), grp=rep(1:2, each=5))
targs_quoted = do.call(call, c("list", lapply(targs, as.name)), quote=TRUE)
# In hardcoded form, targs_quoted = quote(list(a, b))
dat %>%
  group_by(grp) %>%
  summarise_(out = interp(~do.call(fn, xs), 
                          .values=list(fn=funs$fn, xs=targs_quoted)))

# Source: local data frame [2 x 2]
#     
#       grp       out
#     (int)     (dbl)
#  1     1  1.0754497
#  2     2  0.9892201

dplyr的“nse”（非标准评估）手册在这里非常有帮助。我发现 . 始终指的是整个表格，而不是分组表格。这就是为什么评论中的某些建议不能按预期“工作”的原因。