如何在R函数中将对象导出到并行集群

Question

如何在R函数中将对象导出到并行集群

13

我正在编写一个函数，以组合和整理数据，然后使用基本R中的parallel函数并行运行MCMC链。以下是我的函数。

dm100zip <- function(y, n.burn = 1, n.it = 3000, n.thin = 1) {
  y <- array(c(as.matrix(y[,2:9]), as.matrix(y[ ,10:17])), c(length(y$Plot), 8, 2))
  nplots <- nrow(y)
  ncap1 <- apply(y[,1:8, 1],1,sum)
  ncap2 <- apply(y[,1:8, 2],1,sum)
  ncap <- as.matrix(cbind(ncap1, ncap2))
  ymax1 <- apply(y[,1:8, 1],1,sum)
  ymax2 <- apply(y[,1:8, 2],1,sum)

  # Bundle data for JAGS/BUGS
  jdata100 <- list(y=y, nplots=nplots, ncap=ncap)

  # Set initial values for Gibbs sampler
  inits100 <- function(){
    list(p0=runif(1, 1.1, 2),
      p.precip=runif(1, 0, 0.1),
      p.day = runif(1, -.5, 0.1))
  }

  # Set parameters of interest to monitor and save
  params100 <- c("N", "p0")

  # Run JAGS in parallel for improved speed
  CL <- makeCluster(3) # set number of clusters = to number of desired chains
  clusterExport(cl=CL, list("jdata100", "params100", "inits100", "ymax1", "ymax2", "n.burn", "jag", "n.thin")) # make data available to jags in diff cores
  clusterSetRNGStream(cl = CL, iseed = 5312)

  out <- clusterEvalQ(CL, {
    library(rjags)
    load.module('glm')
    jm <- jags.model("dm100zip.txt", jdata100, inits100, n.adapt = n.burn, n.chains = 1)
    fm <- coda.samples(jm, params100, n.iter = n.it, thin = n.thin)
    return(as.mcmc(fm))

  })

  out.list <- mcmc.list(out) # group output from each core into one list
  stopCluster(CL)

  return(out.list)
}

当我运行函数时，出现了一个错误，提示clusterExport函数中找不到使用的n.burn、n.it和n.thin。例如：

dm100zip.list.nain <- dm100zip(NAIN, n.burn = 1, n.it = 3000, n.thin = 1) # returns error

如果在运行函数之前我为它们每个设定值，那么它就使用这些值并正常运行。例如：

n.burn = 1
n.it = 1000
n.thin = 1
dm100zip.list.nain <- dm100zip(NAIN, n.burn = 1, n.it = 3000, n.thin = 1)

这个运行良好，但使用的是 n.it = 1000 而不是 3000。

请问为什么全局环境中的对象可以被 ClusterExport 函数使用，但是函数内分配的值却不能？有没有解决办法？

- djhocking

2个回答

3

由于R中的函数参数是通过惰性评估处理的，因此您需要确保任何默认参数实际上存在于函数的执行环境中。事实上，R核心作者为此目的包含了force函数，它只是function(x) x，并强制将参数从承诺转换为已评估的表达式。尝试进行以下修改：

dm100zip <- function(y, n.burn = 1, n.it = 3000, n.thin = 1) {
  force(n.burn); force(n.it); force(n.thin)
  # The rest of your code as above...
}

如果您想更详细地了解这些问题，请参阅Hadley有关函数的“惰性求值”部分。

- Robert Krzyzanowski

谢谢提供的信息。我之前不知道force函数。不幸的是，我仍然遇到了同样的错误：> dm100zip.list.nain <- dm100zip(NAIN, n.burn = 1, n.it = 3000, n.thin = 1) 错误信息为：在get(name, envir = envir) : object 'n.burn' not found。调用自：eval(substitute(browser(skipCalls = pos), list(pos = 9 - frame)), envir = sys.frame(frame))。 - djhocking

1

不需要强制评估通过clusterExport导出的变量。您可能需要强制对隐式发送到用户提供的worker函数序列化环境中的工作者进程的变量进行评估，但这是在parLapply和clusterApplyLB等函数中出现的问题，而不是在clusterEvalQ中。 - Steve Weston

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Steve Weston · Accepted Answer

默认情况下，clusterExport在全局环境中查找“varlist”指定的变量。在您的情况下，它应该查找dm100zip函数的本地环境。为了实现这一点，您可以使用clusterExport的“envir”参数：

clusterExport(cl=CL, list("jdata100", "params100", "inits100", "ymax1",
                          "ymax2", "n.burn", "jag", "n.thin"),
              envir=environment())

请注意，“varlist”中定义在全局环境中的变量也会被找到，但是在dm100zip中定义的值将具有优先权。