这段代码是有效的:
library(plyr)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE)
虽然这段代码失败了:
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)
stopWorkers(workers)
>Error in do.ply(i) : task 3 failed - "subscript out of bounds"
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
我正在使用R 2.1.12,plyr 1.4和doSMP 1.0-1。有人找到了解决方法吗?
编辑:回应Andrie的问题,这里有进一步的说明:
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #1
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #2
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #3
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #4
stopWorkers(workers)
前三个函数都能正常工作,但是它们都需要大约3秒钟的时间。第二个函数会发出警告,表示没有注册并行后端,因此按顺序执行。第四个函数会出现我在原帖中提到的相同错误。
/编辑:更加好奇了:在我的 Mac 上,以下内容可以正常工作:
library(plyr)
library(doMC)
registerDoMC()
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)
但是它失败了:
library(plyr)
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)
stopWorkers(workers)
这个也失败了:
library(plyr)
library(snow)
library(doSNOW)
cl <- makeCluster(2, type = "SOCK")
registerDoSNOW(cl)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)
stopCluster(cl)
我想各种用于foreach的并行后端不是可以互换的。
do.ply
函数的封闭中传递大量信息。默认情况下,这些数据不会被传递,并且需要对.export
参数进行调整才能正常工作。仍然不确定如何在一般情况下解决这个问题。 - hadley