caret中的并行处理在R 2.13.0上无法工作。

3
我是使用 R 包 caret,并且并行处理无法运行。如果我尝试从 train 函数运行示例:
library(mlbench)
data(BostonHousing)

library(doMC)
registerDoMC(2)

## NOTE: don't run models form RWeka when using
### multicore. The session will crash.

## The code for train() does not change:
set.seed(1)
usingMC <-  train(medv ~ .,
                  data = BostonHousing, 
                  "glmboost")

我收到了以下错误消息:
Error in names(resamples) <- gsub("^\\.", "", names(resamples)) : 
  attempt to set an attribute on NULL

我正在使用MacBook Pro,早期2011年款,配备2.3GHz英特尔Core i5和Mac OS X 10.6.8。

R会话信息:

R版本2.13.0(2011-04-13)平台: x86_64-apple-darwin9.8.0 / x86_64(64位)

附加的基础包:[1]统计图形grDevices实用程序
数据集方法基础

其他已附加的软件包:[1]caret_5.13-20 cluster_1.14.2 reshape_0.8.4 plyr_1.7.1 lattice_0.19-33 mlbench_2.1-0
doMC_1.2.3 multicore_0.1-7 [9] foreach_1.3.2 codetools_0.2-8 iterators_1.0.5

通过命名空间加载(而不是附加):[1]编译器_2.13.0 网格_2.13.0 rpart_3.1-51 工具_2.13.0

是否有什么我可以做来解决这个问题?


当我使用method =“gbm”在其他数据上进行训练时,我随机地收到相同的错误消息。我没有看到像svmLinear,svmPoly,svmRadial,svmRadialCost,rda这样的方法出现过这种错误。 - djhurio
2个回答

2
  1. It may be difficult to find someone who can reproduce your error: With

    > sessionInfo ()
    R version 2.15.0 (2012-03-30)
    Platform: x86_64-pc-linux-gnu (64-bit)
    

    [...snip...]

    other attached packages:
     [1] mboost_2.1-2    caret_5.15-023  cluster_1.14.2  reshape_0.8.4  
     [5] plyr_1.7.1      lattice_0.20-6  doMC_1.2.5      multicore_0.1-7
     [9] iterators_1.0.6 foreach_1.4.0   mlbench_2.1-0          
    
    loaded via a namespace (and not attached):
    [1] codetools_0.2-8  compiler_2.15.0  grid_2.15.0      Matrix_1.0-6    
    [5] splines_2.15.0   survival_2.36-14 tools_2.15.0    
    

    it works.

  2. Which means you'll probably need to dig into the code: traceback () and debug () should help.


1

至少在2.14.0上,我无法重现以下问题。

插入符代码在顺序处理和并行处理中没有不同的版本,所以我不确定问题出在哪里。顺序版本是否有效?其他模型呢?您也可以尝试在新会话中运行。

此外,您可能希望直接发送电子邮件给软件包维护者(除非您已经这样做了,而我错过了),以获得更好的结果。

> library(caret)

<-snip->

> library(mlbench)
> data(BostonHousing)
> 
> library(doMC)

<-snip->

> registerDoMC(2)
> 
> ## NOTE: don't run models form RWeka when using
> ### multicore. The session will crash.
> 
> ## The code for train() does not change:
> set.seed(1)
> usingMC <-  train(medv ~ .,
+                   data = BostonHousing, 
+                   "glmboost")
Warning message:
In glmboost.matrix(x = c(0.00632, 0.02731, 0.02729, 0.03237, 0.06905,  :
  model with centered covariates does not contain intercept
> usingMC
506 samples
 13 predictors

No pre-processing
Resampling: Bootstrap (25 reps) 

Summary of sample sizes: 506, 506, 506, 506, 506, 506, ... 

Resampling results across tuning parameters:

  mstop  RMSE  Rsquared  RMSE SD  Rsquared SD
  50     5.44  0.663     0.484    0.0661     
  100    5.33  0.675     0.518    0.0669     
  150    5.27  0.681     0.526    0.0661     

Tuning parameter 'prune' was held constant at a value
 of 'no'
RMSE was used to select the optimal model using 
 the smallest value.
The final values used for the model were mstop = 150
 and prune = no. 
> sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
 [1] mboost_2.1-1    doMC_1.2.5      multicore_0.1-7
 [4] iterators_1.0.5 mlbench_2.1-0   caret_5.15-023 
 [7] foreach_1.4.0   cluster_1.14.1  reshape_0.8.4  
[10] plyr_1.7.1      lattice_0.20-0 

loaded via a namespace (and not attached):
[1] codetools_0.2-8  compiler_2.14.0  grid_2.14.0     
[4] Matrix_1.0-3     splines_2.14.0   survival_2.36-10
[7] tools_2.14.0  

1
这个 bug 相当奇怪。我今天在一个 OpenStack 集群上运行 caret 时遇到了完全相同的问题。然而我意识到,当我注释掉 registerDoMC() 时,代码成功运行了。 - Ekaba Bisong

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接