计算
然而,当使用
randomForest
回归时,对象包括R平方值,表示为"% Var explained: ...
"。library(randomForest)
library(doSNOW)
library(foreach)
library(ggplot2)
dat <- data.frame(ggplot2::diamonds[1:1000,1:7])
rf <- randomForest(formula = carat ~ ., data = dat, ntree = 500)
rf
# Call:
# randomForest(formula = carat ~ ., data = dat, ntree = 500)
# Type of random forest: regression
# Number of trees: 500
# No. of variables tried at each split: 2
#
# Mean of squared residuals: 0.001820046
# % Var explained: 95.22
然而,当使用
foreach
循环来计算和combine
多个randomForest
对象时,R-squared值不可用,正如?combine
所述:
组合对象的
confusion
、err.rate
、mse
和rsq
组件(以及测试组件中对应的组件,如果存在)将为NULL
cl <- makeCluster(8)
registerDoSNOW(cl)
rfPar <- foreach(ntree=rep(63,8),
.combine = combine,
.multicombine = T,
.packages = "randomForest") %dopar%
{
randomForest(formula = carat ~ ., data = dat, ntree = ntree)
}
stopCluster(cl)
rfPar
# Call:
# randomForest(formula = carat ~ ., data = dat, ntree = ntree)
# Type of random forest: regression
# Number of trees: 504
# No. of variables tried at each split: 2
由于这个问题在这篇文章中并没有得到真正的回答:在得到randomForest
对象之后,是否有可能计算R平方(% Var explained)和均方残差?
(对于此并行化的批评者可能会主张使用caret::train(... method = "parRF")
或其他方法。但是,这种方法需要很长时间。实际上,对于任何使用combine
来合并randomForest
对象的人都可能非常有用...)
R2 <- 1 - var(rfPar$y - rfPar$predicted) / var(rfPar$y)
- gibbone