使用h2o.randomForest()和h2o.gbm()绘制决策树。

11

寻找一种有效的方法,在rstudio、H2O的Flow或本地html页面中绘制树,类似于下面链接中图片所示的h2o的RF和GBM模型。 具体来说,如何为下面代码生成的对象(已拟合模型)rf1和gbm2绘制树,也许是通过解析h2o.download_pojo(rf1)或h2o.download_pojo(gbm1)实现的?

http://i.stack.imgur.com/3OWx1.png

# # The following two commands remove any previously installed H2O packages for R.
# if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }
# if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }

# # Next, we download packages that H2O depends on.
# pkgs <- c("methods","statmod","stats","graphics","RCurl","jsonlite","tools","utils")
# for (pkg in pkgs) {
#   if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }
# }
# 
# # Now we download, install h2o package
# install.packages("h2o", type="source", repos=(c("http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/3/R")))
library(h2o)

h2o.init(nthreads = -1, max_mem_size = "2G")
h2o.removeAll()  ##clean slate - just in case the cluster was already running

## Load data - available to download from link below
## https://www.dropbox.com/s/gu8e2o0mzlozbu4/SampleData.csv?dl=0
df <- h2o.importFile(path = normalizePath("../SampleData.csv"))

splits <- h2o.splitFrame(df, c(0.4, 0.3), seed = 1234)

train <- h2o.assign(splits[[1]], "train.hex")
valid <- h2o.assign(splits[[2]], "valid.hex")
test <- h2o.assign(splits[[2]], "test.hex")

predictor_col_start_pos <- 2
predictor_col_end_pos <- 169
predicted_col_pos <- 1

rf1 <- h2o.randomForest(training_frame = train, validation_frame = valid, 
                        x = predictor_col_start_pos:predictor_col_end_pos, y = predicted_col_pos, 
                        model_id = "rf_covType_v1", ntrees = 2000, stopping_rounds = 10, score_each_iteration = T, 
                        seed = 2001)

gbm1 <- h2o.gbm(training_frame = train, validation_frame = valid, x = predictor_col_start_pos:predictor_col_end_pos, 
            y = predicted_col_pos, model_id = "gbm_covType2", seed = 2002, ntrees = 20, 
            learn_rate = 0.2, max_depth = 10, stopping_rounds = 2, stopping_tolerance = 0.01, 
            score_each_iteration = T)


## Next step would be to plot trees for fitted models rf1 and gbm2
# print the model, POJO (Plain Old Java Object) to screen
h2o.download_pojo(rf1)
h2o.download_pojo(gbm1)

已提供可再现的示例。 - Webby
感谢您提供可重现的示例。我们现在可以将其迁移到 [SO]。如果您稍等片刻,它应该很快就会到达那里。 - gung - Reinstate Monica
有没有一种方法可以将其绘制成图形方式,例如可视化最终的树? - alwaysaskingquestions
“你是说那是正确的意思吗?” - alwaysaskingquestions
推送 我也对这里的解决方案非常感兴趣。 - constiii
@alwaysaskingquestions 我还没有找到实现那个的方法。 - Webby
2个回答

9
我认为这可能是你正在寻找的解决方案;
library(h2o)
h2o.init()
df = h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
model = h2o.gbm(model_id = "model",
            training_frame = df,
            x = c("Year", "Month", "DayofMonth", "DayOfWeek", "UniqueCarrier"),
            y = "IsDepDelayed",
            max_depth = 3,
            ntrees = 5)
h2o.download_mojo(model, getwd(), FALSE)

现在从http://www.h2o.ai/download/下载最新的稳定版h2o,并从命令行运行PrintMojo工具。

java -cp h2o.jar hex.genmodel.tools.PrintMojo --tree 0 -i model.zip -o model.gv
dot -Tpng model.gv -o model.png

打开 model.png

更多信息:http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/index.html


源代码:https://github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/main/java/hex/genmodel/tools/PrintMojo.java - WesternGun
你会如何绘制这个图形,并用颜色/形状表示末端节点的分类? - RealViaCauchy

1

新的Tree API在3.22.0.1版本(2018年10月)中推出,改变了可视化H2O树的整个游戏。一般的工作流程可能如下所示: enter image description here 并且可以在这里找到具有代码的详细示例:最终,您可以在R中绘制H2O决策树


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接