我有一组包含结果列的CSV用于训练,还有一组没有结果列的测试CSV。
library(h2o)
h2o.init()
train <- read.csv(train_file, header=T)
train.h2o <- as.h2o(train)
y <- "Result"
x <- setdiff(names(train.h2o), y)
model <- h2o.deeplearning(x = x,
y = y,
training_frame = train.h2o,
model_id = "my_model",
epochs = 5000,
hidden = c(50),
stopping_rounds=5,
stopping_metric="misclassification",
stopping_tolerance=0.001,
seed = 1)
test <- read.csv(test_file, header=T)
test.h2o <- as.h2o(test)
pred <- h2o.predict(model,test.h2o)
当我尝试用测试数据预测结果时,我会得到一堆错误信息,例如:
1: In doTryCatch(return(expr), name, parentenv, handler) :
Test/Validation dataset column 'ColumnName' has levels not trained on: [ABCD, BCDE]
H2O曾经可以处理测试数据中存在但训练数据中不存在的数据。我在网上找到了一些帖子,他们说他们可以做到。但是对于我来说并没有起作用。
如何避免这些错误,并预测测试数据的值?