我将尝试使用xgboost在R中研究我的模型。普通的训练模型效果良好,但使用caret时指标存在一些问题。
我尝试为类列设置因子,但仍然没有结果。
这是我的数据:
ID var1var2TARGET
1 5 0 1
2 4 3 1
3 4 2 0
4 3 1 0
5 2 4 1
6 1 2 1
7 5 3 1
8 4 1 0
9 4 1 0
10 2 4 1
11 5 5 1
对于这个,我会做
train <- read.csv()
train.y <- train$TARGET
train$TARGET <- NULL
train$ID <- NULL
train.y <- lapply(train.y, factor)
然后我准备模型参数。
xgb_grid_1 = expand.grid(
nrounds = 1000,
eta = c(0.01, 0.001, 0.0001),
max_depth = c(2, 4, 6, 8, 10),
gamma = 1
)
# pack the training control parameters
xgb_trcontrol_1 = trainControl(
method = "cv",
number = 5,
verboseIter = TRUE,
returnData = FALSE,
returnResamp = "all", # save losses across all models
classProbs = TRUE, # set to TRUE for AUC to be computed
summaryFunction = twoClassSummary,
allowParallel = TRUE
)
在进行上述所有操作之后,我调用了训练函数。
xgb_train_1 = train(
x = train,
y = train.y,
trControl = xgb_trcontrol_1,
tuneGrid = xgb_grid_1,
method = "xgbTree"
)
It gives me
Error in train.default(x = train, y = train.y, trControl = xgb_trcontrol_1, :
Metric RMSE not applicable for classification models
为什么会这样呢?
?train
,似乎metric
参数被设置为rmse
(*metric = ifelse(is.factor(y), "Accuracy", "RMSE"
*)。所以我会尝试将我的结果设置为因子,通过train.y <- factor(train$TARGET)
或者明确地设置metric="Accuracy"
。 - user20650train.y <- factor(train$TARGET)
,那么会出现错误提示:至少一个类别级别不是有效的 R 变量名
。 - paveltr