R caret训练错误:在evalSummaryFunction中出现错误:无法为回归计算类概率。

6
> cv.ctrl <- trainControl(method = "repeatedcv", repeats = 3,
+                         summaryFunction = twoClassSummary,
+                         classProbs = TRUE)
> 
> set.seed(35)
> glm.tune.1 <- train(y ~ bool_3,
+                     data = train.batch,
+                     method = "glm",
+                     metric = "ROC",
+                     trControl = cv.ctrl)
Error in evalSummaryFunction(y, trControl, classLevels, metric, method) : 
  train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl()
In addition: Warning message:
In train.default(x, y, weights = w, ...) :
  cannnot compute class probabilities for regression


 > str(train.batch)
'data.frame':   128046 obs. of  42 variables:
 $ offer               : int  1194044 1194044 1194044 1194044 1194044 1194044 1194044 1194044 1194044 1194044 ...
 $ avgPrice            : num  2.68 2.68 2.68 2.68 2.68 ...
 ...
 $ bool_3              : int  0 0 0 0 0 0 0 1 0 0 ...
 $ y                   : num  0 1 0 0 0 1 1 1 1 0 ...

由于cv.ctrl的classProbs设置为TRUE,我不明白为什么会出现这个错误信息。有人可以给些建议吗?
2个回答

9
显然,这个错误是由于我的 y 不是一个因子导致的。
以下代码可以正常工作:
library(caret)
library(mlbench)
data(Sonar)

ctrl <- trainControl(method = "cv", 
                     summaryFunction = twoClassSummary, 
                     classProbs = TRUE)
set.seed(1)
gbmTune <- train(Class ~ ., data = Sonar,
                 method = "gbm",
                 metric = "ROC",
                 verbose = FALSE,                    
                 trControl = ctrl)

接下来执行:

Sonar$Class = as.numeric(Sonar$Class)

而相同的代码会报错:

> gbmTune <- train(Class ~ ., data = Sonar,
+                  method = "gbm",
+                  metric = "ROC",
+                  verbose = FALSE,                    
+                  trControl = ctrl)
Error in evalSummaryFunction(y, trControl, classLevels, metric, method) : 
  train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl()
In addition: Warning message:
In train.default(x, y, weights = w, ...) :
  cannnot compute class probabilities for regression

但是插入符训练文档中提到:
y   a numeric or factor vector containing the outcome for each sample.

5
train函数可以用于回归(当y为数字时)和分类(当y为因子时)。 - topepo

1
如果你将y中的值改为"YES"和"NO",而不是1和0,那么代码就可以运行。
y=ifelse(train.batch$y==0,"No","Yes")

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接