在R的caret包中使用Adaboost

Question

在R的caret包中使用Adaboost

rmachine-learningdata-miningclassificationadaboost

12

我已经使用ada R包有一段时间了，最近也开始使用caret。根据文档，caret的train()函数应该有一个选项可以使用ada。但是，当我使用与我的ada()调用中相同的语法时，caret出现错误。

下面是一个演示，使用wine样本数据集。

library(doSNOW)
registerDoSNOW(makeCluster(2, type = "SOCK"))
library(caret)
library(ada)

wine = read.csv("http://www.nd.edu/~mclark19/learn/data/goodwine.csv")


set.seed(1234) #so that the indices will be the same when re-run
trainIndices = createDataPartition(wine$good, p = 0.8, list = F)
wanted = !colnames(wine) %in% c("free.sulfur.dioxide", "density", "quality",
                            "color", "white")

wine_train = wine[trainIndices, wanted]
wine_test = wine[-trainIndices, wanted]
cv_opts = trainControl(method="cv", number=10)


 ###now, the example that works using ada() 

 results_ada <- ada(good ~ ., data=wine_train, control=rpart.control
 (maxdepth=30, cp=0.010000, minsplit=20, xval=10), iter=500)

##this works, and gives me a confusion matrix.

results_ada
     ada(good ~ ., data = wine_train, control = rpart.control(maxdepth = 30, 
     cp = 0.01, minsplit = 20, xval = 10), iter = 500)
     Loss: exponential Method: discrete   Iteration: 500 
      Final Confusion Matrix for Data:
      Final Prediction
      etc. etc. etc. etc.

##Now, the calls that don't work. 

results_ada = train(good~., data=wine_train, method="ada",
control=rpart.control(maxdepth=30, cp=0.010000, minsplit=20, 
xval=10), iter=500)
   Error in train.default(x, y, weights = w, ...) : 
   final tuning parameters could not be determined
   In addition: Warning messages:
   1: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method,  :
    There were missing values in resampled performance measures.
   2: In train.default(x, y, weights = w, ...) :
    missing values found in aggregated results

 ###this doesn't work, either

results_ada = train(good~., data=wine_train, method="ada", trControl=cv_opts,
maxdepth=10, nu=0.1, iter=50)

  Error in train.default(x, y, weights = w, ...) : 
  final tuning parameters could not be determined
  In addition: Warning messages:
  1: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method,  :
    There were missing values in resampled performance measures.
  2: In train.default(x, y, weights = w, ...) :
   missing values found in aggregated results

我猜测train()需要额外的输入，但是抛出的警告没有给我任何提示缺少什么。此外，我可能缺少依赖项，但是没有提示应该有什么......

- Bryan

4个回答

2

查找?train并搜索ada，您会看到：

来自带有调整参数：iter、maxdepth、nu（仅分类）的ada包的方法值：ada

因此，您必须缺少nu参数和maxdepth参数。

- nograpes

看一下我最后调用的train()函数——它包含了你提到的所有参数。results_ada = train(good~., data=wine_train, method="ada", trControl=cv_opts, maxdepth=10, nu=0.1, iter=50) - Bryan

另外，我尝试去掉 trControl=cv_opts，但没有任何区别。仍然出现错误。 - Bryan

1

wine$good中的数据类型是什么？如果它是一个factor，请明确说明它是这样的：

wine$good <- as.factor(wine$factor)
stopifnot(is.factor(wine$good))

原因：通常情况下，R包需要在区分分类和回归场景方面提供一些帮助，在caret中可能会有一些通用代码错误地将练习识别为回归问题（忽略ada仅进行分类的事实）。

- vijucat

我尝试了你的建议（显式地将“wine”作为因子启动），但我仍然遇到错误...上述的可重现示例在你的系统上能否运行？ - Bryan

最终尝试了一下，很抱歉，我遇到了和你一样的错误，而且无法解决。method="rf"可以正常工作，但我想这并不能让你感到安慰，也就是说，你真的想要的是method="ada"。 - vijucat

啊哈，train(up ~ ., data=sym[,c(6, 14)], "ada") 运行正常，没有任何参数建议！ - vijucat

似乎“调整参数'nu'被保持在0.1的值上。使用准确度选择最佳模型，使用最大值。模型的最终值为iter = 50，maxdepth = 1和nu = 0.1。” - vijucat

你能否提供一个可重现的例子吗？另外，如果我确实想传递参数值怎么办？ - Bryan

这个可以运行，但参数值还不确定：

sym <- data.frame("up"=sample(c(1, 0), 100, replace=T), "blah"=sample(1:10, 100, replace=T)); sym$up <- as.factor(sym$up); train(up ~ ., data=sym, "ada", max.iter=5)

- vijucat

0

请在tuneGrid中包含参数。

Grid <- expand.grid(maxdepth=25,nu=2,iter=100)
results_ada = train(good~., data=wine_train, method="ada",
trControl=cv_opts,tuneGrid=Grid)

这将会起作用。

- prabhanjan reddy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- TomR · Accepted Answer

所以这似乎可以工作：

wineTrainInd <- wine_train[!colnames(wine_train) %in% "good"]
wineTrainDep <- as.factor(wine_train$good)

results_ada = train(x = wineTrainInd, y = wineTrainDep, method="ada")

results_ada
Boosted Classification Trees 

5199 samples
   9 predictors
   2 classes: 'Bad', 'Good' 

No pre-processing
Resampling: Bootstrapped (25 reps) 

Summary of sample sizes: 5199, 5199, 5199, 5199, 5199, 5199, ... 

Resampling results across tuning parameters:

  iter  maxdepth  Accuracy  Kappa  Accuracy SD  Kappa SD
  50    1         0.732     0.397  0.00893      0.0294  
  50    2         0.74      0.422  0.00853      0.0187  
  50    3         0.747     0.437  0.00759      0.0171  
  100   1         0.736     0.411  0.0065       0.0172  
  100   2         0.742     0.428  0.0075       0.0173  
  100   3         0.748     0.442  0.00756      0.0158  
  150   1         0.737     0.417  0.00771      0.0184  
  150   2         0.745     0.435  0.00851      0.0198  
  150   3         0.752     0.449  0.00736      0.016   

Tuning parameter 'nu' was held constant at a value of 0.1
Accuracy was used to select the optimal model using  the largest value.
The final values used for the model were iter = 150, maxdepth = 3 and nu
 = 0.1.

原因在于另一个问题中找到：

caret::train：指定模型生成参数

我认为你将调整参数作为参数传递，而 train 正试图自行找到最佳的调整参数。如果您确实想要定义自己的参数网格搜索，则可以为其定义一个参数网格。