Caret 包中的用户定义度量指标

6

我希望利用caret包,使用一个不是默认选项的度量标准。在下面的例子中,我使用Metrics包。我已经阅读了StackOverflow上所有相关的问题以及caret网站上的指南,但仍然收到错误提示。

在下面的例子中,我希望使用平均绝对误差。

创建一个函数:

maefunction<-function(data, lev=NULL, model=NULL){
  require(Metrics)
  MAE<-mae(data[, "obs"], data[, "pred"])
  out<-c(MAE)
  out
}

现在我将该函数插入到trainControl中。
library(caret)
GBM<-train(train$result~., data=train, method="gbm", trControl=trainControl(summaryFunction=maefunction), metric=MAE)

我收到以下信息。
Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) : 
Results must be all atomic, or all data frames
In addition: Warning messages:
1: In if (metric %in% c("Accuracy", "Kappa")) stop(paste("Metric",  :
  the condition has length > 1 and only the first element will be used
2: In if (metric == "ROC" & !ctrl$classProbs) stop("train()'s use of ROC codes requires                class probabilities. See the classProbs option of trainControl()") :
  the condition has length > 1 and only the first element will be used
3: In if (!(metric %in% perfNames)) { :
  the condition has length > 1 and only the first element will be used
4: In train.default(x, y, weights = w, ...) :
  The metric "4" was not in the result set.  will be used instead.The metric "0.5" was    not in the result set.  will be used instead.
2个回答

15
我认为你需要使用命名向量(请参见下面的示例)。我在文档中没有明确说过,所以我将更新该部分。 Max
library(mlbench)
data(BostonHousing)

maeSummary <- function (data,
                        lev = NULL,
                        model = NULL) {
   out <- mae(data$obs, data$pred)  
   names(out) <- "MAE"
   out
}

mControl <- trainControl(summaryFunction = maeSummary)
marsGrid <- expand.grid(degree = 1, nprune = (1:10) * 2)

set.seed(1)
earthFit <- train(medv ~ .,
                  data = BostonHousing, 
                  "earth",
                  tuneGrid = marsGrid,
                  metric = "MAE",
                  maximize = FALSE,
                  trControl = mControl)

1
当我运行这段代码时,我收到以下信息:“Error in ctrl$summaryFunction(testOutput, lev, method):”自回答以来是否有什么变化? - Jonno Bourne
1
我得到了错误的第一部分,说找不到mae。这是因为在代码中没有调用您最初使用的Metrics包。只需包括library(Metrics)即可解决此问题。无论如何,这将允许您选择具有最低MAE的模型的元参数(nprune),但您可能会对使地球模型的算法直接最小化每个元参数组合的MAE感兴趣。 - Nicolás

0
mae <- function(pred, obs) 
{
  isNA <- is.na(pred)
  pred <- pred[!isNA]
  obs <- obs[!isNA]
  if (!is.factor(obs) & is.numeric(obs)) {
    if (length(obs) + length(pred) == 0) {
      out <- rep(NA, 2)
    }
    else {
      if (length(unique(pred)) < 2 || length(unique(obs)) < 
          2) {
        resamplCor <- NA
      }
      else {
        resamplCor <- try(cor(pred, obs, use = "pairwise.complete.obs"), 
                          silent = TRUE)
        if (class(resamplCor) == "try-error") 
          resamplCor <- NA
      }
      mse <- mean((pred - obs)^2)
      mae <- mean(abs(pred - obs))
      n <- length(obs)
      out <- c(mae, sqrt(mse), resamplCor^2)
    }
    names(out) <- c("MAE", "RMSE", "Rsquared")
  }
  else {
    if (length(obs) + length(pred) == 0) {
      out <- rep(NA, 2)
    }
    else {
      pred <- factor(pred, levels = levels(obs))
      requireNamespaceQuietStop("e1071")
      out <- unlist(e1071::classAgreement(table(obs, pred)))[c("diag", 
                                                               "kappa")]
    }
    names(out) <- c("Accuracy", "Kappa")
  }
  if (any(is.nan(out))) 
    out[is.nan(out)] <- NA
  out
}



MAEFunction <- function (data, lev = NULL, model = NULL) 
{
  if (is.character(data$obs)) 
    data$obs <- factor(data$obs, levels = lev)
  mae(data[, "pred"], data[, "obs"])
}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接