如何计算ranger随机森林模型的AUC值？

Question

如何计算ranger随机森林模型的AUC值？

3

我该如何计算ranger模型的AUC值？Ranger是R语言中随机森林算法的快速实现。我正在使用以下代码构建用于分类目的的ranger模型，并从模型中获取预测结果：

#Build the model using ranger() function
ranger.model <- ranger(formula, data = data_train, importance = 'impurity',   
write.forest = TRUE, num.trees = 3000, mtry = sqrt(length(currentComb)), 
classification = TRUE)
#get the prediction for the ranger model
pred.data <- predict(ranger.model, dat = data_test,)
table(pred.data$predictions)

但我不知道如何计算AUC值

有任何想法吗？

- user2947767

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Artem Sokolov · Accepted Answer

计算AUC的关键是有一种方法将测试样本从“最可能为正”排到“最不可能为正”。修改您的训练调用以包括probability = TRUE。 pred.data $ predictions 现在应该是一个类概率矩阵。注意与您的“正”类对应的列。这一列提供了我们计算AUC所需的排名。

实际计算AUC，我们将使用Hand和Till（2001）中的方程（3）。我们可以按以下方式实现此方程：

## An AUC estimate that doesn't require explicit construction of an ROC curve
auc <- function( scores, lbls )
{
  stopifnot( length(scores) == length(lbls) )
  jp <- which( lbls > 0 ); np <- length( jp )
  jn <- which( lbls <= 0); nn <- length( jn )
  s0 <- sum( rank(scores)[jp] )
  (s0 - np*(np+1) / 2) / (np*nn)
}

其中scores是指与正类相对应的pred.data$predictions列，lbls是作为二进制向量编码的相应测试标签（1表示正例，0或-1表示负例）。