不足够的不同预测结果,无法计算ROC曲线下面积。

5
我正在尝试使用计算AUC,其中labels是一个数值向量,其元素为1(x15)和0(x500),而predictions是一个数值向量,其中包含从glm[binomial]得出的概率。这应该非常简单,但是会报错说“没有足够不同的预测来计算ROC曲线下面积”。我一定做了什么傻事,但是我无法发现问题所在。 你可以帮忙看看吗?
代码如下:
library(AUC)
#read the data, that come from a previous process of a species distribution modelling
prob<-read.csv("prob.csv")
labels<-read.csv("labels.csv")
#prob is
#labels is

roc(prob,labels)

#Gives the error (that I'm NOT interest in)
Error in `[.data.frame`(predictions, pred.order) : undefined columns selected
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'
3: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'

#I change the format to numeric vector
prob<-as.numeric(prob[,2])
labels<-as.numeric(labels[,2])
#Verify it is a vector numeric
class(prob)
[1] "numeric"
class(labels)
[1] "numeric"

#call the roc functoin
roc(prob,labels)

Error in roc(modbrapred, pbbra) : # THIS is the error I0m interested in
  Not enough distinct predictions to compute area under the ROC curve.
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'
3: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'    

Data is as follows

labels.csv
"","x"
"1",1
"2",1
"3",1
"4",1
"5",1
"6",1
...
"164",1
"165",1
"166",0
"167",0
"168",0
"169",0
"170",0
"171",0
"172",0 
...
"665",0

prob.csv
"","x"
"1",0.977465874525236
"2",0.989692657762578
"3",0.989692657762578
"4",0.988038430564019
"5",0.443188602491041
"6",0.409732585195485
...
"164",0.988607910625475
"165",0.986296936078692
"166",7.13529696560611e-05
"167",0.000419255989134081
"168",0.00295825183558019
"169",0.00182941235784709
"170",4.85601026999172e-09
"171",0.000953106471289961
"172",1.70252014430306e-05
...
"665",8.13413358866349e-08

1
请问您能否提供一个可重现的示例? - dayne
请阅读如何创建可重现的示例。您应该编辑您的问题,以便包括我们可以复制/粘贴到R中以获得相同错误的内容。因为需要library()调用来运行代码。您是对的,它应该很容易,所以您到底是如何使它变得困难不清楚。 - MrFlick
感谢您的评论。我现在包含了部分真实数据。 - user2942623
但是你没有提供生成错误的代码,而且你还没有指定你使用的库。就目前而言,这个问题无法回答。请再次阅读MrFlicks的评论,直到你完全理解它。 - Calimo
感谢您的提示。我现在已经上传了代码和数据。 - user2942623
1个回答

28

问题在于我的“标签”是一个数值向量,但是 roc 需要一个因子。因此我进行了转换。

labels <- factor(labels)

大鹏展翅,表现正常。

感谢您所付出的时间。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接