为了计算精确的确定性AUC分数,我们应该按“confid”聚合以处理不是所有置信度值都唯一的情况。然后,我们只需为每个唯一的置信度值计算梯形面积并将所有值相加。此外,还要检查所有标签都为零或一的情况。请注意,由于乘法可能会导致类型溢出-您可以使用BIGINT来防止它。
MS SQL实现:
select
IIF(SUM(Ones) * SUM(Zeros) <> 0,
SUM(IIF(Zeros * Ones > 0, 0.5 * Zeros * Ones + Height * Ones, Height * Ones)) / (SUM(Ones) * SUM(Zeros)), 0)
from (
select
Zeros,
Ones,
SUM(IIF(Zeros * Ones > 0, 0, Zeros) + IIF(PrevZeros * PrevOnes > 0, PrevZeros, 0)) OVER (ORDER BY PD) as Height
from (
select
confid as PD,
SUM(label) as Ones,
SUM(ABS(1 - label)) as Zeros,
LAG(SUM(label), 1, NULL) OVER (ORDER BY confid) as PrevOnes,
LAG(SUM(ABS(1 - label)), 1, NULL) OVER (ORDER BY confid) as PrevZeros
from T
group by confid
) q1
) q2;