假设我的检索系统的NDCG得分为0.8。我如何解释这个分数?我如何告诉读者这个分数是显著的?
假设我的检索系统的NDCG得分为0.8。我如何解释这个分数?我如何告诉读者这个分数是显著的?
Example: Suppose we have [Doc_1, Doc_2, Doc_3, Doc_4, Doc_5]
Doc_1 is 100% relevant
Doc_2 is 70% relevant
Doc_3 is 95% relevant
Doc_4 is 20% relevant
Doc_5 is 100% relevant
CG = 100 + 70 + 95 + 20 + 100 ###(Index of the doc doesn't matter)
= 385
折扣累计增益(DCG)是与信息检索相关的一种评估指标。
DCG = SUM( relivencyAt(index) / log2(index + 1) ) ###where index 1 -> 5
Doc_1 is 100 / log2(2) = 100.00
Doc_2 is 70 / log2(3) = 044.17
Doc_3 is 95 / log2(4) = 047.50
Doc_4 is 20 / log2(5) = 008.61
Doc_5 is 100 / log2(6) = 038.69
DCG = 100 + 44.17 + 47.5 + 8.61 + 38.69
DCG = 238.97
IDCG = Doc_1 , Doc_5, Doc_3, Doc_2, Doc_4
Doc_1 is 100 / log2(2) = 100.00
Doc_5 is 100 / log2(3) = 063.09
Doc_3 is 95 / log2(4) = 047.50
Doc_2 is 75 / log2(5) = 032.30
Doc_4 is 20 / log2(6) = 007.74
IDCG = 100 + 63.09 + 47.5 + 32.30 + 7.74
IDCG = 250.63
nDCG(5) = DCG / IDCG
= 238.97 / 250.63
= 0.95
结论:
在给定的示例中,nDCG为0.95,0.95不是预测准确性,而是文档有效排名。因此,收益从结果列表的顶部到底部累积,并且每个结果的收益在较低排名处打折扣。
维基百科参考