我刚接触数组编程,并发现很难解释 sklearn.metrics.label_ranking_average_precision_score 函数。需要您帮助理解它的计算方式,并欢迎任何学习Numpy数组编程的提示。
通常,我知道精度是
((真正例) / (真正例 + 假正例))
我问这个问题的原因是,我偶然发现了 Kaggle 音频标记竞赛,并看到这篇文章说当响应中有多个正确标签时,他们使用 LWRAP 函数来计算分数。我开始阅读,想知道如何计算此分数,但难以解释。我的两个困难是:
1)从文档中解释数学函数,我不确定排名如何用于得分计算
2)解释代码中的 Numpy 数组操作
我正在阅读的函数来自 Google Collab 文档,然后我尝试阅读 sklearn 的文档,但无法正确理解。
一个样本计算的代码如下:
# Core calculation of label precisions for one test sample.
def _one_sample_positive_class_precisions(scores, truth):
"""Calculate precisions for each true class for a single sample.
Args:
scores: np.array of (num_classes,) giving the individual classifier scores.
truth: np.array of (num_classes,) bools indicating which classes are true.
Returns:
pos_class_indices: np.array of indices of the true classes for this sample.
pos_class_precisions: np.array of precisions corresponding to each of those
classes.
"""
num_classes = scores.shape[0]
pos_class_indices = np.flatnonzero(truth > 0)
# Only calculate precisions if there are some true classes.
if not len(pos_class_indices):
return pos_class_indices, np.zeros(0)
# Retrieval list of classes for this sample.
retrieved_classes = np.argsort(scores)[::-1]
# class_rankings[top_scoring_class_index] == 0 etc.
class_rankings = np.zeros(num_classes, dtype=np.int)
class_rankings[retrieved_classes] = range(num_classes)
# Which of these is a true label?
retrieved_class_true = np.zeros(num_classes, dtype=np.bool)
retrieved_class_true[class_rankings[pos_class_indices]] = True
# Num hits for every truncated retrieval list.
retrieved_cumulative_hits = np.cumsum(retrieved_class_true)
# Precision of retrieval list truncated at each hit, in order of pos_labels.
precision_at_hits = (
retrieved_cumulative_hits[class_rankings[pos_class_indices]] /
(1 + class_rankings[pos_class_indices].astype(np.float)))
return pos_class_indices, precision_at_hits