如何在Scikit中计算多类分类的混淆矩阵？

Question

如何在Scikit中计算多类分类的混淆矩阵？

pythonscikit-learnclassificationconfusion-matrix

10

我有一个多类别分类任务。当我运行基于scikit示例的脚本如下：

classifier = OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70, max_depth=3, learning_rate=.02))

y_pred = classifier.fit(X_train, y_train).predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)

我遇到了这个错误：

File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 242, in confusion_matrix
    raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported

我试过将labels=classifier.classes_传递至confusion_matrix()，但没有帮助。

y_test和y_pred如下：

y_test =
array([[0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0],
   [0, 1, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0]])


y_pred = 
array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 0]])

- YNR

为什么你将 y_pred 和 y_test 作为独热编码数组？你的原始类标签是什么？你应该提供你的代码，从如何转换 y 开始。 - Vivek Kumar

@VivekKumar 我将 y_train 和 y_test 二值化为 y_test = label_binarize(y_test, classes=[0, 1, 2, 3, 4, 5])，以便用于 OneVsRestClassifier()。 - YNR

你应该将你的原始类（非二值化）放入"混淆矩阵"中。你需要反向转换你的y_pred以获取它的原始类。 - Vivek Kumar

@VivekKumar 谢谢。我使用了非二进制版本，问题得到解决。 - YNR

3个回答

8

首先，您需要创建标签输出数组。假设您有三个类别：'cat'，'dog'，'house'，索引为：0,1,2。而两个样本的预测结果分别是：'dog'，'house'。您的输出将是：

y_pred = [[0, 1, 0],[0, 0, 1]]

运行y_pred.argmax（1）可以得到：[1,2]。该数组代表原始标签索引，即： ['dog'，'house']。

num_classes = 3

# from lable to categorial
y_prediction = np.array([1,2]) 
y_categorial = np_utils.to_categorical(y_prediction, num_classes)

# from categorial to lable indexing
y_pred = y_categorial.argmax(1)

- Naomi Fridman

0

我刚刚从预测的y_pred矩阵中减去了输出y_test矩阵，同时保持分类格式。如果是-1，我假设是假阴性，而如果是1，则是假阳性。

接下来：

if output_matrix[i,j] == 1 and predictions_matrix[i,j] == 1:  
    produced_matrix[i,j] = 2

最终得到以下符号：

-1：假阴性
1：假阳性
0：真阴性
2：真阳性

最后，通过进行一些简单的计数，您可以生成任何混淆矩阵。

- mcchran

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Azhar Khan · Accepted Answer

这是我的解决方案：

这对我很有效：

y_test_non_category = [ np.argmax(t) for t in y_test ]
y_predict_non_category = [ np.argmax(t) for t in y_predict ]

from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_test_non_category, y_predict_non_category)

这里的y_test和y_predict是类别变量，就像独热向量一样。