轴错误：计算AUC时，维度为1的数组超出范围。

Question

轴错误：计算AUC时，维度为1的数组超出范围。

python-3.xscikit-learnmulticlass-classification

12

我有一个分类问题，其中我拥有一张8x8图像的像素值和该图像代表的数字，并且我的任务是使用RandomForestClassifier基于像素值来预测数字（'Number'属性）。数字值的范围可以是0-9。

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score

forest_model = RandomForestClassifier(n_estimators=100, random_state=42)
forest_model.fit(train_df[input_var], train_df[target])
test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1]
roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovr")

这里抛出了一个AxisError。

跟踪（最近的调用）：
  文件“dap_hazi_4.py”，第44行，
    roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovo")
  文件“/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py”，第383行，
    roc_auc_score(multi_class, average, sample_weight)
  文件“/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py”，第440行，
    if not np.allclose(1, y_score.sum(axis=1)):
  文件“/home/balint/.local/lib/python3.6/site-packages/numpy/core/_methods.py”，第38行，
    return umr_sum(a, axis, dtype, out, keepdims, initial, where)
AxisError：维数为1的数组的轴1超出范围。

- Bálint Béres

我成功解决了我的问题。问题在于，因为我的分类问题是多类别的，所以需要在拟合和计算auc分数之前将目标列进行二值化。 - Bálint Béres

你到底做了什么，@Bálint Béres？ - Manuel

我使用了@mclzc的Calculate sklearn.roc_auc_score for multi-class。 - Bálint Béres

5

当使用sklearn.model_selection.cross_validate等函数时，如果出现该错误，你只需要在make_scorer(roc_auc_score, multi_class='ovo', needs_proba=True)中设置needs_proba=True即可。 - lhaferkamp

3个回答

4

实际上，由于您的问题是多类问题，标签必须进行独热编码。当标签进行独热编码后，“multi_class”参数才能起作用。通过提供独热编码的标签，您可以解决错误。

假设您有100个测试标签，其中包含5个唯一的类，则矩阵大小（测试标签）必须为（100,5），而不是（100,1）

- Lalith Bharadwaj Baru

我这里也遇到了同样的问题。我该如何将我的 pred 从 (45520,) 转换为 (45520,5)？ - arilwan

如果您正在使用TensorFlow或Keras，则可以使用函数tf.keras.utils.to_categorical(.)或keras.utils.to_categorical(.)来完成。 - Lalith Bharadwaj Baru

如果有人正在使用Sklearn，则应使用LabelBinarizer将标签转换为一热编码格式。https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html#sklearn.preprocessing.LabelBinarizer - Murilo

1

你确定 [:,1] 在 test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1] 中是正确的吗？它可能是一维数组

- Minh-Long Luu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- bhola prasad · Accepted Answer

错误是由于多类问题，正如其他人所建议的那样。您所需要做的就是不预测类别，而是预测概率。我之前也遇到了这个问题，通过这样做可以解决。

以下是如何操作的：

# you might be predicting the class this way
pred = clf.predict(X_valid)

# change it to predict the probabilities which solves the AxisError problem.
pred_prob = clf.predict_proba(X_valid)
roc_auc_score(y_valid, pred_prob, multi_class='ovr')
0.8164900342274142

# shape before
pred.shape
(256,)
pred[:5]
array([1, 2, 1, 1, 2])

# shape after
pred_prob.shape
(256, 3)
pred_prob[:5]
array([[0.  , 1.  , 0.  ],
       [0.02, 0.12, 0.86],
       [0.  , 0.97, 0.03],
       [0.  , 0.8 , 0.2 ],
       [0.  , 0.42, 0.58]])