XGBoost在使用sklearn度量时引发ValueError

Question

XGBoost在使用sklearn度量时引发ValueError

pythonmachine-learningscikit-learnclassificationxgboost

3

我正在尝试使用XGBClassifier和来自sklearn.metrics的度量作为eval_metric，并按照XGBoost文档的建议使用验证集。

这个最小化工作示例(MWE)看起来像这样:

import numpy as np
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

x_train, y_train = np.random.rand(10,3), np.where(np.random.rand(10,)>0.5, 1, 0)
x_valid, y_valid = np.random.rand(5,3), np.where(np.random.rand(5,)>0.5, 1, 0)

model = XGBClassifier(
    n_estimators=100,
    eval_metric=accuracy_score
)

model.fit(
    X=x_train, y=y_train,
    eval_set=[(x_train, y_train), (x_valid, y_valid)]
)

这段代码会引发以下错误信息：

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-5-b63cd5cfabda> in <cell line: 1>()
----> 1 model.fit(
      2     X=x_train, y=y_train,
      3     eval_set=[(x_train, y_train), (x_valid, y_valid)]
      4 )

9 frames

/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
     93 
     94     if len(y_type) > 1:
---> 95         raise ValueError(
     96             "Classification metrics can't handle a mix of {0} and {1} targets".format(
     97                 type_true, type_pred

ValueError: Classification metrics can't handle a mix of binary and continuous targets

相同的代码在注释掉 eval_set 行或者使用 eval_metric="error" 代替时可以工作。我做错了什么，该如何解决？

编辑：我希望将来使用不同的度量标准，例如 sklearn.metrics.balanced_accuracy_score 或 sklearn.metrics.recall_score。

- Paquique

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Learning is a mess · Accepted Answer

原因是xgboost将概率输出馈送到评估函数（这里是您的准确性），但sklearn的准确度得分期望硬决策（1或0）而不是概率。它不知道您的决策阈值，因此无法将它们映射到硬决策。

您可以使用

model = xgb.XGBClassifier(
    n_estimators=100,
    eval_metric='error'
)

或者

model = xgb.XGBClassifier(
    n_estimators=100,
    eval_metric='error@0.6'
)

将阈值从0.5改为0.6。请参考https://xgboost.readthedocs.io/en/stable/parameter.html。

对于召回率，由于它不在xgboost的内置选项中，您需要手动设定预测的阈值：

import numpy as np
from xgboost import XGBClassifier
import xgboost as xgb
from sklearn.metrics import accuracy_score, recall_score

x_train, y_train = np.random.rand(10,3), np.where(np.random.rand(10,)>0.5, 1, 0)
x_valid, y_valid = np.random.rand(5,3), np.where(np.random.rand(5,)>0.5, 1, 0)

def thresholded_recall_score(y_true, y_preds, thresh=0.5):
    return recall_score(y_true, y_preds > thresh)

model = xgb.XGBClassifier(
    n_estimators=100,
    eval_metric=thresholded_recall_score
)

model.fit(
    X=x_train, y=y_train,
    eval_set=[(x_train, y_train), (x_valid, y_valid)]
)