在xgboost多分类模型中，base_score有什么用途？

Question

在xgboost多分类模型中，base_score有什么用途？

machine-learningstatisticsxgboostmulticlass-classificationboosting

6

我正在尝试探索Xgboost二元分类和多类分类的工作原理。在二元分类中，我观察到将base_score视为起始概率，并且在计算Gain和Cover时它也显示了重要影响。

在多类分类中，我不确定base_score参数的重要性，因为对于不同（任何）base_score值，它都显示了相同的Gain和Cover值。

此外，我无法找出为什么在多类分类中计算cover时有一个2倍因子，即2*p*(1-p)。

有人能帮我解决这两个问题吗？

- jayantphor

将base_score应用于多类分类器的讨论在此处：https://dev59.com/e1YN5IYBdhLWcg3wVm6n（这对您的问题的“第一部分”有帮助吗？） - jared_mamrot

是的，您需要阅读整个页面才能找到相关部分：“您针对两类（二进制）情况的答案在多类情况下没有任何意义。请参见他们在多类＃1380中链接到的讨论中的等效base_margin默认值，在那里xgboost（2017年之前）通常假定base_score = 1 / nclasses，如果存在类别不平衡，则先验概率非常可疑，但他们说“如果您使用足够的训练步骤，这将消失”，这对于数据探索中的开箱即用性能不好。”有关更多讨论：https://github.com/dmlc/xgboost/issues/2222 - jared_mamrot

1

我同意base_score=1/nclasses的观点。但是我注意到一件事，在二元分类的情况下，我们的基础分数被用作初始概率，因此影响了增益和覆盖值。而在多类分类的情况下，尽管在R中传递任何值作为基础分数（.5、.6、.7），它总是被1/nclasses覆盖，并且它将被添加到最后一个叶节点的赔率中。请问为什么在多类分类的情况下它会被添加到叶节点的末尾，并被视为二元分类的起始概率？ - jayantphor

1

希望我的回答能够帮助解释发生了什么。如果有不清楚的地方，请留言评论。 - Alexander Pivovarov

2

我觉得xgboost的文档在解释底层发生的事情方面做得很差。我真的很惊讶我在这里说的内容在文档中没有明确提到。 - Alexander Pivovarov

显示剩余2条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Alexander Pivovarov · Accepted Answer

为了回答你的问题，让我们看看在xgboost中使用multi:softmax目标函数进行多类分类时实际发生了什么，假设有6个类别。如果你想要训练一个分类器，并指定num_boost_round=5，那么你期望xgboost会为你训练多少棵树呢？正确的答案是30棵树。原因是softmax期望每个训练行都有num_classes=6个不同的分数，这样xgboost才能计算每个分数的梯度/海森矩阵，并使用它们来为每个分数构建一棵新树（有效地更新6个并行模型，以输出每个样本的6个更新分数）。为了要求xgboost分类器输出每个样本的最终6个值，例如从测试集中，你需要调用bst.predict(xg_test, output_margin=True)（其中bst是你的分类器，xg_test是测试集）。正常的bst.predict(xg_test)的输出与在bst.predict(xg_test, output_margin=True)中选择具有6个值中最高值的类别的输出效果相同。

如果您感兴趣，可以使用bst.trees_to_dataframe()函数查看所有树的内容（其中bst是您训练的分类器）。

现在来回答一个问题，即在multi:softmax情况下，base_score是什么作用。答案是-它被添加为6个类别得分的起始分数，然后再添加任何树之前。因此，如果您例如应用base_score=42.0，则可以观察到bst.predict(xg_test, output_margin=True)中的所有值也会增加42。同时，在softmax情况下，将所有类别的分数增加相等数量不会改变任何内容，因此在multi:softmax情况下，应用与0不同的base_score没有任何可见效果。

与二元分类比较，这种行为几乎与具有2个类的multi:softmax相同，最大的区别在于xgboost只试图为类1生成1个分数，将类0的分数保持为0.0。因此，在二元分类中使用base_score时，它仅添加到类1的得分中，从而增加了类1的起始预测概率。理论上，对于多个类别，例如传递多个基础分数（每个类别一个），这将是有意义的，但您无法使用base_score实现该功能。取而代之的是，您可以使用应用于训练集的set_base_margin功能，但默认情况下与predict不太方便，因此在此之后，您需要始终将其与output_margin=True一起使用，并添加与您在训练数据中使用的相同值作为set_base_margin（如果要在多类情况下使用set_base_margin，则需要像here建议的那样展平边际值）。

以下是它们如何工作的示例：

import numpy as np
import xgboost as xgb
TRAIN = 1000
TEST = 2
F = 10

def gen_data(M):
    np_train_features = np.random.rand(M, F)
    np_train_labels = np.random.binomial(2, np_train_features[:,0])
    return xgb.DMatrix(np_train_features, label=np_train_labels)

def regenerate_data():
    np.random.seed(1)
    return gen_data(TRAIN), gen_data(TEST)

param = {}
param['objective'] = 'multi:softmax'
param['eta'] = 0.001
param['max_depth'] = 1
param['nthread'] = 4
param['num_class'] = 3


def sbm(xg_data, original_scores):
    xg_data.set_base_margin(np.array(original_scores * xg_data.num_row()).reshape(-1, 1))

num_round = 3

print("#1. No base_score, no set_base_margin")
xg_train, xg_test = regenerate_data()
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("Easy to see that in this case all scores/margins have 0.5 added to them initially, which is default value for base_score here for some bizzare reason, but it doesn't really affect anything, so no one cares.")
print()
bst1 = bst

print("#2. Use base_score")
xg_train, xg_test = regenerate_data()
param['base_score'] = 5.8
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.")
print()
bst2 = bst

print("#3. Use very large base_score and screw up numeric precision")
xg_train, xg_test = regenerate_data()
param['base_score'] = 5.8e10
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.")
print("But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).")
xg_train, xg_test = regenerate_data() # if we don't regenerate the dataframe here xgboost seems to be either caching it or somehow else remembering that it didn't have base_margins and result will be different.
sbm(xg_test, [0.1, 0.1, 0.1])
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print()
bst3 = bst

print("#4. Use set_base_margin for training")
xg_train, xg_test = regenerate_data()
# only used in train/test whenever set_base_margin is not applied.
# Peculiar that trained model will remember this value even if it was trained with
# dataset which had set_base_margin. In that case this base_score will be used if
# and only if test set passed to `bst.predict` didn't have `set_base_margin` applied to it.
param['base_score'] = 4.2
sbm(xg_train, [-0.4, 0., 0.8])
bst = xgb.train(param, xg_train, num_round)
sbm(xg_test, [-0.4, 0., 0.8])
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.")
print("If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare, right? But then again, not much difference on what to add here if we are adding same value to all classes' scores.")
xg_train, xg_test = regenerate_data() # regenerate test and don't set the base margin values
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print()
bst4 = bst

print("Trees bst1, bst2, bst3 are almost identical, because there is no difference in how they were trained. bst4 is different though.")
print(bst1.trees_to_dataframe().iloc[1,])
print()
print(bst2.trees_to_dataframe().iloc[1,])
print()
print(bst3.trees_to_dataframe().iloc[1,])
print()
print(bst4.trees_to_dataframe().iloc[1,])

这的输出如下：

#1. No base_score, no set_base_margin
[[0.50240415 0.5003637  0.49870378]
 [0.49863306 0.5003637  0.49870378]]
[0. 1.]
Easy to see that in this case all scores/margins have 0.5 added to them initially, which is default value for base_score here for some bizzare reason, but it doesn't really affect anything, so no one cares.

#2. Use base_score
[[5.8024044 5.800364  5.798704 ]
 [5.798633  5.800364  5.798704 ]]
[0. 1.]
In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.

#3. Use very large base_score and screw up numeric precision
[[5.8e+10 5.8e+10 5.8e+10]
 [5.8e+10 5.8e+10 5.8e+10]]
[0. 0.]
In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.
But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).
[[0.10240632 0.10036398 0.09870315]
 [0.09863247 0.10036398 0.09870315]]
[0. 1.]

#4. Use set_base_margin for training
[[-0.39458954  0.00102317  0.7973728 ]
 [-0.40044016  0.00102317  0.7973728 ]]
[2. 2.]
Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.
If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare, right? But then again, not much difference on what to add here if we are adding same value to all classes' scores.
[[4.2054105 4.201023  4.1973724]
 [4.1995597 4.201023  4.1973724]]
[0. 1.]

Trees bst1, bst2, bst3 are almost identical, because there is no difference in how they were trained. bst4 is different though.
Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                0
Node                1
ID                0-1
Feature          Leaf
Split             NaN
Yes               NaN
No                NaN
Missing           NaN
Gain       0.00180733
Cover         100.858
Name: 1, dtype: object