在xgboost多分类模型中,base_score有什么用途?

6
我正在尝试探索Xgboost二元分类和多类分类的工作原理。 在二元分类中,我观察到将base_score视为起始概率,并且在计算GainCover时它也显示了重要影响。
在多类分类中,我不确定base_score参数的重要性,因为对于不同(任何)base_score值,它都显示了相同的GainCover值。
此外,我无法找出为什么在多类分类中计算cover时有一个2倍因子,即2*p*(1-p)
有人能帮我解决这两个问题吗?

将base_score应用于多类分类器的讨论在此处:https://dev59.com/e1YN5IYBdhLWcg3wVm6n(这对您的问题的“第一部分”有帮助吗?) - jared_mamrot
是的,您需要阅读整个页面才能找到相关部分:“您针对两类(二进制)情况的答案在多类情况下没有任何意义。请参见他们在多类#1380中链接到的讨论中的等效base_margin默认值,在那里xgboost(2017年之前)通常假定base_score = 1 / nclasses,如果存在类别不平衡,则先验概率非常可疑,但他们说“如果您使用足够的训练步骤,这将消失”,这对于数据探索中的开箱即用性能不好。”有关更多讨论:https://github.com/dmlc/xgboost/issues/2222 - jared_mamrot
1
我同意base_score=1/nclasses的观点。但是我注意到一件事,在二元分类的情况下,我们的基础分数被用作初始概率,因此影响了增益和覆盖值。而在多类分类的情况下,尽管在R中传递任何值作为基础分数(.5、.6、.7),它总是被1/nclasses覆盖,并且它将被添加到最后一个叶节点的赔率中。请问为什么在多类分类的情况下它会被添加到叶节点的末尾,并被视为二元分类的起始概率? - jayantphor
1
希望我的回答能够帮助解释发生了什么。如果有不清楚的地方,请留言评论。 - Alexander Pivovarov
2
我觉得xgboost的文档在解释底层发生的事情方面做得很差。我真的很惊讶我在这里说的内容在文档中没有明确提到。 - Alexander Pivovarov
显示剩余2条评论
1个回答

8
为了回答你的问题,让我们看看在xgboost中使用multi:softmax目标函数进行多类分类时实际发生了什么,假设有6个类别。如果你想要训练一个分类器,并指定num_boost_round=5,那么你期望xgboost会为你训练多少棵树呢?正确的答案是30棵树。原因是softmax期望每个训练行都有num_classes=6个不同的分数,这样xgboost才能计算每个分数的梯度/海森矩阵,并使用它们来为每个分数构建一棵新树(有效地更新6个并行模型,以输出每个样本的6个更新分数)。为了要求xgboost分类器输出每个样本的最终6个值,例如从测试集中,你需要调用bst.predict(xg_test, output_margin=True)(其中bst是你的分类器,xg_test是测试集)。正常的bst.predict(xg_test)的输出与在bst.predict(xg_test, output_margin=True)中选择具有6个值中最高值的类别的输出效果相同。
如果您感兴趣,可以使用bst.trees_to_dataframe()函数查看所有树的内容(其中bst是您训练的分类器)。
现在来回答一个问题,即在multi:softmax情况下,base_score是什么作用。答案是-它被添加为6个类别得分的起始分数,然后再添加任何树之前。因此,如果您例如应用base_score=42.0,则可以观察到bst.predict(xg_test, output_margin=True)中的所有值也会增加42。同时,在softmax情况下,将所有类别的分数增加相等数量不会改变任何内容,因此在multi:softmax情况下,应用与0不同的base_score没有任何可见效果。
与二元分类比较,这种行为几乎与具有2个类的multi:softmax相同,最大的区别在于xgboost只试图为类1生成1个分数,将类0的分数保持为0.0。因此,在二元分类中使用base_score时,它仅添加到类1的得分中,从而增加了类1的起始预测概率。理论上,对于多个类别,例如传递多个基础分数(每个类别一个),这将是有意义的,但您无法使用base_score实现该功能。取而代之的是,您可以使用应用于训练集的set_base_margin功能,但默认情况下与predict不太方便,因此在此之后,您需要始终将其与output_margin=True一起使用,并添加与您在训练数据中使用的相同值作为set_base_margin(如果要在多类情况下使用set_base_margin,则需要像here建议的那样展平边际值)。
以下是它们如何工作的示例:
import numpy as np
import xgboost as xgb
TRAIN = 1000
TEST = 2
F = 10

def gen_data(M):
    np_train_features = np.random.rand(M, F)
    np_train_labels = np.random.binomial(2, np_train_features[:,0])
    return xgb.DMatrix(np_train_features, label=np_train_labels)

def regenerate_data():
    np.random.seed(1)
    return gen_data(TRAIN), gen_data(TEST)

param = {}
param['objective'] = 'multi:softmax'
param['eta'] = 0.001
param['max_depth'] = 1
param['nthread'] = 4
param['num_class'] = 3


def sbm(xg_data, original_scores):
    xg_data.set_base_margin(np.array(original_scores * xg_data.num_row()).reshape(-1, 1))

num_round = 3

print("#1. No base_score, no set_base_margin")
xg_train, xg_test = regenerate_data()
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("Easy to see that in this case all scores/margins have 0.5 added to them initially, which is default value for base_score here for some bizzare reason, but it doesn't really affect anything, so no one cares.")
print()
bst1 = bst

print("#2. Use base_score")
xg_train, xg_test = regenerate_data()
param['base_score'] = 5.8
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.")
print()
bst2 = bst

print("#3. Use very large base_score and screw up numeric precision")
xg_train, xg_test = regenerate_data()
param['base_score'] = 5.8e10
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.")
print("But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).")
xg_train, xg_test = regenerate_data() # if we don't regenerate the dataframe here xgboost seems to be either caching it or somehow else remembering that it didn't have base_margins and result will be different.
sbm(xg_test, [0.1, 0.1, 0.1])
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print()
bst3 = bst

print("#4. Use set_base_margin for training")
xg_train, xg_test = regenerate_data()
# only used in train/test whenever set_base_margin is not applied.
# Peculiar that trained model will remember this value even if it was trained with
# dataset which had set_base_margin. In that case this base_score will be used if
# and only if test set passed to `bst.predict` didn't have `set_base_margin` applied to it.
param['base_score'] = 4.2
sbm(xg_train, [-0.4, 0., 0.8])
bst = xgb.train(param, xg_train, num_round)
sbm(xg_test, [-0.4, 0., 0.8])
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.")
print("If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare, right? But then again, not much difference on what to add here if we are adding same value to all classes' scores.")
xg_train, xg_test = regenerate_data() # regenerate test and don't set the base margin values
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print()
bst4 = bst

print("Trees bst1, bst2, bst3 are almost identical, because there is no difference in how they were trained. bst4 is different though.")
print(bst1.trees_to_dataframe().iloc[1,])
print()
print(bst2.trees_to_dataframe().iloc[1,])
print()
print(bst3.trees_to_dataframe().iloc[1,])
print()
print(bst4.trees_to_dataframe().iloc[1,])

这的输出如下:
#1. No base_score, no set_base_margin
[[0.50240415 0.5003637  0.49870378]
 [0.49863306 0.5003637  0.49870378]]
[0. 1.]
Easy to see that in this case all scores/margins have 0.5 added to them initially, which is default value for base_score here for some bizzare reason, but it doesn't really affect anything, so no one cares.

#2. Use base_score
[[5.8024044 5.800364  5.798704 ]
 [5.798633  5.800364  5.798704 ]]
[0. 1.]
In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.

#3. Use very large base_score and screw up numeric precision
[[5.8e+10 5.8e+10 5.8e+10]
 [5.8e+10 5.8e+10 5.8e+10]]
[0. 0.]
In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.
But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).
[[0.10240632 0.10036398 0.09870315]
 [0.09863247 0.10036398 0.09870315]]
[0. 1.]

#4. Use set_base_margin for training
[[-0.39458954  0.00102317  0.7973728 ]
 [-0.40044016  0.00102317  0.7973728 ]]
[2. 2.]
Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.
If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare, right? But then again, not much difference on what to add here if we are adding same value to all classes' scores.
[[4.2054105 4.201023  4.1973724]
 [4.1995597 4.201023  4.1973724]]
[0. 1.]

Trees bst1, bst2, bst3 are almost identical, because there is no difference in how they were trained. bst4 is different though.
Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                0
Node                1
ID                0-1
Feature          Leaf
Split             NaN
Yes               NaN
No                NaN
Missing           NaN
Gain       0.00180733
Cover         100.858
Name: 1, dtype: object

感谢你详细的解释!在尝试回答@jayantphor 的问题时,我尝试了设置base_margin并使用样本数据集中的output_margin=True,但无法看到我期望的效果 - 也许我需要像你所说的那样将base_margin值展平。你能否提供一个可重现的示例,说明如何有效地为多类XGBoost分类问题设置base_margin值(使用R或Python)? - jared_mamrot
感谢@Alexander和jared_mamrot的帮助。到目前为止,我已经观察到以下几点:
  1. 在二进制xgboost中,覆盖率和价值受基础得分的影响,没有独立的基础得分加成。
  2. 然而,在多类分类中,无论使用何种base_score,覆盖率、增益和价值都不会受到影响。此外,基础得分加到最后一片叶子节点的值上。
- jayantphor
我仍然无法弄清以下两个问题:
  1. 我们知道,在多类情况下,规范化概率计算的值时,base_score应该被取消。但是增益和覆盖值是否应该包含此值?
  2. 在建模p1_unadj = exp(z1),p2_unadj = exp(z2),p3_unadj = exp(z3)时,假设有3个结果。然后,p1_adj = p1_unadj / sum(p1_unadj,p2_unadj,p3_unadj)。在这里,base_score可能会被取消,但是树节点应该具有base_score的影响,并且gain、cover值对于不同的base_score值应该是不同的。
- jayantphor
@jared_mamrot - set_base_margin 的实现方式与 base_score 工作方式非常不同,即使你可能认为前者只是更通用的执行后者的方式。我会尝试准备一些合理的例子。 - Alexander Pivovarov
1
@jayantphor - 你所描述的行为正是由于二元分类器每个样本输出一个分数/输出/边际(都是同一件事)所导致的(在multi:softmax符号中,这将转化为类“1”的分数,而类“0”的分数固定为0.0值)。由于这种不对称性,base_score对二元分类有影响。在多类分类中,将相同的base_score添加到所有num_classes分数中不会影响梯度计算(损失相对于分数的梯度),因此您最终看到的是相同的增益、相同的覆盖值(总体上对训练没有影响)。 - Alexander Pivovarov
1
@jared_mamrot - 增加了可重现的示例。 - Alexander Pivovarov

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接