为了回答你的问题,让我们看看在xgboost中使用multi:softmax目标函数进行多类分类时实际发生了什么,假设有6个类别。如果你想要训练一个分类器,并指定num_boost_round=5,那么你期望xgboost会为你训练多少棵树呢?正确的答案是30棵树。原因是softmax期望每个训练行都有num_classes=6个不同的分数,这样xgboost才能计算每个分数的梯度/海森矩阵,并使用它们来为每个分数构建一棵新树(有效地更新6个并行模型,以输出每个样本的6个更新分数)。为了要求xgboost分类器输出每个样本的最终6个值,例如从测试集中,你需要调用bst.predict(xg_test, output_margin=True)(其中bst是你的分类器,xg_test是测试集)。正常的bst.predict(xg_test)的输出与在bst.predict(xg_test, output_margin=True)中选择具有6个值中最高值的类别的输出效果相同。
如果您感兴趣,可以使用
bst.trees_to_dataframe()
函数查看所有树的内容(其中
bst
是您训练的分类器)。
现在来回答一个问题,即在
multi:softmax
情况下,
base_score
是什么作用。答案是-它被添加为6个类别得分的起始分数,然后再添加任何树之前。因此,如果您例如应用
base_score=42.0
,则可以观察到
bst.predict(xg_test, output_margin=True)
中的所有值也会增加
42
。同时,在
softmax
情况下,将所有类别的分数增加相等数量不会改变任何内容,因此在
multi:softmax
情况下,应用与0不同的
base_score
没有任何可见效果。
与二元分类比较,这种行为几乎与具有2个类的
multi:softmax
相同,最大的区别在于xgboost只试图为类1生成1个分数,将类0的分数保持为
0.0
。因此,在二元分类中使用
base_score
时,它仅添加到类1的得分中,从而增加了类1的起始预测概率。理论上,对于多个类别,例如传递多个基础分数(每个类别一个),这将是有意义的,但您无法使用
base_score
实现该功能。取而代之的是,您可以使用应用于训练集的
set_base_margin
功能,但默认情况下与
predict
不太方便,因此在此之后,您需要始终将其与
output_margin=True
一起使用,并添加与您在训练数据中使用的相同值作为
set_base_margin
(如果要在多类情况下使用
set_base_margin
,则需要像
here建议的那样展平边际值)。
以下是它们如何工作的示例:
import numpy as np
import xgboost as xgb
TRAIN = 1000
TEST = 2
F = 10
def gen_data(M):
np_train_features = np.random.rand(M, F)
np_train_labels = np.random.binomial(2, np_train_features[:,0])
return xgb.DMatrix(np_train_features, label=np_train_labels)
def regenerate_data():
np.random.seed(1)
return gen_data(TRAIN), gen_data(TEST)
param = {}
param['objective'] = 'multi:softmax'
param['eta'] = 0.001
param['max_depth'] = 1
param['nthread'] = 4
param['num_class'] = 3
def sbm(xg_data, original_scores):
xg_data.set_base_margin(np.array(original_scores * xg_data.num_row()).reshape(-1, 1))
num_round = 3
print("#1. No base_score, no set_base_margin")
xg_train, xg_test = regenerate_data()
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("Easy to see that in this case all scores/margins have 0.5 added to them initially, which is default value for base_score here for some bizzare reason, but it doesn't really affect anything, so no one cares.")
print()
bst1 = bst
print("#2. Use base_score")
xg_train, xg_test = regenerate_data()
param['base_score'] = 5.8
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.")
print()
bst2 = bst
print("#3. Use very large base_score and screw up numeric precision")
xg_train, xg_test = regenerate_data()
param['base_score'] = 5.8e10
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.")
print("But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).")
xg_train, xg_test = regenerate_data()
sbm(xg_test, [0.1, 0.1, 0.1])
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print()
bst3 = bst
print("#4. Use set_base_margin for training")
xg_train, xg_test = regenerate_data()
param['base_score'] = 4.2
sbm(xg_train, [-0.4, 0., 0.8])
bst = xgb.train(param, xg_train, num_round)
sbm(xg_test, [-0.4, 0., 0.8])
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.")
print("If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare, right? But then again, not much difference on what to add here if we are adding same value to all classes' scores.")
xg_train, xg_test = regenerate_data()
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print()
bst4 = bst
print("Trees bst1, bst2, bst3 are almost identical, because there is no difference in how they were trained. bst4 is different though.")
print(bst1.trees_to_dataframe().iloc[1,])
print()
print(bst2.trees_to_dataframe().iloc[1,])
print()
print(bst3.trees_to_dataframe().iloc[1,])
print()
print(bst4.trees_to_dataframe().iloc[1,])
这的输出如下:
[[0.50240415 0.5003637 0.49870378]
[0.49863306 0.5003637 0.49870378]]
[0. 1.]
Easy to see that in this case all scores/margins have 0.5 added to them initially, which is default value for base_score here for some bizzare reason, but it doesn't really affect anything, so no one cares.
[[5.8024044 5.800364 5.798704 ]
[5.798633 5.800364 5.798704 ]]
[0. 1.]
In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.
[[5.8e+10 5.8e+10 5.8e+10]
[5.8e+10 5.8e+10 5.8e+10]]
[0. 0.]
In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.
But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).
[[0.10240632 0.10036398 0.09870315]
[0.09863247 0.10036398 0.09870315]]
[0. 1.]
[[-0.39458954 0.00102317 0.7973728 ]
[-0.40044016 0.00102317 0.7973728 ]]
[2. 2.]
Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.
If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare, right? But then again, not much difference on what to add here if we are adding same value to all classes' scores.
[[4.2054105 4.201023 4.1973724]
[4.1995597 4.201023 4.1973724]]
[0. 1.]
Trees bst1, bst2, bst3 are almost identical, because there is no difference in how they were trained. bst4 is different though.
Tree 0
Node 1
ID 0-1
Feature Leaf
Split NaN
Yes NaN
No NaN
Missing NaN
Gain 0.000802105
Cover 157.333
Name: 1, dtype: object
Tree 0
Node 1
ID 0-1
Feature Leaf
Split NaN
Yes NaN
No NaN
Missing NaN
Gain 0.000802105
Cover 157.333
Name: 1, dtype: object
Tree 0
Node 1
ID 0-1
Feature Leaf
Split NaN
Yes NaN
No NaN
Missing NaN
Gain 0.000802105
Cover 157.333
Name: 1, dtype: object
Tree 0
Node 1
ID 0-1
Feature Leaf
Split NaN
Yes NaN
No NaN
Missing NaN
Gain 0.00180733
Cover 100.858
Name: 1, dtype: object