将XGBClassifier模型转储为文本

Question

将XGBClassifier模型转储为文本

pythonxgboostmultilabel-classificationboosting

10

我用XGBBoost训练了一个多标签分类模型，并想在另一个系统中编写此模型的代码。

是否可以通过在XGB Booster中使用dump_model方法来查看XGBClassifier模型的文本输出。

编辑：我发现 model._Booster.dump_model(outputfile) 方法返回以下内容的转储文件。但是，没有任何指定类别的内容。在我的模型中有10个类，但在转储文件中只有一个booster。所以，我不确定它是否是所有类别的模型还是其中之一。

booster[0]:
0:[101<0.142245024] yes=1,no=2,missing=1
    1:[107<0.102833837] yes=3,no=4,missing=3
        3:[101<0.039123565] yes=7,no=8,missing=7
            7:leaf=-0.0142603116
            8:leaf=0.023763923
        4:[101<0.0646461397] yes=9,no=10,missing=9
            9:leaf=-0.0345750563
            10:leaf=-0.0135767004
    2:[107<0.238691002] yes=5,no=6,missing=5
        5:[103<0.0775454491] yes=11,no=12,missing=11
            11:leaf=0.188941464
            12:leaf=0.0651629418
        6:[101<0.999929309] yes=13,no=14,missing=13
            13:leaf=0.00403384864
            14:leaf=0.236842111
booster[1]:
0:[102<0.014829753] yes=1,no=2,missing=1
    1:[102<0.00999682024] yes=3,no=4,missing=3
        3:[107<0.0966737345] yes=7,no=8,missing=7
            7:leaf=-0.0387153365
            8:leaf=-0.0486520194
        4:[107<0.0922582299] yes=9,no=10,missing=9
            9:leaf=0.0301927216
            10:leaf=-0.0284226239
    2:[102<0.199759275] yes=5,no=6,missing=5
        5:[107<0.12201979] yes=11,no=12,missing=11
            11:leaf=0.093562685
            12:leaf=0.0127987256
        6:[107<0.298737913] yes=13,no=14,missing=13
            13:leaf=0.227570012
            14:leaf=0.113037519

- Sabri Karagönen

这是你的完整输出文件吗？对于10个标签的分类，您的模型应该包含[n_estimators*10]棵树，所以它看起来确实很奇怪。 - Nick

不，有大约800个增强器。我只添加了前两个。所以我理解的是，它们应该被平均分配，然后为每个人求和？ - Sabri Karagönen

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Michael Nelson · Accepted Answer

看源代码和样本数据的输出，似乎第n棵树会估计给定实例属于模数为num_class的第n类的可能性。我认为xgboost使用softmax函数，所以你需要将第i棵树的输出添加到weight[i%10]中，然后对所得到的权重进行softmax处理。

假设你有一个函数booster_output(features, booster_index)，可以确定给定特征值的第n个增强树的输出，那么像这样的东西应该可以工作：

import numpy as np

num_class = 10
num_boosters = 800
weight_of_classes = [0]*num_class
for i in range(num_boosters):
    weight_of_classes[i%6] += booster_output(feature_values, i)


def softmax(x):
        e_x = np.exp(x - np.max(x))
        return e_x / e_x.sum()

probability_of_classes = softmax(weight_of_classes)
print(probability_of_classes)