XGBoost的plot_importance函数无法显示特征名称。

Question

XGBoost的plot_importance函数无法显示特征名称。

39

我正在使用Python的XGBoost，并成功地使用XGBoost的train()函数训练了一个模型，该函数针对DMatrix数据进行调用。该矩阵是从Pandas数据框中创建的，该数据框具有列的特征名称。

Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
                                    test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(Xtrain, label=ytrain)

model = xgb.train(xgb_params, dtrain, num_boost_round=60, \
                  early_stopping_rounds=50, maximize=False, verbose_eval=10)

fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model, max_num_features=5, ax=ax)

我现在想使用xgboost.plot_importance()函数查看特征重要性，但结果图中没有显示特征名称。相反，特征被列为f1、f2、f3等，如下所示。

我认为问题在于我将原始的Pandas数据框转换为DMatrix。如何适当地关联特征名称，以便特征重要性图表显示它们？

- stackoverflowuser2010

9个回答

28

在创建xgb.DMatrix时，您需要使用feature_names参数。

dtrain = xgb.DMatrix(Xtrain, label=ytrain, feature_names=feature_names)

- piRSquared

我已经尝试过这个方法，但是在特征重要性中仍然得到了f##作为特征的名称。 - undefined

7

train_test_split函数会将数据框转换为numpy数组，这些数组将不再包含列信息。

你可以像@piRSquared建议的那样，将特征作为参数传递给DMatrix构造函数。或者，你也可以将从train_test_split返回的numpy数组转换为数据框，然后再使用你的代码。

Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
                                    test_size=0.2, random_state=42)

# See below two lines
X_train = pd.DataFrame(data=Xtrain, columns=feature_names)
Xval = pd.DataFrame(data=Xval, columns=feature_names)

dtrain = xgb.DMatrix(Xtrain, label=ytrain)

- Vivek Kumar

6

使用Scikit-Learn封装接口 "XGBClassifier"，plot_importance返回类 "matplotlib Axes"。因此，我们可以使用axes.set_yticklabels。

plot_importance(model).set_yticklabels(['特征1','特征2'])

- Vincent M.K

3

在实例化XGBoost分类器时，您应该指定feature_names：

xgb = xgb.XGBClassifier(feature_names=feature_names)

请注意，如果您将xgb分类器包含在执行任何列选择（例如VarianceThreshold）的sklearn管道中，则当尝试进行拟合或转换时，xgb分类器将失败。

- Gianmario Spacagna

我没有看到'feature_names'作为xgb.XGBClassifier()的参数。这是一个错误吗？ - PingPong

3

我在玩弄feature_names时发现了一种替代方法。在尝试中，我写了下面这段代码，在我目前运行的XGBoost v0.80上可以正常工作。

## Saving the model to disk
model.save_model('foo.model')
with open('foo_fnames.txt', 'w') as f:
    f.write('\n'.join(model.feature_names))

## Later, when you want to retrieve the model...
model2 = xgb.Booster({"nthread": nThreads})
model2.load_model("foo.model")

with open("foo_fnames.txt", "r") as f:
    feature_names2 = f.read().split("\n")

model2.feature_names = feature_names2
model2.feature_types = None
fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model2, max_num_features = 5, ax=ax)

因此，这里将feature_names分别保存并稍后添加回去。由于某种原因，即使值为None，也需要初始化feature_types。

- Peter VanderMeer

1

如果训练过

model = XGBClassifier(
    max_depth = 8, 
    learning_rate = 0.25, 
    n_estimators = 50, 
    objective = "binary:logistic",
    n_jobs = 4
)

# x, y are pandas DataFrame
model.fit(train_data_x, train_data_y)

您可以使用model.get_booster().get_fscore()来获取特征名称和特征重要性，返回结果是一个python字典。

- Badger Titan

xgb.plot_importance() 也适用于 XGBClassifier 的使用 :) - Kattia

1

你知道为什么使用xgb模型时，plot_importance和feature_importance会得到不同的结果吗？ - Noob Programmer

0

使用作为字符串列表传递给matplotlib.axes.Axes.set_yticklabels的feature_names重命名ytick标签。

 fig, ax = plt.subplots(1,1,figsize=(10,10))
 xgb.plot_importance(model, max_num_features=5, ax=ax)
 ax.set_yticklabels(feature_names)
 plt.show()

- Mac

0

您也可以在没有DMatrix的情况下使代码更简单。列名称用作标签：

from xgboost import XGBClassifier, plot_importance
model = XGBClassifier()
model.fit(Xtrain, ytrain)
plot_importance(model)

- Ferro

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Darrrrrren · Accepted Answer

如果您正在使用scikit-learn的包装器，您需要访问底层的XGBoost Booster并在其上设置特征名称，而不是像这样在scikit模型上：

model = joblib.load("your_saved.model")
model.get_booster().feature_names = ["your", "feature", "name", "list"]
xgboost.plot_importance(model.get_booster())