我已经使用随机森林分类器来获取对数据集中特定行有贡献的特征。然而,我得到了两个特征值,而不是一个。我不太确定为什么会这样。以下是我的代码。
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from treeinterpreter import treeinterpreter as ti
from treeinterpreter import treeinterpreter as ti
X, y = make_classification(n_samples=1000,
n_features=6,
n_informative=3,
n_classes=2,
random_state=0,
shuffle=False)
# Creating a dataFrame
df = pd.DataFrame({'Feature 1':X[:,0],
'Feature 2':X[:,1],
'Feature 3':X[:,2],
'Feature 4':X[:,3],
'Feature 5':X[:,4],
'Feature 6':X[:,5],
'Class':y})
y_train = df['Class']
X_train = df.drop('Class',axis = 1)
rf = RandomForestClassifier(n_estimators=50,
random_state=0)
rf.fit(X_train, y_train)
print ("-"*20)
importances = rf.feature_importances_
indices = X_train.columns
instances = X_train.loc[[60]]
print(rf.predict(instances))
print ("-"*20)
prediction, biases, contributions = ti.predict(rf, instances)
for i in range(len(instances)):
print ("Instance", i)
print ("-"*20)
print ("Bias (trainset mean)", biases[i])
print ("-"*20)
print ("Feature contributions:")
print ("-"*20)
for c, feature in sorted(zip(contributions[i],
indices),
key=lambda x: ~abs(x[0].any())):
print (feature, np.round(c, 3))
print ("-"*20)
这是我的代码输出结果。为什么偏置和特征值会输出两个值而不是一个?
--------------------
[0]
--------------------
Instance 0
--------------------
Bias (trainset mean) [ 0.49854 0.50146]
--------------------
Feature contributions:
--------------------
Feature 1 [ 0.16 -0.16]
Feature 2 [-0.024 0.024]
Feature 3 [-0.154 0.154]
Feature 4 [ 0.172 -0.172]
Feature 5 [ 0.029 -0.029]
Feature 6 [ 0.019 -0.019]