如何为scikit learn随机森林模型设置阈值

Question

如何为scikit learn随机森林模型设置阈值

11

在查看精确率-召回率曲线后，如果我想将阈值设置为0.4，该如何将其应用于我的随机森林模型（二分类）中？对于任何概率＜0.4的数据，将其标记为0，对于任何概率≥0.4的数据，将其标记为1。

from sklearn.ensemble import RandomForestClassifier
  random_forest = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=12)
  random_forest.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
  predicted = random_forest.predict(X_test)
accuracy = accuracy_score(y_test, predicted)

文档精确度-召回率

- BigData

3个回答

1

random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, y_train)

threshold = 0.4

predicted = random_forest.predict_proba(X_test)
predicted[:,0] = (predicted[:,0] < threshold).astype('int')
predicted[:,1] = (predicted[:,1] >= threshold).astype('int')


accuracy = accuracy_score(y_test, predicted)
print(round(accuracy,4,)*100, "%")

这个错误是指最后精度部分出现了一个问题：“ValueError: Can't handle mix of binary and multilabel-indicator”。

- BigData

0

sklearn.metrics.accuracy_score 接受一个一维数组，但是你的 predicted 数组是二维的。这会导致错误。
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html

- pawan

不要使用 predicted，选择您想查看 accuracy_score 的预测类。使用之一 predicted[:,0] 或 predicted[:,1]，而不是 predicted。 - ddragosd

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Stev · Accepted Answer

假设你正在进行二元分类，这很简单：

threshold = 0.4

predicted_proba = random_forest.predict_proba(X_test)
predicted = (predicted_proba [:,1] >= threshold).astype('int')

accuracy = accuracy_score(y_test, predicted)