我需要开发一个模型,该模型将免费(或接近免费)地消除假负值。为此,我绘制了召回率-精度曲线,并确定阈值应设置为0.11。
我的问题是,如何在模型训练时定义阈值?如果稍后在评估时定义它就没有意义,因为它不会反映新数据。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
rfc_model = RandomForestClassifier(random_state=101)
rfc_model.fit(X_train, y_train)
rfc_preds = rfc_model.predict(X_test)
recall_precision_vals = []
for val in np.linspace(0, 1, 101):
predicted_proba = rfc_model.predict_proba(X_test)
predicted = (predicted_proba[:, 1] >= val).astype('int')
recall_sc = recall_score(y_test, predicted)
precis_sc = precision_score(y_test, predicted)
recall_precision_vals.append({
'Threshold': val,
'Recall val': recall_sc,
'Precis val': precis_sc
})
recall_prec_df = pd.DataFrame(recall_precision_vals)
有什么想法吗?