我认为Scikit中用于多类分类的f1_macro将会使用以下公式进行计算:
2 * Macro_precision * Macro_recall / (Macro_precision + Macro_recall)
但是手动检查显示结果不同,比scikit计算的值略高。我查阅了文档,没有找到公式。
例如,鸢尾花数据集产生了以下结果:
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
data=pd.DataFrame({
'sepal length':iris.data[:,0],
'sepal width':iris.data[:,1],
'petal length':iris.data[:,2],
'petal width':iris.data[:,3],
'species':iris.target
})
X=data[['sepal length', 'sepal width', 'petal length', 'petal width']]
y=data['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
clf=RandomForestClassifier(n_estimators=100)
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
#Compute metrics using scikit
from sklearn import metrics
print(metrics.confusion_matrix(y_test, y_pred))
print(metrics.classification_report(y_test, y_pred))
pre_macro = metrics.precision_score(y_test, y_pred, average="macro")
recall_macro = metrics.recall_score(y_test, y_pred, average="macro")
f1_macro_scikit = metrics.f1_score(y_test, y_pred, average="macro")
print ("Prec_macro_scikit:", pre_macro)
print ("Rec_macro_scikit:", recall_macro)
print ("f1_macro_scikit:", f1_macro_scikit)
输出:
Prec_macro_scikit: 0.9555555555555556
Rec_macro_scikit: 0.9666666666666667
f1_macro_scikit: 0.9586466165413534
然而,使用以下内容进行手动计算:
f1_macro_manual = 2 * pre_macro * recall_macro / (pre_macro + recall_macro )
yields:
f1_macro_manual: 0.9610789980732178
我正试图弄清楚这种差异。