我已经建立了一个分类器,希望将其保存以备将来使用。该分类器包含不同的算法(逻辑回归、朴素贝叶斯、支持向量机):
X, y = tfidf(df, ngrams = 1)
X, y = under_sample.fit_resample(X, y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=40)
df_result = df_result.append(training_naive(X_train, X_test, y_train, y_test), ignore_index = True)
df_result = df_result.append(training_logreg(X_train, X_test, y_train, y_test), ignore_index = True)
df_result = df_result.append(training_svm(X_train, X_test, y_train, y_test), ignore_index = True)
这是我的代码的最后一步,我在这里比较不同的算法。training_svm / logreg 和 naive 是函数。例如,training_svm 的定义如下:
def training_svm(X_train_log, X_test_log, y_train_log, y_test_log):
folds = StratifiedKFold(n_splits = 3, shuffle = True, random_state = 40)
clf = svm.SVC(kernel='linear') # Linear Kernel
clf.fit(X_train_log, y_train_log)
res = pd.DataFrame(columns = ['Preprocessing', 'Model', 'Precision', 'Recall', 'F1-score', 'Accuracy'])
y_pred = clf.predict(X_test_log)
f1 = f1_score(y_pred, y_test_log, average = 'weighted')
pres = precision_score(y_pred, y_test_log, average = 'weighted')
rec = recall_score(y_pred, y_test_log, average = 'weighted')
acc = accuracy_score(y_pred, y_test_log)
res = res.append({'Model': f'SVM', 'Precision': pres,
'Recall': rec, 'F1-score': f1, 'Accuracy': acc}, ignore_index = True)
return res
我想使用和测试新数据,因此我想知道如何保存并重复使用它。 我认为应该像这样做:
import pickle
# save
with open('model.pkl','wb') as f:
pickle.dump(clf,f)
# load
with open('model.pkl', 'rb') as f:
clf2 = pickle.load(f)
clf2.predict(X[0:1])
请说明如何将其扩展到我的项目中?