我想进行一次网格搜索来优化我的模型,但执行时间太长了。我的全部数据集只有大约15,000个观测值和30-40个变量。我已经成功地通过网格搜索运行了一个随机森林,用了一个半小时左右,但现在我转向了SVC,它已经运行了超过9个小时,还没有完成。以下是我的交叉验证代码示例:
from sklearn.model_selection import GridSearchCV
from sklearn import svm
from sklearn.svm import SVC
SVM_Classifier= SVC(random_state=7)
param_grid = {'C': [0.1, 1, 10, 100],
'gamma': [1,0.1,0.01,0.001],
'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
'degree' : [0, 1, 2, 3, 4, 5, 6]}
grid_obj = GridSearchCV(SVM_Classifier,
return_train_score=True,
param_grid=param_grid,
scoring='roc_auc',
cv=3,
n_jobs = -1)
grid_fit = grid_obj.fit(X_train, y_train)
SVMC_opt = grid_fit.best_estimator_
print('='*20)
print("best params: " + str(grid_obj.best_estimator_))
print("best params: " + str(grid_obj.best_params_))
print('best score:', grid_obj.best_score_)
print('='*20)
我已经将交叉验证从10减少到3,并且使用了n_jobs=-1,这样我就可以利用所有的核心。除此之外,还有什么其他措施可以加速处理过程吗?