我晚加入了这个团队,但我带来了一个新的解决方案/见解,使用
Pipeline()
:
- 子管道包含您的模型(回归/分类器)作为单个组件
- 主管道由常规组件组成:
- 预处理组件,例如缩放器、降维等
- 您的重新拟合
GridSearchCV(regressor, param)
,使用所需/最佳参数为您的模型进行调整(注意:不要忘记refit=True
),基于@Vivek Kumar的备注ref
from sklearn.linear_model import SGDRegressor
from sklearn.compose import TransformedTargetRegressor
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
sgd_subpipeline = Pipeline(steps=[
('SGD', SGDRegressor(random_state=0)),
])
param_grid = {
'SGD__loss': ['squared_error', 'epsilon_insensitive', 'squared_epsilon_insensitive', 'huber'],
'SGD__penalty': ['l2', 'l1', 'elasticnet'],
'SGD__alpha': [0.0001, 0.001, 0.01],
'SGD__l1_ratio': [0.15, 0.25, 0.5]
}
grid_search = GridSearchCV(sgd_subpipeline, param_grid, cv=5, n_jobs=-1, verbose=True, refit=True)
grid_search.fit(X_train, y_train)
best_sgd_reg = grid_search.best_estimator_
print('=========================================[Best Hyperparameters info]=====================================')
print(grid_search.best_params_)
print('Best MAE: %.3f' % grid_search.best_score_)
print('Best Config: %s' % grid_search.best_params_)
print('==========================================================================================================')
sgd_pipeline = Pipeline(steps=[('scaler', MinMaxScaler()),
('SGD', grid_search),
])
sgd_pipeline.fit(X_train, y_train)
from sklearn import set_config
set_config(display="text")
![img](https://i.imgur.com/vI7qJkR.jpg)
或者,您可以使用
TransformedTargetRegressor
(特别是如果您需要对
y
进行反缩放,如@mloning在
这里评论的那样),并将此组件链接起来,包括您的回归模型
ref。
注意:
- 除非需要缩放,否则不需要设置
transform
参数;请查看相关帖子1,2,3,4,以及它的score
- 注意这个关于不进行缩放here的备注,因为:
...使用缩放y
实际上会丢失单位....
...在管道之外进行转换...
from sklearn.linear_model import SGDRegressor
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
sgd_subpipeline = Pipeline(steps=[
('SGD', SGDRegressor(random_state=0)),
])
param_grid = {
'SGD__loss': ['squared_error', 'epsilon_insensitive', 'squared_epsilon_insensitive', 'huber'],
'SGD__penalty': ['l2', 'l1', 'elasticnet'],
'SGD__alpha': [0.0001, 0.001, 0.01],
'SGD__l1_ratio': [0.15, 0.25, 0.5]
}
grid_search = GridSearchCV(sgd_subpipeline, param_grid, cv=5, n_jobs=-1, verbose=True, refit=True)
grid_search.fit(X_train, y_train)
best_sgd_reg = grid_search.best_estimator_
print('=========================================[Best Hyperparameters info]=====================================')
print(grid_search.best_params_)
print('Best MAE: %.3f' % grid_search.best_score_)
print('Best Config: %s' % grid_search.best_params_)
print('==========================================================================================================')
from sklearn.compose import TransformedTargetRegressor
TTR_sgd_pipeline = Pipeline(steps=[('scaler', MinMaxScaler()),
('TTR', TransformedTargetRegressor(regressor= grid_search,
check_inverse=False))
])
TTR_sgd_pipeline.fit(X_train, y_train)
from sklearn import set_config
set_config(display="diagram")
![img](https://i.imgur.com/jKtaOQx.jpg)