sklearn:使用Pipeline和TransformedTargetRegressor对x(数据)和y(目标)进行缩放

3

我想同时使用Pipeline和TransformedTargetRegressor来处理所有的缩放(数据和目标):是否可以混合使用Pipeline和TransformedTargetRegressor?如何从TransformedTargetRegressor中获取结果?

$ cat test_ttr.py
#!/usr/bin/python
# -*- coding: UTF-8 -*-

from sklearn.datasets import make_regression
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.pipeline import Pipeline
from sklearn.compose import TransformedTargetRegressor

def main():
    x, y = make_regression()

    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

    model = linear_model.Ridge(alpha=1)

    pipe = Pipeline([('scale', preprocessing.StandardScaler()), ('model', model)])
    treg = TransformedTargetRegressor(regressor=pipe, transformer=preprocessing.MinMaxScaler())

    treg.fit(x_train, y_train)

    print(pipe.get_params()['model__alpha']) # OK !
    print(treg.get_params()['regressor__model__coef']) # KO ?!

if __name__ == '__main__':
    main()

但是无法从TransformedTargetRegressor中得到结果(例如系数)

1
Traceback (most recent call last):
  File ".\test_ttr.py", line 26, in <module>
    main()
  File ".\test_ttr.py", line 23, in main
    print(treg.get_params()['regressor__model__coef']) # KO ?!
TypeError: 'TransformedTargetRegressor' object is not subscriptable
2个回答

4

错误出现在您的代码行中。

print(treg.get_params()['regressor__model__coef']) # KO ?!

由于TransformedTargetRegressor没有参数'regressor__model__coef'

您可以通过执行treg.get_params()查看所有可用参数,然后返回:

{'check_inverse': True,
 'func': None,
 'inverse_func': None,
 'regressor': Pipeline(memory=None,
          steps=[('scale',
                  StandardScaler(copy=True, with_mean=True, with_std=True)),
                 ('model',
                  Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None,
                        normalize=False, random_state=None, solver='auto',
                        tol=0.001))],
          verbose=False),
 'regressor__memory': None,
 'regressor__model': Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
       random_state=None, solver='auto', tol=0.001),
 'regressor__model__alpha': 1,
 'regressor__model__copy_X': True,
 'regressor__model__fit_intercept': True,
 'regressor__model__max_iter': None,
 'regressor__model__normalize': False,
 'regressor__model__random_state': None,
 'regressor__model__solver': 'auto',
 'regressor__model__tol': 0.001,
 'regressor__scale': StandardScaler(copy=True, with_mean=True, with_std=True),
 'regressor__scale__copy': True,
 'regressor__scale__with_mean': True,
 'regressor__scale__with_std': True,
 'regressor__steps': [('scale',
   StandardScaler(copy=True, with_mean=True, with_std=True)),
  ('model',
   Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
         random_state=None, solver='auto', tol=0.001))],
 'regressor__verbose': False,
 'transformer': MinMaxScaler(copy=True, feature_range=(0, 1)),
 'transformer__copy': True,
 'transformer__feature_range': (0, 1)}

你可以通过使用以下方法获得结果,例如R2分数:
treg.score(x_test, y_test)

该函数返回

0.7506837388137267

要进行预测,可以使用以下方法:

treg.predict(x_test)

这份文档非常实用,您可以在这里这里阅读它。


是的,当然。我正在寻找一种方法来从“TransformedTargetRegressor”中获取任何类型的信息(coef_,...):似乎只能获取其中的一些信息,而不是全部。 - fghoussen
从你的问题中并不清楚,你想要获取什么其他信息。你还需要什么? - Kim Tang
任何类型的模型所需的任何信息(输入,输出):coef_,alpha等等。或者根据您使用的模型类型而定的任何其他类型的信息。 - fghoussen
您可以使用“treg.get_params()”获取模型的所有其他信息和alpha参数,就像我在答案中提到的那样。您可以查看返回的所有参数,因为它非常详细。 - Kim Tang
例如,您无法从treg.get_params()中获取coef_estimators_(bagging、forests)的值。 - fghoussen

1

我找到的最佳解决方案(不确定直接访问成员是否很好):

$ cat test_ttr.py
#!/usr/bin/python
# -*- coding: UTF-8 -*-

from sklearn.datasets import make_regression
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.pipeline import Pipeline
from sklearn.compose import TransformedTargetRegressor

def main():
    x, y = make_regression()

    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

    model = linear_model.Ridge(alpha=1)

    pipe = Pipeline([('scale', preprocessing.StandardScaler()), ('model', model)])
    treg = TransformedTargetRegressor(regressor=pipe, transformer=preprocessing.MinMaxScaler())

    treg.fit(x_train, y_train)

    print(treg.regressor_['model'].coef_)
    print(treg.regressor_['model'].alpha)

if __name__ == '__main__':
    main()


$ python test_ttr.py
[-1.13077347e-02  4.44189754e-03  2.39262548e-03  1.72868998e-02
  9.98554629e-03  4.66877821e-02 -4.25349208e-03  1.94027088e-03
  5.64007062e-05  3.08491096e-03 -3.50818087e-05 -1.11165790e-02
 -6.67893402e-03 -3.01372675e-03  3.70455557e-03  5.05148384e-03
  9.39056280e-03  5.63774373e-03 -4.07545049e-03 -5.98363493e-03
 -8.21146459e-03  1.20560099e-02  5.79147139e-03 -3.87135045e-03
  3.62289162e-03 -5.32527728e-03  1.05227189e-02 -3.32636550e-03
  2.24062002e-02  5.36611024e-03  4.42517510e-03  2.98492436e-04
 -3.48722166e-03 -8.16323005e-03 -1.74921354e-03 -2.47793718e-03
  2.00056722e-02  9.02842425e-03 -4.22978758e-03  2.37737450e-03
 -7.93388529e-03  1.22910175e-02  1.34225568e-03 -3.51697078e-03
  4.20992326e-03  4.35675123e-03 -8.07619773e-04  1.13628592e-02
  4.12219590e-03  6.92190818e-03 -2.44482599e-03 -3.12429604e-03
 -5.43930166e-03  3.27253280e-02  4.11909724e-03  3.83302056e-03
  1.34754164e-02 -8.62591922e-04 -4.14770516e-03 -7.02794996e-03
 -2.04141679e-03 -8.93807591e-04 -1.50736158e-03  3.51801088e-03
 -1.26757035e-02 -8.46096567e-04  6.70465585e-02 -1.12191639e-02
  6.08120935e-03 -9.07017386e-03 -2.13280853e-03 -2.24764380e-03
  6.98012623e-03 -9.26042982e-03 -2.93708218e-03  5.74605237e-04
 -1.41308272e-03  5.24419314e-03  3.41054848e-02  7.80090716e-03
  7.33259527e-02 -4.78241365e-03  2.38806342e-04  3.84449219e-04
  5.49127586e-02 -6.91505707e-04 -4.14642042e-04  3.43961614e-03
  5.20966922e-04 -5.47828158e-03 -7.04740862e-04  4.68760531e-02
  4.12140344e-03 -5.16221700e-03 -7.35235898e-03  7.68674585e-03
 -4.39094201e-03  5.05034775e-03  5.75523532e-03 -6.17177294e-03]
1

对于stackoverflow的用户,如果可能的话,请随意改进这个答案!


这是完全可以的解决方案。 - Mohsin hasan
在我看来,那是一个正确的解决方案,因为regressor实际上是一个管道(而已经拟合的回归器regressor_也是如此),因此您需要输入它以访问学习的参数。另一种进入管道的方法是通过named_stepstreg.regressor_.named_steps['model'].coef_),但这与您的解决方案完全相同。 - amiola

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接