使用Python进行多项式回归后预测未来数值

Question

使用Python进行多项式回归后预测未来数值

pythontensorflowmachine-learningscikit-learn

3

我目前正在使用TensorFlow和SkLearn尝试创建一个模型，基于摄氏度下的室外温度来预测某个产品X的销售额。我将温度数据集设置为x变量，并将销售额设置为y变量。如下图所示，温度和销售额之间存在某种相关性：

首先，我尝试进行线性回归以查看它的拟合效果。以下是该代码：

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(x_train, y_train) #fit tries to fit the x variable and y variable.

#Let's try to plot it out.
y_pred = model.predict(x_train)

plt.scatter(x_train,y_train)
plt.plot(x_train,y_pred,'r')
plt.legend(['Predicted Line', 'Observed data'])
plt.show()

这导致预测线的拟合程度相当差：

非常好的一点是，sklearn提供了一个功能，可以基于温度预测一个值，所以如果我想写这样一个模型：

model.predict(15)

我会得到输出。

array([6949.05567873])

这正是我想要的，我只是希望这行更加适合，因此我尝试使用sklearn进行多项式回归，具体如下：

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=8, include_bias=False) #the bias is avoiding the need to intercept
x_new = poly.fit_transform(x_train)
new_model = LinearRegression()
new_model.fit(x_new,y_train)

#plotting
y_prediction = new_model.predict(x_new) #this actually predicts x...?
plt.scatter(x_train,y_train)
plt.plot(x_new[:,0], y_prediction, 'r')
plt.legend(['Predicted line', 'Observed data'])
plt.show()

现在这条线看起来更适合了：

我的问题不是我不能使用 new_model.predict(x)，因为它会导致"ValueError: shapes (1,1) and (8,) not aligned: 1 (dim 1) != 8 (dim 0)"。我知道这是因为我正在使用一个8次多项式，但有没有办法让我基于一个温度预测y轴，使用多项式回归模型？

- Thomas

你可以尝试使用 new_model.predict([x for _ in range(8)]) 吗？ - Sheldore

如果我写了 new_model.predict([[30 for x_train in range(8)]])，我确实会得到一个输出，但输出为 **array([2862.55322278])**。根据模型，我期望的输出超过15k，你有什么想法，为什么我会得到这样一个低的数字？ - Thomas

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Manny · Accepted Answer

尝试使用new_model.predict([x**a for a in range(1,9)])或者根据您之前使用的代码，可以使用new_model.predict(poly.fit_transform(x))。

由于你拟合了一条线：

y = ax^1 + bx^2 + ... + h*x^8

你需要以同样的方式转换输入，即将其转换为一个没有截距和斜率项的多项式。这是你传递给线性回归训练函数的内容。它学习该多项式的斜率项。你展示的图表只包含你索引进入的x^1项（x_new[:,0]），这意味着你正在使用具有更多列的数据。

最后注意一点：始终确保您的训练数据和未来/验证数据经过相同的预处理步骤，以确保您的模型正常工作。

以下是一些详细信息：

让我们从在合成数据上运行你的代码开始。

from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from numpy.random import rand

x_train = rand(1000,1)
y_train = rand(1000,1)

poly = PolynomialFeatures(degree=8, include_bias=False) #the bias is avoiding the need to intercept
x_new = poly.fit_transform(x_train)
new_model = LinearRegression()
new_model.fit(x_new,y_train)

#plotting
y_prediction = new_model.predict(x_new) #this predicts y
plt.scatter(x_train,y_train)
plt.plot(x_new[:,0], y_prediction, 'r')
plt.legend(['Predicted line', 'Observed data'])
plt.show()

现在，我们可以将一个x值转换为不带截距的8次多项式，从而预测y值。

print(new_model.predict(poly.fit_transform(0.25)))

[[0.47974408]]