Scikit-learn:我的线性回归不是一条直线,而是杂乱无章的

3

我试图简单地绘制一个回归线,但是得到了混乱的结果。这是因为我用了两个特征来拟合模型,所以唯一适当的可视化方式是绘制一个三维平面吗?

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

# prepare data
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)[['AGE','RM']]
y = boston.target

# split dataset into training and test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=33)

# apply linear regression on dataset
lm = LinearRegression()
lm.fit(X_train, y_train)
pred_train = lm.predict(X_train)
pred_test = lm.predict(X_test)

#plot relationship between RM and price
plt.scatter(X_train['RM'],
            y_train,
            c='g',
            s=40,
            alpha=0.5)
plt.plot(X_train['RM'], pred_train, color='r')
plt.title('Relationship between RM and Price')
plt.ylabel('Price')
plt.xlabel('RM')

enter image description here

2个回答

4
你说得对。你正在训练多个特征,即AGE和RM。但是你只用了一个特征,即RM,来绘制二维图。尝试获取三维图形。通常,使用两个特征的线性回归会得到一个平面。这仍然是一个线性回归。这就是为什么我们使用术语“超平面”。对于单个特征,它解析为一条线,对于两个特征,它解析为一个平面,以此类推。
以下是3D输出:
plt3d = plt.figure().gca(projection='3d')
plt3d.view_init(azim=135)
plt3d.plot_trisurf(X_train['RM'].values, X_train['AGE'].values, pred_train, alpha=0.7, antialiased=True)

enter image description here


0
问题在于绘图时必须按顺序排列参数。
'plt.plot(np.sort(X_train ['RM']), np.sort(pred_train), color ='r')'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

# prepare data
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)[['AGE','RM']]
y = boston.target

# split dataset into training and test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=33)

# apply linear regression on dataset
lm = LinearRegression()
lm.fit(X_train, y_train)
pred_train = lm.predict(X_train)
pred_test = lm.predict(X_test)

#plot relationship between RM and price
plt.scatter(X_train['RM'],
            y_train,
            c='g',
            s=40,
            alpha=0.5)
plt.plot(np.sort(X_train['RM']), np.sort(pred_train), color='r')
plt.title('Relationship between RM and Price')
plt.ylabel('Price')
plt.xlabel('RM')
plt.show()

结果为: output-plot

如果您进行3D绘图,可能会更容易地可视化RM和年龄之间的关系 3d-plot


这里有一个stackoverflow的问题,我的回答提供了Python代码,用于进行3D散点图、3D曲面图和等高线图的表面拟合:https://stackoverflow.com/questions/55030369/python-surface-fitting-of-variables-of-different-dimensionto-get-unknown-paramet - James Phillips

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接