我正在使用Python构建一个应用程序,可以从数据框中预测Pm2.5污染的值。我正在使用11月份的值,并尝试首先构建线性回归模型。如何在不使用日期的情况下进行线性回归?我只需要预测Pm2.5的值,日期已知。
这是我迄今为止尝试过的:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
data = pd.read_csv("https://raw.githubusercontent.com/iulianastroia/csv_data/master/final_dataframe.csv")
data['day'] = pd.to_datetime(data['day'], dayfirst=True)
#Splitting the dataset into training(70%) and test(30%)
X_train, X_test, y_train, y_test = train_test_split(data['day'], data['pm25'], test_size=0.3,
random_state=0
)
#Fitting Linear Regression to the dataset
lin_reg = LinearRegression()
lin_reg.fit(data['day'], data['pm25'])
这段代码会抛出以下错误:
ValueError: Expected 2D array, got 1D array instead:
array=['2019-11-01T00:00:00.000000000' '2019-11-01T00:00:00.000000000'
'2019-11-01T00:00:00.000000000' ... '2019-11-30T00:00:00.000000000'
'2019-11-30T00:00:00.000000000' '2019-11-30T00:00:00.000000000'].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
lin_reg.fit(data[['day']], data['pm25'])
,注意双括号。 - Quang HoangX_train
和y_train
来拟合你的模型呢? - petezurich