我使用read_sql_query
从数据库中生成了一个pandas数据帧。它有三列,“results,speed,weight”。
我想使用scikit-learn LinearRegression
来拟合results = f(speed, weight)
我尚未找到正确的语法,可以使我传递这个数据框或其列切片到LinearRegression.fit(y, X)
。
print df['result'].shape
print df[['speed', 'weight']].shape
(8L,)
(8, 2)
但我无法将它传递给
fit
。lm.fit(df['result'], df[['speed', 'weight']])
它会抛出一个“弃用警告(deprecation warning)”和一个“值错误(ValueError)”。
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19.
ValueError: Found arrays with inconsistent numbers of samples: [1 8]
如何高效、清晰地将目标和特征的数据框传递给fit
操作?
以下是我生成示例的方法:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
data2 = np.random.randint(1, high=100, size=len(days))
data3 = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'test': days, 'result': data,'speed': data2,'weight': data3})
df = df.set_index('test')
print(df)
df['result'].values
sometimes you needdf.iloc[:, :-1]
- Shihe Zhang