Sklearn：数值错误：发现具有不一致样本数量的输入变量：[1, 6]

Question

Sklearn：数值错误：发现具有不一致样本数量的输入变量：[1, 6]

4

X = [ 1994.  1995.  1996.  1997.  1998.  1999.]
y = [1.2 2.3 3.4 4.5 5.6 6.7]
clf = LinearRegression()
clf.fit(X,y)

这会导致上述错误。X和y都是numpy数组。

我该如何消除这个错误？

我尝试了这里给出的方法，通过使用X.reshape((-1,1))和y.reshape((-1,1))来重新整形X和y。然而，它并没有奏效。

- humble

你怎么用那个语法声明一个numpy数组的？数值必须用逗号隔开还是可以只有一个数值？你可以使用X = [1994, 1995, 1996, 1997, 1998, 1999]。 - Yaman Ahlawat

使用X.reshape（-1,1）重新塑造X，无需重新塑造y。 - Vivek Kumar

1

a = [1994, 1995, 1996, 1997, 1998, 1999]，使用X=np.array(a)将其转换为数组。y同理。当我打印X时，它会显示我所展示的内容。 - humble

3个回答

0

当我进行训练测试分离时，我遇到了一个类似的问题，即样本中存在不平衡的变量。对于我的情况，我通过传递stratify参数来解决它。

X_train, X_test, y_train, y_test  = train_test_split(
    X, y, test_size=0.3, stratify=y, random_state=42)

- Patrick Muoka

-1

import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.cross_validation import train_test_split

df_house = pd.read_csv('CSVFiles/kc_house_data.csv',index_col = 0,engine ='c')

df_house.drop(df_house.columns[[1, 0, 10, 11,12, 13, 14, 15, 16, 17,18]], axis=1, inplace=True)

reg=linear_model.LinearRegression()
df_y=df_house[df_house.columns[1:2]]


df_house.drop(df_house.columns[[6, 7, 8, 5]], axis=1, inplace=True)


x_train, x_test, y_train, y_test=train_test_split(df_house, df_y, test_size=0.1, random_state=7)

print(x_train.shape, y_train.shape)

reg.fit(x_train, x_test)

LinearRegression(copy_x=True, fit_intercept=True, n_jobs=1, normalize=False )

My Shape is :
(19451, 5) (19451, 1)

ValueError: Found input variables with inconsistent numbers of samples: [19451, 2162]

- Syed Hasan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Vivek Kumar · Accepted Answer

这对我来说很有效。在重新塑形之前，请确保数组是NumPy数组。

import numpy as np
from sklearn.linear_model import LinearRegression

X = np.asarray([ 1994.,  1995.,  1996.,  1997.,  1998.,  1999.])
y = np.asarray([1.2, 2.3, 3.4, 4.5, 5.6, 6.7])

clf = LinearRegression()
clf.fit(X.reshape(-1,1),y)


clf.predict([1997])
#Output: array([ 4.5])

clf.predict([2001])
#Output: array([ 8.9])