如何在sklearn/python中修复“ValueError: Expected 2D array, got 1D array instead”错误？

Question

如何在sklearn/python中修复“ValueError: Expected 2D array, got 1D array instead”错误？

6

你好。我刚开始学习机器学习，想通过一个简单的例子来学习。所以，我希望使用分类器根据文件类型对我的硬盘中的文件进行分类。我写的代码如下：

import sklearn
import numpy as np


#Importing a local data set from the desktop
import pandas as pd
mydata = pd.read_csv('file_format.csv',skipinitialspace=True)
print mydata


x_train = mydata.script
y_train = mydata.label

#print x_train
#print y_train
x_test = mydata.script

from sklearn import tree
classi = tree.DecisionTreeClassifier()

classi.fit(x_train, y_train)

predictions = classi.predict(x_test)
print predictions

我遇到了以下错误：

  script  class  div   label
0       5      6    7    html
1       0      0    0  python
2       1      1    1     csv
Traceback (most recent call last):
  File "newtest.py", line 21, in <module>
  classi.fit(x_train, y_train)
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 790, in fit
    X_idx_sorted=X_idx_sorted)
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 116, in fit
    X = check_array(X, dtype=DTYPE, accept_sparse="csc")
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/utils/validation.py", line 410, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[ 5.  0.  1.].
Reshape your data either using array.reshape(-1, 1) if your data has a 
single feature or array.reshape(1, -1) if it contains a single sample.

如果有人可以帮我解决这个代码问题，那对我来说将会非常有帮助！！

- Karthik Bhojaraj

2

提示：阅读错误信息。 - Julien

6个回答

3

X=dataset.iloc[:, 0].values
y=dataset.iloc[:, 1].values

regressor=LinearRegression()
X=X.reshape(-1,1)
regressor.fit(X,y)

我有以下代码。重塑运算符不是就地操作符。因此，我们必须将其值替换为重塑后的值，如上所示。

- Ameya Marathe

1

您需要创建一个二维数组。

您可能会像这样输入：

model.predict([1,2,0,4])

但这是错误的

你需要像这样输入：

model.predict([[1,2,0,4]])

这里有2个方括号而不是一个。

- Gaurav Singh Rathore

0

假设最初你有以下内容：

X = dataset.iloc[:, 1].values

这表示您有包括所有行的第一列。现在按以下方式进行设置

X = dataset.iloc[:, 1:2].values

这里的1:2表示[1,2)，类似于上限形式。

- codexaxor

0

在选择列时轻松地将其变为二维。

x_train = mydata[['script']]
y_train = mydata[['label']]

- sameer_nubia

0

一个简单的自动调整形状的解决方案是，不要使用：

X=dataset.iloc[:, 0].values

您可以使用：

X=dataset.iloc[:, :-1].values

如果你只有两列并且想要获取第一列，那么这段代码会获取除了最后一列之外的所有列。

- Nabreezy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- cs95 · Accepted Answer

将输入传递给分类器时，请传递2D数组（形状为（M，N），其中N≥1），而不是1D数组（其形状为（N，））。错误消息非常清楚，

如果您的数据只有一个特征，请使用array.reshape(-1, 1)进行数据重塑；如果您的数据包含一个样本，请使用array.reshape(1, -1)。

from sklearn.model_selection import train_test_split

# X.shape should be (N, M) where M >= 1
X = mydata[['script']]  
# y.shape should be (N, 1)
y = mydata['label'] 
# perform label encoding if "label" contains strings
# y = pd.factorize(mydata['label'])[0].reshape(-1, 1) 
X_train, X_test, y_train, y_test = train_test_split(
                      X, y, test_size=0.33, random_state=42)
...

clf.fit(X_train, y_train) 
print(clf.score(X_test, y_test))

其他一些有用的提示 -

将数据拆分为有效的训练和测试部分。不要使用训练数据进行测试 - 这会导致分类器强度的不准确估计。
我建议对标签进行因子分解，这样处理整数会更容易。