scikit learn中的预处理 - 单个样本 - 废弃警告

Question

scikit learn中的预处理 - 单个样本 - 废弃警告

48

在Ubuntu上全新安装Anaconda后，我正在使用Scikit-Learn在进行分类任务之前以多种方式对我的数据进行预处理。

from sklearn import preprocessing

scaler = preprocessing.MinMaxScaler().fit(train)
train = scaler.transform(train)    
test = scaler.transform(test)

这一切都很好，但如果我有一个新的样本（temp below），我想要对其进行分类（因此我想以同样的方式进行预处理），那么我会得到

temp = [1,2,3,4,5,5,6,....................,7]
temp = scaler.transform(temp)

然后我收到一个弃用警告...

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 
and will raise ValueError in 0.19. Reshape your data either using 
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
if it contains a single sample.

那么问题是我应该如何重新缩放像这样的单个样本呢？

我想，另一种选择（不是很好的选择）可能是……

temp = [temp, temp]
temp = scaler.transform(temp)
temp = temp[0]

但我相信有更好的方法。

- Chris Arthur

3

好的...你已经自己回答了。这在警告信息中已经说明了：如果你的数据只有一个特征值，就使用X.reshape(-1, 1)，如果你的数据只有一个样本，就使用X.reshape(1, -1)。如果你的数据不是numpy数组，则首先使用np.array(data)。 - pzelasko

8个回答

34

嗯，实际上看起来警告正在告诉你该怎么做。

作为 sklearn.pipeline 阶段的统一接口的一部分，根据经验法则：

当你看到 X 时，它应该是一个具有两个维度的 np.array
当你看到 y 时，它应该是一个具有单个维度的 np.array

因此，你应该考虑以下内容:

temp = [1,2,3,4,5,5,6,....................,7]
# This makes it into a 2d array
temp = np.array(temp).reshape((len(temp), 1))
temp = scaler.transform(temp)

- Ami Tavory

'np'对象是什么？ - Michał Tajchert

2

@Tajchert 很抱歉 - import numpy as np. - Ami Tavory

刚开始学习Python，所以这不是很明显。 - Michał Tajchert

#这将它转换为2维数组 temp = [1,2,3,4,5,5,6,....................,7] #一个实例 temp = np.array(temp).reshape((1, -1)) print(model.predict(temp)) - Manoj Kumar

2

这是否意味着sklearn决定不再支持Python原生列表？还有没有不使用numpy的方法？ - Eb Abadi

你可以通过给未知的维度来简化第二个命令，例如：temp = np.array(temp).reshape((-1, 1))。 - thanos.a

9

这可能会有帮助

temp = ([[1,2,3,4,5,6,.....,7]])

- Bharath M Shetty

3

.values.reshape(-1,1)将被接受而没有警告/提示

.reshape(-1,1)将被接受，但会有弃用警告

- Analytics

0

-1是数组的未知维度。在numpy.reshape文档中了解更多关于“newshape”参数的信息-

# X is a 1-d ndarray

# If we want a COLUMN vector (many/one/unknown samples, 1 feature)
X = X.reshape(-1, 1)

# you want a ROW vector (one sample, many features/one/unknown)
X = X.reshape(1, -1)

- Kuntal Bhattacharjee

0

你可以随时进行重新塑形，例如：

temp = [1,2,3,4,5,5,6,7]

temp = temp.reshape(len(temp), 1)

因为主要问题在于当你的temp.shape是(8,)时，而你需要的是(8,1)。

- Francisco Pereira

0

我遇到了同样的问题并收到了相同的弃用警告。当我收到这条消息时，我正在使用一个numpy数组[23,276]。我尝试按照警告进行重塑，但最终却毫无头绪。然后我从numpy数组中选择了每一行（因为我正在对其进行迭代），并将其分配给一个列表变量。这样就可以正常工作，而没有任何警告出现。

array = []
array.append(temp[0])

然后您可以将Python列表对象（这里是'array'）用作sk-learn函数的输入。虽然不是最有效的解决方案，但对我有用。

- shan89

0

from sklearn.linear_model import LinearRegression
X = df[['x_1']] 
X_n = X.values.reshape(-1, 1)
y = df['target']  
y_n = y.values
model = LinearRegression()
model.fit(X_n, y)

y_pred = pd.Series(model.predict(X_n), index=X.index)

- gregor256

3

当回答一个古老的问题且已经有高赞数且被采纳的答案时，请花时间解释你的新回答如何为该主题做出贡献。请编辑你的回答并解释你的回答为何更好/新颖/改进。 - joanis

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mike · Accepted Answer

只需听取警告提示的内容：

如果您的数据只有一个特征/列，则使用X.reshape(-1, 1)进行数据重塑；如果您的数据包含单个样本，则使用X.reshape(1, -1)。

对于您的示例类型（如果您有多个特征/列）：

temp = temp.reshape(1,-1)

针对一个特定的功能/列：

temp = temp.reshape(-1,1)