数据类型、数据形状和pad_sequences

5

我无法理解我在这段代码中收到的错误信息。其中x_train部分来自于一个展示如何在Keras中使用LSTM的工作示例。

mytrain部分只是我为了理解各种函数而尝试的一个示例。

从消息中可以看出,x_trainmytrain具有相同的类型和形状。

from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
import numpy as np

max_features = 80
maxlen = 5

# from the example
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print('x_train type: ', type(x_train))
print('x_train shape:', x_train.shape)
sequence.pad_sequences(x_train, maxlen=maxlen)

# my test code
mytrain = np.ones_like(x_train)
print('mytrain type:', type(mytrain))
print('mytrain shape:', mytrain.shape)
mytrain2 = sequence.pad_sequences(mytrain, maxlen=maxlen)

输出:

D:\python\python.exe D:/workspace/YYYY/test/test_sequences.py
Using TensorFlow backend.
x_train type:  <class 'numpy.ndarray'>
x_train shape: (25000,)
Traceback (most recent call last):
  File "D:/workspace/YYYY/test/test_sequences.py", line 22, in <module>
    mytrain2 = sequence.pad_sequences(mytrain, maxlen=10)
  File "D:\python\lib\site-packages\keras\preprocessing\sequence.py", line 42, in pad_sequences
    'Found non-iterable: ' + str(x))
mytrain type: <class 'numpy.ndarray'>
ValueError: `sequences` must be a list of iterables. Found non-iterable: 1
mytrain shape: (25000,)

如果我使用像mytrain = np.asarray([[1, 2, 3]])(可迭代列表)这样的内容,它可以工作,但我无法理解在先前代码中x_trainmytrain之间的区别。

2个回答

4
问题:

打印 x_train 时,您会得到以下输出:

[ [1, 14, 22, 16, 43, 2, 2, 2, 2, 65, 2, 2, 66, 2, 4, 2, 36, 2, 5, 25, 2, 43, 2, 2, 50, 2, 2, 9, 35, 2, 2, 5, 2, 4, 2, 2, 2, 2, 2, 2, 39, 4, 2, 2, 2, 17, 2, 38, 13, 2, 4, 2, 50, 16, 6, 2, 2, 19, 14, 22, 4, 2, 2, 2, 4, 22, 71, 2, 12, 16, 43, 2, 38, 76, 15, 13, 2, 4, 22, 17, 2, 17, 12, 16, 2, 18, 2, 5, 62, 2, 12, 8, 2, 8, 2, 5, 4, 2, 2, 16, 2, 66, 2, 33, 4, 2, 12, 16, 38, 2, 5, 25, 2, 51, 36, 2, 48, 25, 2, 33, 6, 22, 12, 2, 28, 77, 52, 5, 14, 2, 16, 2, 2, 8, 4, 2, 2, 2, 15, 2, 4, 2, 7, 2, 5, 2, 36, 71, 43, 2, 2, 26, 2, 2, 46, 7, 4, 2, 2, 13, 2, 2, 4, 2, 15, 2, 2, 32, 2, 56, 26, 2, 6, 2, 2, 18, 4, 2, 22, 21, 2, 2, 26, 2, 5, 2, 30, 2, 18, 51, 36, 28, 2, 2, 25, 2, 4, 2, 65, 16, 38, 2, 2, 12, 16, 2, 5, 16, 2, 2, 2, 32, 15, 16, 2, 19, 2, 32]
 ...,
 [1, 17, 6, 2, 2, 7, 4, 2, 22, 45, 2, 8, 2, 14, 2, 4, 2, 2, 2, 5, 2, 2, 2, 2, 2, 2, 39, 14, 2, 4, 2, 9, 2, 50, 2, 12, 47, 4, 2, 5, 2, 7, 38, 2, 2, 2, 7, 4, 2, 2, 9, 24, 6, 78, 2, 17, 2, 2, 21, 27, 2, 2, 5, 2, 2, 2, 2, 4, 2, 7, 4, 2, 42, 2, 2, 35, 2, 2, 29, 2, 27, 2, 8, 2, 12, 2, 21, 2, 2, 9, 6, 66, 78, 2, 4, 2, 2, 5, 2, 2, 2, 2, 6, 2, 8, 2, 2, 2, 2, 5, 2, 2, 2, 2, 2, 2, 2, 8, 2, 2, 2, 21, 60, 27, 2, 9, 43, 2, 2, 2, 10, 10, 12, 2, 40, 4, 2, 20, 12, 16, 5, 2, 2, 72, 7, 51, 6, 2, 22, 4, 2, 2, 9]]

每个元素都是一个列表。而mytrain是:

[1 1 1 ..., 1 1 1]

这只是一个整数列表。

解决方案:

这应该可以满足您的需求:

mytrain = []
for i in range(0,x_train.shape[0]):
    mytrain.append(np.ones(len(x_train[i])))
mytrain = np.asarray(mytrain)

确实:

('x_train type: ', <type 'numpy.ndarray'>)
('x_train shape:', (25000,))
('mytrain type:', <type 'numpy.ndarray'>)
('mytrain shape:', (25000,))

谢谢。我注意到使用列表mytrain = np.asarray([[1, 2, 3]])是可以的。然而,print('mytrain shape:', mytrain.shape)会输出mytrain shape: (1, 3),而x_train(N,)。我仍然感到困惑。 - Antonio Sesto
是的,因为现在mytrain是一个由数组组成的数组,简而言之,因此被认为是一个二维数组。X_train是一个由列表组成的数组,被认为是一个一维数组。如果你尝试打印x_train[0].shape,它会告诉你你不能获取一个列表的形状,因为x_train的第0个元素确实是一个列表。如果你打印mytrain[0].shape,你会得到(3,),因为这个元素是一个数组。数组的数组=2D数组。 - Michele Tonutti
谢谢。基本上,它取决于数组中列表的大小:如果它们都具有相同的大小,则被视为2D数组。当你意识到这一点时,它非常明显,但当你第一次读到形状时有点晦涩。 - Antonio Sesto
不,这取决于类型!一个包含数组的数组(mytrain)是一个二维数组。一个包含列表的数组(x_train)是一个一维数组,其中每个元素都是一个列表。mytrain = np.array; mytrain [0] = np.array; x_train = np.array; x_train [0] = list。 - Michele Tonutti

2

我只需要将mytrain放在方括号[]中,问题就得到了解决。

# my test code
mytrain = np.ones_like(x_train)
print('mytrain type:', type(mytrain))
print('mytrain shape:', mytrain.shape)
mytrain2 = sequence.pad_sequences([mytrain], maxlen=maxlen)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接