数据类型、数据形状和pad_sequences

Question

数据类型、数据形状和pad_sequences

5

我无法理解我在这段代码中收到的错误信息。其中x_train部分来自于一个展示如何在Keras中使用LSTM的工作示例。

mytrain部分只是我为了理解各种函数而尝试的一个示例。

从消息中可以看出，x_train和mytrain具有相同的类型和形状。

from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
import numpy as np

max_features = 80
maxlen = 5

# from the example
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print('x_train type: ', type(x_train))
print('x_train shape:', x_train.shape)
sequence.pad_sequences(x_train, maxlen=maxlen)

# my test code
mytrain = np.ones_like(x_train)
print('mytrain type:', type(mytrain))
print('mytrain shape:', mytrain.shape)
mytrain2 = sequence.pad_sequences(mytrain, maxlen=maxlen)

输出：

D:\python\python.exe D:/workspace/YYYY/test/test_sequences.py
Using TensorFlow backend.
x_train type:  <class 'numpy.ndarray'>
x_train shape: (25000,)
Traceback (most recent call last):
  File "D:/workspace/YYYY/test/test_sequences.py", line 22, in <module>
    mytrain2 = sequence.pad_sequences(mytrain, maxlen=10)
  File "D:\python\lib\site-packages\keras\preprocessing\sequence.py", line 42, in pad_sequences
    'Found non-iterable: ' + str(x))
mytrain type: <class 'numpy.ndarray'>
ValueError: `sequences` must be a list of iterables. Found non-iterable: 1
mytrain shape: (25000,)

如果我使用像mytrain = np.asarray([[1, 2, 3]])（可迭代列表）这样的内容，它可以工作，但我无法理解在先前代码中x_train和mytrain之间的区别。

- Antonio Sesto

2个回答

2

我只需要将mytrain放在方括号[]中，问题就得到了解决。

# my test code
mytrain = np.ones_like(x_train)
print('mytrain type:', type(mytrain))
print('mytrain shape:', mytrain.shape)
mytrain2 = sequence.pad_sequences([mytrain], maxlen=maxlen)

- Duc Vu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Michele Tonutti · Accepted Answer

问题：

打印 x_train 时，您会得到以下输出：

[ [1, 14, 22, 16, 43, 2, 2, 2, 2, 65, 2, 2, 66, 2, 4, 2, 36, 2, 5, 25, 2, 43, 2, 2, 50, 2, 2, 9, 35, 2, 2, 5, 2, 4, 2, 2, 2, 2, 2, 2, 39, 4, 2, 2, 2, 17, 2, 38, 13, 2, 4, 2, 50, 16, 6, 2, 2, 19, 14, 22, 4, 2, 2, 2, 4, 22, 71, 2, 12, 16, 43, 2, 38, 76, 15, 13, 2, 4, 22, 17, 2, 17, 12, 16, 2, 18, 2, 5, 62, 2, 12, 8, 2, 8, 2, 5, 4, 2, 2, 16, 2, 66, 2, 33, 4, 2, 12, 16, 38, 2, 5, 25, 2, 51, 36, 2, 48, 25, 2, 33, 6, 22, 12, 2, 28, 77, 52, 5, 14, 2, 16, 2, 2, 8, 4, 2, 2, 2, 15, 2, 4, 2, 7, 2, 5, 2, 36, 71, 43, 2, 2, 26, 2, 2, 46, 7, 4, 2, 2, 13, 2, 2, 4, 2, 15, 2, 2, 32, 2, 56, 26, 2, 6, 2, 2, 18, 4, 2, 22, 21, 2, 2, 26, 2, 5, 2, 30, 2, 18, 51, 36, 28, 2, 2, 25, 2, 4, 2, 65, 16, 38, 2, 2, 12, 16, 2, 5, 16, 2, 2, 2, 32, 15, 16, 2, 19, 2, 32]
 ...,
 [1, 17, 6, 2, 2, 7, 4, 2, 22, 45, 2, 8, 2, 14, 2, 4, 2, 2, 2, 5, 2, 2, 2, 2, 2, 2, 39, 14, 2, 4, 2, 9, 2, 50, 2, 12, 47, 4, 2, 5, 2, 7, 38, 2, 2, 2, 7, 4, 2, 2, 9, 24, 6, 78, 2, 17, 2, 2, 21, 27, 2, 2, 5, 2, 2, 2, 2, 4, 2, 7, 4, 2, 42, 2, 2, 35, 2, 2, 29, 2, 27, 2, 8, 2, 12, 2, 21, 2, 2, 9, 6, 66, 78, 2, 4, 2, 2, 5, 2, 2, 2, 2, 6, 2, 8, 2, 2, 2, 2, 5, 2, 2, 2, 2, 2, 2, 2, 8, 2, 2, 2, 21, 60, 27, 2, 9, 43, 2, 2, 2, 10, 10, 12, 2, 40, 4, 2, 20, 12, 16, 5, 2, 2, 72, 7, 51, 6, 2, 22, 4, 2, 2, 9]]

每个元素都是一个列表。而mytrain是：

[1 1 1 ..., 1 1 1]

这只是一个整数列表。

解决方案:

这应该可以满足您的需求：

mytrain = []
for i in range(0,x_train.shape[0]):
    mytrain.append(np.ones(len(x_train[i])))
mytrain = np.asarray(mytrain)

确实：

('x_train type: ', <type 'numpy.ndarray'>)
('x_train shape:', (25000,))
('mytrain type:', <type 'numpy.ndarray'>)
('mytrain shape:', (25000,))