LSTM神经网络的输入和输出数据应该采用哪种形状？

Question

LSTM神经网络的输入和输出数据应该采用哪种形状？

pythonmultidimensional-arrayneural-networkkerasrecurrent-neural-network

3

我有一个问题是如何为带有LSTM层的NN创造我的数据。我有许多文件，每个文件包含数百行。 每个文件表示一首歌曲，每行代表四个值的音符。我想要NN使用10个音符的序列来读取音符，以便它可以从中预测下一个音符。如果需要，我们可以将每首歌曲的音符数量固定为5000。因此，我只想知道我的输入和输出数据应该有什么样的形状以及如何定义第一个LSTM层。

model = Sequential()
model.add(LSTM(32, input_shape=(5000, 4),return_sequences=True))

总之：

一个文件有5000行和4列，代表一首歌曲。
文件中的一行表示一个带有4个值的音符。

感谢您的帮助。

- Juan

请问您能否提供一份输入数据的样本？ - grovina

输入数据有数千行，因此我无法在此提供。有4列，每个值都是0到1之间的数字。 - Juan

2个回答

0

我希望神经网络能够按照10个音符的顺序读取音符，以便它可以从中预测下一个音符。

我从未使用过keras，但我认为您应该首先将这些音符转换为id。例如：(aa, bb, cc, dd) 作为1，(ab, bb, cc, dd) 作为2等等。

然后，您可以为编码器读取10个id/音符，然后添加投影以将最终状态投影到第11个音符上。如果您想用歌曲中的任何音符的10个音符来测试模型，那么您就需要将第二个音符训练到第11个音符，并在投影后将第12个音符作为目标。以此类推，直到最后一个音符成为目标。这是针对一首歌曲的，重复这个过程直到所有歌曲都完成。

您可以通过id完全获取音符。您可以构建一个词汇表来进行转换。

- Lerner Zhang

谢谢您的帮助。我会尝试这样做。所以想象一下x，它是一首歌曲的音符数组。然后我会将其用作输入数据。但是我的输出数据应该如何构建？应该与前十个音符不同吗？ - Juan

这取决于你的任务。如果你的问题是指仅在给定前十个音符的情况下生成下一个音符，那么我建议使用我的答案中的方法。你能详细说明一下你的问题吗？ - Lerner Zhang

是的，我希望神经网络从10个音符中预测一个音符。但在进行机器学习之前，我需要定义输入和输出的数组。我现在知道应该在输入中放什么，但如何填充输出呢？在第一步中，神经网络将取出前10个音符，并对第11个音符进行预测，并将其与实际的第11个音符进行比较，这应该可以在输出的第一行中找到。但接下来的输出部分怎么办？ - Juan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- rvinas · Accepted Answer

第一个LSTM层的输入形状应为(None, 10, 4)。模型的输出形状将是(None, 4)。我使用None表示批量大小。

我编写了一个简单的LSTM作为示例：

import numpy as np
from keras.layers import LSTM
from keras.models import Sequential

batch_size = 32
window_length = 10
note_dim = 4
n_samples = 5000

# Input data. TODO: Slide window and modify it to use real data
x = np.ones(shape=(n_samples, window_length, note_dim))
y = np.ones(shape=(n_samples, note_dim))

# Define model
model = Sequential()
model.add(LSTM(note_dim, input_shape=(window_length, note_dim))) # The batch dimension is implicit here

model.compile('sgd', 'mse')
model.fit(x=x, # Batch input shape is: (None, window_length, note_dim)
          y=y, # Batch output shape is: (None, note_dim)
          batch_size=batch_size)

如果您需要更复杂的模型（例如2个LSTM层），可以按照以下方式定义：

# ...
# Define model
hidden_size = 50
model = Sequential()
model.add(LSTM(hidden_size, input_shape=(window_length, note_dim), return_sequences=True)) # The batch dimension is implicit here
model.add(LSTM(note_dim))
# ...

更新：回答您的第一个评论。

x 应该包含在它们上滑动窗口后的所有歌曲。例如，假设您有一个形状为 (n_songs, notes_per_song, note_dim) 的变量 songs 包含了所有您的歌曲。然后，您可以按以下方式创建 x 和 y：

# ...
# Input data    
# Suppose that variable ´songs´ is an array with shape: (n_songs, notes_per_song, note_dim). 
samples_per_song = notes_per_song-window_length
n_samples = n_songs*samples_per_song
x = np.zeros(shape=(n_samples, window_length, note_dim))
y = np.zeros(shape=(n_samples, note_dim))
for n, song in enumerate(songs):
    for i in range(samples_per_song):
        x[i+n*samples_per_song, :, :] = song[i:(i+window_length), :]
        y[i+n*samples_per_song, :, :] = song[i+window_length, :] # note that you want to predict
# ...