为单个/多个时间步预测多行数据的LSTM。

5
我有250天的数据,72个训练样本特征和一个目标变量列。我想要对21351行的每个72个特征进行下30天的预测。我应该如何调整我的输入和输出数据的形状?似乎我有些困惑,而且库给出了关于形状不兼容的错误信息。
我之前的调整方式是:
trainX.reshape(1, len(trainX), trainX.shape[1])

trainY.reshape(1, len(trainX))

但是报错了:

数值错误:输入数组应该有和目标数组相同数量的样本。找到1个输入样本和250个目标样本。

与以下代码相同:

trainX.reshape(1, len(trainX), trainX.shape[1])

trainY.reshape(len(trainX), )

并且有相同的错误:

trainX.reshape(1, len(trainX), trainX.shape[1])

trainY.reshape(len(trainX), 1)

目前,trainX已被重新塑造为:

trainX.reshape(trainX.shape[0], 1, trainX.shape[1])

array([[[  4.49027601e+00,  -3.71848297e-01,  -3.71848297e-01, ...,
           1.06175239e+17,   1.24734085e+06,   5.16668131e+00]],

       [[  2.05921386e+00,  -3.71848297e-01,  -3.71848297e-01, ...,
           8.44426594e+17,   1.39098642e+06,   4.01803817e+00]],

       [[  9.25515792e+00,  -3.71848297e-01,  -3.71848297e-01, ...,
           4.08800518e+17,   1.24441013e+06,   3.69129399e+00]],

       ..., 
       [[  3.80037999e+00,  -3.71848297e-01,  -3.71848297e-01, ...,
           1.35414902e+18,   1.23823291e+06,   3.54601899e+00]],

       [[  3.73994822e+00,  -3.71848297e-01,   8.40698741e+00, ...,
           3.93863169e+17,   1.25693299e+06,   3.29993440e+00]],

       [[  3.56843035e+00,  -3.71848297e-01,   1.53710656e+00, ...,
           3.28306336e+17,   1.22667253e+06,   3.36569960e+00]]])

trainY 重新调整形状后为:

trainY.reshape(trainY.shape[0], )

array([[-0.7238661 ],

       [-0.43128777],

       [-0.31542821],

       [-0.35185375],

       ...,

       [-0.28319519],

       [-0.28740503],

       [-0.24209411],

       [-0.3202021 ]])

并将testX重塑为:

testX.reshape(1, testX.shape[0], testX.shape[1])

array([[[ -3.71848297e-01,  -3.71848297e-01,  -3.71848297e-01, ...,
          -3.71848297e-01,   2.73982042e+06,  -3.71848297e-01],

        [ -3.71848297e-01,  -3.71848297e-01,  -3.71848297e-01, ...,
          -3.71848297e-01,   2.73982042e+06,  -3.71848297e-01],

        [ -3.71848297e-01,  -3.71848297e-01,  -3.71848297e-01, ...,
           2.00988794e+18,   1.05992636e+06,   2.49920150e+01],

       ..., 

        [ -3.71848297e-01,  -3.71848297e-01,  -3.71848297e-01, ...,
          -3.71848297e-01,  -3.71848297e-01,  -3.71848297e-01],

        [ -3.71848297e-01,  -3.71848297e-01,  -3.71848297e-01, ...,
          -3.71848297e-01,  -3.71848297e-01,  -3.71848297e-01],

        [ -3.71848297e-01,  -3.71848297e-01,  -3.71848297e-01, ...,
          -3.71848297e-01,  -3.71848297e-01,  -3.71848297e-01]]])

错误信息如下:

ValueError: 检查时出错: 预期 lstm_25_input 的形状为 (None, 1, 72),但得到的数组形状为 (1, 2895067, 72)

编辑 1:

以下是我的模型代码:

trainX = trainX.reshape(trainX.shape[0], 1, trainX.shape[1])
trainY = trainY.reshape(trainY.shape[0], )
testX = testX.reshape(1, testX.shape[0], testX.shape[1])

model = Sequential()

model.add(LSTM(100, return_sequences=True, input_shape = trainX.shape[0], trainX.shape[2])))
model.add(LSTM(100))
model.add(Dense(1, activation='linear'))

model.compile(loss='mse', optimizer='adam')

model.fit(trainX, trainY, epochs=500, shuffle=False, verbose=1)

model.save('model_lstm.h5')

model = load_model('model_lstm.h5')

prediction = model.predict(testX, verbose=0)


ValueError Traceback (most recent call last) in () 43 model.compile(loss='mse', optimizer='adam') 44 ---> 45 model.fit(exog, endog, epochs=50, shuffle=False, verbose=1) 46 47 start_date = endog_end + timedelta(days = 1)

D:\AnacondaIDE\lib\site-packages\keras\models.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs) 865 class_weight=class_weight, 866 sample_weight=sample_weight, --> 867 initial_epoch=initial_epoch) 868 869 def evaluate(self, x, y, batch_size=32, verbose=1,

D:\AnacondaIDE\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs) 1520
class_weight=class_weight, 1521 check_batch_axis=False, -> 1522 batch_size=batch_size) 1523 # Prepare validation data. 1524 do_validation = False

D:\AnacondaIDE\lib\site-packages\keras\engine\training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_batch_axis, batch_size) 1376
self._feed_input_shapes, 1377
check_batch_axis=False, -> 1378 exception_prefix='input') 1379 y = _standardize_input_data(y, self._feed_output_names,
1380 output_shapes,

D:\AnacondaIDE\lib\site-packages\keras\engine\training.py in _standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix) 142 ' to have shape ' + str(shapes[i]) + 143 ' but got array with shape ' + --> 144 str(array.shape)) 145 return arrays 146

ValueError: Error when checking input: expected lstm_31_input to have shape (None, 250, 72) but got array with shape (21351, 1, 72)

编辑2:

在尝试了@Paddy更新的解决方案后,调用predict()时出现了以下错误:

ValueError Traceback (most recent call last) in () 1 model = load_model('model_lstm.h5') 2 ----> 3 prediction = model.predict(exog_test, verbose=0) 4 # for x in range(0, len(exog_test)): D:\AnacondaIDE\lib\site-packages\keras\models.py in predict(self, x, batch_size, verbose) 911 if not self.built: 912 self.build() --> 913 return self.model.predict(x, batch_size=batch_size, verbose=verbose) 914
D:\AnacondaIDE\lib\site-packages\keras\engine\training.py in predict(self, x, batch_size, verbose, steps) 1693 x = _standardize_input_data(x, self._feed_input_names, 1694 self._feed_input_shapes, -> 1695 check_batch_axis=False) 1696 if self.stateful: 1697 if x[0].shape[0] > batch_size and x[0].shape[0] % batch_size != 0:
D:\AnacondaIDE\lib\site-packages\keras\engine\training.py in _standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix) 130 ' to have ' + str(len(shapes[i])) + 131 ' dimensions, but got array with shape ' + --> 132 str(array.shape)) 133 for j, (dim, ref_dim) in enumerate(zip(array.shape, shapes[i])): 134 if not j and not check_batch_axis:
ValueError: 检查错误:预期 lstm_64_input 具有 3 个维度,但得到的数组形状为 (2895067, 72)。

我已经成功地训练了模型。但是在调用predict()时它给了我一个错误。 - Fawad Khalil
您将错误的维度传递给了testX函数。 - DJK
@djk47463,你可以提供一下testX重塑的代码吗?这是我现在遇到的困难。 - Fawad Khalil
主要关注的是根据需求获得所需的输出。 - Fawad Khalil
我需要在循环中放置91,像这样:exog2_sep16 = np.concatenate([exog_sep16[x : x + 92,:].reshape(1, 92, exog_sep16.shape[1]) for x in range(exog_sep16.shape[0]-91)]). 这解决了问题。请检查我是否做对了。 - Fawad Khalil
显示剩余2条评论
2个回答

2

您有:

trainX = trainX.reshape(trainX.shape[0], 1, trainX.shape[1])
trainY = trainY.reshape(trainY.shape[0], )
testX = testX.reshape(1, testX.shape[0], testX.shape[1])

您希望:

trainX = trainX.reshape(trainX.shape[0], 1, trainX.shape[1])
trainY = trainY.reshape(trainY.shape[0], )
testX = testX.reshape(testX.shape[0],1, testX.shape[1])

您在testX中混淆了样本和时间步维度。


是的,我已经尝试过了,它解决了我的问题。我只是在获取testX的维度时弄错了。谢谢。 - Fawad Khalil

1
尝试使用这个reshape:
trainX.reshape(len(trainX),1, trainX.shape[1])

trainY.reshape(len(trainX), 1)

但是,一般来说你有两种方法,要么重新塑造输入数据,要么更改模型参数。
请注意错误信息,它已经说明了一切!
好的,这里是你代码的更新:
trainX = trainX.reshape(trainX.shape[0], trainX.shape[1],1)
trainY = trainY.reshape(trainY.shape[0],)
testX = testX.reshape(testX.shape[0], testX.shape[1], 1)

model = Sequential()

model.add(LSTM(100, return_sequences= True, input_shape=(trainX.shape[1],1) ))
model.add(LSTM(100, return_sequences= False))
model.add(Dense(1, activation='linear'))

model.compile(loss='mse', optimizer='adam')

model.fit(trainX, trainY, epochs=500, shuffle=False, verbose=1)

model.save('model_lstm.h5')

model = load_model('model_lstm.h5')

prediction = model.predict(testX, verbose=0)

是的,我已经按照你说的进行了重塑。请查看trainX、trainY、testX的最后三个重塑代码片段。然后它就与testX的形状不兼容了(如错误消息所述)。请注意,我想为testX中的每个记录预测多个步骤(比如30天)。 - Fawad Khalil
还有,为什么你把1移到了reshape()的最后一个参数?这会将每个特征分开成它们自己的数组。 - Fawad Khalil
请再次检查我的答案 :) - Vadim
关于1,通常最小的数字会放在右边,如果您想了解更多关于如何为LSTM准备形状的信息,可以查看一下。 - Vadim
不,我不这么认为。因为它会将每行的列拆分为单个元素的行。Wesley的答案解决了这个问题。再次感谢你的时间 :) - Fawad Khalil
模型计算均方误差。我们在model.compile中做哪些更改可以计算多个损失(如mse、mae、rmse)? - Asif Khan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接