使用Python中的LSTM预测未来值

8

这段代码可以预测指定股票的值,预测时间仅限于当前日期,不能超过训练数据集中的日期。该代码来自我之前提出的一个问题,所以我对它的理解很低。我猜想解决方法可能只需要改变简单的变量,以增加额外的时间,但我不知道应该改变哪个值。

import pandas as pd
import numpy as np
import yfinance as yf
import os
import matplotlib.pyplot as plt
from IPython.display import display
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.preprocessing import MinMaxScaler
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

pd.options.mode.chained_assignment = None

# download the data
df = yf.download(tickers=['AAPL'], period='2y')

# split the data
train_data = df[['Close']].iloc[: - 200, :]
valid_data = df[['Close']].iloc[- 200:, :]

# scale the data
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit(train_data)

train_data = scaler.transform(train_data)
valid_data = scaler.transform(valid_data)

# extract the training sequences
x_train, y_train = [], []

for i in range(60, train_data.shape[0]):
    x_train.append(train_data[i - 60: i, 0])
    y_train.append(train_data[i, 0])

x_train = np.array(x_train)
y_train = np.array(y_train)

# extract the validation sequences
x_valid = []

for i in range(60, valid_data.shape[0]):
    x_valid.append(valid_data[i - 60: i, 0])

x_valid = np.array(x_valid)

# reshape the sequences
x_train = x_train.reshape(x_train.shape[0], 
x_train.shape[1], 1)
x_valid = x_valid.reshape(x_valid.shape[0], 
x_valid.shape[1], 1)

# train the model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, 
input_shape=x_train.shape[1:]))
model.add(LSTM(units=50))
model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x_train, y_train, epochs=50, batch_size=128, verbose=1)

# generate the model predictions
y_pred = model.predict(x_valid)
y_pred = scaler.inverse_transform(y_pred)
y_pred = y_pred.flatten()

# plot the model predictions
df.rename(columns={'Close': 'Actual'}, inplace=True)
df['Predicted'] = np.nan
df['Predicted'].iloc[- y_pred.shape[0]:] = y_pred
df[['Actual', 'Predicted']].plot(title='AAPL')

display(df)

plt.show()
1个回答

19
你可以训练你的模型来预测未来的序列(例如接下来的30天),而不是像目前这样只预测下一个值(即明天的值)。
为了做到这一点,你需要将输出定义为y[t: t + H](而不是当前代码中的y[t]),其中y是时间序列,H是预测期间的长度(即你希望预测的天数)。你还需要将最后一层的输出数量设置为H(而不是当前代码中的1)。
您仍可以将输入定义为y[t - T: t],其中T是回顾期的长度(或时间步数),因此模型的输入形状仍为(T, 1)。回顾期T通常比预测期H长(即T > H),并且通常设置为T = m * H的倍数,其中m > 1为整数。

enter image description here

import numpy as np
import pandas as pd
import yfinance as yf
import tensorflow as tf
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.models import Sequential
from sklearn.preprocessing import MinMaxScaler
pd.options.mode.chained_assignment = None
tf.random.set_seed(0)

# download the data
df = yf.download(tickers=['AAPL'], period='1y')
y = df['Close'].fillna(method='ffill')
y = y.values.reshape(-1, 1)

# scale the data
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = scaler.fit(y)
y = scaler.transform(y)

# generate the input and output sequences
n_lookback = 60  # length of input sequences (lookback period)
n_forecast = 30  # length of output sequences (forecast period)

X = []
Y = []

for i in range(n_lookback, len(y) - n_forecast + 1):
    X.append(y[i - n_lookback: i])
    Y.append(y[i: i + n_forecast])

X = np.array(X)
Y = np.array(Y)

# fit the model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(n_lookback, 1)))
model.add(LSTM(units=50))
model.add(Dense(n_forecast))

model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, Y, epochs=100, batch_size=32, verbose=0)

# generate the forecasts
X_ = y[- n_lookback:]  # last available input sequence
X_ = X_.reshape(1, n_lookback, 1)

Y_ = model.predict(X_).reshape(-1, 1)
Y_ = scaler.inverse_transform(Y_)

# organize the results in a data frame
df_past = df[['Close']].reset_index()
df_past.rename(columns={'index': 'Date', 'Close': 'Actual'}, inplace=True)
df_past['Date'] = pd.to_datetime(df_past['Date'])
df_past['Forecast'] = np.nan
df_past['Forecast'].iloc[-1] = df_past['Actual'].iloc[-1]

df_future = pd.DataFrame(columns=['Date', 'Actual', 'Forecast'])
df_future['Date'] = pd.date_range(start=df_past['Date'].iloc[-1] + pd.Timedelta(days=1), periods=n_forecast)
df_future['Forecast'] = Y_.flatten()
df_future['Actual'] = np.nan

results = df_past.append(df_future).set_index('Date')

# plot the results
results.plot(title='AAPL')

enter image description here

参见这个答案以了解不同的方法。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接