SARIMAX模型结合外生变量的样本外预测。

4
我正在使用SARIMAX进行时间序列分析,但一直困扰不已。
我认为我已经成功拟合了模型并用它进行了预测,但是我不知道如何使用外生数据进行样本外预测。
可能整个过程都做错了,因此我在下面列出了我的步骤和一些示例数据。
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pandas import datetime
import statsmodels.api as sm

# Defining Sample data
df = pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
                         '2019-01-04','2019-01-05','2019-01-06',
                         '2019-01-07','2019-01-08','2019-01-09',
                         '2019-01-10','2019-01-11','2019-01-12'],
                  'price':[78,60,62,64,66,68,70,72,74,76,78,80],
                 'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
                })
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])

df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)

# Splitting Data into test and training sets manually
train = df.loc['2019-01-01':'2019-01-09']
test = df.loc['2019-01-10':'2019-01-12']

# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')

# Defining and fitting the model with training data for endogenous and exogenous data

model=sm.tsa.statespace.SARIMAX(train['price'],
                                order=(0, 0, 0),
                                seasonal_order=(0, 0, 0,12), 
                                exog=train.iloc[:,1:],
                                time_varying_regression=True,
                                mle_regression=False)
model_1= model.fit(disp=False)

# Defining exogenous data for testing 
exog_test=test.iloc[:,1:]

# Forecasting out of sample data with exogenous data
forecast = model_1.forecast(3, exog=exog_test)

那么我的问题实际上是与最后一行有关的,如果我想要超过3个步骤,我该怎么做呢?

1个回答

1
我会尝试回答这个问题,因为它主要涉及到关于statsmodels包的数据和文档类型。
根据文档,'steps'是一个整数,表示从样本末尾开始预测的步数。这也意味着,如果你想获得超过三个步骤的预测结果,你需要提供额外的数组数据用于训练和测试数据(注意 - 两者都需要)。 (https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html) (https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.forecast.html)
当我将步长增加一时,出现了两个错误: ValueError: 无法将大小为3的数组重塑为形状为(4,1)的数组 提供的外生值的形状不正确。需要(4,1),但得到的是(3,1)。
ValueError: 外生变量的行数与您要求预测的时间段数不匹配。
简而言之,扩展测试集可以很好地为您提供额外的预测。以下是可行的代码和工作笔记链接:

https://colab.research.google.com/drive/1o9KXAe61EKH6bDI-FJO3qXzlWjz9IHHw?usp=sharing

import pandas as pd
import numpy as np
# from sklearn.model_selection import train_test_split 
# why import this if you want to do tran/test manually? 
from pandas import datetime

# Defining Sample data
df=pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
                         '2019-01-04','2019-01-05','2019-01-06',
                         '2019-01-07','2019-01-08','2019-01-09',
                         '2019-01-10','2019-01-11','2019-01-12'],
                  'price':[78,60,62,64,66,68,70,72,74,76,78,80],
                 'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
                })
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])

df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)

# Splitting Data into test and training sets manually
train=df.loc['2019-01-01':'2019-01-09']
# I made a change here #CHANGED 10 to 09 so one more month got added
# that means my input array is now 4,1 (if you add a column array is - ) 
# (4,2) 
# I can give any step from -4,0,4 (integral)

test=df.loc['2019-01-09':'2019-01-12']

# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')

# Defining and fitting the model with training data for endogenous and exogenous data
import statsmodels.api as sm

model=sm.tsa.statespace.SARIMAX(train['price'],
                                order=(0, 0, 0),
                                seasonal_order=(0, 0, 0,12), 
                                exog=train.iloc[:,1:],
                                time_varying_regression=True,
                                mle_regression=False)
model_1= model.fit(disp=False)

# Defining exogenous data for testing 
exog_test=test.iloc[:,1:]

# Forcasting out of sample data with exogenous data
forecast = model_1.forecast(4, exog=exog_test)
 

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接