Python ARIMA模型中的外生变量如何在样本外进行预测

Question

Python ARIMA模型中的外生变量如何在样本外进行预测

11

我正在尝试使用Python中的statsmodels ARIMA包预测时间序列，并包括一个外生变量，但无法确定在预测步骤中插入外生变量的正确方法。请参见此处的文档。

import numpy as np
from scipy import stats
import pandas as pd

import statsmodels.api as sm

vals = np.random.rand(13)
ts = pd.TimeSeries(vals)
df = pd.DataFrame(ts, columns=["test"])
df.index = pd.Index(pd.date_range("2011/01/01", periods = len(vals), freq = 'Q'))

fit1 = sm.tsa.ARIMA(df, (1,0,0)).fit()
#this works fine:
pred1 = fit1.predict(start=12, end = 16)
print(pred1)

Out[32]: 
2014-03-31    0.589121
2014-06-30    0.747575
2014-09-30    0.631322
2014-12-31    0.654858
2015-03-31    0.650093
Freq: Q-DEC, dtype: float64

现在加入一个趋势外生变量。

exogx = np.array(range(1,14))
#to make this easy, let's look at the ols of the trend (arima(0,0,0))
fit2 = sm.tsa.ARIMA(df, (0,0,0),exog = exogx).fit()
print(fit2.params)

const    0.555226
x1       0.013132
dtype: float64

print(fit2.fittedvalues)

2011-03-31    0.568358
2011-06-30    0.581490
2011-09-30    0.594622
2011-12-31    0.607754
2012-03-31    0.620886
2012-06-30    0.634018
2012-09-30    0.647150
2012-12-31    0.660282
2013-03-31    0.673414
2013-06-30    0.686546
2013-09-30    0.699678
2013-12-31    0.712810
2014-03-31    0.725942
Freq: Q-DEC, dtype: float64

注意，正如我们所预期的那样，这是一条趋势线，每增加一个时间刻度就会增加0.013132（当然，这是随机数据，因此如果您运行它，值将不同，但正负趋势故事将相同）。因此，下一个值（对于时间=14）应该是0.555226 + 0.013132 * 14 = 0.739074。

#out of sample exog should be (14,15,16)
pred2 = fit2.predict(start = 12, end = 16, exog = np.array(range(13,17)))
print(pred2)
2014-03-31    0.725942
2014-06-30    0.568358
2014-09-30    0.581490
2014-12-31    0.594622
2015-03-31    0.765338
Freq: Q-DEC, dtype: float64

因此，2014-03-31预测（最后一个样本）是正确的，但2014-06-30从头开始（t = 1），但请注意2015-03-31（实际上是预测的最后一个观察值，无论视野如何）拾取t = 16（即，（value-intercept）/ beta =（0.765338-0.555226）/ 0.013132）。

为了使这更清晰，注意当我膨胀x mat的值时会发生什么。

fit2.predict(start = 12, end = 16, exog = np.array(range(13,17))*10000)
Out[41]: 
2014-03-31       0.725942
2014-06-30       0.568358
2014-09-30       0.581490
2014-12-31       0.594622
2015-03-31    2101.680532
Freq: Q-DEC, dtype: float64

看到2015-03-31的值爆炸了，但是其他xmat的值都没有被考虑？我在这里做错了什么？？

我已经尝试了各种传递exog变量的方法（改变维度、将exog作为矩阵、使exog与输入加上horizon一样长等），但仍未解决。如果有建议，将不胜感激。

我使用Anaconda2.1的2.7版本 numpy 1.8.1 scipy 0.14.0 pandas 0.14.0 statsmodels 0.5.0

我已经在Windows 7 64位和CentOS 64位上验证了这个问题。

另外，我正在使用ARIMA来实现ARIMA功能，上述内容仅供说明（也就是说，我不能“只使用OLS...”，因为我想象中会被建议）。由于项目的限制（以及更一般地说，在基础Spark中不支持R），我也不能“只使用R”。

以下是代码的有趣部分，您可以自己尝试：

import numpy as np
from scipy import stats
import pandas as pd
import statsmodels.api as sm

vals = np.random.rand(13)
ts = pd.TimeSeries(vals)
df = pd.DataFrame(ts, columns=["test"])
df.index = pd.Index(pd.date_range("2011/01/01", periods = len(vals), freq = 'Q'))

exogx = np.array(range(1,14))
fit2 = sm.tsa.ARIMA(df, (0,0,0),exog = exogx).fit()
print(fit2.fittedvalues)
pred2 = fit2.predict(start = 12, end = 16, exog = np.array(range(13,17))*10000)
print(pred2)

- Jim Crozier

请注意，以下文章中提到了这些问题（但没有直接讨论）：https://github.com/statsmodels/statsmodels/issues/1076http://stackoverflow.com/questions/18721547/armax-model-forecasting-leads-to-valueerror-matrices-are-not-aligned-when-pas - Jim Crozier

3个回答

2

在拟合 fit2 时，您已经提到了外生变量，因此无需重复：

exogx = np.array(range(1,5)) # I think you will need 4 exegeneous variables to perform an ARIMAX(0,0,0) since you want out of sample forecast with 4 steps ahead
fit2 = sm.tsa.ARIMA(df, (0,0,0),exog = exogx).fit()
# if you want to do an out-of-sample-forecast use fit2.forecast(steps) instead
#I would do this
pred = fit2.forecast(steps = 4)
fcst_index = pd.date_range(start = df.shift(1,'10T').index[-1]  , periods = 4, freq = '10T')
fcst_serie = pd.Series(data = pred1[0], index = fcst_index)
print fcst_serie

希望这会有所帮助！这是一篇很棒的文章。我以前从未尝试过在ARIMA模型中使用外生变量，但论文表明，无论你使用的领域如何，它并不是真正相关的（如果需要，我可以搜索论文，或者你可以自行搜索）。

- Saltigué

有人成功让它工作了吗？我也遇到了同样的问题。 - user3861925

1

如果有人正在使用.forecast()方法，这是我进行一步预测的方法。

history为训练数组
exog为外部变量数组
Y_exog_test为对应的外部变量样本。

将其更改为 .SARIMAX()，它应该可以工作

model = sm.tsa.statespace.ARIMA(history, trend='c', order=(1,1,1),seasonal_order=(0,1,0,24),exog=yexog)
model_fit = model.fit()
predicted = model_fit.forecast(step=1,exog=[[Y_exog_test]], dynamic=True)

- Niri

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jseabold · Accepted Answer

建议您发布这个问题到 GitHub问题追踪器以获取更好的支持。我已经提交了一个票务。

如果没有在那里提交票务，我可能会忘记它。这些日子很忙。

k_ar == 0的特殊情况下逻辑存在一个错误，应该被修复了。如果您能/不能尝试一下该补丁，请告诉我。如果不行，我可以做更严格的测试并合并它。

Statsmodels与Spark搭配？我很感兴趣。