如何使用Prophet的make_future_dataframe函数来处理多个回归器?

4

make_future_dataframe似乎只会生成带有日期(ds)值的数据框,这导致在使用下面的代码时出现了ValueError: Regressor 'var' missing from dataframe when attempting to generate forecasts

m = Prophet()
m.add_country_holidays(country_name='US')
m.add_regressor('var')
m.fit(df)
forecasts = m.predict(m.make_future_dataframe(periods=7))

在查看Python文档时,似乎没有提到如何使用Prophet解决这个问题。我的唯一选择是编写额外的代码来延迟所有回归器的时间,并生成所需预测期间的预测(例如,获取t-7时刻的变量以产生为期7天的每日预测)吗?

你好,我也遇到了同样的问题:( 你有找到更好的解决方法吗? - Aymen Mouelhi
这可能是因为预言家不知道在生成的每一行中将变量设置为什么值。 - Sameer Mahajan
1
我已经在 https://github.com/facebook/prophet/issues/2068 上开了一个问题。 - Sameer Mahajan
1个回答

1
问题在于future = m.make_future_dataframe方法创建了一个名为future的数据集,其中唯一的列是日期ds列。要使用具有回归器的模型进行预测,您还需要在future数据集中为每个回归器添加列。

我通过预测回归器变量的值,然后将这些值填充到future_w_regressors数据集中,该数据集是futureregression_data的合并解决了这个问题。

假设您已经准备好了训练好的模型model

# List of regressors     
regressors = ['Total Minutes','Sent Emails','Banner Active']

# My data is weekly so I project out 1 year (52 weeks), this is what I want to forecast
future = model.make_future_dataframe(52, freq='W')

此时,如果您运行model.predict(future),您将得到一直以来的错误。我们需要将回归器合并进去。我将regression_datafuture合并,以便填充过去的观察结果。正如您所看到的,向前看的观察结果是空的(表格末尾)。

# regression_data is the dataframe I used to train the model (include all covariates)
# merge the data you used to train the model 
future_w_regressors = regression_data[regressors+['ds']].merge(future, how='outer', on='ds')
future_w_regressors

Total Minutes   Sent Emails Banner Active   ds
0   7.129552    9.241493e-03    0.0 2018-01-07
1   7.157242    8.629305e-14    0.0 2018-01-14
2   7.155367    8.629305e-14    0.0 2018-01-21
3   7.164352    8.629305e-14    0.0 2018-01-28
4   7.165526    8.629305e-14    0.0 2018-02-04
... ... ... ... ...
283 NaN NaN NaN 2023-06-11
284 NaN NaN NaN 2023-06-18
285 NaN NaN NaN 2023-06-25
286 NaN NaN NaN 2023-07-02
287 NaN NaN NaN 2023-07-09

解决方案 1:预测回归器

下一步,我创建了一个仅包含空回归器值的数据集,循环遍历每个回归器,在每个回归器上训练一个简单的prophet模型,预测它们在未来日期的值,将这些值填充到空回归器数据集中,并将这些值放置到future_w_regressors数据集中。

# Get the segment for which we have no regressor values
empty_future = future_w_regressors[future_w_regressors[regressors[0]].isnull()]
only_future = empty_future[['ds']]

# Create a dictionary to hold the different independent variable forecasts 
for regressor in regressors: 
    # Prep a new training dataset
    train = regression_data[['ds',regressor]]
    train.columns = ['ds','y'] # rename the variables so they can be submitted to the prophet model

    # Train a model for this regressor 
    rmodel = Prophet()
    rmodel.weekly_seasonality = False # this is specific to my case
    rmodel.fit(train)
    regressor_predictions = rmodel.predict(only_future)

    # Replace the empty values in the empty dataset with the predicted values from the regressor model 
    empty_future[regressor] = regressor_predictions['yhat'].values
    
# Fill in the values for all regressors in the future_w_regressors dataset 
future_w_regressors.loc[future_w_regressors[regressors[0]].isnull(), regressors] = empty_future[regressors].values

现在 future_w_regressors 表中不再有缺失值。

future_w_regressors

Total Minutes   Sent Emails Banner Active   ds
0   7.129552    9.241493e-03    0.000000    2018-01-07
1   7.157242    8.629305e-14    0.000000    2018-01-14
2   7.155367    8.629305e-14    0.000000    2018-01-21
3   7.164352    8.629305e-14    0.000000    2018-01-28
4   7.165526    8.629305e-14    0.000000    2018-02-04
... ... ... ... ...
283 7.161023    -1.114906e-02   0.548577    2023-06-11
284 7.156832    -1.138025e-02   0.404318    2023-06-18
285 7.150829    -5.642398e-03   0.465311    2023-06-25
286 7.146200    -2.989316e-04   0.699624    2023-07-02
287 7.145258    1.568782e-03    0.962070    2023-07-09

我可以运行预测命令来获取我的预测结果,现在可以延伸到2023年(原始数据截至2022年):

model.predict(future_w_regressors)

    ds  trend   yhat_lower  yhat_upper  trend_lower trend_upper Banner Active   Banner Active_lower Banner Active_upper Sent Emails Sent Emails_lower   Sent Emails_upper   Total Minutes   Total Minutes_lower Total Minutes_upper additive_terms  additive_terms_lower    additive_terms_upper    extra_regressors_additive   extra_regressors_additive_lower extra_regressors_additive_upper yearly  yearly_lower    yearly_upper    multiplicative_terms    multiplicative_terms_lower  multiplicative_terms_upper  yhat
0   2018-01-07  2.118724    2.159304    2.373065    2.118724    2.118724    0.000000    0.000000    0.000000    3.681765e-04    3.681765e-04    3.681765e-04    0.076736    0.076736    0.076736    0.152302    0.152302    0.152302    0.077104    0.077104    0.077104    0.075198    0.075198    0.075198    0.0 0.0 0.0 2.271026
1   2018-01-14  2.119545    2.109899    2.327498    2.119545    2.119545    0.000000    0.000000    0.000000    3.437872e-15    3.437872e-15    3.437872e-15    0.077034    0.077034    0.077034    0.098945    0.098945    0.098945    0.077034    0.077034    0.077034    0.021911    0.021911    0.021911    0.0 0.0 0.0 2.218490
2   2018-01-21  2.120366    2.074524    2.293829    2.120366    2.120366    0.000000    0.000000    0.000000    3.437872e-15    3.437872e-15    3.437872e-15    0.077014    0.077014    0.077014    0.064139    0.064139    0.064139    0.077014    0.077014    0.077014    -0.012874   -0.012874   -0.012874   0.0 0.0 0.0 2.184506
3   2018-01-28  2.121187    2.069461    2.279815    2.121187    2.121187    0.000000    0.000000    0.000000    3.437872e-15    3.437872e-15    3.437872e-15    0.077110    0.077110    0.077110    0.050180    0.050180    0.050180    0.077110    0.077110    0.077110    -0.026931   -0.026931   -0.026931   0.0 0.0 0.0 2.171367
4   2018-02-04  2.122009    2.063122    2.271638    2.122009    2.122009    0.000000    0.000000    0.000000    3.437872e-15    3.437872e-15    3.437872e-15    0.077123    0.077123    0.077123    0.046624    0.046624    0.046624    0.077123    0.077123    0.077123    -0.030498   -0.030498   -0.030498   0.0 0.0 0.0 2.168633
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
283 2023-06-11  2.062645    2.022276    2.238241    2.045284    2.078576    0.025237    0.025237    0.025237    -4.441732e-04   -4.441732e-04   -4.441732e-04   0.077074    0.077074    0.077074    0.070976    0.070976    0.070976    0.101867    0.101867    0.101867    -0.030891   -0.030891   -0.030891   0.0 0.0 0.0 2.133621
284 2023-06-18  2.061211    1.975744    2.199376    2.043279    2.077973    0.018600    0.018600    0.018600    -4.533835e-04   -4.533835e-04   -4.533835e-04   0.077029    0.077029    0.077029    0.025293    0.025293    0.025293    0.095176    0.095176    0.095176    -0.069883   -0.069883   -0.069883   0.0 0.0 0.0 2.086504
285 2023-06-25  2.059778    1.951075    2.162531    2.041192    2.077091    0.021406    0.021406    0.021406    -2.247903e-04   -2.247903e-04   -2.247903e-04   0.076965    0.076965    0.076965    0.002630    0.002630    0.002630    0.098146    0.098146    0.098146    -0.095516   -0.095516   -0.095516   0.0 0.0 0.0 2.062408
286 2023-07-02  2.058344    1.953027    2.177666    2.039228    2.076373    0.032185    0.032185    0.032185    -1.190929e-05   -1.190929e-05   -1.190929e-05   0.076915    0.076915    0.076915    0.006746    0.006746    0.006746    0.109088    0.109088    0.109088    -0.102342   -0.102342   -0.102342   0.0 0.0 0.0 2.065090
287 2023-07-09  2.056911    1.987989    2.206830    2.037272    2.075110    0.044259    0.044259    0.044259    6.249949e-05    6.249949e-05    6.249949e-05    0.076905    0.076905    0.076905    0.039813    0.039813    0.039813    0.121226    0.121226    0.121226    -0.081414   -0.081414   -0.081414   0.0 0.0 0.0 2.096724
288 rows × 28 columns

请注意,我按照每个回归器的朴素方式对模型进行了训练。然而,如果您希望优化针对这些独立变量的预测,则可以进行优化。
解决方案2:使用去年的回归器值
或者,您可以选择不想将回归器预测的不确定性复合到主要预测中,只想知道在不同回归器值下预测可能会如何变化。在这种情况下,您可能只需将去年的回归器值复制到缺失的 future_w_regressors 数据集中。这样做的额外好处是轻松模拟相对于当前回归器水平的降低或增加。
from datetime import timedelta

last_date = regression_data.iloc[-1]['ds']
one_year_ago = last_date - timedelta(days=365) # works with data at any scale

last_year_of_regressors = regression_data.loc[regression_data['ds']>one_year_ago, regressors]

# If you want to simulate a 10% drop in levels compared to this year 
last_year_of_regressors = last_year_of_regressors * 0.9    
    
future_w_regressors.loc[future_w_regressors[regressors[0]].isnull(), regressors] = last_year_of_regressors.iloc[:len(future_w_regressors[future_w_regressors[regressors[0]].isnull()])].values

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接