使用Prophet包在R中按组对数据框进行预测

12

我正在使用Facebook发布的一个名为Prophet的新软件包。它可以进行时间序列预测,我希望能够对其应用By Group功能。

请往下滚动至R部分。

https://facebookincubator.github.io/prophet/docs/quick_start.html

这是我的尝试:

grouped_output = df %>% group_by(group) %>%
  do(m = prophet(df[,c(1,3)])) %>%
  do(future = make_future_dataframe(m, period = 7)) %>%
  do(forecast = prophet:::predict.prophet(m, future))

grouped_output[[1]]

我现在需要从每个组的列表中提取结果,但我遇到了麻烦。

以下是没有分组的原始数据框:

ds <- as.Date(c('2016-11-01','2016-11-02','2016-11-03','2016-11-04',
                   '2016-11-05','2016-11-06','2016-11-07','2016-11-08',
                   '2016-11-09','2016-11-10','2016-11-11','2016-11-12',
                   '2016-11-13','2016-11-14','2016-11-15','2016-11-16',
                   '2016-11-17','2016-11-18','2016-11-19','2016-11-20',
                   '2016-11-21','2016-11-22','2016-11-23','2016-11-24',
                   '2016-11-25','2016-11-26','2016-11-27','2016-11-28',
                   '2016-11-29','2016-11-30'))
y <- c(15,17,18,19,20,54,67,23,12,34,12,78,34,12,3,45,67,89,12,111,123,112,14,566,345,123,567,56,87,90)
y<-as.numeric(y)
df <- data.frame(ds, y)

df

           ds   y
1  2016-11-01  15
2  2016-11-02  17
3  2016-11-03  18
4  2016-11-04  19
5  2016-11-05  20
6  2016-11-06  54
7  2016-11-07  67
8  2016-11-08  23
9  2016-11-09  12
10 2016-11-10  34
11 2016-11-11  12
12 2016-11-12  78
13 2016-11-13  34
14 2016-11-14  12
15 2016-11-15   3
16 2016-11-16  45
17 2016-11-17  67
18 2016-11-18  89
19 2016-11-19  12
20 2016-11-20 111
21 2016-11-21 123
22 2016-11-22 112
23 2016-11-23  14
24 2016-11-24 566
25 2016-11-25 345
26 2016-11-26 123
27 2016-11-27 567
28 2016-11-28  56
29 2016-11-29  87
30 2016-11-30  90

当我对单个组执行以下操作时,当前函数可以正常工作:

#install.packages('prophet')
library(prophet)
m<-prophet(df)
future <- make_future_dataframe(m, period = 7)
forecast <- prophet:::predict.prophet(m, future)

forecast$yhat
 [1]  -2.649032 -29.762095 128.169781  59.573684 -11.623727 107.473617 -29.949730 -42.862455 -62.378408 104.797639  46.868610
[12] -12.502864 119.282058  -4.914921  -4.402638 -10.643570 169.309505 123.321261  74.734746 215.856347  99.290218 105.508059
[23] 102.882915 284.245984 237.401258 185.688202 321.466962 197.451536 194.280518 180.535663 349.304365 288.684031 222.337210
[34] 342.968499 203.648851 185.377165

现在我想改变这个,使其对每个组应用prophet:::predict函数。因此,按组分的新数据框如下所示:

ds <- as.Date(c('2016-11-01','2016-11-02','2016-11-03','2016-11-04',
            '2016-11-05','2016-11-06','2016-11-07','2016-11-08',
            '2016-11-09','2016-11-10','2016-11-11','2016-11-12',
            '2016-11-13','2016-11-14','2016-11-15','2016-11-16',
            '2016-11-17','2016-11-18','2016-11-19','2016-11-20',
            '2016-11-21','2016-11-22','2016-11-23','2016-11-24',
            '2016-11-25','2016-11-26','2016-11-27','2016-11-28',
            '2016-11-29','2016-11-30',


            '2016-11-01','2016-11-02','2016-11-03','2016-11-04',
            '2016-11-05','2016-11-06','2016-11-07','2016-11-08',
            '2016-11-09','2016-11-10','2016-11-11','2016-11-12',
            '2016-11-13','2016-11-14','2016-11-15','2016-11-16',
            '2016-11-17','2016-11-18','2016-11-19','2016-11-20',
            '2016-11-21','2016-11-22','2016-11-23','2016-11-24',
            '2016-11-25','2016-11-26','2016-11-27','2016-11-28',
            '2016-11-29','2016-11-30'))
y <- c(15,17,18,19,20,54,67,23,12,34,12,78,34,12,3,45,67,89,12,111,123,112,14,566,345,123,567,56,87,90,
   45,23,12,10,21,34,12,45,12,44,87,45,32,67,1,57,87,99,33,234,456,123,89,333,411,232,455,55,90,21)
y<-as.numeric(y)

group<-c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
     "A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
     "B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
     "B","B","B","B","B","B","B","B","B","B","B","B","B","B","B")
df <- data.frame(ds,group, y)

df

           ds group   y
1  2016-11-01     A  15
2  2016-11-02     A  17
3  2016-11-03     A  18
4  2016-11-04     A  19
5  2016-11-05     A  20
6  2016-11-06     A  54
7  2016-11-07     A  67
8  2016-11-08     A  23
9  2016-11-09     A  12
10 2016-11-10     A  34
11 2016-11-11     A  12
12 2016-11-12     A  78
13 2016-11-13     A  34
14 2016-11-14     A  12
15 2016-11-15     A   3
16 2016-11-16     A  45
17 2016-11-17     A  67
18 2016-11-18     A  89
19 2016-11-19     A  12
20 2016-11-20     A 111
21 2016-11-21     A 123
22 2016-11-22     A 112
23 2016-11-23     A  14
24 2016-11-24     A 566
25 2016-11-25     A 345
26 2016-11-26     A 123
27 2016-11-27     A 567
28 2016-11-28     A  56
29 2016-11-29     A  87
30 2016-11-30     A  90
31 2016-11-01     B  45
32 2016-11-02     B  23
33 2016-11-03     B  12
34 2016-11-04     B  10
35 2016-11-05     B  21
36 2016-11-06     B  34
37 2016-11-07     B  12
38 2016-11-08     B  45
39 2016-11-09     B  12
40 2016-11-10     B  44
41 2016-11-11     B  87
42 2016-11-12     B  45
43 2016-11-13     B  32
44 2016-11-14     B  67
45 2016-11-15     B   1
46 2016-11-16     B  57
47 2016-11-17     B  87
48 2016-11-18     B  99
49 2016-11-19     B  33
50 2016-11-20     B 234
51 2016-11-21     B 456
52 2016-11-22     B 123
53 2016-11-23     B  89
54 2016-11-24     B 333
55 2016-11-25     B 411
56 2016-11-26     B 232
57 2016-11-27     B 455
58 2016-11-28     B  55
59 2016-11-29     B  90
60 2016-11-30     B  21

如何使用 prophet 包按组预测 y-hat 而不是总体?

2个回答

19

这里是一种解决方法,使用 tidyr::nest 按组嵌套数据,使用 purrr::map 在这些组中拟合模型,然后按要求检索 y-hat。 我采用了您的代码,但将其并入了使用 purrr::map 计算新列的 mutate 调用中。

library(prophet)
library(dplyr)
library(purrr)
library(tidyr)

d1 <- df %>% 
  nest(-group) %>% 
  mutate(m = map(data, prophet)) %>% 
  mutate(future = map(m, make_future_dataframe, period = 7)) %>% 
  mutate(forecast = map2(m, future, predict))

此时的输出如下:

d1
# A tibble: 2 × 5
   group              data          m                future
  <fctr>            <list>     <list>                <list>
1      A <tibble [30 × 2]> <S3: list> <data.frame [36 × 1]>
2      B <tibble [30 × 2]> <S3: list> <data.frame [36 × 1]>
# ... with 1 more variables: forecast <list>

然后我使用 unnest() 函数从 forecast 列中检索数据,并按要求选择 y-hat 值。

d <- d1 %>% 
  unnest(forecast) %>% 
  select(ds, group, yhat)

这是新预测值的输出结果:

d %>% group_by(group) %>% 
  top_n(7, ds)
Source: local data frame [14 x 3]
Groups: group [2]

           ds  group      yhat
       <date> <fctr>     <dbl>
1  2016-11-30      A 180.53422
2  2016-12-01      A 349.30277
3  2016-12-02      A 288.68215
4  2016-12-03      A 222.33501
5  2016-12-04      A 342.96654
6  2016-12-05      A 203.64625
7  2016-12-06      A 185.37395
8  2016-11-30      B 131.07827
9  2016-12-01      B 222.83703
10 2016-12-02      B 236.33555
11 2016-12-03      B 145.41001
12 2016-12-04      B 228.59687
13 2016-12-05      B 162.49244
14 2016-12-06      B  68.44477

我不确定我应该使用 map(m, ~predict(.x, future)) 还是 map2(m, future, ~predict(.x, .y))?它们似乎在这里给出了相同的输出。 - FlorianGD
我应该使用 map2,但由于我的会话中有一个名为 future 的变量,所以得到了相同的结果。 - FlorianGD
这个很好用,谢谢。不过有一件事情对我没起作用,我不得不做出改变,以便其他人看到这个问题,就是predict对我没用,我用prophet:::predict.prophet代替了它。 - nak5120
如果有NA,你该如何处理? - nak5120
当我使用完整数据集进行操作时,实际上会出现错误,错误信息为:Error in FUN(X[[i]], ...) : Stan does not support NA (in y) in data failed to preprocess the data; optimization not done Error: 'data' must be of a vector type, was 'NULL' - nak5120
显示剩余3条评论

9
我正在寻找一个与同样问题相关的解决方案。我想出了以下代码,比被采纳的答案要简单一些。
library(tidyr)
library(dplyr)
library(prophet)

data = df %>%  
       group_by(group) %>%
       do(predict(prophet(.), make_future_dataframe(prophet(.), periods = 7))) %>%
       select(ds, group, yhat)

以下是预测值:

data %>% group_by(group) %>% 
         top_n(7, ds)

# A tibble: 14 x 3
# Groups:   group [2]
           ds  group     yhat
       <date> <fctr>    <dbl>
 1 2016-12-01      A 316.9709
 2 2016-12-02      A 258.2153
 3 2016-12-03      A 196.6835
 4 2016-12-04      A 346.2338
 5 2016-12-05      A 208.9083
 6 2016-12-06      A 216.5847
 7 2016-12-07      A 206.3642
 8 2016-12-01      B 230.0424
 9 2016-12-02      B 268.5359
10 2016-12-03      B 190.2903
11 2016-12-04      B 312.9019
12 2016-12-05      B 266.5584
13 2016-12-06      B 189.3556
14 2016-12-07      B 168.9791

在编程中,简短并不一定意味着“更简单”。虽然这种方法更为简洁,但我发现比起嵌套和变异的方法,跟随“do()”步骤要困难得多。 - Ricky
@Ricky 我明白你的想法。不过,我只是想提供一种我曾经学到的另一种选择。这将为来到这里的人提供另一种选择。 - Eduardo
这很棒,@Eduardo。你知道如何从未来的数据框中排除周末吗?这是我的问题参考链接,如果你更愿意回答一个新问题:https://dev59.com/nG0NtIcB2Jgan1znMaNh - nak5120

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接