预测模型提前一天预测 - 滑动窗口

Question

预测模型提前一天预测 - 滑动窗口

rmachine-learningstatisticspredictionsliding-window

6

我遇到了一个问题。我正在使用SparkR进行时间序列预测，但是这个场景也可以转移到普通的R环境中。除了使用ARIMA模型外，我还想使用回归模型（如随机森林回归）来预测未来一天的负载情况。我也了解到了滑动窗口方法来评估不同参数组合下不同回归器的性能。为了更好地理解，以下是我的数据集结构示例：

Timestamp              UsageCPU     UsageMemory   Indicator  Delay
2014-01-03 21:50:00    3123            1231          1        123
2014-01-03 22:00:00    5123            2355          1        322
2014-01-03 22:10:00    3121            1233          2        321
2014-01-03 22:20:00    2111            1234          2        211
2014-01-03 22:30:00    1000            2222          2         0 
2014-01-03 22:40:00    4754            1599          1         0

使用任何类型的回归器，下一步是提取特征并将其转换为可读格式，因为这些回归不能读取时间戳：

Year   Month  Day  Hour    Minute    UsageCPU   UsageMemory  Indicator Delay
2014   1      3    21       50        3123        1231          1      123
2014   1      3    22       00        5123        2355          1      322
2014   1      3    22       10        3121        1233          2      321
2114   1      3    22       20        2111        1234          2      211

下一步是为模型创建训练集和测试集。

trainTest <-randomSplit(SparkDF,c(0.7,0.3), seed=42)
train <- trainTest[[1]]
test <- trainTest[[2]]

然后可以创建模型+预测（首先随机森林的设置不相关）：

model <- spark.randomForest(train, UsageCPU ~ ., type = "regression", maxDepth = 5, maxBins = 16)
predictions <- predict(model, test)

我知道所有这些步骤，并通过将预测数据与实际数据绘制出来，看起来效果很好。但是这个回归模型不是动态的，这意味着我无法预测未来一天的情况。因为像UsageCPU、UsageMemory等特征不存在，所以我想从历史值预测到下一天。正如在开头提到的，滑动窗口方法可以在这里起作用，但我不确定如何应用它（在整个数据集上，仅在训练或测试集上）。

这个实现来自shabbychef's和mbq：

 slideMean<-function(x,windowsize=3,slide=2){
 idx1<-seq(1,length(x),by=slide);
 idx1+windowsize->idx2;
 idx2[idx2>(length(x)+1)]<-length(x)+1;
 c(0,cumsum(x))->cx;
 return((cx[idx2]-cx[idx1])/windowsize);
}

最后一个问题涉及窗口大小。我想预测下一天的小时数（00、01、02、03...），但时间戳的间隔为10分钟，所以在我的计算中，窗口的大小应该是144（10*60*24 / 10）。

如果有人能帮忙就太好了。谢谢！

- Daniel

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- smile · Accepted Answer

我也曾在使用神经网络进行时间序列预测时遇到过同样的问题。我尝试了很多模型，最好的一个是滑动窗口结合神经网络的方法。

我还向该领域的其他研究人员进行了确认。通过这个，我们得出结论：如果你想要用单步训练来预测1天（24个时间点）后的结果，那么这对系统的要求将会非常高。我们采取了以下措施：

1. We had a sliding window of 24 hours. e.g lets use [1,2,3] here
2. Then use ML model to predict the [4]. Meaning use value 4 as target. 
# As illustration we had 
x = [1,2,3] 
# then set target as 
y=[4]. 
# We had a function that returns the x=[1,2,3] and y =[4] and
# shift the window in the next training step. 
3.To the:
x =[1,2,3] 
we can add further features that are important to the model. 
x=[1,2,3,feature_x]

4. Then we minimise error and shift the window to have:
 x = [2,3,4,feature_x] and y = [5]. 
5. You could also predict two values ahead. e.g [4,5] .
6. Use a list to collect output and plot
7. Make prediction after the training.