从正弦噪声中预测数值

Question

从正弦噪声中预测数值

rplotpredictiongam

4

背景

使用R语言预测时间序列中的下一个数值。

问题

以下代码生成并绘制了一条带有均匀噪声的曲线模型：

slope = 0.55
offset = -0.5
amplitude = 0.22
frequency = 3
noise = 0.75
x <- seq( 0, 200 )
y <- offset + (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
yn <- y + (noise * runif( length( x ) ))

gam.object <- gam( yn ~ s( x ) + 0 )
plot( gam.object, col = rgb( 1.0, 0.392, 0.0 ) )
points( x, yn, col = rgb( 0.121, 0.247, 0.506 ) )

模型显示了预期的趋势。问题在于预测后续值：

p <- predict( gam.object, data.frame( x=201:210 ) )

预测结果在绘制时看起来不正确：

df <- data.frame( fit=c( fitted( gam.object ), p ) )
plot( seq( 1:211 ), df[,], col="blue" )
points( yn, col="orange" )

预测值（从201年开始）似乎过低。

问题

所显示的预测值是否真的是最准确的预测值？
如果不是，如何提高准确性？
有更好的方法来连接两个数据集（fitted.values( gam.object )和p）吗？

- Dave Jarvis

2

从1到211预测x，不要使用拟合。 - hadley

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- fabians · Accepted Answer

模拟数据很奇怪，因为你添加到“真实”y的所有误差都大于0。（runif生成[0,1]范围内的数字，而不是[-1,1]。）
当模型中允许截距项时，问题就消失了。

例如：

gam.object2 <- gam( yn ~ s( x ))
p2 <- predict( gam.object2, data.frame( x=201:210 ))
points( 1:211, c( fitted( gam.object2 ), p2), col="green")

没有截距的模型会导致系统性低估，这是因为gam对估计光滑函数使用了零和约束。我认为第二个问题回答了你的第一个和第二个问题。

你的第三个问题需要澄清，因为gam对象不是data.frame。这两种数据类型不能混合使用。

一个更完整的例子：

slope = 0.55
amplitude = 0.22
frequency = 3
noise = 0.75
x <- 1:200
y <- (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
ynoise <- y + (noise * runif( length( x ) ))

gam.object <- gam( ynoise ~ s( x ) )
p <- predict( gam.object, data.frame( x = 1:210 ) )

plot( p, col = rgb( 0, 0.75, 0.2 ) )
points( x, ynoise, col = rgb( 0.121, 0.247, 0.506 ) )
points( fitted( gam.object ), col = rgb( 1.0, 0.392, 0.0 ) )