R预测和可视化

3
我已经对我的数据进行了多项式拟合并可视化了结果。我试图将我的绘图扩展到未来,并预测当y小于70时的x值(日期)。我的数据在此处,可以复制。下面是我的当前代码。
data <- read.table("data.txt", sep="\t", header=T)

data$date<- as.Date(data$date)
data$y <- as.numeric(data$y)

attach(data)

x <- 1:88 # vector for formula coordinates. I haven't found a way to plot polynomial formula with dates..

p <- qplot(date, y, data=data , geom="line", xlab="Time", ylab="y")
p+ geom_smooth(method = "lm", formula = y ~ poly(x, 3))


fit <- lm(y~poly(x,3)) 
summary(fit) #Fit is adequate

这个图的结果如下:

enter image description here

由于我不知道如何将日期用作公式的“坐标轴”,因此使用了数字x向量来创建三次多项式。我想要预测,即通过这个公式将这个图扩展到未来,并找出在什么日期y值低于70。


1
你需要在 ggolot 外部调用 lm(就像你现在展示的那样),然后使用带有“newdata”参数的 predict 函数,该参数具有在您可以合理地预期预测值约为70的范围内的 'x' 值。 - IRTFM
是的,我尝试过了。我创建了一个新的数字向量,认为它可以覆盖范围 g <- 1:120。然后我使用了 predict(fit,newdata=g)。但是我收到了这个错误:Error in eval(predvars, data, env) : numeric 'envir' arg not of length one。另外,我想在 x 中使用日期,因为现在我必须将整数更改为日期。 - ELEC
1
新数据参数需要是一个列表,其中一个元素是“x”,其值应适当构造。我非常怀疑1:120会成为as.numeric(data$y)之后的下4个月。(日期确实是整数。) - IRTFM
我发现我必须在newdata参数中使用数据框。 我构建了g<-1:200并运行了predict(fit, newdata= data.frame(x=g))。从结果中可以看出,第179个指数低于70,因此我将那么多天添加到我的第一个观测值as.Date("2014-04-06") + 179。因此结果似乎是2014-10-02。我所做的是否有效(虽然不太优雅)?感谢您的帮助。如果您有更好的解决方案,请随时发布答案,我会接受它。 - ELEC
2
我认为你可能想要使用stat_smooth和`fullrange=TRUE。 - Tyler Rinker
显示剩余2条评论
1个回答

0
有点巧妙,但能完成任务:

enter image description here

代码

# Define timeframe to predict, convert dates to numeric
days <- as.numeric(seq.Date(max(df$date) + 1, max(df$date) + 120, by = "days"))

# Build model
model <- loess(y ~ as.numeric(date), df, control = loess.control(surface = "direct"))

# Apply model to timeframe
p <- predict(model, days)

# Convert date back to Date format, build result dataframe
result <- data.frame(date = as.Date(days, origin = "1970-01-01"),
                     y = p)

# Plot three elements: original data, model, prediction
ggplot() +  
    geom_line(data = df, aes(date, y)) +
    geom_smooth(data = df, aes(date, y), method = "loess", se = FALSE) + 
    geom_line(data = result, aes(date, y), linetype = "dashed", color = "red", size = 1)

数据

df <- structure(list(date = structure(c(16166, 16167, 16168, 16169, 16170, 16171, 16172, 16173, 16174, 16175, 16176, 16177, 16178, 16179, 16180, 16186, 16187, 16188, 16189, 16190, 16191, 16205, 16206, 16207, 16208, 16209, 16210, 16211, 16212, 16216, 16217, 16218, 16219, 16261, 16262, 16263, 16264, 16265, 16266, 16267, 16268, 16269, 16270, 16271, 16272, 16273, 16274, 16275, 16282, 16283, 16284, 16285, 16286, 16287, 16288, 16289, 16290, 16291, 16292, 16293, 16294, 16295, 16296, 16297, 16298, 16299, 16300, 16301, 16302, 16303, 16304, 16305, 16306, 16307, 16308, 16309, 16310, 16311, 16312, 16313, 16315, 16316, 16317, 16318, 16319, 16320, 16321, 16322), class = "Date"), y = c(95.543962, 95.573412, 95.589183, 95.500536, 95.563371, 95.579541, 94.979131, 95.56979, 95.545374, 95.912162, 95.687874, 95.564335, 95.538733, 95.579036, 95.539545, 94.068515, 94.584192, 95.479851, 95.554502, 95.517236, 95.514891, 95.541116, 95.52134, 95.545067, 95.551372, 95.520105, 95.535395, 95.494109, 95.501609, 95.544039, 95.545912, 95.560667, 95.435162, 94.934045, 95.072639, 95.050748, 94.676876, 94.68793, 95.068279, 95.038642, 94.408982, 94.429949, 94.990296, 94.75853, 95.1649, 95.095966, 93.945934, 93.934546, 92.71179, 92.757176, 93.429478, 93.730306, 93.840446, 93.769516, 93.958374, 93.94293, 93.940904, 93.776711, 93.474757, 92.255233, 92.779808, 92.508432, 92.869858, 92.846158, 93.533357, 93.233847, 93.392017, 93.613915, 93.520494, 93.761786, 93.562945, 93.584771, 93.650417, 93.091347, 92.813293, 92.650896, 92.577961, 92.468491, 93.269589, 93.242729, 91.626408, 91.157243, 90.486782, 90.989062, 91.766393, 91.477911, 90.463049, 91.182974)), row.names = c(NA, -88L), class = "data.frame")

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接