"累积"回归的矢量化

Question

"累积"回归的矢量化

3

我有数据。

 dat <- data.frame(t=1:100,y=rnorm(100),x1=rnorm(100)),x2=rnorm(100))

其中t表示时间点。我想在每个时间点上，基于之前的时间点，将y与x1和x2进行回归分析。

我可以创建一个循环实现。

reg <- matrix(rep(NA,3*nrow(dat),ncol=3)
for(i in 11:nrow(dat)){
   reg[i,] <- coefficients(lm(y ~ x1 + x2, data=dat[1:i,]))
}

但我想知道是否有一种向量化的方法来处理这个问题，也许可以使用data.table。

- bumblebee

1

不是向量化的，但是 reg2 <- t(sapply(11:nrow(dat), function(n) coefficients(lm(y ~ x1 + x2, data=dat[1:n,]))))。请注意，在您的代码中，reg[1:10, ] 的值为 NA。 - Rui Barradas

2个回答

1

尝试使用lapply在回归自定义函数上实现此解决方案：

f<-function(i,dat)
+ {
+       out <- coefficients(lm(y ~ x1 + x2, data=dat[1:i,]))
+       return(out)
+ }
> lapply(seq(1:nrow(dat)),f,dat=dat)
[[1]]
(Intercept)          x1          x2 
  0.4949079          NA          NA 

[[2]]
(Intercept)          x1          x2 
 -0.4552593   2.4497037          NA 

[[3]]
(Intercept)          x1          x2 
  0.1023961   1.6163017  -0.8490789 

[[4]]
(Intercept)          x1          x2 
 -0.9136870   2.1235787   0.9072042

...

[[100]]
(Intercept)          x1          x2 
 0.06118874 -0.02917001  0.15879213

- Terru_theTerror

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MichaelChirico · Accepted Answer

我们可以使用非等自连接来获取您想要的表格：

library(data.table)
setDT(dat)
# not clear if you wanted points _strictly_ before present, 
#   but the fix is basically clear -- just add nomatch = 0L to skip the first row
dat[dat, on = .(t <= t), allow.cartesian = TRUE]
        t           y         x1          x2
   1:   1 -0.51729096  0.1765509  1.06562278
   2:   2 -0.51729096  0.1765509  1.06562278
   3:   2  0.85173679 -0.7801053  0.05249113
   4:   3 -0.51729096  0.1765509  1.06562278
   5:   3  0.85173679 -0.7801053  0.05249113
  ---                                       
5046: 100  1.03802913 -2.7042756  2.05639758
5047: 100 -1.29122593  0.9013410  0.77088748
5048: 100  0.08262791  0.4135725  0.92694074
5049: 100 -0.93397320  0.2719790 -0.26097185
5050: 100 -1.23897617  0.9008160  0.61121185
             i.y       i.x1        i.x2
   1: -0.5172910  0.1765509  1.06562278
   2:  0.8517368 -0.7801053  0.05249113
   3:  0.8517368 -0.7801053  0.05249113
   4: -0.5080630 -2.0701757 -1.01573263
   5: -0.5080630 -2.0701757 -1.01573263
  ---                                  
5046: -1.2389762  0.9008160  0.61121185
5047: -1.2389762  0.9008160  0.61121185
5048: -1.2389762  0.9008160  0.61121185
5049: -1.2389762  0.9008160  0.61121185
5050: -1.2389762  0.9008160  0.61121185

有点混淆，但是在 t <= t 中，LHS 的 t 指的是 LHS 的 dat，RHS 的 t 指的是 RHS 的 dat。

从这里开始，我们只需要按 t 进行分组并运行回归：

dat[dat, on = .(t <= t), allow.cartesian = TRUE
    ][ , as.list(coef(lm(y ~ x1 + x2))), keyby = t
       # (only adding head here to limit output)
       ][ , head(.SD)]
#    t (Intercept)          x1          x2
# 1: 1  -0.5172910          NA          NA
# 2: 2  -0.2646369 -1.43105510          NA
# 3: 3   9.1879448  9.96212179 -10.7580819
# 4: 4  -0.3504059 -0.36654096   0.4523271
# 5: 5  -0.1681879 -0.06670494   0.3553107
# 6: 6   1.2108223  1.04082291  -0.6947567