使用 Bootstrap 预测混合模型

3
library(nlme)
library(bootstrap)
y = Loblolly$height
x = Loblolly
theta.fit = function(x, y){
nlme(height ~ SSasymp(age, Asym, R0, lrc),
            data = x,
            fixed = Asym + R0 + lrc ~ 1,
            random = Asym ~ 1,
            start = c(Asym = 103, R0 = -8.5, lrc = -3.3))
}
theta.predict = function(fit, x){
  (fit$fitted)[,1]
}
sq.err <- function(y,yhat) { (y-yhat)^2}                 
results <- bootpred(x,y,20,theta.fit,theta.predict,
                    err.meas=sq.err)

我正在使用bootpred函数来获得预测误差的估计值。但是,当我运行最后一行时,会出现以下错误:

 Error in model.frame.default(formula = ~height + age, data = c(" 4.51",  : 
  'data' must be a data.frame, not a matrix or an array 

我随后尝试 x = data.frame(x) 但无法解决我的问题。

1个回答

0
问题出现的原因是所使用的示例数据集是一个分组数据(groupedData):
library(nlme)
library(bootstrap)
y = Loblolly$height
x = Loblolly

class(x)
[1] "nfnGroupedData" "nfGroupedData"  "groupedData"    "data.frame" 

bootpred函数内部,它再次被转换为矩阵。来回转换可能会很混乱,特别是当您需要线性混合模型的因子列时。

您可以编写theta.fit和theta.predict以接受data.frame:

theta.fit = function(df){
nlme(height ~ SSasymp(age, Asym, R0, lrc),
            data = df,
            fixed = Asym + R0 + lrc ~ 1,
            random = Asym ~ 1,
            start = c(Asym = 103, R0 = -8.5, lrc = -3.3))
}
theta.predict = function(fit, df){
  predict(fit,df)
}

sq.err <- function(y,yhat) { (y-yhat)^2}

现在修改bootpred函数并使用df,我猜你可以再次提供y,或者指定在data.frame中使用的列:

bootpred_df = function (df,y,nboot, theta.fit, theta.predict, err.meas, ...) 
{
    call <- match.call()
    n <- length(y)
    saveii <- NULL
    fit0 <- theta.fit(df, ...)
    yhat0 <- theta.predict(fit0, df)
    app.err <- mean(err.meas(y, yhat0))
    err1 <- matrix(0, nrow = nboot, ncol = n)
    err2 <- rep(0, nboot)
    for (b in 1:nboot) {
        ii <- sample(1:n, replace = TRUE)
        saveii <- cbind(saveii, ii)
        fit <- theta.fit(df[ii, ], ...)
        yhat1 <- theta.predict(fit, df[ii, ])
        yhat2 <- theta.predict(fit, df)
        err1[b, ] <- err.meas(y, yhat2)
        err2[b] <- mean(err.meas(y[ii], yhat1))
    }
    optim <- mean(apply(err1, 1, mean,na.rm=TRUE) - err2)
    junk <- function(x, i) {
        sum(x == i)
    }
    e0 <- 0
    for (i in 1:n) {
        o <- apply(saveii, 2, junk, i)
        if (sum(o == 0) == 0) 
            cat("increase nboot for computation of the .632 estimator", 
                fill = TRUE)
        e0 <- e0 + (1/n) * sum(err1[o == 0, i])/sum(o == 0)
    }
    err.632 <- 0.368 * app.err + 0.632 * e0
    return(list(app.err, optim, err.632, call = call))
}

我们现在可以运行它了...但由于这些数据的特性,会出现组(种子)分布不均匀的情况,使得一些变量难以估计...很可能这个问题最好通过改进代码来解决。无论如何,如果你很幸运,它会像下面这样工作:
bootpred_df(Loblolly,Loblolly$height,20,theta.fit,theta.predict,err.meas=sq.err)
    [[1]]
    [1] 0.4337236
    
    [[2]]
    [1] 0.1777644
    
    [[3]]
    [1] 0.6532417
    
    $call
    bootpred_df(df = Loblolly, y = Loblolly$height, nboot = 20, theta.fit = theta.fit, 
        theta.predict = theta.predict, err.meas = sq.err)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接