如何在 ggplot2 R 中为零膨胀负二项式函数添加置信区间？

Question

如何在 ggplot2 R 中为零膨胀负二项式函数添加置信区间？

3

我有一个数据集，它被建模为带有混合效应的零膨胀负二项分布。我想从模型预测中获得置信区间，并绘制模型的均值和置信区间。我已经尝试绘制了模型均值，但不知道如何将模型的置信区间绘制到 ggplot2 上。我想绘制数据的预测均值及其置信区间。我的基本图表尝试代码如下：

library(pscl)
library(lmtest)

df <- data.frame(
      fertilizer = c("N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P","N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P"), 
      level = c("low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","low"), 
      repro = c(0,90,2,4,0,80,1,90,2,33,56,0,99,100,66,80,1,0,2,33,0,0,1,2,90,5,2,2,5,8,0,1,90,2,4,66,0,0,0,0,1,2,90,5,2,5,8,55)
    )    

    model <- formula(repro ∼ fertilizer + level | fertilizer * level)
    modelzinb <- zeroinfl(model, dist = "negbin", link = "logit",data =df)
    summary(modelzinb)

    df$predict <- predict(modelzinb)

    ggplot(df, aes(x=fertilizer, y=predict, color = fertilizer)) + theme_bw() +  stat_summary(aes(color = fertilizer),fun.y = mean, geom = "point", size = 4, position = position_dodge(0.1)) +
      scale_x_discrete("Fertlizer") +
      facet_wrap(.~level)

- Biotechgeek

1

zeroinfl 是哪个包里的？请始终包括您使用过的任何非基础 R 包。 - Maurits Evers

抱歉，我现在已经包含它了。 - Rspacer

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Maurits Evers · Accepted Answer

我不确定您想绘制什么。但是我可以向您展示如何计算零膨胀负二项模型的预测置信区间。

请注意，我无法评论拟合的质量或拟合这种模型是否有意义。要回答这些问题需要具备领域特定的知识，而我没有。

Let's start by fitting the model to your data.

library(pscl)
model <- formula(repro ~ fertilizer + level | fertilizer * level)
modelzinb <- zeroinfl(model, dist = "negbin", link = "logit", data = df)

We can use predict to get model estimates for the response.
```
resp <- predict(modelzinb)
```
Note that in your zero-inflated NB model, predict.zeroinfl (by default) returns the estimated mean response on the scale of the observed counts repro.

Concerning confidence intervals (CIs), the "issue" here is that precict.zeroinfl does not have an interval argument that allows to calculate CIs directly. On a side note: It seems that pscl::zeroinfl used to include that functionality, see the documentation for version 0.54. Perhaps it would be worth contacting the package maintainer(s) about this.

The bottom line is, we have to find a different way to calculate prediction CIs. To do so, we use bootstrapping. The R library boot provides all the necessary functions & tools to do that.
To start, we can use the function boot::boot to generate bootstrap replicates of the predicted responses.boot::boot needs a function that generates the quantity of interest (here the predicted response) based on data. So we first define such a function stat. In this particular case stat must take two arguments: the first argument is the original data, the second argument is a vector of row indices (that define the random bootstrap sample of the data). The function will then use bootstrapped data to fit the model, and then use the model to predict the mean response based on the full data.
```
stat <- function(df, inds) {
    model <- formula(repro ~ fertilizer + level | fertilizer * level)
    predict(
        zeroinfl(model, dist = "negbin", link = "logit", data = df[inds, ]),
        newdata = df)
}
```
In order to ensure reproducibility of results, we set a a fixed random seed and generate 100 bootstrap samples for the estimated mean response
```
set.seed(2018)
library(boot)
res <- boot(df, stat, R = 100)
```
The output object res is a list that contains the estimated mean response for the full data in element t0 (verify with all.equal(res$t0, predict(modelzinb))) and the estimated mean response for the bootstrap samples in element t (which is a matrix of dimension R x nrow(df), where R are the number of bootstrap samples).
All that's left to do is to calculate a confidence interval from the estimated mean responses based on the models fitted to the bootstrap samples. To do so we can use the handy function boot::boot.ci. The function allows to calculate different types of CIs (basic, normal, BCa, etc.). Here we use "basic" for demonstration purposes. I do not claim that this is the best choice.

boot::boot.ci takes an index argument which corresponds to the entry of the estimated mean response vector. The actual intervals are stored in the last 2 columns (column 4 and 5) of the matrix stored as a list element of the boot.ci return object.
```
CI <- setNames(as.data.frame(t(sapply(1:nrow(df), function(row)
    boot.ci(res, conf = 0.95, type = "basic", index = row)$basic[, 4:5]))),
    c("lower", "upper"))
```

Now we're done, and we can column-bind the CIs to the original data including predicted mean response values.

df_all <- cbind.data.frame(df, response = predict(modelzinb), CI)
#   fertilizer level repro response      lower    upper
#1           N   low     0 29.72057   5.876731 46.48165
#2           N   low    90 29.72057   5.876731 46.48165
#3           N  high     2 21.99345 -15.228956 38.86421
#4           N  high     4 21.99345 -15.228956 38.86421
#5           N   low     0 29.72057   5.876731 46.48165
#6           N   low    80 29.72057   5.876731 46.48165
#7           N  high     1 21.99345 -15.228956 38.86421
#8           N  high    90 21.99345 -15.228956 38.86421
#9           N   low     2 29.72057   5.876731 46.48165
#10          N   low    33 29.72057   5.876731 46.48165
#11          N  high    56 21.99345 -15.228956 38.86421
#12          N  high     0 21.99345 -15.228956 38.86421
#13          P   low    99 24.22626  -9.225827 46.17656
#14          P   low   100 24.22626  -9.225827 46.17656
#15          P  high    66 22.71826   2.595246 39.88333
#16          P  high    80 22.71826   2.595246 39.88333
#17          P   low     1 24.22626  -9.225827 46.17656
#18          P   low     0 24.22626  -9.225827 46.17656
#19          P  high     2 22.71826   2.595246 39.88333
#20          P  high    33 22.71826   2.595246 39.88333
#21          P   low     0 24.22626  -9.225827 46.17656
#22          P   low     0 24.22626  -9.225827 46.17656
#23          P  high     1 22.71826   2.595246 39.88333
#24          P  high     2 22.71826   2.595246 39.88333
#25          N   low    90 29.72057   5.876731 46.48165
#26          N   low     5 29.72057   5.876731 46.48165
#27          N  high     2 21.99345 -15.228956 38.86421
#28          N  high     2 21.99345 -15.228956 38.86421
#29          N   low     5 29.72057   5.876731 46.48165
#30          N   low     8 29.72057   5.876731 46.48165
#31          N  high     0 21.99345 -15.228956 38.86421
#32          N  high     1 21.99345 -15.228956 38.86421
#33          N   low    90 29.72057   5.876731 46.48165
#34          N   low     2 29.72057   5.876731 46.48165
#35          N  high     4 21.99345 -15.228956 38.86421
#36          N  high    66 21.99345 -15.228956 38.86421
#37          P   low     0 24.22626  -9.225827 46.17656
#38          P   low     0 24.22626  -9.225827 46.17656
#39          P  high     0 22.71826   2.595246 39.88333
#40          P  high     0 22.71826   2.595246 39.88333
#41          P   low     1 24.22626  -9.225827 46.17656
#42          P   low     2 24.22626  -9.225827 46.17656
#43          P  high    90 22.71826   2.595246 39.88333
#44          P  high     5 22.71826   2.595246 39.88333
#45          P   low     2 24.22626  -9.225827 46.17656
#46          P   low     5 24.22626  -9.225827 46.17656
#47          P  high     8 22.71826   2.595246 39.88333
#48          P   low    55 24.22626  -9.225827 46.17656
#...

示例数据

df <- data.frame(
    fertilizer = c("N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P","N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P"),
    level = c("low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","low"),
    repro = c(0,90,2,4,0,80,1,90,2,33,56,0,99,100,66,80,1,0,2,33,0,0,1,2,90,5,2,2,5,8,0,1,90,2,4,66,0,0,0,0,1,2,90,5,2,5,8,55))