使用bayesplot绘制多个模型的后验参数估计

Question

使用bayesplot绘制多个模型的后验参数估计

8

我正在使用优秀的绘图库bayesplot来可视化由rstanarm估计的模型的后验概率区间。我想通过将系数的后验区间绘制到同一图中，对不同模型的抽样结果进行图形化比较。

例如，假设我有三个参数beta1，beta2，beta3从两个不同模型的后验分布中各有1000次抽样：

# load the plotting library
library(bayesplot)
#> This is bayesplot version 1.6.0
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting
library(ggplot2)

# generate fake posterior draws from model1
fdata <- matrix(rnorm(1000 * 3), ncol = 3)
colnames(fdata) <- c('beta1', 'beta2', 'beta3')

# fake posterior draws from model 2
fdata2 <- matrix(rnorm(1000 * 3, 1, 2), ncol = 3)
colnames(fdata2) <- c('beta1', 'beta2', 'beta3')

Bayesplot可以为单个模型绘制出精美的可视化图表，它是基于ggplot2的，因此我可以按照自己的意愿进行定制：

# a nice plot of 1
color_scheme_set("orange")
mcmc_intervals(fdata) + theme_minimal() + ggtitle("Model 1")

# a nice plot of 2
color_scheme_set("blue")
mcmc_intervals(fdata2) + ggtitle("Model 2")

我希望实现的目标是将这两个模型绘制在同一张图上，对于每个系数，我有两个区间，并通过将颜色映射到模型来区分哪个区间是哪个。然而，我无法弄清楚如何做到这一点。以下是一些不起作用的方法：

# doesnt work
mcmc_intervals(fdata) + mcmc_intervals(fdata2)
#> Error: Don't know how to add mcmc_intervals(fdata2) to a plot

# appears to pool
mcmc_intervals(list(fdata, fdata2))

你有什么想法能帮我完成这个任务吗？或者如何手动给出先验抽样矩阵，也可以。

^{该文档由reprex包 (v0.2.1)创建于2018-10-18}

- gfgm

3个回答

1

我花了比想象中更多的时间来编写这篇文章，因此不妨在这里发布一下。以下是一个函数，它结合了上面的建议，并且（目前）适用于rstanarm和brms模型对象。请注意保留HTML标签，但不要添加解释。

compare_posteriors <- function(..., dodge_width = 0.5) {
  dots <- rlang::dots_list(..., .named = TRUE)
  draws <- lapply(dots, function(x) {
    if (class(x)[1] == "stanreg") {
        posterior::subset_draws(posterior::as_draws(x$stanfit),
            variable = names(fixef(x))
        )
    } else if (class(x)[1] == "brmsfit") {
        brm_draws <- posterior::subset_draws(posterior::as_draws(x$fit),
            variable = paste0("b_", rownames(fixef(x)))
        )
        posterior::variables(brm_draws) <- stringr::str_split(posterior::variables(brm_draws), "_", simplify = T)[, 2]
        posterior::rename_variables(brm_draws, `(Intercept)` = Intercept)
    } else {
        stop(paste0(class(x)[1], " objects not supported."))
    }
  })
  intervals <- lapply(draws, bayesplot::mcmc_intervals_data)
  combined <- dplyr::bind_rows(intervals, .id = "model")
  ggplot(combined, aes(x = m, y = parameter, color = model, group = model)) +
    geom_linerange(aes(xmin = l, xmax = h), size = 2, position = position_dodge(dodge_width)) +
    geom_linerange(aes(xmin = ll, xmax = hh), position = position_dodge(dodge_width)) +
    geom_point(color = "black", position = position_dodge(dodge_width)) +
    geom_vline(xintercept = 0, linetype = "dashed")
}

使用方法：

compare_posteriors(mod1, mod2, mod3)

- Ryan

1

我在 GitHub 的 bayesplot 页面上提出了这个问题，并得到了回复 (Issue #232)。

- Emmanuel Teitelbaum

1

你好，欢迎来到SO。很高兴看到您对stan/dev的跟进。您可以编辑并将建议的解决方案放在这里，同时保留问题归属。 - Chris

这太好了。多年来，许多人似乎都看过这个问题。你为什么不把解决方案粘贴进来，我会接受答案的呢？ - gfgm

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dylan_Gomes · Accepted Answer

为了方便起见，我在@Manny T提供的链接上进行了代码扩展(https://github.com/stan-dev/bayesplot/issues/232)

# simulate having posteriors for two different models each with parameters beta[1],..., beta[4]
posterior_1 <- matrix(rnorm(4000), 1000, 4)
posterior_2 <- matrix(rnorm(4000), 1000, 4)
colnames(posterior_1) <- colnames(posterior_2) <- paste0("beta[", 1:4, "]")

# use bayesplot::mcmc_intervals_data() function to get intervals data in format easy to pass to ggplot
library(bayesplot)
combined <- rbind(mcmc_intervals_data(posterior_1), mcmc_intervals_data(posterior_2))
combined$model <- rep(c("Model 1", "Model 2"), each = ncol(posterior_1))

# make the plot using ggplot 
library(ggplot2)
theme_set(bayesplot::theme_default())
pos <- position_nudge(y = ifelse(combined$model == "Model 2", 0, 0.1))
ggplot(combined, aes(x = m, y = parameter, color = model)) + 
  geom_linerange(aes(xmin = l, xmax = h), position = pos, size=2)+
  geom_linerange(aes(xmin = ll, xmax = hh), position = pos)+
  geom_point(position = pos, color="black")

如果您和我一样，希望内部区间为80％和90％（而不是50％），并可能希望坐标轴翻转，并在0处添加虚线（模型估计没有变化）。您可以像这样操作：

# use bayesplot::mcmc_intervals_data() function to get intervals data in format easy to pass to ggplot
library(bayesplot)
combined <- rbind(mcmc_intervals_data(posterior_1,prob=0.8,prob_outer = 0.9), mcmc_intervals_data(posterior_2,prob=0.8,prob_outer = 0.9))
combined$model <- rep(c("Model 1", "Model 2"), each = ncol(posterior_1))

# make the plot using ggplot 
library(ggplot2)
theme_set(bayesplot::theme_default())
pos <- position_nudge(y = ifelse(combined$model == "Model 2", 0, 0.1))
ggplot(combined, aes(x = m, y = parameter, color = model)) + 
  geom_linerange(aes(xmin = l, xmax = h), position = pos, size=2)+
  geom_linerange(aes(xmin = ll, xmax = hh), position = pos)+
  geom_point(position = pos, color="black")+
  coord_flip()+
  geom_vline(xintercept=0,linetype="dashed")

关于最后一个问题，有几个需要注意的地方。我添加了prob_outer = 0.9，即使这是默认值，只是为了展示您如何更改外部可信区间。虚线是使用geom_vline和xintercept = 创建的，而不是使用geom_hline和yintercept = ，因为使用了coord_flip（一切都被反转了）。因此，如果您不翻转轴，您需要做相反的操作。