如何创建自定义ggplot2平滑统计(不仅限于自定义lm或glm模型)?

3

我有一个函数用于计算移动窗口中位数和90%置信区间。对于每个x = seq(xmin, xmax, by = wStep),我返回所有y的中位数和5%和95%分位数,这些yx值小于wSize/2。我希望通过创建自定义平滑函数stat_movingwindow(),使用ggplot2以线和带状图形式显示结果。我可以使用geom_smooth(data = ..., stat = "identity")创建所需的结果:

moveWin <- function(d, wSize = 0.5, wStep = 0.1, 
  f = function(x) quantile(x, prob = c(0.05,0.50,0.95), na.rm = TRUE)
){
  x <- seq(min(d$x), max(d$x), by = wStep)
  y <- matrix(NA, ncol = 3, nrow = length(x))
  for(i in seq_along(x)){
    y[i, ] <- f(d[abs(d$x - x[i]) < wSize/2, ]$y)
  }
  y <- as.tibble(y)
  colnames(y) <- c("ymin","y","ymax")
  y$x <- x
  return(as.tibble(y))
}

set.seed(123)
d <- tibble(
 x= sqrt(seq(0,1,length.out = 50))*10,
 y= rnorm(50)
)

ggplot(data = d) + aes(x = x, y = y) +
  geom_smooth(
    data    = function(d) moveWin(d, wSize = 1, wStep = 0.1), 
    mapping = aes(ymin = ymin, ymax= ymax),
    stat    = "identity") + 
  geom_point() + scale_x_continuous(breaks = 1:10)

ggplot with moving window smoothing

在扩展ggplot2的小插曲(链接)之后,这是我迄今为止想出的代码。然而,问题在于这不显示ribbon。也许我需要一种声明此自定义stat提供美学yminymax的方法。如何使以下代码输出与上述类似的结果?

StatMovingWindow <- ggproto("StatMovingWindow", Stat,
  compute_group = function(data, scales, wSize, wStep, fun) {
    moveWin(data, wSize = wSize, wStep = wStep, f = fun)
  },

  required_aes = c("x", "y")
)
stat_movingwindow <- function(mapping = NULL, data = NULL, 
  fun = function(d) quantile(d, probs = c(0.05, 0.50, 0.95), na.rm = TRUE),
  wStep = 0.1, wSize = 1,
  geom = "smooth", position = "identity", show.legend = NA, inherit.aes = TRUE,
  ...
){
  layer(
    stat = StatMovingWindow, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(wStep = wStep, wSize = wSize, fun = fun, ...)
  )
}

ggplot(data = d) + aes(x = x, y = y) +
  stat_movingwindow(wStep = 0.1, wSize = 1) + 
  geom_point() + scale_x_continuous(breaks = 1:10)

custom smoothing stat does not show a ribbon


尝试在 stat_movingwindow() 内部添加 se = TRUE 参数? - Z.Lin
@Z.Lin 这个好像可以用 O.o ... 但是我不明白为什么。为什么 GeomSmooth 能找到这个参数?例如,如果我在 moveWin 函数的定义中添加一个参数 se=FALSE,那么如果我调用 stat_movingwindow(..., se = TRUE),它就不会被设置为 true。为什么它会得到 wStep 的值呢?这两个参数都在 layer(... params= ...) 中列出了? - akraf
请看下面冗长(啰嗦)的解释。我觉得在评论区里无法解释清楚这个问题... - Z.Lin
1个回答

2

在你的stat_movingwindow函数中,对应的geom的代码行为geom = "smooth"


stat_movingwindow <- function(mapping = NULL, data = NULL, 
  fun = function(d) quantile(d, probs = c(0.05, 0.50, 0.95), na.rm = TRUE),
  wStep = 0.1, wSize = 1,
  geom = "smooth", # <- look here
  position = "identity", show.legend = NA, inherit.aes = TRUE,
  ...
){
  layer(
    stat = StatMovingWindow, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(wStep = wStep, wSize = wSize, fun = fun, ...)
  )
}

检查geom_smooth的代码,我们可以看到它包括参数se = TRUE,并使用GeomSmooth作为其几何图形:
最初的回答
> geom_smooth
function (mapping = NULL, data = NULL, stat = "smooth", position = "identity", 
    ..., method = "auto", formula = y ~ x, se = TRUE, # <- look here
    na.rm = FALSE, 
    show.legend = NA, inherit.aes = TRUE) 
{
    params <- list(na.rm = na.rm, se = se, ...)
    if (identical(stat, "smooth")) {
        params$method <- method
        params$formula <- formula
    }
    layer(data = data, mapping = mapping, stat = stat, geom = GeomSmooth, # <- and here
        position = position, show.legend = show.legend, inherit.aes = inherit.aes, 
        params = params)
}

深入了解GeomSmooth后,我们发现它的draw_group函数(负责绘制平滑线)的默认参数是se = FALSE
从代码中可以看出,如果se == FALSE,即使你的数据中存在ymaxymin,由于StatMovingWindow$compute_group函数的存在,has_ribbon也将为FALSE。这反过来意味着只有GeomLine$draw_panel(path, panel_params, coord)的结果会被返回,而没有GeomRibbon$draw_group(ribbon, panel_params, coord)。"最初的回答"
> GeomSmooth$draw_group
<ggproto method>
  <Wrapper function>
    function (...) 
f(...)

  <Inner function (f)>
    function (data, panel_params, coord, se = FALSE) # <- look here
{
    ribbon <- transform(data, colour = NA)
    path <- transform(data, alpha = NA)
    has_ribbon <- se && !is.null(data$ymax) && !is.null(data$ymin) # <- and here
    gList(if (has_ribbon) GeomRibbon$draw_group(ribbon, panel_params, coord), 
          GeomLine$draw_panel(path, panel_params, coord))
}

简而言之,geom_smooth的默认参数se = TRUE覆盖了GeomSmooth$draw_group的默认行为(对于stat_smooth也是这样),如果我们想要达到相同的结果,则在stat_movingwindow中应该做相同的事情。
如果您预计通常希望绘制色带,则可以在stat_movingwindow的定义中包含参数se=TRUE。如果这只是偶尔需要,您可以在代码中必要时包含它。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接