如何识别geom_smooth()所使用的函数

Question

如何识别geom_smooth()所使用的函数

11

我想展示由geom_smooth()创建的图形，但对于我来说，能够描述图形是如何创建的非常重要。

我可以从文档中看到当n >= 1000时，gam被用作平滑函数，但我无法看到使用了多少个节点或者生成平滑的函数是什么。

示例：

library(ggplot2)

set.seed(12345)
n <- 3000
x1 <- seq(0, 4*pi,, n)
x2 <- runif(n)
x3 <- rnorm(n)
lp <- 2*sin(2* x1)+3*x2 + 3*x3
p <- 1/(1+exp(-lp))
y <- ifelse(p > 0.5, 1, 0)

df <- data.frame(x1, x2, x3, y)

# default plot
ggplot(df, aes(x = x1, y = y)) +
  geom_smooth() 

# specify method='gam'
# linear
ggplot(df, aes(x = x1, y = y)) +
  geom_smooth(method = 'gam') 

# specify gam and splines
# Shows non-linearity, but different from default
ggplot(df, aes(x = x1, y = y)) +
  geom_smooth(method = 'gam',
              method.args = list(family = "binomial"),
              formula = y ~ splines::ns(x, 7))

如果我想使用默认参数，有没有一种方法可以确定用于创建平滑曲线的函数，以便在分析的方法部分中准确描述它？

- A Toll

1

可能的答案在这里：https://dev59.com/92kw5IYBdhLWcg3wlbh3 - Nate

2

“gam”的默认公式（根据文档）应该是formula = y ~ s(x, bs = "cs")，因此k被设置为从mgcv::s中获取的默认值。然而，在ggplot2_2.2.0中，“gam”默认使用formula = y ~ x，因此拟合一条直线而不是任何类型的样条曲线。我对此表示怀疑；结果图现在与使用“lm”得到的相同。 - aosmith

1

@Axeman 你可以继续了。谁知道呢，也许一直都是这样的，如果用户选择使用gam，那么他们应该指定公式。 - aosmith

1

@aosmith，我不知道，我降级到v1.0.0，但情况仍然如此。 - Axeman

值得一提的是，Hadley在这里解释了这个问题（https://github.com/tidyverse/ggplot2/issues/1931）：“如果明确说明了`method =“gam”，则默认公式不会从y〜x更改为y〜s（x，bs =“cs”）`，这种行为是正确的；我们报告公式但不自动更改它。这取决于你。” - Z.Lin

显示剩余4条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Z.Lin · Accepted Answer

我写了一个功能来逆向工程 StatSmooth 的 setup_params 函数中使用的步骤，以获取用于绘制的实际方法/公式参数。

该函数期望一个ggplot对象作为其输入，以及一个额外的可选参数，指定对应于 geom_smooth 的层（如果未指定，则默认为1）。它返回一个文本字符串，格式为"Method: [method used], Formula: [formula used]"，并将所有参数打印到控制台上。

设想的用例是双重的：

将文本字符串原样添加到图表中作为图表标题/副标题/说明，以便在分析过程中快速参考；
从控制台输出中读取信息，并在图表中进行注释，手动格式化得体（例如解析的plotmath表达式），用于报告/演示。

功能：

get.params <- function(plot, layer = 1){

  # return empty string if the specified geom layer doesn't use stat = "smooth"
  if(!"StatSmooth" %in% class(plot$layers[[layer]]$stat)){
    message("No smoothing function was used in this geom layer.")
    return("")
  }

  # recreate data used by this layer, in the format expected by StatSmooth
  # (this code chunk takes heavy reference from ggplot2:::ggplot_build.ggplot)
  layer.data <- plot$layers[[layer]]$layer_data(plot$data)
  layout <- ggplot2:::create_layout(plot$facet, plot$coordinates)
  data <- layout$setup(list(layer.data), plot$data, plot$plot_env)
  data[[1]] <- plot$layers[[layer]]$compute_aesthetics(data[[1]], plot)
  scales <- plot$scales
  data[[1]] <- ggplot2:::scales_transform_df(scales = scales, df = data[[1]])
  layout$train_position(data, scales$get_scales("x"), scales$get_scales("y"))
  data <- layout$map_position(data)[[1]]

  # set up stat params (e.g. replace "auto" with actual method / formula)
  stat.params <- suppressMessages(
    plot$layers[[layer]]$stat$setup_params(data = data, 
                                           params = plot$layers[[layer]]$stat_params)
    )

  # reverse the last step in setup_params; we don't need the actual function
  # for mgcv::gam, just the name
  if(identical(stat.params$method, mgcv::gam)) stat.params$method <- "gam"

  print(stat.params)

  return(paste0("Method: ", stat.params$method, ", Formula: ", deparse(stat.params$formula)))
}

演示:

p <- ggplot(df, aes(x = x1, y = y)) # df is the sample dataset in the question

# default plot for 1000+ observations
# (method defaults to gam & formula to 'y ~ s(x, bs = "cs")')
p1 <- p + geom_smooth()
p1 + ggtitle(get.params(p1))

# specify method = 'gam'
# (formula defaults to `y ~ x`)
p2 <- p + geom_smooth(method='gam')
p2 + ggtitle(get.params(p2))

# specify method = 'gam' and splines for formula
p3 <- p + geom_smooth(method='gam',
              method.args = list(family = "binomial"),
              formula = y ~ splines::ns(x, 7))
p3 + ggtitle(get.params(p3))

# specify method = 'glm'
# (formula defaults to `y ~ x`)
p4 <- p + geom_smooth(method='glm')
p4 + ggtitle(get.params(p4))

# default plot for fewer observations
# (method defaults to loess & formula to `y ~ x`)
# observe that function is able to distinguish between plot data 
# & data actually used by the layer
p5 <- p + geom_smooth(data = . %>% slice(1:500))
p5 + ggtitle(get.params(p5))