我们能否整齐地对齐回归方程、R2和p值？

Question

我们能否整齐地对齐回归方程、R2和p值？

6

如何在 ggplot 图表中整洁地添加回归方程、R2 值和 p 值（针对回归方程）？最好与分组和 facet 兼容。这个第一个图中使用 ggpubr 按组添加了回归方程和 r2 值以及 p 值，但它们没有对齐吗？我有什么遗漏吗？能否将它们作为一个字符串包含在其中？

library(ggplot)
library(ggpubr)

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_regline_equation()+
  stat_cor(aes(label = paste(..rr.label.., ..p.label.., sep = "*`,`~")),
           label.x.npc = "centre")

这里有一个使用ggpmisc的选项，可以实现一些奇怪的放置。
编辑奇怪的放置是由于geom=text引起的，我已将其注释掉以提供更好的放置，并添加了`label.x="right"`来停止重叠。我们仍然存在与ggpubr不对齐的问题，这是由@dc37标记的上标问题造成的。

#https://dev59.com/xFoU5IYBdhLWcg3wYWEM#37708832
library(ggpmisc)

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_poly_eq(formula = "y~x", 
             aes(label = paste(..eq.label.., ..rr.label.., sep = "*`,`~")), 
             parse = TRUE)+
  stat_fit_glance(method = 'lm',
                  method.args = list(formula = "y~x"),
                  #geom = 'text',

                  aes(label = paste("P-value = ", signif(..p.value.., digits = 4), sep = "")))

我已经找到了一个将相关统计信息汇总的好方法，但这需要在ggplot之外创建回归，并进行一堆字符串操作-这是最简单的方法吗？此外，它目前的编码不涉及分组，并且无法处理面板。

#https://dev59.com/vmsz5IYBdhLWcg3w9soQ#51974753
#Solution as one string, equation, R2 and p-value
lm_eqn <- function(df, y, x){
  formula = as.formula(sprintf('%s ~ %s', y, x))
  m <- lm(formula, data=df);
  # formating the values into a summary string to print out
  # ~ give some space, but equal size and comma need to be quoted
  eq <- substitute(italic(target) == a + b %.% italic(input)*","~~italic(r)^2~"="~r2*","~~p~"="~italic(pvalue), 
                   list(target = y,
                        input = x,
                        a = format(as.vector(coef(m)[1]), digits = 2), 
                        b = format(as.vector(coef(m)[2]), digits = 2), 
                        r2 = format(summary(m)$r.squared, digits = 3),
                        # getting the pvalue is painful
                        pvalue = format(summary(m)$coefficients[2,'Pr(>|t|)'], digits=1)
                   )
  )
  as.character(as.expression(eq));                 
}

ggplot(mtcars, aes(x = wt, y = mpg, group=cyl))+
  geom_point() +
  geom_text(x=3,y=30,label=lm_eqn(mtcars, 'wt','mpg'),color='red',parse=T) +
  geom_smooth(method='lm')

- Mark Neal

尝试在 stat_poly_eq(...) 中使用 sep = "~~~"。 - monarque13

更改“sep”只会更改等式和r2之间的字符 - 你期望得到不同的结果吗？ - Mark Neal

3个回答

5

使用ggpubr的一种可能的解决方案是，通过将Inf传递给label.y和Inf或-Inf传递给label.x（取决于您想在绘图的右侧还是左侧），将方程公式和R2值放置在图形顶部。

由于R上的上标2，两个文本不会对齐。因此，您需要使用vjust和hjust进行一些微调，以使两个文本对齐。

然后，即使是具有不同比例的分面图也可以工作。

library(ggplot)
library(ggpubr)

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_regline_equation(label.x = -Inf, label.y = Inf, vjust = 1.5, hjust = -0.1, size = 3)+
  stat_cor(aes(label = paste(..rr.label.., ..p.label.., sep = "*`,`~")),
           label.y= Inf, label.x = Inf, vjust = 1, hjust = 1.1, size = 3)+
  facet_wrap(~cyl, scales = "free")

您的问题得到解答了吗？

编辑：通过手动添加方程的替代方法

根据您类似的问题（使用ggpmisc标记ggplot组时使用方程），您可以通过将文本作为geom_text传递来添加方程：

df_mtcars <- mtcars %>% mutate(factor_cyl = as.factor(cyl))

df_label <- df_mtcars %>% group_by(factor_cyl) %>%
  summarise(Inter = lm(mpg~wt)$coefficients[1],
            Coeff = lm(mpg~wt)$coefficients[2],
            pval = summary(lm(mpg~wt))$coefficients[2,4],
            r2 = summary(lm(mpg~wt))$r.squared) %>% ungroup() %>%
  #mutate(ypos = max(df_mtcars$mpg)*(1-0.05*row_number())) %>%
  #mutate(Label2 = paste(factor_cyl,"~Cylinders:~", "italic(y)==",round(Inter,3),ifelse(Coeff <0,"-","+"),round(abs(Coeff),3),"~italic(x)",sep ="")) %>%
  mutate(Label = paste("italic(y)==",round(Inter,3),ifelse(Coeff <0,"-","+"),round(abs(Coeff),3),"~italic(x)",
                       "~~~~italic(R^2)==",round(r2,3),"~~italic(p)==",round(pval,3),sep =""))

# A tibble: 3 x 6
  factor_cyl Inter Coeff   pval    r2 Label                                                                    
  <fct>      <dbl> <dbl>  <dbl> <dbl> <chr>                                                                    
1 4           39.6 -5.65 0.0137 0.509 italic(y)==39.571-5.647~italic(x)~~~~italic(R^2)==0.509~~italic(p)==0.014
2 6           28.4 -2.78 0.0918 0.465 italic(y)==28.409-2.78~italic(x)~~~~italic(R^2)==0.465~~italic(p)==0.092 
3 8           23.9 -2.19 0.0118 0.423 italic(y)==23.868-2.192~italic(x)~~~~italic(R^2)==0.423~~italic(p)==0.012

您可以按照以下方式将其用于 geom_text：

ggplot(df_mtcars,aes(x = wt, y = mpg, group = factor_cyl, colour= factor_cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  geom_text(data = df_label,
            aes(x = -Inf, y = Inf, 
                label = Label, color = factor_cyl), 
          show.legend = FALSE, parse = TRUE, size = 3,vjust = 1, hjust = 0)+
  facet_wrap(~factor_cyl)

至少，它解决了由于 R 上的上标 2 导致的对齐问题。

- dc37

1

那个上标导致对齐错误的发现真是惊人。我在想是否可以在方程中添加一个空的上标来解决对齐问题，而不需要使用hjust？ - Mark Neal

1

有趣的想法 ;) 然而，现在我还不知道如何轻松地实现它。 - dc37

1

随你的便 ;) 我认为向 ggpubr 的开发者提出这个“问题”可能很有意思。也许他们知道一个更容易解决的方法。 - dc37

1

我明白了 ;) 感谢提供链接。我会记住并在有更多时间时考虑它的 ;) - dc37

1

我使用手动编辑方程式的方式修改了我的答案，以便在您的图表上绘制。这个工作流程与 https://stackoverflow.com/questions/61357383/label-ggplot-groups-using-equation-with-ggpmisc/61358526#61358526 中描述的类似。 - dc37

显示剩余7条评论

3

在这里，我使用ggpmisc。一次调用stat_poly_eq()来获得方程式（中上部），一次调用stat_fit_glance()来获得统计信息（p值和r2）。对于对齐的秘诀是使用yhat作为方程式左侧，因为帽子近似于文本高度，然后与r2的上标匹配 - 感谢Pedro Aphalo提供yhat，此处显示。

将它们作为一个字符串会更好，这意味着水平对齐不会成为问题，并且方便地在绘图空间中定位会更容易。我已经在ggpubr和ggpmisc中提出了问题。

我很乐意接受其他更好的答案！

library(ggpmisc)

df_mtcars <- mtcars %>% mutate(factor_cyl = as.factor(cyl))

my_formula <- "y~x"

ggplot(df_mtcars, aes(x = wt, y = mpg, group = factor_cyl, colour= factor_cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_poly_eq(formula = my_formula,
               label.x = "centre",
               eq.with.lhs = "italic(hat(y))~`=`~",
               aes(label = paste(..eq.label.., sep = "~~~")), 
               parse = TRUE)+
  stat_fit_glance(method = 'lm',
                  method.args = list(formula = my_formula),
                  #geom = 'text',
                  label.x = "right", #added to prevent overplotting
                  aes(label = paste("~italic(p) ==", round(..p.value.., digits = 3),
                                    "~italic(R)^2 ==", round(..r.squared.., digits = 2),
                                    sep = "~")),
                  parse=TRUE)+
  theme_minimal()

注意，facet功能也非常好用，您可以为facet和分组设置不同的变量，一切仍然正常工作。

注意：如果您在分组和分面中使用相同的变量，则在每个调用中添加label.y= Inf,将强制标签位于每个分面的顶部（感谢@dc37，在此问题的另一个答案中提供提示）。

- Mark Neal

1

另外，如果你（像我一样）讨厌无意义的小p值被显示出来，可以在调用stat_fit_glance()时使用

aes(label = paste("~italic(p) ==", ifelse(..p.value.. <0.001, " '<0.001' ", round(..p.value.., digits = 3)),                                     "~italic(R)^2 ==", round(..r.squared.., digits = 2),                                     sep = "~")),

。 - Mark Neal

1

非常好的答案 ;) 我会仔细看的 ;) - dc37

如果您想更直接地添加标签并删除图例，则此处的方法将会很有帮助。使用ggplot_build()还建议了一种可能的选项，即通过连接两个字符串并删除重复项来创建单个字符串。 - Mark Neal

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Pedro J. Aphalo · Accepted Answer

我已经更新了 'ggpmisc'，以使其更加简单。版本0.3.4现在正在前往CRAN，源代码包在线上，二进制文件将在几天内构建完成。

library(ggpmisc) # version >= 0.3.4 !!

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl)) +
  geom_smooth(method="lm")+
  geom_point()+
  stat_poly_eq(formula = y ~ x, 
               aes(label = paste(..eq.label.., ..rr.label.., ..p.value.label.., sep = "*`,`~")), 
               parse = TRUE,
               label.x.npc = "right",
               vstep = 0.05) # sets vertical spacing