使用facet_wrap生成百分比直方图

13

我试图将百分比直方图与facet_wrap结合起来,但百分比并不是基于组而计算的,而是基于所有数据。我希望每个直方图显示组内的分布,而不是相对于整个样本总体的分布。我知道可以使用multiplot绘制多个图形并将它们组合起来。

library(ggplot2)
library(scales)
library(dplyr)

set.seed(1)
df <- data.frame(age = runif(900, min = 10, max = 100),
                 group = rep(c("a", "b", "c", "d", "e", "f", "g", "h", "i"), 100))

tmp <- df %>%
  mutate(group = "ALL")

df <- rbind(df, tmp)

ggplot(df, aes(age)) + 
  geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) + 
  scale_y_continuous(labels = percent ) + 
  facet_wrap(~ group, ncol = 5) 

输出: 输出图

2个回答

14

请使用y = stat(density)(在ggplot2版本3.0.0之前,请使用y = ..density..)代替y = (..count..)/sum(..count..)

ggplot(df, aes(age, group = group)) + 
  geom_histogram(aes(y = stat(density) * 5), binwidth = 5) + 
  scale_y_continuous(labels = percent ) +
  facet_wrap(~ group, ncol = 5)

在此输入图像描述

在“计算变量”下,来自?geom_histogram

密度:每个区间内的点数密度,按比例缩放至总和为1

我们乘以5(区间宽度)是因为y轴表示密度(面积总和为1),而不是百分比(高度总和为1),请参见Hadley's comment(感谢@MariuszSiatka)。


添加clauswilke的解决方案,以保留y轴上的%(而不是密度)geom_histogram(aes(y = stat(width*density))) - Sweepy Dodo

3

虽然看起来facet_wrap没有在每个子集中运行特殊的geom_histogram百分比计算,但考虑单独构建一系列图形并将它们排列在一起。

具体而言,调用by以在group的子集中运行您的ggplots,然后调用gridExtra::grid.arrange()(实际包方法)以某种方式模仿facet_wrap

library(ggplot2)
library(scales)
library(gridExtra)

...

grp_plots <- by(df, df$group, function(sub){
  ggplot(sub, aes(age)) + 
    geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) + 
    scale_y_continuous(labels = percent ) + ggtitle(sub$group[[1]]) +
    theme(plot.title = element_text(hjust = 0.5))
})

grid.arrange(grobs = grp_plots, ncol=5)

绘图输出


然而,为了避免重复的y轴和x轴,请考虑在by调用中有条件地设置theme,假设您事先知道您的分组,并且它们的数量合理。

grp_plots <- by(df, df$group, function(sub){

  # BASE GRAPH
  p <- ggplot(sub, aes(age)) + 
    geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) + 
    scale_y_continuous(labels = percent ) + ggtitle(sub$group[[1]])

  # CONDITIONAL theme() CALLS
  if (sub$group[[1]] %in% c("a")) {
    p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.x = element_blank(), 
                  axis.text.x = element_blank(), axis.ticks.x = element_blank())
  }
  else if (sub$group[[1]] %in% c("f")) {
    p <- p + theme(plot.title = element_text(hjust = 0.5))
  }
  else if (sub$group[[1]] %in% c("b", "c", "d", "e")) {
    p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.y = element_blank(), 
                   axis.text.y = element_blank(), axis.ticks.y = element_blank(),
                   axis.title.x = element_blank(), axis.text.x = element_blank(), 
                   axis.ticks.x = element_blank())
  }
  else {
    p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.y = element_blank(), 
                   axis.text.y = element_blank(), axis.ticks.y = element_blank())
  }
  return(p)
})

grid.arrange(grobs=grp_plots, ncol=5)

Plot Output


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接