ggplot2:带有点和填充分离的箱线图

10

我有一些数据,可以通过两个分隔符进行划分。一个是年份,另一个是字段特征。

box<-as.data.frame(1:36)

box$year <- c(1996,1996,1996,1996,1996,1996,1996,1996,1996,
              1997,1997,1997,1997,1997,1997,1997,1997,1997,
              1996,1996,1996,1996,1996,1996,1996,1996,1996,
              1997,1997,1997,1997,1997,1997,1997,1997,1997)
box$year <- as.character(box$year)

box$case <- c(6.40,6.75,6.11,6.33,5.50,5.40,5.83,4.57,5.80,
              6.00,6.11,6.40,7.00,NA,5.44,6.00,  NA,6.00,
              6.00,6.20,6.40,6.64,6.33,6.60,7.14,6.89,7.10,
              6.73,6.27,6.64,6.41,6.42,6.17,6.05,5.89,5.82)

box$code <- c("L","L","L","L","L","L","L","L","L","L","L","L",
              "L","L","L","L","L","L","M","M","M","M","M","M",
              "M","M","M","M","M","M","M","M","M","M","M","M")

colour <- factor(box$code, labels = c("#F8766D", "#00BFC4"))
在箱线图中,我想显示出点来了解数据的分布情况。这可以通过每年一个单独的箱线图轻松完成:

在箱线图中,我希望显示出点以了解数据的分布情况。这可以通过为每一年绘制一个单独的箱线图并在其上方显示点来轻松完成:

ggplot(box, aes(x = year, y = case, fill = "#F8766D")) +
  geom_boxplot(alpha = 0.80) +
  geom_point(colour = colour, size = 5) +
  theme(text = element_text(size = 18),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.x = element_blank(),
        legend.position = "none")

在此输入图片描述

但当我给它们添加填充参数时,情况变得更加复杂:

ggplot(box, aes(x = year, y = case, fill = code)) +
  geom_boxplot(alpha = 0.80) +
  geom_point(colour = colour, size = 5) +
  theme(text = element_text(size = 18),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.x = element_blank(),
        legend.position = "none")

enter image description here

现在的问题是:如何将这些点移动到它们所属的箱线图坐标轴上,蓝色点移到蓝色箱线图上,红色点移到红色箱线图上。


geom_dotplot 使用 binaxis = 'y' 也可能会引起兴趣。 - bouncyball
2个回答

21

像 Henrik 所说,使用 position_jitterdodge()shape = 21。你还可以简化你的代码:

  1. 无需定义一个盒子,然后逐个填充它。
  2. 如果愿意,你可以让 ggplot 自动处理颜色,并跳过构建颜色因子的步骤。如果你想要改变默认值,请查看 scale_fill_manualscale_color_manual

box <- data.frame(year = c(1996,1996,1996,1996,1996,1996,1996,1996,1996,
                  1997,1997,1997,1997,1997,1997,1997,1997,1997,
                  1996,1996,1996,1996,1996,1996,1996,1996,1996,
                  1997,1997,1997,1997,1997,1997,1997,1997,1997),
                  case  = c(6.40,6.75,6.11,6.33,5.50,5.40,5.83,4.57,5.80,
                  6.00,6.11,6.40,7.00,NA,5.44,6.00,  NA,6.00,
                  6.00,6.20,6.40,6.64,6.33,6.60,7.14,6.89,7.10,
                  6.73,6.27,6.64,6.41,6.42,6.17,6.05,5.89,5.82),
                  code = c("L","L","L","L","L","L","L","L","L","L","L","L",
                  "L","L","L","L","L","L","M","M","M","M","M","M",
                  "M","M","M","M","M","M","M","M","M","M","M","M"))

ggplot(box, aes(x = factor(year), y = case, fill = code)) +
  geom_boxplot(alpha = 0.80) +
  geom_point(aes(fill = code), size = 5, shape = 21, position = position_jitterdodge()) +
  theme(text = element_text(size = 18),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.x = element_blank(),
        legend.position = "none")

这里输入图片描述


1997年最低圆圈左侧的黑点是什么? - giordano
来自geom_boxlot的一个点/潜在异常值 - Jake Kaupp
如果添加一个组来更改点的形状,似乎它不再起作用了... - Shixiang Wang

6

我看到您已经接受了@JakeKaupp的不错答案,但是我想推荐一种不同的选择,使用geom_dotplot。您正在展示的数据相当少,因此为什么不放弃箱线图呢?

ggplot(box, aes(x = factor(year), y = case, fill = code))+
    geom_dotplot(binaxis = 'y', stackdir = 'center',
                 position = position_dodge())

enter image description here


谢谢!但是我的数据集很大,所以我必须使用箱线图。这个问题的数据集只是其中的一小部分。 - Kryštof Chytrý

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接