按组分组的填充小提琴图

5

我正在尝试使用R(最好使用ggplot2)生成一种分组小提琴图的变体,类似于下面的图:

Grouped Violin Plot

以下是可重复的示例代码生成的:

# Load libraries #
library(tidyverse)

# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100), 
                 Y = rgamma(n = 200, shape = 2, rate = 2),
                 Z = rep(c("Za", "Zb"), rep = 100),
                 stringsAsFactors = FALSE)

# Grouped violin plot #
df %>% 
  ggplot(., aes(x = X, y = Y, fill = Z)) + 
    geom_violin(draw_quantiles = 0.5) + 
    scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))

我希望实现的效果是,中位数以上的密度和中位数以下的密度颜色不同,就像下面的图表一样:

Shaded Violin Plot

我使用以下代码为数据中的组合 X = X1Z = Za 生成了上述(单个)小提琴图:

## Shaded violin plot ##
# Calculate limits and median #
df.lim <- df %>% 
            filter(X == "X1", Z == "Za") %>% 
            summarise(Y_min = min(Y),
                      Y_qnt = quantile(Y, 0.5),
                      Y_max = max(Y))

# Calculate density, truncate at limits and assign shade category #
df.dens <- df %>% 
            filter(X == "X1", Z == "Za") %>% 
            do(data.frame(LOC  = density(.$Y)$x,
                          DENS = density(.$Y)$y)) %>%
            filter(LOC >= df.lim$Y_min, LOC <= df.lim$Y_max) %>% 
            mutate(COL = ifelse(LOC > df.lim$Y_qnt, "Empty", "Filled"))

# Find density values at limits #
df.lim.2 <- df.dens %>% 
              filter(LOC == min(LOC) | LOC == max(LOC))

# Produce shaded single violin plot #
df.dens %>% 
  ggplot(aes(x = LOC)) + 
    geom_area(aes(y =  DENS, alpha = COL), fill = "red") +
    geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
    geom_path(aes(y =  DENS)) +
    geom_path(aes(y = -DENS)) +
    geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
    coord_flip() + 
    scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))

如您在代码中所见,我正在使用density函数水平地构建小提琴图,然后翻转坐标轴。问题出现在当我尝试生成分组小提琴图时,因为用于显示组XZ的轴已经用于密度的“高度”,所以会出现问题。我尝试通过按组重复所有计算来达到相同的结果,但我卡在了最后一步:
## Shaded grouped violin plot ##
# Calculate limits and median by group #
df.lim <- df %>% 
            group_by(X, Z) %>% 
            summarise(Y_min = min(Y),
                      Y_qnt = quantile(Y, 0.5),
                      Y_max = max(Y))

# Calculate density, truncate at limits and assign shade category by group #
df.dens <- df %>% 
            group_by(X, Z) %>% 
            do(data.frame(LOC  = density(.$Y)$x,
                          DENS = density(.$Y)$y)) %>%
            left_join(., df.lim, by = c("X", "Z")) %>% 
            filter(LOC >= Y_min, LOC <= Y_max) %>% 
            mutate(COL = ifelse(LOC > Y_qnt, "Empty", "Filled"))

# Find density values at limits by group #
df.lim.2 <- df.dens %>%
              group_by(X, Z) %>% 
              filter(LOC == min(LOC) | LOC == max(LOC))

# Produce shaded grouped violin plot #
df.dens %>% 
  ggplot(aes(x = LOC, group = interaction(X, Z))) + 
    # The following two lines don't work when included #
    #geom_area(aes(y =  DENS, alpha = COL), fill = "red") +
    #geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
    geom_path(aes(y =  DENS)) +
    geom_path(aes(y = -DENS)) +
    geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
    coord_flip() + 
    scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))

运行上面的代码将为每个组绘制小提琴图的轮廓,每个小提琴图叠在其他小提琴图之上。但一旦我尝试包含geom_area线条,代码就会失败。
我的直觉告诉我,我需要以某种方式生成“阴影”小提琴图作为新的geom,然后可以在ggplot2图形的通用结构下使用,但我不知道如何做到这一点,因为我的编码技能还没有达到那个水平。任何帮助或指针,无论是沿着我的思路还是朝着不同的方向,都将不胜感激。谢谢您的时间。

1
我认为当小提琴图不在0附近时,geom_area()无法解决您的问题。最好用geom_polygon()替换它。我发现创建自己的几何对象和其他内容的最佳指南在这里:https://cran.r-project.org/web/packages/ggplot2/vignettes/extending-ggplot2.html。 - teunbrand
1个回答

2

创意

出于兴趣,我快速编写了一个半小提琴几何图形。它基本上是从GeomViolin中大量复制粘贴而来的,并且为了使其运行,我不得不访问一些内部的ggplot2函数,这些函数没有通过:::导出,这意味着如果ggplot团队决定更改他们的内部函数,此解决方案可能无法运行。

但是,这个解决方案有效,你可以指定上下两部分的alpha值。该geom假定您只提供一个分位数。该代码仅经过表面测试,但它可以让您了解如何完成这项工作。如上所述,它在很大程度上是从GeomViolin进行简单的复制和粘贴,其中我添加了一些代码,找出了哪些值低于和高于分位数,并将底层的GeomPolygon分成两部分,因为此函数仅使用单个alpha值。它与groupscoord_flip一样工作。


代码


Original Answer翻译成"最初的回答"
library(grid)

GeomHalfViolin <- ggproto("GeomHalfViolin", GeomViolin,
  draw_group = function (self, data, ..., draw_quantiles = NULL, 
                         alpha_upper = .5, alpha_lower = 1) {
    data <- transform(data, xminv = x - violinwidth * (x - xmin), 
        xmaxv = x + violinwidth * (xmax - x))
    newdata <- rbind(transform(data, x = xminv)[order(data$y), 
        ], transform(data, x = xmaxv)[order(data$y, decreasing = TRUE), 
        ])
    newdata <- rbind(newdata, newdata[1, ])
    if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
        stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 
            1))
        stopifnot(length(draw_quantiles) <= 1)
        ## need to add ggplot::: to access ggplot2 internal functions here and there
        quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
        ###------------------------------------------------
        ## find out where the quantile is supposed to be
        quantile_line <- unique(quantiles$y)
        ## which y values are below this quantile?
        ind <- newdata$y <= quantile_line
        ## set the alpha values accordingly
        newdata$alpha[!ind] <- alpha_upper
        newdata$alpha[ind] <- alpha_lower
        ###------------------------------------------------
        aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), 
            c("x", "y", "group")), drop = FALSE]
        aesthetics$alpha <- rep(1, nrow(quantiles))
        both <- cbind(quantiles, aesthetics)
        both <- both[!is.na(both$group), , drop = FALSE]
        quantile_grob <- if (nrow(both) == 0) {
            zeroGrob()
        }
        else {
            GeomPath$draw_panel(both, ...)
        }
        ###------------------------------------------------
        ## GeomPolygon uses a single alpha value by default
        ## Hence, split the violin in two parts
        ggplot2:::ggname("geom_half_violin",
                         grobTree(GeomPolygon$draw_panel(newdata[ind, ], ...),
                                  GeomPolygon$draw_panel(newdata[!ind, ], ...),
                                  quantile_grob))
        ###------------------------------------------------
    }
    else {
        ggplot2:::ggname("geom_half_violin", GeomPolygon$draw_panel(newdata, 
            ...))
    }
  } 
)

geom_half_violin <- function(mapping = NULL, data = NULL, stat = "ydensity",
                             position = "dodge", ..., draw_quantiles = NULL,
                             alpha_upper = .5, alpha_lower = 1, 
                             trim = TRUE, scale = "area", 
                             na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) {
    layer(data = data, mapping = mapping, stat = stat, geom = GeomHalfViolin, 
        position = position, show.legend = show.legend, inherit.aes = inherit.aes, 
        params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, 
                      alpha_lower = alpha_lower, alpha_upper = alpha_upper,
                      na.rm = na.rm, ...))

}


library(tidyverse)

# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100), 
                 Y = rgamma(n = 200, shape = 2, rate = 2),
                 Z = rep(c("Za", "Zb"), rep = 100),
                 stringsAsFactors = FALSE)

# Grouped violin plot #
df %>% 
  ggplot(., aes(x = X, y = Y, fill = Z)) + 
    geom_half_violin(draw_quantiles = 0.5, alpha_upper = .1) + 
    scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))
# no groups
df %>% filter(Z == "Za") %>% 
  ggplot(., aes(x = X, y = Y)) + 
    geom_half_violin(draw_quantiles = 0.5, alpha_upper = .1, fill = "red") + 
    scale_fill_manual(values = c("Za" = "red", "Zb" = "blue")) + 
    coord_flip()

图表

分组半小提琴图 翻转的半小提琴图


这正是我想要的!我进行了一些小修改,因为 alpha_loweralpha_upper 的定义方式相反了,但它似乎完美地工作了。一旦修改被批准,我就会接受这个答案。干得好! - Constantinos
真的很酷!太遗憾了,它依赖于未导出的函数。也许你可以添加它适用的 ggplot2 版本,这样即使 ggplot 内部发生变化,人们也可以在将来复制它。 - asachet
虽然这些函数是内部函数,但我认为它们不会很快改变。我猜一个人可以轻松地删除 ggplot2:::ggname(“只是”添加了一个名称到 grob),而工作函数 ggplot2:::create_quantile_segment 也被 geom_violin 使用。查看其源代码,主要依赖于基本的 R 函数,因此重新构建(去除 ggplot2 糖)并不太困难。记录一下,这是使用 ggplot 3.2.0 完成的。 - thothal

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接