ggplot：直方图透明度作为stat(count)函数的功能

Question

ggplot：直方图透明度作为stat(count)函数的功能

3

我正在尝试制作一个缩放的直方图，以这样的方式，即每个“列”（bin？）的透明度取决于给定x范围内观察值的数量。以下是我的代码：

set.seed(1)
test = data.frame(x = rnorm(200, mean = 0, sd = 10),
                  y = as.factor(sample(c(0,1), replace=TRUE, size=100)))
threshold = 20 
ggplot(test,
       aes(x = x))+
  geom_histogram(aes(fill = y, alpha = stat(count) > threshold),
                 position = "fill", bins = 10)

基本上我想制作的图表应该看起来像这样：

然而我的代码生成的图表中透明度基于分组后的计数进行应用，导致出现悬挂列，如下所示：

例如在这个例子中，为了模拟一个“正确”的图表，我只是调整了阈值，但我需要 alpha 考虑给定“列”（bin）中两个组的计数总和。

更新: 我还希望它能够与分面图一起使用，这样每个分面中的突出显示区域都独立于其他分面。@Stefan 提出的方法对于单个图表非常完美，但在分面图中会在所有分面中突出显示相同的区域。

library(ggplot2)

set.seed(1)
test = data.frame(x = rnorm(1000, mean = 0, sd = 10),
                  y = as.factor(sample(c(0,1), replace=TRUE, size=1000)),
                  n = as.factor(sample(c(0,1,2), replace=TRUE, size=1000)),
                  m = as.factor(sample(c(0,1,3,4), replace=TRUE, size=1000)))
f = function(..count.., ..x..) tapply(..count.., factor(..x..), sum)[factor(..x..)]
threshold = 10 
ggplot(test,
       aes(x = x))+
  geom_histogram(aes(fill = y, alpha = f(..count.., ..x..) > threshold),
                 position = "fill", bins = 10)+
  facet_grid(rows = vars(n),
             cols = vars(m))

- Oleg

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- stefan · Accepted Answer

这可以通过以下方式实现：

由于由 stat_count 计算的 count 是分组后观测值的数量，因此我们必须手动聚合每个组的 count 以获取每个区间的总数。
为了按区间聚合计数，我使用 tapply，其中利用 .. 表示法获取由 stat_count 计算的变量。
作为分组变量，我使用计算出的变量 ..x..，据我所知，这并未得到记录。基本上，..x.. 默认包含区间的中点，因此可以用作区间的标识符。但是，由于这些是连续值，因此我们必须将它们转换为因子。

最后，为了使代码更易读，我使用辅助函数来计算聚合计数。此外，我将 threshold 值加倍至 20。

library(ggplot2)

set.seed(1)
test <- data.frame(
  x = rnorm(200, mean = 0, sd = 10),
  y = as.factor(sample(c(0, 1), replace = TRUE, size = 100))
)
threshold <- 20

f <- function(..count.., ..x..) tapply(..count.., factor(..x..), sum)[factor(..x..)]
p <- ggplot(
  test,
  aes(x = x)
) +
  geom_histogram(aes(fill = y, alpha = f(..count.., ..x..) > threshold),
    position = "fill", bins = 10
  )
p

编辑为了实现分面，我们需要将..PANEL..标识符作为额外的参数传递给函数。我现在使用dplyr::group_by和dplyr::add_count来计算每个分组和分面面板的总计数，而不是使用tapply。

library(ggplot2)
library(dplyr)

set.seed(1)
test <- data.frame(
  x = rnorm(200, mean = 0, sd = 10),
  y = as.factor(sample(c(0, 1), replace = TRUE, size = 100)),
  type = rep(c("A", "B"), each = 100)
)
threshold <- 20

f <- function(count, x, PANEL) {
  data.frame(count, x, PANEL) %>% 
    add_count(x, PANEL, wt = count) %>% 
    pull(n)
}
p <- ggplot(
  test,
  aes(x = x)
) +
  geom_histogram(aes(fill = y, alpha = f(..count.., ..x.., ..PANEL..) > threshold),
                 position = "fill", bins = 10
  ) +
  facet_wrap(~type)
p
#> Warning: Using alpha for a discrete variable is not advised.
#> Warning: Removed 2 rows containing missing values (geom_bar).