使用ggplot制作不同宽度的堆积条形图

7
我尝试制作一个变宽度的堆叠条形图,其中宽度表示分配平均值的数量,而高度表示分配的数量。
以下是我可以复现的数据:
procedure = c("method1","method2", "method3", "method4","method1","method2", "method3", "method4","method1","method2", "method3","method4")
sector =c("construction","construction","construction","construction","delivery","delivery","delivery","delivery","service","service","service","service") 
number = c(100,20,10,80,75,80,50,20,20,25,10,4)
amount_mean = c(1,1.2,0.2,0.5,1.3,0.8,1.5,1,0.8,0.6,0.2,0.9) 

data0 = data.frame(procedure, sector, number, amount_mean)

当使用geom_bar并在aes中包含宽度时,会出现以下错误消息:

position_stack requires non-overlapping x intervals. Furthermore, the bars are no longer stacked. 
bar<-ggplot(data=data0,aes(x=sector,y=number,fill=procedure, width = amount_mean)) + 
geom_bar(stat="identity") 

我也看了mekko包,但似乎只适用于条形图。
以下是我最终想要的内容(不基于上面的数据): desired Outcome (not based on above data) 有什么办法可以解决我的问题吗?

2
如果一切都失败了,那么可以预先计算所有内容并使用geom_rect - Richard Telford
你想如何排序堆叠呢?我现在使用geom_tile来做,它可以让你设置一个矩形的中心点,然后设置它的宽度和高度,但是看起来有些奇怪。 - camille
1个回答

7
我尝试了同样的方法,使用geom_col(),但是我遇到了同样的问题 - 当使用position = "stack"时,似乎我们无法在不取消堆叠的情况下指定width参数。
但事实证明,解决方案非常简单 - 我们可以使用geom_rect()手工构建这种图形。
以下是您的数据:
df <- data.frame(
  procedure   = rep(paste("method", 1:4), times = 3),
  sector      = rep(c("construction", "delivery", "service"), each = 4),
  amount      = c(100, 20, 10, 80, 75, 80, 50, 20, 20, 25, 10, 4),
  amount_mean = c(1, 1.2, 0.2, 0.5, 1.3, 0.8, 1.5, 1, 0.8, 0.6, 0.2, 0.9)
)

首先,我已经转换了您的数据集:

df <- df |>
  mutate(
      amount_mean = amount_mean / max(amount_mean),
      sector_num  = as.numeric(sector)
  ) |>
  arrange(desc(amount_mean)) |>
  group_by(sector) |>
  mutate(
    xmin = sector_num - amount_mean / 2,
    xmax = sector_num + amount_mean / 2,
    ymin = cumsum(lag(amount, default = 0)), 
    ymax = cumsum(amount)
  ) |>
  ungroup()

我的工作:

  1. 我缩小了amount_mean的比例,使得0 >= amount_mean <= 1(更适合绘图,因为我们没有其他刻度来显示amount_mean的实际值);
  2. 我还将sector变量解码为数字形式(用于绘图,见下文);
  3. 我按amount_mean的降序排列数据集(越重的在底部,越轻的在顶部);
  4. 按部门分组后,我计算了xminxmax以表示amount_mean,以及yminymax以表示金额。前两个有点棘手。 ymax很明显——你只需要从第一个开始累加所有的amount。你也需要累加来计算ymin,但是要从0开始。因此,第一个矩形的ymin = 0,第二个矩形的ymin = previouse triangle的ymax等等。所有这些都在每个单独的sector组中执行。

绘制数据:

df |>
  ggplot(aes(xmin = xmin, xmax = xmax,
             ymin = ymin, ymax = ymax, 
             fill = procedure
             )
         ) +
  geom_rect() +
  scale_x_continuous(breaks = df$sector_num, labels = df$sector) +
  #ggthemes::theme_tufte() +
  theme_bw() +
  labs(title = "Question 51136471", x = "Sector", y = "Amount") +
  theme(
    axis.ticks.x = element_blank()
  )

结果:

pyramid_plot

另一种选项是防止procedure变量被重新排序。这样所有的“红色”都在下面,“绿色”在上面等等。但它看起来很丑陋:

df <- df |>
  mutate(
      amount_mean = amount_mean / max(amount_mean),
      sector_num = as.numeric(sector)
  ) |>
  arrange(procedure, desc(amount), desc(amount_mean)) |>
  group_by(sector) |>
  mutate(
    xmin = sector_num - amount_mean / 2,
    xmax = sector_num + amount_mean / 2,
    ymin = cumsum(lag(amount, default = 0)), 
    ymax = cumsum(amount)
  ) |>
  ungroup()

pyramid_plot_ugly


1
非常感谢您的逐步解释。amount_mean的重新缩放为设计提供了极大的灵活性。完美。 - an_ja
我不知道为什么,但是按组计算ymin和ymax的结果不正确。我总是得到所有值的累加和。可能出了什么问题? - an_ja
也许使用 dplyr::mutate(...) 可以帮助解决问题。 - utubun
要将扇区变量解码为数字,您需要使用 factor。这是 sector_num = as.numeric(factor(sector)) - YBS

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接