在ggplot2中的百分比堆叠条形图中添加标签

5

我是新手,想要通过ggplot来为我的数据集制作可视化图表。以下是我的目前代码:

#create plot
plot <- ggplot(newDoto, aes(y = pid3lean, weight = weight, fill = factor(Q29_1String, levels = c("Strongly disagree","Somewhat disagree", "Neither agree nor disagree", "Somewhat agree", "Strongly agree")))) + geom_bar(position = "fill", width = .732) 
#fix colors
plot <- plot + scale_fill_manual(values = c("Strongly disagree" = "#7D0000", "Somewhat disagree" = "#D70000","Neither agree nor disagree" = "#C0BEB8", "Somewhat agree" = "#008DCA", "Strongly agree" = "#00405B")) 
#fix grid
plot <- plot + guides(fill=guide_legend(title="29")) + theme_bw() + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(panel.border = element_blank()) + theme(axis.ticks = element_blank()) + theme(axis.title.y=element_blank()) + theme(axis.title.x=element_blank()) + theme(axis.text.x=element_blank()) + theme(text=element_text(size=19,  family="serif")) + theme(axis.text.y = element_text(color="black")) + theme(legend.position = "top") + theme(legend.text=element_text(size=12)) 
#plot graph
plot

这将创建此条形图: enter image description here 目前,我遇到的问题是尝试在这些条形上添加百分比标签。我想添加显示每个部分百分比的文本,居中且为白色字体。
不幸的是,我一直在尝试添加geom_text,但它经常因为我没有x变量而导致错误,我不知道如何修复它,因为我使用fill的方式有点奇怪,与其他使用x和y变量的方式相比。考虑到填充是每种响应类型的百分比(不同的响应类型在级别中显示),我甚至不知道要添加什么x变量。
如果数据集很重要,我很乐意回答任何关于数据集的问题。以下是两个相关列的示例,看起来像什么(没有使用head,因为数据集中有很多变量)。基本上,它们显示了受访者属于哪个政党以及他们是否强烈同意、有点同意等。 data sample 以下是两个变量的dput输出的结果:
structure(list(pid3lean = structure(c("Democrats", "Democrats", 
"Democrats", "Democrats", "Independents", "Democrats", "Republicans", 
"Independents", "Republicans", "Democrats", "Democrats", "Independents", 
"Democrats", "Republicans", "Democrats", "Democrats", "Democrats", 
"Democrats", "Democrats", "Republicans"), label = "pid3lean", format.spss = "A13", display_width = 15L), 
    Q29_1String = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 5L, 4L, 
    1L, 1L, 2L, 5L, 1L, 5L, 1L, 1L, 1L, 5L, 1L, 3L), .Label = c("Strongly agree", 
    "Somewhat agree", "Neither agree nor disagree", "Somewhat disagree", 
    "Strongly disagree"), class = "factor")), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

你能否发布一些样本数据?请编辑问题并附上dput(newDoto)的输出结果。如果太大,请使用dput(head(newDoto, 20))的输出结果。 - Rui Barradas
请使用 dput 添加数据以重新创建问题。 - Vishal A.
@RuiBarradas 添加了一些示例数据 - 没有使用命令,因为我认为它会返回太多的变量(这是一个大数据集)。希望这可以帮助! - JeaniousSpelur
好的,但是图片不是发布数据的好方法,请尝试使用 dput(newDoto[1:20, c("pid3lean", "Q29_1String")] - Rui Barradas
@RuiBarradas 当我输入"dput(newDoto[1:20, c("pid3lean", "Q29_1String")])"时,出现了"Error: unexpected symbol in: "var""的错误提示。 - JeaniousSpelur
显示剩余3条评论
3个回答

7
为了将百分比放在条形图中间,请使用position_fill(vjust = 0.5)并在geom_text中计算比例。这些比例是总值的比例,而不是每个条形图的比例。
library(ggplot2)

colors <- c("#00405b", "#008dca", "#c0beb8", "#d70000", "#7d0000")
colors <- setNames(colors, levels(newDoto$Q29_1String))

ggplot(newDoto, aes(pid3lean, fill = Q29_1String)) +
  geom_bar(position = position_fill()) +
  geom_text(aes(label = paste0(..count../sum(..count..)*100, "%")),
            stat = "count",
            colour = "white",
            position = position_fill(vjust = 0.5)) +
  scale_fill_manual(values = colors) +
  coord_flip()

enter image description here


scales具有自动格式化百分比的函数。

ggplot(newDoto, aes(pid3lean, fill = Q29_1String)) +
  geom_bar(position = position_fill()) +
  geom_text(aes(label = scales::percent(..count../sum(..count..))),
            stat = "count",
            colour = "white",
            position = position_fill(vjust = 0.5)) +
  scale_fill_manual(values = colors) +
  coord_flip()

enter image description here


编辑

在下面的评论中,有人要求按照条形图比例进行计算,以下是仅使用基本R计算比例的解决方案。

tbl <- xtabs(~ pid3lean + Q29_1String, newDoto)
proptbl <- proportions(tbl, margin = "pid3lean")
proptbl <- as.data.frame(proptbl)
proptbl <- proptbl[proptbl$Freq != 0, ]

ggplot(proptbl, aes(pid3lean, Freq, fill = Q29_1String)) +
  geom_col(position = position_fill()) +
  geom_text(aes(label = scales::percent(Freq)),
            colour = "white",
            position = position_fill(vjust = 0.5)) +
  scale_fill_manual(values = colors) +
  coord_flip() +
  guides(fill = guide_legend(title = "29")) +
  theme_question_70539767()

enter image description here


要添加到图表的主题

这个主题TarJae's answer中定义的主题的副本,稍作修改。

theme_question_70539767 <- function(){
  theme_bw() %+replace%
    theme(panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(),
          panel.border = element_blank(),
          text = element_text(size = 19, family = "serif"),
          axis.ticks = element_blank(),
          axis.title.y = element_blank(),
          axis.title.x = element_blank(),
          axis.text.x = element_blank(),
          axis.text.y = element_text(color = "black"),
          legend.position = "top",
          legend.text = element_text(size = 10),
          legend.key.size = unit(1, "char")
    )
}

1
为什么不使用geom_col?:geom_col(...)实际上是geom_bar(stat =“identity”) - TarJae
1
@TarJae 即使没有 aes(y = .) 也可以吗?我会去检查一下的。无论如何,谢谢。 - Rui Barradas
@RuiBarradas 这段代码的问题在于它没有计算列内的百分比。我希望每一列加起来都是100%。 - JeaniousSpelur
@JeaniousSpelur 现在看到编辑了。 - Rui Barradas

1

这里有一种替代方法:

  1. 在数据框中进行统计(计算百分比并将类别更改为 Q29_1String 的因子)
  2. 使用 geom_col
  3. 然后使用 coord_flip
  4. 微调主题部分
library(tidyverse)

df %>% 
  group_by(pid3lean) %>% 
  count(Q29_1String) %>% 
  ungroup() %>% 
  mutate(pct = n/sum(n)) %>% 
  mutate(Q29_1String = as.factor(Q29_1String)) %>% 
  ggplot(aes(x = pid3lean, y = pct, fill = Q29_1String)) +
  geom_col(position = "fill", width = .732) +
  scale_fill_manual(values = c("Strongly disagree" = "#7D0000", "Somewhat disagree" = "#D70000","Neither agree nor disagree" = "#C0BEB8", "Somewhat agree" = "#008DCA", "Strongly agree" = "#00405B")) +
  coord_flip()+
  geom_text(aes(label = scales::percent(pct)), 
            position = position_fill(vjust = 0.5),size=5, color="white",
            ) + guides(fill=guide_legend(title="29")) + 
  theme_bw() + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.border = element_blank(), 
        axis.ticks = element_blank(), 
        axis.title.y=element_blank(), 
        axis.title.x=element_blank(), 
        axis.text.x=element_blank(), 
        text=element_text(size=19,  family="serif"), 
        axis.text.y = element_text(color="black"),
        legend.position = "top",
        legend.text=element_text(size=12)
        ) 

enter image description here


0

首先,您需要使用dplyr包来计算百分比:

library(dplyr)
newDoto <- newDoto %>% group_by(pid3lean) %>%
  count(Q29_1String) %>%
  mutate(perc = n/sum(n)) %>%
  select(-n)

使用您现有的代码,您只需在代码末尾添加以下行:

plot + 
  geom_text(stat = 'count', aes(label = perc), position = position_fill(vjust = 0.5), size = 3, color = "white")

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接