ggplot2:如何将图例按颜色拆分为自定义图例

3

我正在努力让绘图标签以特定的方式呈现。我正在使用ggplot2和tidyverse。

这是我的代码:

the ggplot legend shows the color of each cell type, with the shape and linetype of the Reagent below. The graph has different values for each reagent and cell type

我希望图例有两个标题(=名称),一个是细胞类型HCT,另一个是细胞类型RKO。然后对于HCT和RKO,我想要具有相应颜色、线型和形状的试剂的图例。因此,基本上,我想将颜色图例分成两个单独的图例。我只是无法理解如何编写代码。这是我想要的绘图图例(请想象橙色正方形被填充):

enter image description here

我需要改变我的geom_line和geom_point代码才能实现我想要的图例风格吗?还是有其他方法可以做到?我尝试搜索了一下,但没有找到任何相关内容(也许我只是没有使用正确的术语)。我已经尝试按照这里所做的步骤:如何在ggplot中合并颜色、线条样式和形状图例将颜色和形状的图例组合成一个图例,但无法使其工作。(换句话说,我尝试改变scale_shape_manual等来适应我的需求,但没有成功。我还尝试使用interaction())。
注意:我决定不使用facet_wrap,因为我想在同一张图上显示两种细胞类型。真实数据的图看起来有些不同,而且不会那么混乱。我已经成功地用ggpubr绘制了一个"facet_wrap"图。
注意2:我也没有使用stat_summary(),因为我需要取相同试剂浓度、试剂和细胞类型的平均值。根据我的数据,我找不到使stat_summary工作的方法。
这是我目前拥有的代码:
mean_mutated <- mutated %>% group_by(Reagent, Reagent.Conc, Cell.type) %>%
  summarise(Avg.Viable.Cells = mean(Mean.Viable.Cells.1, na.rm = TRUE))
mutated_0 = mutated %>% group_by(Reagent, Reagent.Conc, Cell.type) %>% filter(Reagent=="0") %>% 
  summarise(Avg.Viable.Cells = mean(Mean.Viable.Cells.1, na.rm = TRUE))
mutated_1 = mutated %>% group_by(Reagent, Reagent.Conc, Cell.type) %>% filter(Reagent=="1") %>% 
  summarise(Avg.Viable.Cells = mean(Mean.Viable.Cells.1, na.rm = TRUE))
mutated_2 = mutated %>% group_by(Reagent, Reagent.Conc, Cell.type) %>% filter(Reagent=="2") %>% 
  summarise(Avg.Viable.Cells = mean(Mean.Viable.Cells.1, na.rm = TRUE))

#linetype by reagent
ggplot() +  
  #the scatter plot per cell type -> that way I can color them the way I want to, I believe
  #the mean/average line plot 
  geom_point(mean_mutated, mapping= aes(x = as.factor(Reagent.Conc), y = Avg.Viable.Cells, shape=as.factor(Reagent), color=Cell.type)) +
  geom_line(mutated_1, mapping= aes(x = as.factor(Reagent.Conc),y = Avg.Viable.Cells, group=Cell.type, color=Cell.type, linetype = "1"))+
  geom_line(mutated_2, mapping= aes(x = as.factor(Reagent.Conc),y = Avg.Viable.Cells, group=Cell.type, color=Cell.type, linetype = "2"))+
  geom_line(mutated_0, mapping= aes(x = as.factor(Reagent.Conc),y = Avg.Viable.Cells, group=Cell.type, color=Cell.type, linetype = "0"))+
  
  #making the plot look prettier
  scale_colour_manual(values = c("#999999", "#E69F00")) +
  #scale_linetype_manual(values = c("solid", "dashed", "dotted")) + #for whatever reason, when I add this, the dash in the legend is removed...?
  labs(shape = "Reagent", linetype = "Reagent", color="Cell type")+
  scale_shape_manual(values=c(15,16,4), labels=c("0", "1", "2"))+
  #guides(shape = FALSE)+ #this removes the label that you don't want
  
  #Change the look of the plot and change the axes
  xlab("[Reagent] (nM/ml)")+ #change name of x-axis
  ylab("Relative viability")+ #change name of y-axis
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10))+ #adjust the y-axis so that it has more ticks
  expand_limits(y = 0)+
  theme_bw() + #this and the next line are to remove the background grid and make it look more publication-like
  theme(panel.border = element_blank(), panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))

以下是我通过dput(df[9:32, c(1,2,3,4,5)])命令生成的数据框“mutated”的快照:

    structure(list(Biological.Replicate = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), Reagent.Conc = c(10000, 2500, 625, 156.3, 39.1, 9.8, 
2.4, 0.6, 10000, 2500, 625, 156.3, 39.1, 9.8, 2.4, 0.6, 10000, 
2500, 625, 156.3, 39.1, 9.8, 2.4, 0.6), Reagent = c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L), Cell.type = c("HCT", "HCT", "HCT", "HCT", 
"HCT", "HCT", "HCT", "HCT", "HCT", "HCT", "HCT", "HCT", "HCT", 
"HCT", "HCT", "HCT", "RKO", "RKO", "RKO", "RKO", "RKO", "RKO", 
"RKO", "RKO"), Mean.Viable.Cells.1 = c(1.014923966, 1.022279854, 
1.00926559, 0.936979842, 0.935565248, 0.966403395, 1.00007073, 
0.978144524, 1.019673384, 0.991595836, 0.977270557, 1.007353643, 
1.111928183, 0.963518289, 0.993028364, 1.027409034, 1.055452733, 
0.953801253, 0.956577449, 0.792568337, 0.797052961, 0.755623576, 
0.838482346, 0.836773918)), row.names = 9:32, class = "data.frame")

注意3:尽管一个列名为“Mean.Viable.Cells.1”,但这不是我正在绘制的平均值,而是之前计算的技术重复测量值的平均值。我正在从mutated_0、mutated_1和mutated_2的生物学重复中取平均值进行绘图。

我建议您使用dput(head(df))命令来展示您的数据框,这样有助于其他用户重新创建数据框并使用您的代码。不要使用数据框图片。 - PesKchan
谢谢您的建议 @PesKchan!我添加了一小段数据,并确保它具有三种试剂和两种细胞类型。 - Fio
1
我希望你已经在下面得到了你的答案。 - PesKchan
1个回答

2
利用ggnewscale包,可以这样实现:
  1. 在操作数据集之前将Cell.TypeReagent转换为因子
  2. 不需要使用数据集mutate_0等。只需要一个汇总数据集,我通过Cell.type分割数据集以简化后续的代码。
  3. 为了获得所需的结果,分别为每个细胞类型绘制数据图。这就是为什么我按细胞类型分割数据的原因。
  4. 利用ggnewscale::new_scale添加第二个比例尺和图例来获取分离的图例,用于linetypeshape。此外,从美学中删除color并将其设置为参数。
  5. 至少对于您的数据片段,您必须在两个比例尺上都添加drop=FALSE以保留未使用的因子水平。
  6. 最后,为了减少代码重复,我使用一个帮助函数为每个Cell.type添加geoms和比例尺。
library(ggplot2)
library(dplyr)

mutated <- mutated %>% 
  mutate(Cell.type = factor(Cell.type, levels = c("HCT", "RKO")),
         Reagent = factor(Reagent, levels = c("0", "1", "2"))
  )

mean_mutated <- mutated %>%
  group_by(Reagent, Reagent.Conc, Cell.type) %>%
  summarise(Avg.Viable.Cells = mean(Mean.Viable.Cells.1, na.rm = TRUE)) %>% 
  split(.$Cell.type)
#> `summarise()` has grouped output by 'Reagent', 'Reagent.Conc'. You can override using the `.groups` argument.

layer_geom_scale <- function(cell_type, color) {
  list(
    geom_point(mean_mutated[[cell_type]], mapping = aes(shape = Reagent), color = color),
    geom_line(mean_mutated[[cell_type]], mapping = aes(group = Reagent, linetype = Reagent), color = color),
    scale_linetype_manual(name = cell_type, values = c("solid", "dashed", "dotted"), drop=FALSE),
    scale_shape_manual(name = cell_type, values = c(15, 16, 4), labels = c("0", "1", "2"), drop=FALSE) 
  )
}

# linetype by reagent
ggplot(mapping = aes(
  x = as.factor(Reagent.Conc),
  y = Avg.Viable.Cells
)) +
  layer_geom_scale("HCT", "#999999") +
  ggnewscale::new_scale("linetype") +
  ggnewscale::new_scale("shape") +
  layer_geom_scale("RKO", "#E69F00") +
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10), limits = c(0, NA)) +
  theme_bw() +
  theme(
    panel.border = element_blank(), panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")
  ) +
  labs(shape = "Reagent", 
       linetype = "Reagent", 
       color = "Cell type",
       x = "[Reagent] (nM/ml)",
       y = "Relative viability")


嗨Stefan,非常感谢您详细的解释和回答。如果我有关于您编写的代码的一些问题,希望问您是否可以,因为我正在尝试理解所有内容。layer_geom_scale函数很棒-每次调用它时,它会为相应的单元类型绘制散点图和线图,对吗?我不知道ggnewscale包,所以这非常有用。我阅读了它的文档并想要验证:在“层”之间放置new_scale的位置很重要,也就是说,如果我有第三个层,我需要再次调用2次new_scale,对吗? - Fio
另外,感谢您清理我的代码。我没有意识到每次都调用了我的x和y aes,所以我本可以将其放在ggplot()调用中。 - Fio
1
嗨@FioDa。欢迎你。没错,你说的我的辅助函数和ggnewscale是对的。如果你想绘制更多的细胞类型,你需要为每个额外的类别调用2次new_scale,并逐一添加图层。对于两到三个不同的类别,这样做还可以接受。但对于更多的类别...你可以通过例如purrr::reduce循环遍历类别,而不是通过复制和粘贴添加它们。最好的S. - stefan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接