ggplot热图格线格式化:geom_tile和geom_rect

6

我一直在创建热力图,但几天来一直无法得到网格线的最终格式。请参见下面的代码和附加的图表。我正在尝试使用geom_tile()对齐热力图块的网格线,以便每个块填充网格的内部以盒状方式。我能够使用geom_raster()对齐网格线,但是y轴标签刻度要么位于瓷砖的顶部,要么位于底部,而我需要它在中心处打勾(见红色高亮),此外,我无法让geom_raster在瓷砖周围包裹白色线条边框,因此原始数据集中的色块看起来有些杂乱。非常感谢任何关于格式化代码的帮助。谢谢!

#The data set in long format 


y<- c("A","A","A","A","B","B","B","B","B","C","C","C","D","D","D")
    x<- c("2020-03-01","2020-03-15","2020-03-18","2020-03-18","2020-03-01","2020-03-01","2020-03-01","2020-03-01","2020-03-05","2020-03-06","2020-03-05","2020-03-05","2020-03-20","2020-03-20","2020-03-21")
    v<-data.frame(y,x)

#approach 1 using geom_tile but gridline does not align with borders of the tiles 
    v%>%
      count(y,x,drop=FALSE)%>%
      arrange(n)%>%
      ggplot(aes(x=x,y=fct_reorder(y,n,sum)))+
      geom_tile(aes(fill=n),color="white", size=0.25)

需要在网格线上对齐瓷砖边框

我尝试过运行来自另一篇帖子类似的代码,但无法正确运行。我认为这是因为我的x变量是y变量的计数变量,因此无法格式化为因子变量以指定geom_rect()中的xmin和xmax。

#approach 2 using geom_raster but y-axis label can't tick at the center of tiles and there's no border around the tile to differentiate between tiles. 

v%>%
  count(y,x,drop=FALSE)%>%
  arrange(n)%>%
  ggplot()+
  geom_raster(aes(x=x,y=fct_reorder(y,n,sum),fill=n),hjust=0,vjust=0)

need y axis label to tick at center of tiles and need border around the tiles

2个回答

3

我认为保留刻度和网格线的位置是有意义的。要实现你想要的效果,我建议扩展数据包含所有可能的组合,并将na.value设置为中性填充颜色:

# all possible combinations
all <- v %>% expand(y, x)

# join with all, n will be NA for obs. in all that are not present in v
v = v %>% group_by_at(vars(y, x)) %>% 
    summarize(n = n()) %>% right_join(all)

ggplot(data = v, 
       aes(x=x, y=fct_reorder(y,n, function(x) sum(x, na.rm = T))))+ # note that you must account for the NA values now 
geom_tile(aes(fill=n), color="white",
        size=0.25) +
scale_fill_continuous(na.value = 'grey90') +
scale_x_discrete(expand = c(0,0)) +
scale_y_discrete(expand = c(0,0))

1
感谢PinotTiger。这些代码在示例数据集(v)中运行良好。然而,当这些代码在我的实际数据集中运行时,y轴网格线仍然穿过刻度线的瓷砖中心,这不是我想要的绘图效果。 - DHR
很难在没有实际数据集的情况下确定问题所在... - PRZ

2
这有点取巧。我的方法是将分类变量转换为数字,这会在绘图中添加与瓷砖对齐的次要网格线。为了去掉主要的网格线,我只需使用 theme()。缺点是必须手动设置断点和标签。
library(ggplot2)
library(dplyr)
library(forcats)

v1 <- v %>%
  count(y,x,drop=FALSE)%>%
  arrange(n) %>%
  mutate(y = fct_reorder(y, n, sum),
         y1 = as.integer(y),
         x = factor(x),
         x1 = as.integer(x))

labels_y <- levels(v1$y)
breaks_y <- seq_along(labels_y)

labels_x <- levels(v1$x)
breaks_x <- seq_along(labels_x)

ggplot(v1, aes(x=x1, y=y1))+
  geom_tile(aes(fill=n), color="white", size=0.25) + 
  scale_y_continuous(breaks = breaks_y, labels = labels_y) +
  scale_x_continuous(breaks = breaks_x, labels = labels_x) +
  theme(panel.grid.major = element_blank())

reprex包 (v0.3.0)于2020年5月23日创建

编辑:检查了长变量名称

y<- c("John Doe","John Doe","John Doe","John Doe","Mary Jane","Mary Jane","Mary Jane","Mary Jane","Mary Jane","C","C","C","D","D","D")
x<- c("2020-03-01","2020-03-15","2020-03-18","2020-03-18","2020-03-01","2020-03-01","2020-03-01","2020-03-01","2020-03-05","2020-03-06","2020-03-05","2020-03-05","2020-03-20","2020-03-20","2020-03-21")
v<-data.frame(y,x)

创建于2020年5月23日,使用reprex软件包(v0.3.0)


谢谢,Stefan。然而,在实际数据集中,我的y变量具有更长的字符标签(例如“Mary Jane”,“John Doe”等),因此无法转换为整数,并显示为y1中的NA。在这种情况下,您有什么建议?对不起,我应该考虑到这一点并创建样本数据集。 - DHR
嗨@DHR。你在将数据转换为数值之前是否已将其转换为因子?看一下我的编辑内容。我将A改成了John Doe,B改成了Mary Jane。我的代码仍适用于这种情况。 - stefan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接