在分面上绘制子集均值而不是全局均值的Ggplot2图表

7

我希望能够使用ggplot获取子集(x轴和y轴)的分面子集均值。然而,我得到的是数据的平均值而不是子集的平均值。我不知道如何解决这个问题。

hsb2<-read.table("http://www.ats.ucla.edu/stat/data/hsb2.csv", sep=",", header=T)
head(hsb2)
hsb2$gender = as.factor(hsb2$female)

ggplot() +
  geom_point(aes(y = read,x = write,colour = gender),data=hsb2,size = 2.2,alpha = 0.9) +
  scale_colour_brewer(guide = guide_legend(),palette = 'Set1') +
  stat_smooth(aes(x = write,y = read),data=hsb2,colour = '#000000',size = 0.8,method = lm,formula = 'y ~ x') +
  geom_vline(aes(xintercept = mean(write)),data=hsb2,linetype = 3) +
  geom_hline(aes(yintercept = mean(read)),data=hsb2,linetype = 3) +
  facet_wrap(facets = ~gender)

enter image description here

1个回答

9

有一种方法是明确计算每个性别的均值(x和y),并将它们作为新列存储在原始数据框中。当 faceting 将其按性别分割时,线条会在您想要的位置绘制。

使用 tapply

#compute the read and write means for each gender 
read_means <- tapply(hsb2$read, hsb2$gender, mean)
write_means <- tapply(hsb2$write, hsb2$gender, mean)

#store it in the data frame
hsb2$read_mean <- ifelse(hsb2$gender==0, read_means[1], read_means[2])
hsb2$write_mean <- ifelse(hsb2$gender==0, write_means[1], write_means[2])

除了上面的方法,另一种选择是使用ddply。

使用Plyr包中的ddply

可以使用一行代码创建新列。

library(plyr)
ddply(hsb2, "gender", transform, 
      read_mean  = mean(read),
      write_mean = mean(write))

现在,将这两个新列的均值传递给ggplot中的vline和hline调用。
ggplot() +
  geom_point(aes(y = read,x = write,colour = gender),data=hsb2,size = 2.2,alpha = 0.9) +
  scale_colour_brewer(guide = guide_legend(),palette = 'Set1') +
  stat_smooth(aes(x = write,y = read),data=hsb2,colour = '#000000',
              size = 0.8,method = lm,formula = 'y ~ x') +
  geom_vline(aes(xintercept = write_mean),data=hsb2,linetype = 3) +
  geom_hline(aes(yintercept = read_mean),data=hsb2,linetype = 3) +
  facet_wrap(facets = ~gender)

生成: 这里输入图片描述

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接