如何在两条线之间填充不同的颜色?(原文:如何在y = 0(或任何其他值)上方和下方用不同的颜色填充geom_polygon?)

22

编辑 更新问题标题以反映此问题可以概括为“任意两行”,并且不一定需要在一条线中固定y。

考虑以下多边形绘图:

ggplot(df, aes(x=year,y=afw)) +
  geom_polygon() +
  scale_x_continuous("", expand=c(0,0), breaks=seq(1910,2010,10)) +
  theme_bw()

输入图像描述

然而,我想用两种不同的颜色填充它,例如红色对于大于0的黑色区域,蓝色对于小于0的黑色区域。不幸的是,使用fill=col不能填充正确的区域。

我尝试了以下代码(为了说明填充边界位置,我添加了geom_line):

ggplot(df, aes(x=year,y=afw)) +
  geom_line() +
  geom_polygon(aes(fill=col), alpha=0.5) +
  scale_x_continuous("", expand=c(0,0), breaks=seq(1910,2010,10)) +
  theme_bw()

这会导致:

enter image description here

正如您所看到的,它填充的比它应该填充的要多得多。我该怎么解决?

数据:

df <- structure(list(year = c(1901, 1901, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2013, 2013), afw = c(0, 0, -0.246246074793035, -2.39463317156723, -2.39785897801884, 0.840850699400514, -0.843020268341422, -3.02043962318013, -0.033342848986583, -2.04947188124465, -0.00431059092206709, 2.49568940907793, 1.96988295746503, 2.26665715101342, 0.986011989723095, 1.79568940907793, 2.06665715101342, -0.601084784470454, -3.21076220382529, 2.65052811875535, 0.46988295746503, -1.09140736511562, 0.0505281187553526, 1.41827005423922, -2.80108478447045, 0.611818441335997, -1.83011704253497, -0.30753639737368, -4.43011704253497, -0.897858978018841, 1.98601198972309, -0.965600913502712, 0.0795603768198685, 0.308592634884385, -5.33011704253497, 4.00214102198116, -0.594633171567228, 0.0698829574650297, -1.60753639737368, -2.81398801027691, -2.21398801027691, -2.4365686554382, 1.53439908649729, 1.06665715101342, -1.87205252640594, -0.688181558664002, 0.0569797316585783, -3.51398801027691, 0.979560376819868, 0.289237796174707, 1.24085069940051, -4.39140736511562, 1.13117328004567, -1.72689123608336, 2.20214102198116, 2.27310876391664, 1.46665715101342, 2.18278618327148, -0.23011704253497, 1.50536682843277, 1.17633457036826, -0.0785041393091639, -1.54947188124465, -3.85269768769626, -4.31398801027691, -0.80753639737368, 1.27956037681987, 1.2376248929489, 0.195689409077933, -3.38172994576078, -4.88172994576078, -0.675278332857551, 2.25375392520697, 0.0924636026263199, -0.446246074793035, 4.06988295746503, 0.350528118755352, -1.48172994576078, 1.81504424778761, -1.42689123608336, 2.22472166714245, 0.376334570368256, -3.88495575221239, 0.211818441335998, 0.586011989723094, 1.14407650585213, 2.55697973165858, 1.92794747359406, 1.20214102198116, 3.83439908649729, 1.64407650585213, 0.986011989723095, 0.753753925206965, 0.508592634884385, 1.911818441336, 2.11504424778761, -4.06560091350271, -2.58495575221239, 1.80859263488438, 1.37956037681987, 1.58923779617471, 1.88601198972309, -0.323665429631744, -0.291407365115615, 0.818270054239223, 0.0569797316585783, 0.795689409077933, 3.32472166714245, 0.595689409077933, -0.733342848986583, -0.955923494147874, -4.32689123608336, 3.29891521552955, 1.85697973165858, 2.74407650585213, 0, 0), col = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("B", "A"), class = "factor")), .Names = c("year", "afw", "col"), class = c("tbl_df", "data.frame"), row.names = c(NA, -117L))

注意:从数据中可以看到,1901年和2013年都有3行。我这样做是因为我想正确填充。尽管黑色填充是正确的,但似乎我无法通过颜色得到可行的解决方案。

原始数据集:

orig <- structure(list(year = c(1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013), afw = c(-0.246246074793035, -2.39463317156723, -2.39785897801884, 0.840850699400514, -0.843020268341422, -3.02043962318013, -0.033342848986583, -2.04947188124465, -0.00431059092206709, 2.49568940907793, 1.96988295746503, 2.26665715101342, 0.986011989723095, 1.79568940907793, 2.06665715101342, -0.601084784470454, -3.21076220382529, 2.65052811875535, 0.46988295746503, -1.09140736511562, 0.0505281187553526, 1.41827005423922, -2.80108478447045, 0.611818441335997, -1.83011704253497, -0.30753639737368, -4.43011704253497, -0.897858978018841, 1.98601198972309, -0.965600913502712, 0.0795603768198685, 0.308592634884385, -5.33011704253497, 4.00214102198116, -0.594633171567228, 0.0698829574650297, -1.60753639737368, -2.81398801027691, -2.21398801027691, -2.4365686554382, 1.53439908649729, 1.06665715101342, -1.87205252640594, -0.688181558664002, 0.0569797316585783, -3.51398801027691, 0.979560376819868, 0.289237796174707, 1.24085069940051, -4.39140736511562, 1.13117328004567, -1.72689123608336, 2.20214102198116, 2.27310876391664, 1.46665715101342, 2.18278618327148, -0.23011704253497, 1.50536682843277, 1.17633457036826, -0.0785041393091639, -1.54947188124465, -3.85269768769626, -4.31398801027691, -0.80753639737368, 1.27956037681987, 1.2376248929489, 0.195689409077933, -3.38172994576078, -4.88172994576078, -0.675278332857551, 2.25375392520697, 0.0924636026263199, -0.446246074793035, 4.06988295746503, 0.350528118755352, -1.48172994576078, 1.81504424778761, -1.42689123608336, 2.22472166714245, 0.376334570368256, -3.88495575221239, 0.211818441335998, 0.586011989723094, 1.14407650585213, 2.55697973165858, 1.92794747359406, 1.20214102198116, 3.83439908649729, 1.64407650585213, 0.986011989723095, 0.753753925206965, 0.508592634884385, 1.911818441336, 2.11504424778761, -4.06560091350271, -2.58495575221239, 1.80859263488438, 1.37956037681987, 1.58923779617471, 1.88601198972309, -0.323665429631744, -0.291407365115615, 0.818270054239223, 0.0569797316585783, 0.795689409077933, 3.32472166714245, 0.595689409077933, -0.733342848986583, -0.955923494147874, -4.32689123608336, 3.29891521552955, 1.85697973165858, 2.74407650585213)), .Names = c("year", "afw"), class = c("tbl_df", "data.frame"), row.names = c(NA, -113L))
4个回答

19
获取两个连续时间步的y值符号不同的索引。在这些点之间使用线性插值生成新的x值,其中y为零。
首先,为了更容易理解线性插值以及添加到原始数据中的点,我们先看一个较小的示例。
# original data
d <- data.frame(x = 1:6,
                y = c(-1, 2, 1, 2, -1, 1))

# coerce to data.table
library(data.table)
setDT(d)

# make sure data is ordered by x
setorder(d, x)

# add a grouping variable
# only to keep track of original and interpolated points in this example
d[ , g := "orig"]

# interpolation
d2 = d[ , {
  ix = .I[c(FALSE, abs(diff(sign(d$y))) == 2)]
  if(length(ix)){
    pred_x = sapply(ix, function(i) approx(x = y[c(i-1, i)], y = x[c(i-1, i)], xout = 0)$y)
    rbindlist(.(.SD, data.table(x = pred_x, y = 0, g = "new")))} else .SD
}]

d2   
#           x  y  grp
# 1  1.000000 -1 orig
# 2  2.000000  2 orig
# 3  3.000000  1 orig
# 4  4.000000  2 orig
# 5  5.000000 -1 orig
# 6  6.000000  1 orig
# 13 1.333333  0  new
# 11 4.666667  0  new
# 12 5.500000  0  new

使用不同颜色区分原始点和新点的绘图:
ggplot(data = d2, aes(x = x, y = y)) +
  geom_area(data = d2[y <= 0], fill = "red", alpha = 0.2) +
  geom_area(data = d2[y >= 0], fill = "blue", alpha = 0.2) +
  geom_point(aes(color = g), size = 4) +
  scale_color_manual(values = c("red", "black")) +
  theme_bw()

enter image description here


应用于原帖数据:

d = as.data.table(orig)
# setorder(d, year)

d2 = d[ , {
  ix = .I[c(FALSE, abs(diff(sign(d$afw))) == 2)]
  if(length(ix)){
    pred_yr = sapply(ix, function(i) approx(afw[c(i-1, i)], year[c(i-1, i)], xout = 0)$y)
    rbindlist(.(.SD, data.table(year = pred_yr, afw = 0)))} else .SD}]

ggplot(data = d2, aes(x = year, y = afw)) +
  geom_area(data = d2[afw <= 0], fill = "red") +
  geom_area(data = d2[afw >= 0], fill = "blue") +
  theme_bw()

enter image description here


作为@Jason Whythe的评论的回复,上述方法可以修改以考虑分组数据。内插是在每个组内进行的,并且绘图是按组进行的分面绘制:
# data grouped by 'id' 
d = data.table(
  id = rep(c("a", "b", "c"), c(6, 5, 4)),
  x = as.numeric(c(1:6, 1:5, 1:4)),
  y = c(-1, 2, 1, 2, -1, 1,
        0, -2, 0, -1, -2, 
        2, 1, -1, 1.5))

# again, this variable is just added for illustration 
d[ , g := "orig"]

d2 = d[ , {
  ix = .I[c(FALSE, abs(diff(sign(.SD$y))) == 2)]
  if(length(ix)){
    pred_x = sapply(ix, function(i) approx(x = d$y[c(i-1, i)], y = d$x[c(i-1, i)], xout = 0)$y)
    rbindlist(.(.SD, data.table(x = pred_x, y = 0, g = "new")))} else .SD
}, by = id]

ggplot(data = d2, aes(x = x, y = y)) +
  facet_wrap(~ id) +
  geom_area(data = d2[y <= 0], fill = "red", alpha = 0.2) +
  geom_area(data = d2[y >= 0], fill = "blue", alpha = 0.2) +
  geom_point(aes(color = g), size = 4) +
  scale_color_manual(values = c("red", "black")) +
  theme_bw()

enter image description here


如果您需要一种基于@kohske的答案(在此致谢)改编的替代base解决方案,请参见以前的编辑。


另请参见y != 0的基本绘图解决方案 - jay.sf
@Henrik,假设上面考虑的数据集“orig”只是一个特定单元(或位置等)的数据,在包含多个单位的数据集new.orig中。您对如何在facet_wrap情况下应用您的解决方案有什么想法吗?您会从group_by(new.orig,unit)开始,并对每个组应用修改后的rx函数吗? - Jason Whyte
1
@JasonWhyte 感谢您的反馈。我已经添加了一个考虑到分组数据的示例。请注意,我还更新了插值过程。如果有任何不清楚的地方,请告诉我。干杯。 - Henrik

18

所以这并不完美,我很想看看别人能想出什么...

"多个"彩色区域的原因是单个多边形由数据点界定,而数据点实际上并非为零。

要解决这个问题,我们可以使用approx()进行插值。 对于完美的解决方案,您需要确定线穿过零点的确切位置。

interp <- approx(orig$year, orig$afw, n=10000)

orig2 <- data.frame(year=interp$x, afw=interp$y)
orig2$col[orig2$afw >= 0] <- "pos"
orig2$col[orig2$afw < 0] <- "neg"

ggplot(orig2, aes(x=year, y=afw)) +
  geom_area(aes(fill=col)) +
  geom_line() +
  geom_hline(yintercept=0)

解决方案

然而,当你缩放时,你会发现这仍然存在问题:

放大图像


为了阐述我上面的陈述(并进一步说明原始“问题”),请考虑当你单独绘制每个原始的正负数据集时会发生什么:

p1 <- ggplot(subset(orig, col == "neg"), aes(x = year, y = afw)) +
  geom_area(aes(fill=col)) +
  scale_fill_manual(values = c("#FF3030", "#00CC66"))

p2 <- ggplot(subset(orig, col == "pos"), aes(x = year, y = afw)) +
  geom_area(aes(fill=col)) +
  scale_fill_manual(values = c("#00CC66", "#FF3030"))

library(gridExtra)

grid.arrange(p2, p1)

多个图形



当然,您可以通过使用不同类型的可视化来解决这个问题:

ggplot(data = orig, aes(x = year, y = afw)) +
  geom_bar(stat = "identity", aes(fill=col), colour = "white")

备用方案


我认为你对geom_bar()解决方案的最终评论非常明智。我正在研究定期时间点之间的差异,因此没有必要插值以找到x轴交叉点。此外,你的geom_bar()解决方案更容易实现。 - Jason Whyte

4
作为一种线图,这个“多边形”图实际上是在上方或下方填充的。因此,可以使用 ggh4x::stat_difference。优点在于代码简单,可以使用原始数据。
另一个选择是 {ggbraid} 包,但是(截至2023年2月),它不适用于 R >= 4.2.2 的 CRAN 版本,需要安装开发版本。
还有另一种选择是使用两个带状区域,但结果可能不太令人满意,需要进行相当复杂的交点计算(见下面的第三个选项)。 使用 ggh4x 包
orig <- structure(list(year = c(1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013), afw = c(-0.246246074793035, -2.39463317156723, -2.39785897801884, 0.840850699400514, -0.843020268341422, -3.02043962318013, -0.033342848986583, -2.04947188124465, -0.00431059092206709, 2.49568940907793, 1.96988295746503, 2.26665715101342, 0.986011989723095, 1.79568940907793, 2.06665715101342, -0.601084784470454, -3.21076220382529, 2.65052811875535, 0.46988295746503, -1.09140736511562, 0.0505281187553526, 1.41827005423922, -2.80108478447045, 0.611818441335997, -1.83011704253497, -0.30753639737368, -4.43011704253497, -0.897858978018841, 1.98601198972309, -0.965600913502712, 0.0795603768198685, 0.308592634884385, -5.33011704253497, 4.00214102198116, -0.594633171567228, 0.0698829574650297, -1.60753639737368, -2.81398801027691, -2.21398801027691, -2.4365686554382, 1.53439908649729, 1.06665715101342, -1.87205252640594, -0.688181558664002, 0.0569797316585783, -3.51398801027691, 0.979560376819868, 0.289237796174707, 1.24085069940051, -4.39140736511562, 1.13117328004567, -1.72689123608336, 2.20214102198116, 2.27310876391664, 1.46665715101342, 2.18278618327148, -0.23011704253497, 1.50536682843277, 1.17633457036826, -0.0785041393091639, -1.54947188124465, -3.85269768769626, -4.31398801027691, -0.80753639737368, 1.27956037681987, 1.2376248929489, 0.195689409077933, -3.38172994576078, -4.88172994576078, -0.675278332857551, 2.25375392520697, 0.0924636026263199, -0.446246074793035, 4.06988295746503, 0.350528118755352, -1.48172994576078, 1.81504424778761, -1.42689123608336, 2.22472166714245, 0.376334570368256, -3.88495575221239, 0.211818441335998, 0.586011989723094, 1.14407650585213, 2.55697973165858, 1.92794747359406, 1.20214102198116, 3.83439908649729, 1.64407650585213, 0.986011989723095, 0.753753925206965, 0.508592634884385, 1.911818441336, 2.11504424778761, -4.06560091350271, -2.58495575221239, 1.80859263488438, 1.37956037681987, 1.58923779617471, 1.88601198972309, -0.323665429631744, -0.291407365115615, 0.818270054239223, 0.0569797316585783, 0.795689409077933, 3.32472166714245, 0.595689409077933, -0.733342848986583, -0.955923494147874, -4.32689123608336, 3.29891521552955, 1.85697973165858, 2.74407650585213)), .Names = c("year", "afw"), class = c("tbl_df", "data.frame"), row.names = c(NA, -113L))

library(ggh4x)
#> Loading required package: ggplot2

ggplot(orig, aes(x = year)) +
  ## ymin can be set to any level of reference
  ggh4x::stat_difference(aes(ymin = 0, ymax = afw)) +
  geom_line(aes(y = afw)) +
  labs(fill = NULL)

或者使用ggbraid包

## as of Feb 2023, the current CRAN version does not work with R >= 4.2.2
# remotes::install_github("nsgrantham/ggbraid")

library(ggbraid)
library(ggplot2)
ggplot(orig, aes(x = year)) +
  geom_line(aes(y = afw)) +
  geom_braid(aes(ymin = 0, ymax = afw, fill = afw < 0)) 
#> `geom_braid()` using method = 'line'

或者,使用两个带子

问题在于填充与线条不完全匹配。如果您想走这条路,并且希望填充完全匹配,您需要按照用户Z.Lin建议的方式计算相交点 (在此答案中)

## using this more than once, thus I like to add this as a variable
my_lev <- 0

ggplot(data = orig, aes(x = year)) +
  geom_ribbon(aes(
    ymin = my_lev, ymax = ifelse(afw > my_lev, afw, my_lev),
  ), fill = "blue") +
  geom_ribbon(aes(
    ymax = my_lev, ymin = ifelse(afw > my_lev, my_lev, afw)
  ), fill = "red") +
geom_line(aes(y = afw))

2022年7月14日创建,使用reprex package(v2.0.1)


0
orig 

orig_1 = orig
orig_pos <- ifelse(orig_1$afw <= 0, 0, orig_1$afw) #positive when y >0

orig_2 = orig
orig_neg <- ifelse(orig2$afw > 0, 0, orig$afw) #negative when y<0


df <- cbind.data.frame(orig, orig_neg, orig_pos) # dataframe of orig_neg < y < orig_pos

ggplot(df)+
  geom_area(aes(year, orig_pos), fill = "blue") +
  geom_area(aes(year, orig_neg), fill = "red") +
  theme_bw()+
  scale_x_continuous("", expand=c(0,0), breaks=seq(1910,2010,10))

1
欢迎来到stackoverflow。请编辑您的答案并简要解释代码如何帮助解决问题。 - bad_coder

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接