密度图下的面积不等于1

5
我正在尝试使用ggplot绘制概率密度图。问题在于曲线下的面积不等于一。请给予建议。
示例图表...以下是生成此图表的代码...Y轴看起来像是小型bin的计数,而不是落入该bin的概率。我在准备此图表时参考了此处的示例代码。
示例代码...其中大部分是数据...关键的代码位于底部...
library(ggplot2)
library(reshape)
library(plyr)
library(scales)

Date <- as.Date(
    c("1976-01-16", "1976-02-15", "1976-03-16", "1976-04-15", "1976-05-16",
      "1976-06-15", "1976-07-16", "1976-08-16", "1976-09-15", "1976-10-16",
      "1976-11-15", "1976-12-16", "1977-01-16", "1977-02-14", "1977-03-16",
      "1977-04-15", "1977-05-16", "1977-06-15", "1977-07-16", "1977-08-16",
      "1977-09-15", "1977-10-16", "1977-11-15", "1977-12-16", "1978-01-16",
      "1978-02-14", "1978-03-16", "1978-04-15", "1978-05-16", "1978-06-15",
      "1978-07-16", "1978-08-16", "1978-09-15", "1978-10-16", "1978-11-15",
      "1978-12-16", "1979-01-16", "1979-02-14", "1979-03-16", "1979-04-15",
      "1979-05-16", "1979-06-15", "1979-07-16", "1979-08-16", "1979-09-15",
      "1979-10-16", "1979-11-15", "1979-12-16", "1980-01-16", "1980-02-15",
      "1980-03-16", "1980-04-15", "1980-05-16", "1980-06-15", "1980-07-16",
      "1980-08-16", "1980-09-15", "1980-10-16", "1980-11-15", "1980-12-16",
      "1981-01-16", "1981-02-14", "1981-03-16", "1981-04-15", "1981-05-16",
      "1981-06-15", "1981-07-16", "1981-08-16", "1981-09-15", "1981-10-16",
      "1981-11-15", "1981-12-16", "1982-01-16", "1982-02-14", "1982-03-16",
      "1982-04-15", "1982-05-16", "1982-06-15", "1982-07-16", "1982-08-16",
      "1982-09-15", "1982-10-16", "1982-11-15", "1982-12-16", "1983-01-16",
      "1983-02-14", "1983-03-16", "1983-04-15", "1983-05-16", "1983-06-15",
      "1983-07-16", "1983-08-16", "1983-09-15", "1983-10-16", "1983-11-15",
      "1983-12-16", "1984-01-16", "1984-02-15", "1984-03-16", "1984-04-15",
      "1984-05-16", "1984-06-15", "1984-07-16", "1984-08-16", "1984-09-15",
      "1984-10-16", "1984-11-15", "1984-12-16", "1985-01-16", "1985-02-14",
      "1985-03-16", "1985-04-15", "1985-05-16", "1985-06-15", "1985-07-16",
      "1985-08-16", "1985-09-15", "1985-10-16", "1985-11-15", "1985-12-16"))

GOLD <- c(
  -0.104,  0.051,  0.011, -0.035, -0.008, -0.010, -0.065, -0.067,  0.041,  0.017,
   0.126,  0.023, -0.011,  0.029,  0.087,  0.007, -0.016, -0.044,  0.048, -0.013,
   0.030,  0.062, -0.029,  0.042,  0.078,  0.028,  0.031, -0.045,  0.005,  0.043,
   0.028,  0.090,  0.030,  0.072, -0.094,  0.009,  0.093,  0.080, -0.014, -0.013,
   0.077,  0.084,  0.058,  0.021,  0.184,  0.097,  0.002,  0.169,  0.474, -0.014,
  -0.168, -0.067, -0.007,  0.169,  0.071, -0.025,  0.077, -0.022, -0.059, -0.044,
  -0.063, -0.103, -0.003, -0.008, -0.031, -0.040, -0.113,  0.005,  0.081, -0.014,
  -0.057, -0.009, -0.062, -0.026, -0.117,  0.061, -0.046, -0.058,  0.080,  0.076,
   0.190, -0.031, -0.019,  0.074,  0.079,  0.022, -0.144,  0.030,  0.013, -0.057,
   0.026, -0.017, -0.012, -0.042, -0.030,  0.015, -0.043,  0.041,  0.022, -0.032,
  -0.011,  0.001, -0.083,  0.004, -0.019, -0.002,  0.003, -0.065, -0.063,  0.017,
  -0.044,  0.134, -0.022, -0.014, -0.008,  0.033, -0.014,  0.017, -0.004, -0.023)

df <- data.frame(Date=Date, GOLD=GOLD)

p <- ggplot(data=df, aes(x=GOLD, y=..density..)) +
    stat_density(fill='grey50') +
    xlab('Percent change on previous month') +
    ylab('Density') +
    opts(title='Change in Gold Price in the US')
ggsave(p, width=8, height=4, filename='plot.png', dpi=125)
1个回答

9

我认为这不是ggplot的问题,而是您对密度图中y轴的理解有误。 R中的基础绘图函数绘制相同的内容。 您可以设置调用y = ..scaled..来给出相对密度,但是如果使用stat_bin(),则会看到实际的直方图,并注意到它不是计数。 如果您想要,可以使用以下内容对数据进行归一化:

GOLD_N <- (GOLD- mean(GOLD))/sd(GOLD)
df <- data.frame(Date=Date, GOLD=GOLD,GOLD_N=GOLD_N)

然后运行你的图表,它应该看起来像这样:enter image description here 你应该观看这个关于如何解释密度函数的视频:http://www.youtube.com/watch?v=Fvi9A_tEmXQ 但是对你的数据进行归一化处理将给你一个更直观的图表,如果你习惯于盯着概率密度函数(PDF)看,它将总和为1。但不要误解y轴,y轴并不是从密度中随机抽取值等于x的概率。

2
我讨厌当你意识到自己有点傻的时候。当然,曲线下面积总和为一。我的(感知)问题在于,x轴基本上从-0.2到+0.2,因此y轴将上升超过一(甚至达到六个左右),以使曲线下面积总和为一。谢谢您的评论,使我看到了我的错误。非常感谢。 - Mark Graph

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接