如何在R中将条形直方图转换为折线直方图

Question

如何在R中将条形直方图转换为折线直方图

rhistogram

5

我看过许多密度图的例子，但是这些密度图的y轴是概率。我想要的是一种线状图（类似于密度图），但是y轴应该包含计数（类似于直方图）。

我可以在Excel中手动制作区间和频率，制作条形直方图，然后将图表类型更改为线性，但是在R中找不到类似的东西。

我已经查看了base和ggplot2，但似乎找不到答案。我知道直方图应该是条形图，但我认为将它们表示为连续的线条更具视觉效果。

- asangoi

我不确定您的术语是否正确。对我来说，线状直方图应该是 plot(..., type = "h") 这样的东西。也就是说，这是一种具有垂直线而非条形的直方图。根据您的问题，我推测您想要一个在 y 轴上表示计数的密度图。 - Richie Cotton

是的，你说得对。在y轴上显示计数的密度图。 - asangoi

5个回答

4

这是一个老问题，但我认为发表一种特别针对你的问题的解决方案可能会有所帮助。

在ggplot2中，您可以绘制直方图并使用条形显示计数：

ggplot(data) +  
geom_histogram()

您还可以绘制直方图，并使用频率多边形显示计数：

ggplot(data) + 
geom_freqpoly()

更多信息请参考 -- ggplot2参考文献

- M. Olaru

0

为了适应?stat_density帮助页面上的示例：

m <- ggplot(movies, aes(x = rating))
# Standard density plot.
m + geom_density()
# Density plot with y-axis scaled to counts.
m + geom_density(aes(y = ..count..))

- Richie Cotton

太好了。看来我错过了查看?stat_density页面。 - asangoi

0

虽然这是旧的，但我认为以下内容可能会有用。假设您有一个包含10,000个数据点的数据集，并且您相信它们属于某个分布，并且您想在实际数据的直方图上绘制理想分布的概率密度线。

noise <- 2
#
# the noise is tagged onto the end using runif
# just do demo issues w/real data and fitting
# the subtraction causes the data to have some
# negative values, which must be addressed in 
# the fit later on
#
noisylognorm <- rlnorm(10000, 
                        mean = 0.25, 
                        sd = 1) + 
                        (noise * runif(10000) - noise / 10)
#
# using package fitdistrplus
#
# subset is used to remove the negative values
# as the lognormal distribution needs positive only
#
fitlnorm <- fitdist(subset(noisylognorm, 
                           noisylognorm > 0),
                           "lnorm")
fitlnorm_density <- density(rlnorm(10000, 
                                   mean = fitlnorm$estimate[1],
                                   sd = fitlnorm$estimate[2]))
hist(subset(noisylognorm, 
            noisylognorm < 25),
     breaks = seq(-1, 25, 0.5),
     col = "lightblue",
     xlim = c(0, 25),
     xlab = "value",
     ylab = "frequency",
     main = paste0("Log Normal Distribution\n",
                   "noise = ", noise))

lines(fitlnorm_density$x, 
      10000 * fitlnorm_density$y * 0.5,
      type="l",
      col = "red")

请注意lines函数中的* 0.5。据我所知，这是必要的，以考虑hist()条形图的宽度。

- eafpres

0

有一种非常简单快捷的方法来计算数据。

首先，让我们生成一些虚拟计数数据：

my.count.data = rpois(n = 10000, lambda = 3)

然后绘图命令（假设您已调用library(magrittr)）：

my.count.data %>% table %>% plot

- DorinPopescu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- CnrL · Accepted Answer

8

使用默认的R图形（即不安装ggplot）可以执行以下操作，这也可能使密度函数的作用更加清晰：

# Generate some data
data=rnorm(1000)
# Get the density estimate
dens=density(data)
# Plot y-values scaled by number of observations against x values
plot(dens$x,length(data)*dens$y,type="l",xlab="Value",ylab="Count estimate")

- CnrL

1

谢谢。它确实帮助我弄清楚了密度函数的作用。 - asangoi

1

@asangoi - 我想你已经开始对各个箱子的density$y值进行求和了。一个更简单的方法是 hist_list <- hist(data); plot(hist_list$mids, hist_list$counts, type = "b")。此外，如果您使用 plot(hist_list$breaks, c(hist_list$counts, 0), type = "s")，您（有点）可以获得直方图的轮廓。如果需要，hist(data, breaks = ...)允许您指定自己的箱子。请参阅?hist以了解其工作原理。 - Dale

使用<-而不是=会更好，不是吗？ - quesadagranja

人们经常这样做，我认为是为了避免赋值（“=”）和比较（“==”）之间的歧义，但我认为这很可怕，只是把问题转移到其他地方。假设a = 1。假设我们想问自己一个问题：“a是否小于-1？”：a < -1 对我来说看起来太像a <- 1了。此外，为什么要使用两个符号：“<”和“-”，而不是一个符号：“=”。赋值是所有编程中非常常见的模式。 - CnrL