Python- 在等高线内集成2D核密度估计

Question

Python- 在等高线内集成2D核密度估计

4

我希望绘制一个核密度估计的轮廓图，其中KDE被集成在每个轮廓图的填充区域中。

例如，假设我计算2D数据的KDE：

data = np.random.multivariate_normal((0, 0), [[1, 1], [2, 0.7]], 100)
x = data[:, 0]
y = data[:, 1]
xmin, xmax = min(x), max(x)
ymin, ymax = min(y), max(y)
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)

我知道如何绘制KDE的轮廓图。

fig = plt.figure()
ax = fig.gca()
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
cfset = ax.contourf(xx, yy, f, cmap='Blues')
cset = ax.contour(xx, yy, f, colors='k')
plt.show()

然而，这个等高线图显示了每个填充区域内的概率密度。相反，我希望该图表明落在每个填充区域内的总概率。

- Laura

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Paul Panzer · Accepted Answer

请注意，以下内容仅在您的轮廓是“单调”的情况下才正确，即在轮廓线内只能找到相应轮廓水平以上的像素值。还请注意，如果您的密度具有多个峰值，则将分别对应的区域合并在一起。

如果这是真实/可接受的，您的问题可以通过按值对像素进行排序来解决。

我不知道您的绘图程序选择其轮廓水平的启发式方法，但假设您已经将它们存储（按升序排列），例如在名为“levels”的变量中，您可以尝试类似以下的操作：

ff = f.ravel()
order = np.argsort(ff)
fsorted = ff[order]
F = np.cumsum(fsorted)
# depending on how your density is normalised next line may be superfluous
# also note that this is only correct for equal bins
# and, finally, to be unimpeachably rigorous, this disregards the probability
# mass outside the field of view, so it calculates probability condtional
# on being in the field of view
F /= F[-1]
boundaries = fsorted.searchsorted(levels)
new_levels = F[boundaries]

现在，为了让您能够使用此绘图程序，您的程序必须允许您自由选择等高线标签，或者至少可以选择放置等高线的级别。在后一种情况下，假设有一个 kwarg 'levels'。

# make a copy to avoid problems with in-place shuffling
# i.e. overwriting positions whose original values are still to be read out
F[order] = F.copy()
F.shape = f.shape
cset = ax.contour(xx, yy, F, levels=new_levels, colors='k')

我已经复制了以下内容，以便更加明显：

最后，如果想要在每个填充区域内获得真正的概率，这是一个有效的解决方法：cb = fig.colorbar(cfset, ax=ax) values = cb.values.copy() values[1:] -= values[:-1].copy() cb.set_ticklabels(values) - Laura