从Seaborn regplot提取均值和置信区间

Question

从Seaborn regplot提取均值和置信区间

5

由于regplot在间隔中计算平均值并引导查找每个箱的置信区间，因此需要手动重新计算这些值以进行进一步研究似乎是一种浪费，因此：

问题: 如何访问regplot的计算平均值和置信区间?

示例: 此代码生成一个漂亮的图形，显示有置信区间的箱平均值：

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# just some random numbers to get started
fig, ax = plt.subplots()
x = np.random.uniform(-2, 2, 1000)
y = np.random.normal(x**2, np.abs(x) + 1)

# Manual binning to retain control
binwidth=4./10
x_bins=np.arange(-2+binwidth/2,2,binwidth)
sns.regplot(x=x, y=y, x_bins=x_bins, fit_reg=None)
plt.show()

结果： 展示带置信区间的分组数据的回归图

按照分组计算均值并不难，但置信区间是使用随机数计算的。能否让我访问与绘制相同的确切数字会很好，那么我该如何访问它们呢？我可能忽略了一些 get_*-方法。

- Rasmus Mackeprang

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jwalton · Accepted Answer

设置

按照您的MWE进行设置：

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Random numbers for plotting
x = np.random.uniform(-2, 2, 1000)
y = np.random.normal(x**2, np.abs(x) + 1)

# Manual binning to retain control
binwidth = 4 / 10
x_bins = np.arange(binwidth/2 - 2, 2, binwidth)
sns.regplot(x=x, y=y, x_bins=x_bins, fit_reg=None)

这是我们的起点：

提取置信区间：

我们可以通过循环绘制线条并提取最小值和最大值（分别对应上限和下限CI）来提取置信区间：

ax = plt.gca()
lower = [line.get_ydata().min() for line in ax.lines]
upper = [line.get_ydata().max() for line in ax.lines]

作为一种合理性检查，我们可以在原始数据上绘制这些提取的点（由红色十字显示）：

plt.scatter(x_bins, lower, marker='x', color='C3', zorder=3)
plt.scatter(x_bins, upper, marker='x', color='C3', zorder=3)

提取平均值

可以从 ax.collections 中提取平均值的值：

means = ax.collections[0].get_offsets()[:, 1]

作为一种理智检查，我们可以将提取的数值叠加在原始图形上：

plt.scatter(x_bins, means, color='C1', marker='x', zorder=3)