如何从 Pandas 列在分组条形图上添加误差线

Question

如何从 Pandas 列在分组条形图上添加误差线

3

我有一个名为df的pandas数据框，它有四列：Candidate、Sample_Set、Values和Error。 Candidate列有三个唯一的条目：[X, Y, Z]，我们也有三个样本集，因此Sample_Set也有三个唯一的值：[1,2,3]。 df大致如下。

import pandas as pd

data = {'Candidate': ['X', 'Y', 'Z', 'X', 'Y', 'Z', 'X', 'Y', 'Z'],
        'Sample_Set': [1, 1, 1, 2, 2, 2, 3, 3, 3],
        'Values': [20, 10, 10, 200, 101, 99, 1999, 998, 1003],
        'Error': [5, 2, 3, 30, 30, 30, 10, 10, 10]}
df = pd.DataFrame(data)

# display(df)
  Candidate  Sample_Set  Values  Error
0         X           1      20      5
1         Y           1      10      2
2         Z           1      10      3
3         X           2     200     30
4         Y           2     101     30
5         Z           2      99     30
6         X           3    1999     10
7         Y           3     998     10
8         Z           3    1003     10

我正在使用seaborn创建一个分组条形图，其中x="Candidate"，y="Values"，hue="Sample_Set"。一切都很好，直到我尝试在y轴上使用列名为Error下的值添加误差线。我正在使用以下代码。

import seaborn as sns

ax = sns.factorplot(x="Candidate", y="Values", hue="Sample_Set", data=df,
                    size=8, kind="bar")

如何将错误纳入其中？

我希望能得到解决方案或更优雅的方法来完成这个任务。

- EFL

seaborn 通常是 matplotlib 的扩展，因此无论您在 seaborn 中无法实现什么，都可以通过使用前者的工具来对输出 (ax) 进行修改。您是否有“误差条”这一术语的视觉示例？您是指 bar plot 吗？ - Aleksey Bilogur

谢谢@ResMar。是的，您所提到的条形图中的黑色垂直线。我正在使用matplotlib功能进行实验。我只需要提取生成的分组条形图的数值x和y值即可。 - EFL

2

除非有人给我一个出乎意料的答案，否则我认为没有一种“优雅”的方法来解决这个问题。seaborn通过聚合许多观察值来在barplot中生成这些误差条，而您的数据已经预先聚合了。 - Aleksey Bilogur

4个回答

0

我建议从patches属性中提取位置坐标，然后绘制误差条形图。

ax = sns.barplot(data=df, x="Candidate", y="Values", hue="Sample_Set")
x_coords = [p.get_x() + 0.5*p.get_width() for p in ax.patches]
y_coords = [p.get_height() for p in ax.patches]
ax.errorbar(x=x_coords, y=y_coords, yerr=df["Error"], fmt="none", c= "k")

- michael

0

Seaborn 绘图在聚合数据时生成误差栏，但是该数据已经被聚合并且有指定的误差列。
最简单的解决方法是使用Pandas创建条形图，使用 pandas.DataFrame.plot 和 kind='bar'
- matplotlib 默认用作绘图后端，绘图 API 有一个 yerr 参数，接受以下内容：
  - 作为具有与绘图 DataFrame 的columns属性匹配或与Series的name属性匹配的列名的DataFrame或dict的错误。
  - 作为指示哪些列包含误差值的绘图DataFrame的str。
  - 作为原始值（list，tuple或np.ndarray）。必须与绘图DataFrame/ Series的长度相同。
这可以通过使用 pandas.DataFrame.pivot 将数据框从长格式转换为宽格式来完成。
请参见 Pandas 用户指南：带误差栏的绘图
在 python 3.8.12，pandas1.3.4 和 matplotlib 3.4.3中进行了测试

# reshape the dataframe into a wide format for Values
vals = df.pivot(index='Candidate', columns='Sample_Set', values='Values')

# reshape the dataframe into a wide format for Errors
yerr = df.pivot(index='Candidate', columns='Sample_Set', values='Error')

# plot vals with yerr
ax = vals.plot(kind='bar', yerr=yerr, logy=True, rot=0, figsize=(6, 5))
_ = ax.legend(title='Sample Set', bbox_to_anchor=(1, 1.02), loc='upper left')

`vals`

Sample_Set   1    2     3
Candidate                
X           20  200  1999
Y           10  101   998
Z           10   99  1003

`错误`

Sample_Set  1   2   3
Candidate            
X           5  30  10
Y           2  30  10
Z           3  30  10

- Trenton McKinney

-3

您可以使用pandas绘图功能接近您所需的内容：请参见此答案

bars = data.groupby("Candidate").plot(kind='bar',x="Sample_Set", y= "Values", yerr=data['Error'])

这并不完全符合您的要求，但非常接近。不幸的是，目前Python的ggplot2无法正确渲染误差条。个人而言，在这种情况下，我会转向R ggplot2：

data <- read.csv("~/repos/tmp/test.csv")
data
library(ggplot2)
ggplot(data, aes(x=Candidate, y=Values, fill=factor(Sample_Set))) + 
  geom_bar(position=position_dodge(), stat="identity") +
  geom_errorbar(aes(ymin=Values-Error, ymax=Values+Error), width=.1, position=position_dodge(.9))

- Dima Lituiev

1

在这里似乎不需要切换到R。 - Jolien

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ImportanceOfBeingErnest · Accepted Answer

如@ResMar在评论中指出，似乎seaborn没有内置的功能来轻松设置个别误差条。

如果您更关心结果而不是获取方式，则以下（不太优雅）的解决方案可能会有所帮助，它建立在matplotlib.pyplot.bar之上。 seaborn导入仅用于获得相同的样式。

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

def grouped_barplot(df, cat,subcat, val , err):
    u = df[cat].unique()
    x = np.arange(len(u))
    subx = df[subcat].unique()
    offsets = (np.arange(len(subx))-np.arange(len(subx)).mean())/(len(subx)+1.)
    width= np.diff(offsets).mean()
    for i,gr in enumerate(subx):
        dfg = df[df[subcat] == gr]
        plt.bar(x+offsets[i], dfg[val].values, width=width, 
                label="{} {}".format(subcat, gr), yerr=dfg[err].values)
    plt.xlabel(cat)
    plt.ylabel(val)
    plt.xticks(x, u)
    plt.legend()
    plt.show()


cat = "Candidate"
subcat = "Sample_Set"
val = "Values"
err = "Error"

# call the function with df from the question
grouped_barplot(df, cat, subcat, val, err )

请注意，仅通过反转类别和子类别即可实现。

cat = "Sample_Set"
subcat = "Candidate"

你可以获得不同的分组：