Pandas箱线图:设置箱体、中位数和均值的颜色和属性。

18

我有一个带有多级索引的 DataFrame:

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

# dataframe with dates
dates = pd.DataFrame()
dates['2016'] = pd.date_range(start='2016', periods=4, freq='60Min')
dates['2017'] = pd.date_range(start='2017', periods=4, freq='60Min')
dates['2018'] = pd.date_range(start='2018', periods=4, freq='60Min')
dates.reset_index()
dates = dates.unstack()

# multi-indexed dataframe
df = pd.DataFrame(np.random.randn(36, 3))
df['concept'] = np.repeat(np.repeat(['A', 'B', 'C'], 3), 4)
df['datetime'] = pd.concat([dates, dates, dates], ignore_index=True)
df.set_index(['concept', 'datetime'], inplace=True)
df.sort_index(inplace=True)
df.columns = ['V1', 'V2', 'V3']
df.info()

返回:

                                   V1        V2        V3
concept datetime                                         
A       2016-01-01 00:00:00 -0.303428  0.088180 -0.547776
        2016-01-01 01:00:00 -0.893835 -2.226923 -0.181370
        2016-01-01 02:00:00  2.934575  1.515822  0.343609
        2016-01-01 03:00:00 -1.341694  1.681015  0.099759
        2017-01-01 00:00:00  1.515894  0.519595  0.102635
        2017-01-01 01:00:00 -0.266949 -0.035901  0.539084
        2017-01-01 02:00:00  1.336603  0.286928 -0.352078
        2017-01-01 03:00:00  0.480137  0.185785  0.595706
        2018-01-01 00:00:00 -0.385640  1.813604 -0.839973
        2018-01-01 01:00:00  0.568706  1.165257 -1.352020
        2018-01-01 02:00:00  0.498388  0.382034 -1.190599
        2018-01-01 03:00:00  1.897356 -0.293143  0.177787
B       2016-01-01 00:00:00 -1.111196 -1.644588  0.333936
        2016-01-01 01:00:00  0.232206 -0.202987 -0.334564
        2016-01-01 02:00:00  1.264637 -1.472229  0.888451
        2016-01-01 03:00:00  1.033163  0.504090  1.325476
        2017-01-01 00:00:00 -0.199445  0.088792 -0.797965
        2017-01-01 01:00:00 -1.116359  0.574789 -1.055830
        2017-01-01 02:00:00  1.267970  0.287501  0.001420
        2017-01-01 03:00:00  1.554647  2.865833  0.089875
        2018-01-01 00:00:00  0.030871 -1.783524 -1.457190
        2018-01-01 01:00:00  0.073978 -0.735599 -0.420115
        2018-01-01 02:00:00  0.931073 -2.543869 -0.649976
        2018-01-01 03:00:00  0.325443  1.134799  0.445788
C       2016-01-01 00:00:00 -0.489454 -0.646136 -0.111308
        2016-01-01 01:00:00 -0.501965 -0.197183  0.025899
        2016-01-01 02:00:00 -0.714251 -1.846856  0.197658
        2016-01-01 03:00:00  0.609357  0.456263 -0.041581
        2017-01-01 00:00:00 -1.004726 -0.956688 -0.068980
        2017-01-01 01:00:00 -0.036204 -1.236450 -0.895681
        2017-01-01 02:00:00 -0.840374  0.561443  1.401854
        2017-01-01 03:00:00  0.325433  1.406280 -1.033267
        2018-01-01 00:00:00 -0.029315 -1.591510 -0.739032
        2018-01-01 01:00:00 -0.761522 -0.896236  0.537450
        2018-01-01 02:00:00  1.081961  0.126248 -0.911462
        2018-01-01 03:00:00  0.070915 -1.036460  1.187859

并想在一个箱线图中绘制一个分组的列:

# demonstrate how to customize the display different elements:
boxprops = dict(linestyle='-', linewidth=4, color='k')
medianprops = dict(linestyle='-', linewidth=4, color='k')

ax = df.boxplot(column=['V1'],
                by=df.index.get_level_values('datetime').year,
                showfliers=False, showmeans=True,
                boxprops=boxprops,
                medianprops=medianprops)
# get rid of the automatic title
plt.suptitle("")
ax.set_xlabel("")
ax.set_title("Boxplot of V1")

返回结果:

返回结果: 输入图像描述

显然,盒形图的某些样式选项有效,而另一些则无效。

所以我的问题是:

如何设置盒子/中位数/平均值的颜色?

提前致谢!

############################ 编辑 1 ############################

我找到了这个答案并调整了我的图表:

bp = data.boxplot(column=['eex_da_price_mean'],
                  by=data.index.get_level_values('date').year,
                  showfliers=False, showmeans=True,
                  return_type='dict')

[[item.set_linewidth(4) for item in bp[key]['boxes']] for key in bp.keys()]
[[item.set_linewidth(4) for item in bp[key]['fliers']] for key in bp.keys()]
[[item.set_linewidth(4) for item in bp[key]['medians']] for key in bp.keys()]
[[item.set_linewidth(4) for item in bp[key]['means']] for key in bp.keys()]
[[item.set_linewidth(4) for item in bp[key]['whiskers']] for key in bp.keys()]
[[item.set_linewidth(4) for item in bp[key]['caps']] for key in bp.keys()]

bp.set_xlabel("")
bp.set_title("Some plot", fontsize=60)
bp.tick_params(axis='y', labelsize=60)
bp.tick_params(axis='x', labelsize=60)
plt.suptitle("")

返回:

enter image description here

但是现在轴格式化不再起作用了,我会收到类似这样的错误:
bp.set_xlabel("")
AttributeError: 'OrderedDict' object has no attribute 'set_xlabel'

任何提示?
4个回答

19

我刚刚发现了另一个解决方案,可以直接从pandas中使用更少的代码绘图(无需在之后操作matplotlib对象):

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
ax = df.plot(kind='box',
             color=dict(boxes='r', whiskers='r', medians='r', caps='r'),
             boxprops=dict(linestyle='-', linewidth=1.5),
             flierprops=dict(linestyle='-', linewidth=1.5),
             medianprops=dict(linestyle='-', linewidth=1.5),
             whiskerprops=dict(linestyle='-', linewidth=1.5),
             capprops=dict(linestyle='-', linewidth=1.5),
             showfliers=False, grid=True, rot=0)
ax.set_xlabel('Foo')
ax.set_ylabel('Bar in X')
plt.show()

得到的结果是:

在此输入图片描述

我还没有弄清楚如何在showmeans=True时调整均值的颜色,但在大多数情况下应该没问题。

希望对你有所帮助!


有没有想法如何为每个框应用单独的属性,并将它们绘制在一个Axes对象上,就像您的示例一样?如果我单独绘制它们,传递参数ax=ax以便它们都使用相同的Axes,它们当然会重叠在一起,就在中间。 - n1k31t4
要给方框填充颜色,请使用patch_artist=True - Nav
如果人们也尝试给离群值上色,您需要在.plot()命令中添加flierprops。例如:flierprops = dict(marker='o', markerfacecolor='r', markersize=12, linestyle='none', markeredgecolor='g') - Felix Seifert

14

屏幕保护程序的答案很有效。

这是一个完整的例子:

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

# dataframe with dates
dates = pd.DataFrame()
dates['2016'] = pd.date_range(start='2016', periods=4, freq='60Min')
dates['2017'] = pd.date_range(start='2017', periods=4, freq='60Min')
dates['2018'] = pd.date_range(start='2018', periods=4, freq='60Min')
dates.reset_index()
dates = dates.unstack()

# multi-indexed dataframe
df = pd.DataFrame(np.random.randn(36, 3))
df['concept'] = np.repeat(np.repeat(['A', 'B', 'C'], 3), 4)
df['datetime'] = pd.concat([dates, dates, dates], ignore_index=True)
df.set_index(['concept', 'datetime'], inplace=True)
df.sort_index(inplace=True)
df.columns = ['V1', 'V2', 'V3']
df.info()


# demonstrate how to customize the display different elements:
boxprops = dict(linestyle='-', linewidth=4, color='k')
medianprops = dict(linestyle='-', linewidth=4, color='k')

bp = df.boxplot(column=['V1'],
                by=df.index.get_level_values('datetime').year,
                showfliers=False, showmeans=True,
                boxprops=boxprops, medianprops=medianprops,
                return_type='dict')

# boxplot style adjustments
[[item.set_linewidth(4) for item in bp[key]['boxes']] for key in bp.keys()]
[[item.set_linewidth(4) for item in bp[key]['fliers']] for key in bp.keys()]
[[item.set_linewidth(4) for item in bp[key]['medians']] for key in bp.keys()]
[[item.set_linewidth(4) for item in bp[key]['means']] for key in bp.keys()]
[[item.set_linewidth(4) for item in bp[key]['whiskers']] for key in bp.keys()]
[[item.set_linewidth(4) for item in bp[key]['caps']] for key in bp.keys()]

[[item.set_color('g') for item in bp[key]['boxes']] for key in bp.keys()]
# seems to have no effect
[[item.set_color('b') for item in bp[key]['fliers']] for key in bp.keys()]
[[item.set_color('m') for item in bp[key]['medians']] for key in bp.keys()]
[[item.set_markerfacecolor('k') for item in bp[key]['means']] for key in bp.keys()]
[[item.set_color('c') for item in bp[key]['whiskers']] for key in bp.keys()]
[[item.set_color('y') for item in bp[key]['caps']] for key in bp.keys()]

# get rid of "boxplot grouped by" title
plt.suptitle("")

# label adjustment
p = plt.gca()
p.set_xlabel("")
p.set_title("Some plot", fontsize=30)
p.tick_params(axis='y', labelsize=30)
p.tick_params(axis='x', labelsize=30)

返回: 在此输入图片描述


3
从Pandas版本0.23.4开始,以下代码可正常工作:[item.set_color('crimson') for item in bp['boxes']]。该代码用于设置图表中箱线的颜色为“猩红色”。 - Yakzan

3
在你的bp.set_xlabel("")语句之前,尝试使用以下代码:
p = plt.gca()
p.set_xlabel("")
p.set_title("Some plot", fontsize=60)
p.tick_params(axis='y', labelsize=60)
p.tick_params(axis='x', labelsize=60)

谢谢你的回答。完美运作;-) 我会在下面发布一个完整的例子! - Cord Kaldemeyer
如果你想的话,你也可以复制我的例子并回答这个问题。但我认为一个完整的例子会更好! - Cord Kaldemeyer

0

尝试使用seaborn

# Box Plot
import seaborn as sns
%matplotlib inline
sns.boxplot(data=data['fixed acidity'])
plt.show()

enter image description here


3
我不知道啊...seaborn的箱线图看起来像猴面包树。这不是我想要使用的东西。 - Nav

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接