在Python中绘制双箱线图(双轴箱线图;箱线图相关图表)

5
我想用箱线图在x轴和y轴上绘制两个变量的分布。我想获得的图表类型的示例可在此网站上找到,但它们是使用R语言的。
我想知道是否可以使用Python的matlplotlib.pyplot等工具来获得同样的结果。然而,boxplot函数似乎不能用于此类图表。
针对两个组,我尝试了以下内容:
import matplotlib.pyplot as plt
x1 = [x11, x12, ..., x1n]
x2 = [x21, x22, ..., x2n]
y1 = [y11, y12, ..., y1n]
y2 = [y21, y22, ..., y2n]

data = [list(zip(x1,y1)), list(zip(x2,y2))]
fig, ax = plt.subplots()
ax.boxplot(data)

结果如下: 而非类似于这样的:

我建议您考虑绘制彩色散点图,而不是二维箱线图。其中一些示例非常难以清晰地展示任何内容。我不完全确定这在matplotlib中如何实现(但相当确定可以),但真正的问题是为什么? - Andrew
你将无法通过当前的 boxplot 实现获得期望的结果,你需要编写自己的实现。如果你能够计算出 xy 维度的第一四分位数和第三四分位数,那么你就可以使用适当的坐标绘制一个 Rectangle(矩形)。Rectangle - Diziet Asahi
@Andrew 我需要在一篇科学文章中使用这种图表。我们最初制作了一个彩色散点图,但我们认为使用双箱线图更容易展示我们想要展示的信息。@DizietAsahi 我会尝试一些东西,感谢您提供Rectangle的提示。如果成功,我会发布答案。 - Timothée ZARAGORI
我希望你是对的。如果那些例子是参考的话,我认为我很难看出那些例子中不能用更简单的方式表达的含义。祝好运。 - Andrew
1个回答

6

以下是使用numpyspercentile方法以及RectangleLine2D进行实际绘图的解决问题的初步尝试:

from matplotlib import pyplot as plt
from matplotlib.patches import Rectangle
from matplotlib.lines import Line2D

import numpy as np

def boxplot_2d(x,y, ax, whis=1.5):
    xlimits = [np.percentile(x, q) for q in (25, 50, 75)]
    ylimits = [np.percentile(y, q) for q in (25, 50, 75)]

    ##the box
    box = Rectangle(
        (xlimits[0],ylimits[0]),
        (xlimits[2]-xlimits[0]),
        (ylimits[2]-ylimits[0]),
        ec = 'k',
        zorder=0
    )
    ax.add_patch(box)

    ##the x median
    vline = Line2D(
        [xlimits[1],xlimits[1]],[ylimits[0],ylimits[2]],
        color='k',
        zorder=1
    )
    ax.add_line(vline)

    ##the y median
    hline = Line2D(
        [xlimits[0],xlimits[2]],[ylimits[1],ylimits[1]],
        color='k',
        zorder=1
    )
    ax.add_line(hline)

    ##the central point
    ax.plot([xlimits[1]],[ylimits[1]], color='k', marker='o')

    ##the x-whisker
    ##defined as in matplotlib boxplot:
    ##As a float, determines the reach of the whiskers to the beyond the
    ##first and third quartiles. In other words, where IQR is the
    ##interquartile range (Q3-Q1), the upper whisker will extend to
    ##last datum less than Q3 + whis*IQR). Similarly, the lower whisker
    ####will extend to the first datum greater than Q1 - whis*IQR. Beyond
    ##the whiskers, data are considered outliers and are plotted as
    ##individual points. Set this to an unreasonably high value to force
    ##the whiskers to show the min and max values. Alternatively, set this
    ##to an ascending sequence of percentile (e.g., [5, 95]) to set the
    ##whiskers at specific percentiles of the data. Finally, whis can
    ##be the string 'range' to force the whiskers to the min and max of
    ##the data.
    iqr = xlimits[2]-xlimits[0]

    ##left
    left = np.min(x[x > xlimits[0]-whis*iqr])
    whisker_line = Line2D(
        [left, xlimits[0]], [ylimits[1],ylimits[1]],
        color = 'k',
        zorder = 1
    )
    ax.add_line(whisker_line)
    whisker_bar = Line2D(
        [left, left], [ylimits[0],ylimits[2]],
        color = 'k',
        zorder = 1
    )
    ax.add_line(whisker_bar)

    ##right
    right = np.max(x[x < xlimits[2]+whis*iqr])
    whisker_line = Line2D(
        [right, xlimits[2]], [ylimits[1],ylimits[1]],
        color = 'k',
        zorder = 1
    )
    ax.add_line(whisker_line)
    whisker_bar = Line2D(
        [right, right], [ylimits[0],ylimits[2]],
        color = 'k',
        zorder = 1
    )
    ax.add_line(whisker_bar)

    ##the y-whisker
    iqr = ylimits[2]-ylimits[0]

    ##bottom
    bottom = np.min(y[y > ylimits[0]-whis*iqr])
    whisker_line = Line2D(
        [xlimits[1],xlimits[1]], [bottom, ylimits[0]], 
        color = 'k',
        zorder = 1
    )
    ax.add_line(whisker_line)
    whisker_bar = Line2D(
        [xlimits[0],xlimits[2]], [bottom, bottom], 
        color = 'k',
        zorder = 1
    )
    ax.add_line(whisker_bar)

    ##top
    top = np.max(y[y < ylimits[2]+whis*iqr])
    whisker_line = Line2D(
        [xlimits[1],xlimits[1]], [top, ylimits[2]], 
        color = 'k',
        zorder = 1
    )
    ax.add_line(whisker_line)
    whisker_bar = Line2D(
        [xlimits[0],xlimits[2]], [top, top], 
        color = 'k',
        zorder = 1
    )
    ax.add_line(whisker_bar)

    ##outliers
    mask = (x<left)|(x>right)|(y<bottom)|(y>top)
    ax.scatter(
        x[mask],y[mask],
        facecolors='none', edgecolors='k'
    )

#the figure and axes
fig,(ax1,ax2) = plt.subplots(ncols=2)

#some fake data
x = np.random.rand(1000)**2
y = np.sqrt(np.random.rand(1000))
#x = np.random.rand(1000)
#y = np.random.rand(1000)

#plotting the original data
ax1.scatter(x,y,c='r', s=1)

#doing the box plot
boxplot_2d(x,y,ax=ax2, whis=1)

plt.show()

这当然可以通过允许关键字参数来使RectangleLine2D调用更加方便。最终的结果看起来像这样:

上述代码的结果

左边显示了实际数据的散点图,右边显示了生成的二维箱形图。希望这有所帮助。

1
非常感谢,我一直在尝试以同样的方式完成它。看来我还是有点慢 :) ! - Timothée ZARAGORI

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接