使用Matplotlib创建概率/频率轴网格（不规则间距）

Question

使用Matplotlib创建概率/频率轴网格（不规则间距）

pythonmatplotlibplotprobability

11

我正在尝试创建一个频率曲线图，但我在调整坐标轴以获得所需的绘图时遇到了问题。

这是我试图创建的期望网格/绘图的示例：

Frequency Curve Example

这是我用matplotlib创建的结果：

Current Matplotlib Result

为了创建此绘图中的网格，我使用了以下代码：

m1 = pd.np.arange(.2, 1, .1)
m2 = pd.np.arange(1, 2, .2)
m3 = pd.np.arange(2, 10, 2)
m4 = pd.np.arange(2, 20, 1)
m5 = pd.np.arange(20, 80, 2)
m6 = pd.np.arange(80, 98, 1)
xTick_minor = pd.np.concatenate((m1,m2,m3,m4, m5, m6))
xTick_major = pd.np.array([.2,.5,1,2,5,10,20,30,40,50,60,70,80,90,95,98])

m1 = range(0, 250, 50)
m2 = range(250, 500, 10)
m3 = range(500, 1000, 20)
m4 = range(1000, 5000, 100)
m5 = range(5000, 10000, 200)
m6 = range(10000, 50000, 1000)

yTick_minor = pd.np.concatenate((m1,m2,m3,m4,m5,m6))
yTick_major = pd.np.array([250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000])

axes.invert_xaxis()
axes.set_ylabel('Discharge in CFS')
axes.set_xlabel('Exceedance Probability')
axes.xaxis.set_major_formatter(FormatStrFormatter('%3.1f'))
axes.set_xticks(xTick_major)
axes.set_xticks(xTick_minor, minor=True)
axes.grid(which='major', alpha=0.7)
axes.grid(which='minor', alpha=0.4)

axes.set_yticks(yTick_major)
axes.set_yticks(yTick_minor, minor=True)

网格是正确的，但我现在想要做的是确保在显示中，低概率范围和低放电值（y轴）之间的间隔更大。实质上，我想控制刻度之间的间距，而不是刻度间隔本身，使得从0.2到0.5的范围与x轴上40到50之间的范围类似，就像所需网格一样。

这可以在matplotlib中完成吗？我已经阅读了有关tick_params和locators的文档，但是没有一个能够解决这种类型的轴格式化问题。

- Nelz11

y轴采用对数刻度。您可以通过设置“axes.yaxis.set_scale ('log')”轻松实现此目的。而“x轴”比较奇怪，我建议将其线性或对数缩放。其他方式可能会让读者感到困惑。 - hitzg

@hitzg 我考虑过使用对数刻度，这是合理的，但更难管理网格和标签。我同意x轴很奇怪。我本质上是在尝试模拟概率纸。我可以用我提供的代码做到这一点，但它显示的方式不是我想要的。 - Nelz11

此外，即使使用axes.yaxis.set_scale('log')，仍然无法解决在相同“刻度”下使较低范围可见的问题。结果会导致网格紧密排布在底部并稀疏分散在顶部，这并不是期望的结果。 - Nelz11

2个回答

7

多亏了@Amy Teegarden的帮助，我终于得出了正确的解决方案。为了让其他人参考，我想在这里分享最终的解决方案！以下是最终结果：

洪水频率曲线概率图

下面是一个真正的概率轴比例尺，使用正态累积分布函数和其反函数PPF，参数化为mu和sigma。

import numpy as np
from matplotlib import scale as mscale
from matplotlib import transforms as mtransforms
from matplotlib.ticker import FormatStrFormatter, FixedLocator
from scipy.stats import norm

class ProbScale(mscale.ScaleBase):
    """
    Scales data in range 0 to 100 using a non-standard log transform
    This scale attempts to replicate "probability paper" scaling

    The scale function:
        A piecewise combination of exponential, linear, and logarithmic scales

    The inverse scale function:
      piecewise combination of exponential, linear, and logarithmic scales

    Since probabilities at 0 and 100 are not represented,
    there is user-defined upper and lower limit, above and below which nothing
    will be plotted.  This defaults to .1 and 99 for lower and upper, respectively.

    """

    # The scale class must have a member ``name`` that defines the
    # string used to select the scale.  For example,
    # ``gca().set_yscale("mercator")`` would be used to select this
    # scale.
    name = 'prob_scale'


    def __init__(self, axis, **kwargs):
        """
        Any keyword arguments passed to ``set_xscale`` and
        ``set_yscale`` will be passed along to the scale's
        constructor.

        upper: The probability above which to crop the data.
        lower: The probability below which to crop the data.
        """
        mscale.ScaleBase.__init__(self)
        upper = kwargs.pop("upper", 98) #Default to an upper bound of 98%
        if upper <= 0 or upper >= 100:
            raise ValueError("upper must be between 0 and 100.")
        lower = kwargs.pop("lower", 0.2) #Default to a lower bound of .2%
        if lower <= 0 or lower >= 100:
            raise ValueError("lower must be between 0 and 100.")
        if lower >= upper:
            raise ValueError("lower must be strictly less than upper!.")
        self.lower = lower
        self.upper = upper

        #This scale is best described by the CDF of the normal distribution
        #This distribution is paramaterized by mu and sigma, these default vaules
        #are provided to work generally well, but can be adjusted by the user if desired
        mu = kwargs.pop("mu", 15)
        sigma = kwargs.pop("sigma", 40)
        self.mu = mu
        self.sigma = sigma
        #Need to enfore the upper and lower limits on the axes initially
        axis.axes.set_xlim(lower,upper)

    def get_transform(self):
        """
        Override this method to return a new instance that does the
        actual transformation of the data.

        The ProbTransform class is defined below as a
        nested class of this one.
        """
        return self.ProbTransform(self.lower, self.upper, self.mu, self.sigma)

    def set_default_locators_and_formatters(self, axis):
        """
        Override to set up the locators and formatters to use with the
        scale.  This is only required if the scale requires custom
        locators and formatters.  Writing custom locators and
        formatters: many helpful examples in ``ticker.py``.

        In this case, the prob_scale uses a fixed locator from
        0.1 to 99 % and a custom no formatter class

        This builds both the major and minor locators, and cuts off any values
        above or below the user defined thresholds: upper, lower
        """
        #major_ticks = np.asarray([.2,.5,1,2,5,10,20,30,40,50,60,70,80,90,95,98])
        major_ticks = np.asarray([.2,1,2,5,10,20,30,40,50,60,70,80,90,98]) #removed a couple ticks to make it look nicer
        major_ticks = major_ticks[np.where( (major_ticks >= self.lower) & (major_ticks <= self.upper) )]

        minor_ticks = np.concatenate( [np.arange(.2, 1, .1), np.arange(1, 2, .2), np.arange(2,20,1), np.arange(20, 80, 2), np.arange(80, 98, 1)] )
        minor_ticks = minor_ticks[np.where( (minor_ticks >= self.lower) & (minor_ticks <= self.upper) )]
        axis.set_major_locator(FixedLocator(major_ticks))
        axis.set_minor_locator(FixedLocator(minor_ticks))


    def limit_range_for_scale(self, vmin, vmax, minpos):
        """
        Override to limit the bounds of the axis to the domain of the
        transform.  In the case of Probability, the bounds should be
        limited to the user bounds that were passed in.  Unlike the
        autoscaling provided by the tick locators, this range limiting
        will always be adhered to, whether the axis range is set
        manually, determined automatically or changed through panning
        and zooming.
        """
        return max(self.lower, vmin), min(self.upper, vmax)

    class ProbTransform(mtransforms.Transform):
        # There are two value members that must be defined.
        # ``input_dims`` and ``output_dims`` specify number of input
        # dimensions and output dimensions to the transformation.
        # These are used by the transformation framework to do some
        # error checking and prevent incompatible transformations from
        # being connected together.  When defining transforms for a
        # scale, which are, by definition, separable and have only one
        # dimension, these members should always be set to 1.
        input_dims = 1
        output_dims = 1
        is_separable = True

        def __init__(self, upper, lower, mu, sigma):
            mtransforms.Transform.__init__(self)
            self.upper = upper
            self.lower = lower
            self.mu = mu
            self.sigma = sigma

        def transform_non_affine(self, a):
            """
            This transform takes an Nx1 ``numpy`` array and returns a
            transformed copy.  Since the range of the Probability scale
            is limited by the user-specified threshold, the input
            array must be masked to contain only valid values.
            ``matplotlib`` will handle masked arrays and remove the
            out-of-range data from the plot.  Importantly, the
            ``transform`` method *must* return an array that is the
            same shape as the input array, since these values need to
            remain synchronized with values in the other dimension.
            """

            masked = np.ma.masked_where( (a < self.upper) & (a > self.lower) , a)
            #Get the CDF of the normal distribution located at mu and scaled by sigma
            #Multiply these by 100 to put it into a percent scale
            cdf = norm.cdf(masked, self.mu, self.sigma)*100
            return cdf

        def inverted(self):
            """
            Override this method so matplotlib knows how to get the
            inverse transform for this transform.
            """
            return ProbScale.InvertedProbTransform(self.lower, self.upper, self.mu, self.sigma)

    class InvertedProbTransform(mtransforms.Transform):
        input_dims = 1
        output_dims = 1
        is_separable = True

        def __init__(self, lower, upper, mu, sigma):
            mtransforms.Transform.__init__(self)
            self.lower = lower
            self.upper = upper
            self.mu = mu
            self.sigma = sigma

        def transform_non_affine(self, a):
            #Need to get the PPF value for a, which is in a percent scale [0,100], so move back to probability range [0,1]
            inverse = norm.ppf(a/100, self.mu, self.sigma)
            return inverse

        def inverted(self):
            return ProbScale.ProbTransform(self.lower, self.upper)

# Now that the Scale class has been defined, it must be registered so
# that ``matplotlib`` can find it.
mscale.register_scale(ProbScale)

另外，为了获得所需的y轴结果，我发现对数刻度确实可以使用一些额外的调整来获得合理的图形。以下是用于强制对数刻度具有适当次要刻度的代码:

axes.set_yscale('log', basey=10, subsy=[2,3,4,5,6,7,8,9])

然后，您可以使用定位器和格式化程序修改标签：

#Adjust the yaxis labels and format
axes.yaxis.set_minor_locator(FixedLocator([200, 500, 1500, 2500, 3500, 4500, 5000, 6000, 7000, 8000, 9000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000]))
axes.yaxis.set_minor_formatter(FormatStrFormatter('%d'))
axes.yaxis.set_major_formatter(FormatStrFormatter('%d'))

所以完整的轴操作如下所示：

axes.set_ylabel('Discharge in CFS')
axes.set_xlabel('Exceedance Probability')
plt.setp(plt.xticks()[1], rotation=45)
#Adjust the scales of the x and y axis
axes.set_yscale('log', basey=10, subsy=[2,3,4,5,6,7,8,9])
axes.set_xscale('prob_scale', upper=98, lower=.2)
#Adjust the yaxis labels and format
axes.yaxis.set_minor_locator(FixedLocator([200, 500, 1500, 2500, 3500, 4500, 5000, 6000, 7000, 8000, 9000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000]))
axes.yaxis.set_minor_formatter(FormatStrFormatter('%d'))
axes.yaxis.set_major_formatter(FormatStrFormatter('%d'))

#Finally set the y-limit of the plot to be reasonable
axes.set_ylim((0, 2*pp['Q'].max()))
#Invert the x-axis
axes.invert_xaxis()
#Turn on major and minor grid lines
axes.grid(which='both', alpha=.9)

这提供了一个半对数概率纸图！其良好特性是任何绘制在这些轴上的东西，如果呈直线状，则表明它来自正态分布！

- Nelz11

在class ProbTransform中，方法transform_non_affine应该使用PPF（并使用0-1比例），而不是CDF。相反，对于Inverted来说，PPF实际上应该是CDF。这样就可以创建正确的正常缩放，如原始问题示例图所示。干得好。 - Q-man

最快的解释是将您展示的图形与您试图模仿的原始扫描图进行比较。考虑两个图中40-30％的箱子宽度与90-80％的宽度。在您的图中，40-30％更大，而在原始图中，40-30％相对于90-80％较小。我们有兴趣“压缩”分布的中间部分并拉伸肩膀，而不是像您展示的那样相反。 - Q-man

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Amy Teegarden · Accepted Answer

您可以定义一个自定义标尺来替代'log'，用于x轴。不幸的是，它很复杂，您需要找出一个函数，让您将给定的x轴数字转换为更线性的形式。请查看http://matplotlib.org/examples/api/custom_scale_example.html。

编辑添加：

问题非常有趣，我决定自己制作自定义轴。我修改了来自链接的代码以使其适用于您的示例。我很想看到它是否按照您想要的方式工作。

编辑：新和改进的代码！间距不如之前均匀，但是当您将点列表传递给plt.gca().set_xscale（请参见代码末尾的示例时），它现在会自动完成。它进行曲线拟合，将这些点拟合到一个逻辑函数上，并使用得到的参数作为转换的基础。当我运行此代码时，我会收到警告（警告：将屏蔽元素转换为nan）。我仍然没有弄清楚发生了什么，但似乎没有造成问题。这是我生成的图像：

import numpy as np
from numpy import ma
from matplotlib import scale as mscale
from matplotlib import transforms as mtransforms
from matplotlib.ticker import Formatter, FixedLocator
from scipy.optimize import curve_fit

def logistic(x, L, k, x0):
    """Logistic function (s-curve)."""
    return L / (1 + np.exp(-k * (x - x0)))

class ProbabilityScale(mscale.ScaleBase):
    """
    Scales data so that points along a logistic curve become evenly spaced.
    """

    # The scale class must have a member ``name`` that defines the
    # string used to select the scale.  For example,
    # ``gca().set_yscale("probability")`` would be used to select this
    # scale.
    name = 'probability'


    def __init__(self, axis, **kwargs):
        """
        Any keyword arguments passed to ``set_xscale`` and
        ``set_yscale`` will be passed along to the scale's
        constructor.

        lower_bound: Minimum value of x. Defaults to .01.
        upper_bound_dist: L - upper_bound_dist is the maximum value
        of x. Defaults to lower_bound.

        """
        mscale.ScaleBase.__init__(self)
        lower_bound = kwargs.pop("lower_bound", .01)
        if lower_bound <= 0:
            raise ValueError("lower_bound must be greater than 0")
        self.lower_bound = lower_bound
        upper_bound_dist = kwargs.pop("upper_bound_dist", lower_bound)
        self.points = kwargs['points']
        #determine parameters of logistic function with curve fitting
        x = np.linspace(0, 1, len(self.points))
        #initial guess for parameters
        p0 = [max(self.points), 1, .5]
        popt, pcov = curve_fit(logistic, x, self.points, p0 = p0)
        [self.L, self.k, self.x0] = popt
        self.upper_bound = self.L - upper_bound_dist

    def get_transform(self):
        """
        Override this method to return a new instance that does the
        actual transformation of the data.

        The ProbabilityTransform class is defined below as a
        nested class of this one.
        """
        return self.ProbabilityTransform(self.lower_bound, self.upper_bound, self.L, self.k, self.x0)

    def set_default_locators_and_formatters(self, axis):
        """
        Override to set up the locators and formatters to use with the
        scale.  This is only required if the scale requires custom
        locators and formatters.  Writing custom locators and
        formatters is rather outside the scope of this example, but
        there are many helpful examples in ``ticker.py``.
        """


        axis.set_major_locator(FixedLocator(self.points))

    def limit_range_for_scale(self, vmin, vmax, minpos):
        """
        Override to limit the bounds of the axis to the domain of the
        transform.  In this case, the bounds should be
        limited to the threshold that was passed in.  Unlike the
        autoscaling provided by the tick locators, this range limiting
        will always be adhered to, whether the axis range is set
        manually, determined automatically or changed through panning
        and zooming.
        """
        return max(vmin, self.lower_bound), min(vmax, self.upper_bound)

    class ProbabilityTransform(mtransforms.Transform):
        # There are two value members that must be defined.
        # ``input_dims`` and ``output_dims`` specify number of input
        # dimensions and output dimensions to the transformation.
        # These are used by the transformation framework to do some
        # error checking and prevent incompatible transformations from
        # being connected together.  When defining transforms for a
        # scale, which are, by definition, separable and have only one
        # dimension, these members should always be set to 1.
        input_dims = 1
        output_dims = 1
        is_separable = True

        def __init__(self, lower_bound, upper_bound, L, k, x0):
            mtransforms.Transform.__init__(self)
            self.lower_bound = lower_bound
            self.L = L
            self.k = k
            self.x0 = x0
            self.upper_bound = upper_bound
        def transform_non_affine(self, a):
            """
            This transform takes an Nx1 ``numpy`` array and returns a
            transformed copy.  Since the range of the scale
            is limited by the user-specified threshold, the input
            array must be masked to contain only valid values.
            ``matplotlib`` will handle masked arrays and remove the
            out-of-range data from the plot.  Importantly, the
            ``transform`` method *must* return an array that is the
            same shape as the input array, since these values need to
            remain synchronized with values in the other dimension.
            """
            masked = ma.masked_where((a < self.lower_bound) | (a > self.upper_bound), a)
            return ma.log((self.L - masked) / masked) / -self.k + self.x0

        def inverted(self):
            """
            Override this method so matplotlib knows how to get the
            inverse transform for this transform.
            """
            return ProbabilityScale.InvertedProbabilityTransform(self.lower_bound, self.upper_bound, self.L, self.k, self.x0)

    class InvertedProbabilityTransform(mtransforms.Transform):
        input_dims = 1
        output_dims = 1
        is_separable = True

        def __init__(self, lower_bound, upper_bound, L, k, x0):
            mtransforms.Transform.__init__(self)
            self.lower_bound = lower_bound
            self.L = L
            self.k = k
            self.x0 = x0
            self.upper_bound = upper_bound

        def transform_non_affine(self, a):
            return self.L / (1 + np.exp(-self.k * (a - self.x0)))
        def inverted(self):
            return ProbabilityScale.ProbabilityTransform(self.lower_bound, self.upper_bound, self.L, self.k, self.x0)

# Now that the Scale class has been defined, it must be registered so
# that ``matplotlib`` can find it.
mscale.register_scale(ProbabilityScale)


if __name__ == '__main__':
    import matplotlib.pyplot as plt
    x = np.linspace(.1, 100, 1000)
    points = np.array([.2,.5,1,2,5,10,20,30,40,50,60,70,80,90,95,98])

    plt.plot(x, x)
    plt.gca().set_xscale('probability', points = points, vmin = .01)
    plt.grid(True)

    plt.show()