如何计算累积正态分布？

Question

如何计算累积正态分布？

pythonnumpyscipystatistics

137

我正在寻找在Numpy或Scipy（或任何严谨的Python库）中提供正态分布累积函数的函数。

- toma

8个回答

57

也许回答这个问题已经太晚了，但既然Google仍然将人们带到这里，我决定在这里写下我的解决方案。

也就是说，自Python 2.7以来，math库已经集成了误差函数math.erf(x)

erf()函数可用于计算传统的统计函数，例如累积标准正态分布：

from math import *
def phi(x):
    #'Cumulative distribution function for the standard normal distribution'
    return (1.0 + erf(x / sqrt(2.0))) / 2.0

参考文献:

https://docs.python.org/2/library/math.html

https://docs.python.org/3/library/math.html

误差函数和标准正态分布函数之间有什么关系？

- WTIFS

3

这正是我要找的内容。如果有人想知道如何使用它来计算“落在标准分布内的数据百分比”，那么：1 -（1-phi（1））* 2 = 0.6827（“68％的数据在1个标准差内”）。 - Hannes Landeholm

4

对于一般正态分布，其函数可以表示为def phi(x, mu, sigma): return (1 + erf((x - mu) / sigma / sqrt(2))) / 2。其中，mu为均值，sigma为标准差，erf为误差函数，sqrt为平方根函数。 - Bernhard Barker

55

从Python 3.8开始，标准库提供了NormalDist对象作为statistics模块的一部分。

它可用于获取给定均值（mu）和标准偏差（sigma）的累积分布函数（cdf - 随机样本X小于或等于x的概率）：

from statistics import NormalDist

NormalDist(mu=0, sigma=1).cdf(1.96)
# 0.9750021048517796

对于标准正态分布（mu = 0，sigma = 1），可以简化为：

NormalDist().cdf(1.96)
# 0.9750021048517796

NormalDist().cdf(-1.96)
# 0.024997895148220428

- Xavier Guihot

9

根据一些快速检查，这比scipy.stats中的norm.cdf要快得多，并且比scipy和math中erf的实现都要快。 - dcl

2

这个能向量化吗？或者如果需要在数组中计算评估在所有点上的CDF，某人应该使用Scipy实现吗？ - hasManyStupidQuestions

1

太棒了。也许你知道如何获取反函数（normsinv）？编辑：好的，它是inv_cdf()。谢谢！ - Juozas

19

源自这里 http://mail.python.org/pipermail/python-list/2000-June/039873.html

from math import *
def erfcc(x):
    """Complementary error function."""
    z = abs(x)
    t = 1. / (1. + 0.5*z)
    r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
        t*(.09678418+t*(-.18628806+t*(.27886807+
        t*(-1.13520398+t*(1.48851587+t*(-.82215223+
        t*.17087277)))))))))
    if (x >= 0.):
        return r
    else:
        return 2. - r

def ncdf(x):
    return 1. - 0.5*erfcc(x/(2**0.5))

- Unknown

5

由于标准库已经实现了math.erf()函数，因此无需单独编写实现。 - Marc

1

我找不到答案，这些数字从哪里来？ - TmSmth

1

@TmSmth 如果我要猜的话，这看起来像是指数函数内部的某种近似值，因此您可能可以在稍微调整一下函数（更改变量，然后说r = t * exp（- z ** 2 -f（t））并对f进行泰勒展开（可以通过数值方法找到）之后计算它们。 - Nephanth

18

在Unknown的例子基础上，许多库中实现的函数normdist()的Python等效函数如下：

def normcdf(x, mu, sigma):
    t = x-mu;
    y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
    if y>1.0:
        y = 1.0;
    return y

def normpdf(x, mu, sigma):
    u = (x-mu)/abs(sigma)
    y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
    return y

def normdist(x, mu, sigma, f):
    if f:
        y = normcdf(x,mu,sigma)
    else:
        y = normpdf(x,mu,sigma)
    return y

- Cerin

13

Alex的回答为您展示了标准正态分布（平均值=0，标准差=1）的解决方案。如果您有一个mean和std（即sqr(var)）的正态分布，并且您想要计算：

from scipy.stats import norm

# cdf(x < val)
print norm.cdf(val, m, s)

# cdf(x > val)
print 1 - norm.cdf(val, m, s)

# cdf(v1 < x < v2)
print norm.cdf(v2, m, s) - norm.cdf(v1, m, s)

阅读更多关于累积分布函数的信息，以及Scipy实现正态分布的许多公式，请单击此处。

- Salvador Dali

2

Taken from above:

from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435

对于双尾检验：

最初的回答：

Import numpy as np
z = 1.96
p_value = 2 * norm.cdf(-np.abs(z))
0.04999579029644087

- David Miller

0

就像这样简单：

import math
def my_cdf(x):
    return 0.5*(1+math.erf(x/math.sqrt(2)))

我在这个页面 https://www.danielsoper.com/statcalc/formulas.aspx?id=55 中找到了公式。

- Samuel Corradi

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Alex Reynolds · Accepted Answer

163

这是一个例子：

>>> from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435

换句话说，标准正态区间的约95%位于以标准均值零为中心，两个标准偏差内。

如果需要反函数累积分布：

>>> norm.ppf(norm.cdf(1.96))
array(1.9599999999999991)

- Alex Reynolds

14

另外，您可以将均值（loc）和方差（scale）指定为参数。例如，d = norm(loc=10.0, scale=2.0); d.cdf(12.0); 详细信息请参见：http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.norm.html - Irvan

9

@Irvan，尺度参数实际上是标准差，而不是方差。 - qkhhly

2

为什么scipy将它们命名为“loc”和“scale”？我使用了help(norm.ppf)，但是“loc”和“scale”到底是什么意思 - 需要帮助理解帮助文档。 - WestCoastProjects

4

“位置”和“尺度”是统计学中更一般的术语，用于参数化各种分布。对于正态分布，它们与均值和标准差相对应，但对于其他分布则不是这样。 - Michael Ohlrogge

1

@MichaelOhlrogge。谢谢！这是NIST的一个页面，进一步解释了http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm - WestCoastProjects

显示剩余2条评论