Python中的统计学：组合

Question

Python中的统计学：组合

141

我需要在Python中计算组合数(nCr)，但是在math、numpy或stat库中找不到相应的函数。我想要的是类似以下类型的函数：

comb = calculate_combinations(n, r)

我需要可能的组合数，而不是实际的组合，因此itertools.combinations对我没有兴趣。

最后，我想避免使用阶乘，因为我将计算的组合数可能会变得太大，阶乘会变得非常庞大。

这似乎是一个非常容易回答的问题，但我被淹没在有关生成所有实际组合的问题中，这不是我想要的。

- Morlock

21个回答

130

为什么不自己写呢？这只是一个简单的一行代码而已。

from operator import mul    # or mul=lambda x,y:x*y
from fractions import Fraction

def nCk(n,k): 
  return int(reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1))

测试 - 打印帕斯卡三角形：

>>> for n in range(17):
...     print ' '.join('%5d'%nCk(n,k) for k in range(n+1)).center(100)
...     
                                                   1                                                
                                                1     1                                             
                                             1     2     1                                          
                                          1     3     3     1                                       
                                       1     4     6     4     1                                    
                                    1     5    10    10     5     1                                 
                                 1     6    15    20    15     6     1                              
                              1     7    21    35    35    21     7     1                           
                           1     8    28    56    70    56    28     8     1                        
                        1     9    36    84   126   126    84    36     9     1                     
                     1    10    45   120   210   252   210   120    45    10     1                  
                  1    11    55   165   330   462   462   330   165    55    11     1               
               1    12    66   220   495   792   924   792   495   220    66    12     1            
            1    13    78   286   715  1287  1716  1716  1287   715   286    78    13     1         
         1    14    91   364  1001  2002  3003  3432  3003  2002  1001   364    91    14     1      
      1    15   105   455  1365  3003  5005  6435  6435  5005  3003  1365   455   105    15     1   
    1    16   120   560  1820  4368  8008 11440 12870 11440  8008  4368  1820   560   120    16     1
>>>

PS.
编辑以替换：
int(round(reduce(mul, (float(n-i)/(i+1) for i in range(k)), 1)))
为：
int(reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1))
这样在n/k很大的情况下就不会出错了。

- Nas Banov

28

+1 建议写简单的代码，使用 reduce 函数，并展示帕斯卡三角形的酷炫演示。 - jon_darkstar

7

因为这个答案是错误的，所以得到的结果是-1：打印(factorial(54)/(factorial(54-27))/factorial(27)) == nCk(54, 27) 的值为 False。 - Rusty Rob

3

@robertking - 好吧，你既小气又技术上正确。我的意图是为了说明如何编写自己的函数；我知道由于浮点精度问题，在N和K足够大的情况下，这并不准确。但我们可以解决这个问题 - 请参见上文，现在它不会因为大数字而出错。 - Nas Banov

10

在Haskell中，这可能很快，但不幸的是在Python中并非如此。与其他答案相比，比如@Alex Martelli、J.F. Sebastian和我自己的答案，它实际上相当慢。 - Todd Owen

9

еѓєдЇО Python 3пЉМжИСињШйЬАи¶БеѓЉеЕ• functools ж®°еЭЧзЪД reduce еЗљжХ∞гАВ - Velizar Hristov

显示剩余6条评论

55

在谷歌代码中进行快速搜索（它使用@Mark Byers的答案中的公式）：

def choose(n, k):
    """
    A fast way to calculate binomial coefficients by Andrew Dalke (contrib).
    """
    if 0 <= k <= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

choose()比scipy.misc.comb()快10倍（在所有0<=(n,k)<1e3对上进行测试），如果你需要一个精确的答案。

def comb(N,k): # from scipy.comb(), but MODIFIED!
    if (k > N) or (N < 0) or (k < 0):
        return 0L
    N,k = map(long,(N,k))
    top = N
    val = 1L
    while (top > (N-k)):
        val *= top
        top -= 1
    n = 1L
    while (n < k+1L):
        val /= n
        n += 1
    return val

- jfs

一种不需要任何包的好解决方案 - Edward Newell

2

请注意：此处提到的公式在以下链接中：https://en.wikipedia.org/wiki/Binomial_coefficient#Multiplicative_formula - jmiserez

4

这个 choose 函数应该得到更多的赞！Python 3.8 有 math.comb，但我不得不在一个挑战中使用 Python 3.6，并且没有一个实现可以给出非常大整数的精确结果。这个函数可以快速地给出精确结果！ - reconn

45

如果您想要精确的结果和速度，请尝试gmpy -- gmpy.comb 可以完全满足您的需求，而且速度很快（当然，作为gmpy的原始作者，我有点偏见;-)。

- Alex Martelli

6

事实上，对于代码for k, n in itertools.combinations(range(1000), 2): f(n,k)中的f()函数，使用gmpy2.comb()函数比我之前回答中提到的choose()函数快10倍。其中，gmpy2.comb()和choose()均为Python 3中的函数。 - jfs

既然您是该软件包的作者，我会让您修复损坏的链接，使其指向正确的位置... - Seldom 'Where's Monica' Needy

@SeldomNeedy，code.google.com的链接是一个正确的地方（尽管该网站现在处于存档模式）。当然，从那里很容易找到github位置，https://github.com/aleaxit/gmpy，以及PyPI位置，https://pypi.python.org/pypi/gmpy2，因为它们都有链接！-) - Alex Martelli

@AlexMartelli 对于造成困扰我很抱歉。如果 JavaScript 被（有选择地）禁用，该页面将显示 404。我猜这是为了防止流氓 AI 轻易地整合存档的 Google Code 项目源代码？ - Seldom 'Where's Monica' Needy

1

你做得很好，这是我在回答中测试的17种不同算法中最快的。可惜它不支持分数/小数。 - reticivis

30

如果您想获得精确的结果，请使用sympy.binomial。毫无疑问，它似乎是最快的方法。

x = 1000000
y = 234050

%timeit scipy.misc.comb(x, y, exact=True)
1 loops, best of 3: 1min 27s per loop

%timeit gmpy.comb(x, y)
1 loops, best of 3: 1.97 s per loop

%timeit int(sympy.binomial(x, y))
100000 loops, best of 3: 5.06 µs per loop

- Jim Garrison

1

sympy有一个缓存，而timeit无法清除它。在我的测试中，gmpy速度大约快了264倍。 - reticivis

28

从Python 3.8开始，标准库现在包括math.comb函数来计算二项式系数：

math.comb(n, k)

它表示从n个物品中选择k个物品的方式数量，不考虑重复选择。其计算公式为：
n! / (k! (n - k)!)

import math
math.comb(10, 5) # 252

- Xavier Guihot

28

在许多情况下，数学定义的逐字翻译是相当足够的（记住Python会自动使用大数算法）：

from math import factorial

def calculate_combinations(n, r):
    return factorial(n) // factorial(r) // factorial(n-r)

对于我测试过的某些输入（例如n=1000 r=500），这比另一个（当前得票最高的）答案建议的一行代码reduce快了十倍以上。另一方面，它被@J.F.Sebastian提供的片段超越了。

- Todd Owen

10

以下是另一种选择。这个方法最初是用C++编写的，因此可以将其移植到C++上，以获得有限精度整数（例如__int64）。其优点是(1)它仅涉及整数运算，(2)通过执行连续的乘法和除法对整数值进行膨胀。

我已经使用Nas Banov的Pascal三角形测试了结果，它得到了正确的答案：

def choose(n,r):
  """Computes n! / (r! (n-r)!) exactly. Returns a python long int."""
  assert n >= 0
  assert 0 <= r <= n

  c = 1L
  denom = 1
  for (num,denom) in zip(xrange(n,n-r,-1), xrange(1,r+1,1)):
    c = (c * num) // denom
  return c

原因：为了最小化乘除法的数量，我们将表达式重写为

    n!      n(n-1)...(n-r+1)
--------- = ----------------
 r!(n-r)!          r!

为了尽可能避免乘法溢出，我们将按照以下严格的顺序从左到右进行评估：

n / 1 * (n-1) / 2 * (n-2) / 3 * ... * (n-r+1) / r

我们可以证明按照这个顺序进行整数运算是精确的（即没有舍入误差）。

- Wirawan Purwanto

6

你可以编写2个简单的函数，实际上比使用scipy.special.comb快5-8倍。事实上，你不需要导入任何额外的包，而且函数非常易读。诀窍是使用记忆化来存储先前计算过的值，并使用nCr的定义。

# create a memoization dictionary
memo = {}
def factorial(n):
    """
    Calculate the factorial of an input using memoization
    :param n: int
    :rtype value: int
    """
    if n in [1,0]:
        return 1
    if n in memo:
        return memo[n]
    value = n*factorial(n-1)
    memo[n] = value
    return value

def ncr(n, k):
    """
    Choose k elements from a set of n elements - n must be larger than or equal to k
    :param n: int
    :param k: int
    :rtype: int
    """
    return factorial(n)/(factorial(k)*factorial(n-k))

如果我们比较时间

from scipy.special import comb
%timeit comb(100,48)
>>> 100000 loops, best of 3: 6.78 µs per loop

%timeit ncr(100,48)
>>> 1000000 loops, best of 3: 1.39 µs per loop

- PyRsquared

现在functools模块中有一个memoize装饰器，名为lru_cache，它可能会简化你的代码。 - demented hedgehog

5

如果您的程序对于n有上限（比如说n <= N），并且需要重复计算nCr（最好是计算大于N次），使用lru_cache可以极大地提高性能：

from functools import lru_cache

@lru_cache(maxsize=None)
def nCr(n, r):
    return 1 if r == 0 or r == n else nCr(n - 1, r - 1) + nCr(n - 1, r)

构建缓存（隐式完成）需要高达 O(N^2) 的时间。任何后续对 nCr 的调用将在 O(1) 内返回。

- yzn-pku

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jouni K. Seppänen · Accepted Answer

2023年更新的答案：使用math.comb函数，它自Python 3.8版本起就存在，并且在3.11版本中速度大大提高。

旧答案：请参见scipy.special.comb（在较旧版本的scipy中为scipy.misc.comb）。当exact为False时，它使用gammaln函数以获得良好的精度而不需要花费太多时间。在精确情况下，它返回一个任意精度整数，可能需要很长时间来计算。