为什么random.randint()比random.getrandbits()慢得多

3

我做了一个测试,比较了Python中的random.randint()random.getrandbits()。结果显示getrandbitsrandint快得多。

from random import randint, getrandbits, seed
from datetime import datetime

def ts():
    return datetime.now().timestamp()

def diff(st, en):
    print(f'{round((en - st) * 1000, 3)} ms')

seed(111)
N = 6

rn = []
st = ts()
for _ in range(10**N):
    rn.append(randint(0, 1023))
en = ts()
diff(st,en)

rn = []
st = ts()
for _ in range(10**N):
    rn.append(getrandbits(10))
en = ts()
diff(st,en)

数字范围完全相同,因为10个随机位的范围是从0(例如0000000000)到1023(例如1111111111)。

代码输出:

590.509 ms
91.01 ms

正如你所看到的,在相同条件下,getrandbits 的速度几乎是 6.5 倍。但是为什么呢?有人能解释一下吗?
4个回答

5

randint使用randrange,后者使用_randbelow,如果存在_randbelow_with_get_randbits则使用它,该函数使用getrandbits。但是,randrange除了调用getrandbits之外,还包括在开始和步骤参数上进行检查的开销。即使有快速路径进行这些检查,它们仍然存在。当使用randint时,肯定还有其他因素导致6.5倍的减速,但至少这个答案告诉你,randint始终比getrandbits慢。


3

getrandbits() 是一个非常简短的函数,几乎立即进入本地代码:

def getrandbits(self, k):
    """getrandbits(k) -> x.  Generates an int with k random bits."""
    if k < 0:
        raise ValueError('number of bits must be non-negative')
    numbytes = (k + 7) // 8                       # bits / 8 and rounded up
    x = int.from_bytes(_urandom(numbytes), 'big')
    return x >> (numbytes * 8 - k)                # trim excess bits

_urandomos.urandom,你甚至在相同文件夹下的 os.py 中都找不到它,因为它是原生代码。详细信息请参见此处:Where can I find the source code of os.urandom()?

randint() 是一个 return self.randrange(a, b+1) 调用,其中 randrange() 是一个长且多才多艺的函数,几乎有100行的Python代码,其 "快速跟踪" 在大约50行的检查和初始化之后以 return istart + self._randbelow(width) 结束。

_randbelow() 其实是一个选择,取决于 getrandbits() 是否存在,它在你的情况下是存在的,所以可能会运行this

def _randbelow_with_getrandbits(self, n):
    "Return a random int in the range [0,n).  Returns 0 if n==0."

    if not n:
        return 0
    getrandbits = self.getrandbits
    k = n.bit_length()  # don't use (n-1) here because n can be 1
    r = getrandbits(k)  # 0 <= r < 2**k
    while r >= n:
        r = getrandbits(k)
    return r

所以,在做了许多其他操作后,randint()的返回值与getrandbits()一样,这就是它比getrandbits()更慢的原因。


1

根本原因是getrandbits调用的函数是用C编写的,在GitHub Python repository的源代码中可以查看。

你会发现,getrandbits在底层调用了os模块中的urandom函数,如果你查看这个函数的实现,你会发现它确实是用C语言编写的。

os_urandom_impl(PyObject *module, Py_ssize_t size)
/*[clinic end generated code: output=42c5cca9d18068e9 input=4067cdb1b6776c29]*/
{
    PyObject *bytes;
    int result;

    if (size < 0)
        return PyErr_Format(PyExc_ValueError,
                            "negative argument not allowed");
    bytes = PyBytes_FromStringAndSize(NULL, size);
    if (bytes == NULL)
        return NULL;

    result = _PyOS_URandom(PyBytes_AS_STRING(bytes), PyBytes_GET_SIZE(bytes));
    if (result == -1) {
        Py_DECREF(bytes);
        return NULL;
    }
    return bytes;
}

谈到randint,如果您查看此函数的源代码,它调用randrange,后者进一步调用_randbelow_with_get_randbits
    def randrange(self, start, stop=None, step=_ONE):
        """Choose a random item from range(start, stop[, step]).
        This fixes the problem with randint() which includes the
        endpoint; in Python this is usually not what you want.
        """

        # This code is a bit messy to make it fast for the
        # common case while still doing adequate error checking.
        try:
            istart = _index(start)
        except TypeError:
            istart = int(start)
            if istart != start:
                _warn('randrange() will raise TypeError in the future',
                      DeprecationWarning, 2)
                raise ValueError("non-integer arg 1 for randrange()")
            _warn('non-integer arguments to randrange() have been deprecated '
                  'since Python 3.10 and will be removed in a subsequent '
                  'version',
                  DeprecationWarning, 2)
        if stop is None:
            # We don't check for "step != 1" because it hasn't been
            # type checked and converted to an integer yet.
            if step is not _ONE:
                raise TypeError('Missing a non-None stop argument')
            if istart > 0:
                return self._randbelow(istart)
            raise ValueError("empty range for randrange()")

        # stop argument supplied.
        try:
            istop = _index(stop)
        except TypeError:
            istop = int(stop)
            if istop != stop:
                _warn('randrange() will raise TypeError in the future',
                      DeprecationWarning, 2)
                raise ValueError("non-integer stop for randrange()")
            _warn('non-integer arguments to randrange() have been deprecated '
                  'since Python 3.10 and will be removed in a subsequent '
                  'version',
                  DeprecationWarning, 2)
        width = istop - istart
        try:
            istep = _index(step)
        except TypeError:
            istep = int(step)
            if istep != step:
                _warn('randrange() will raise TypeError in the future',
                      DeprecationWarning, 2)
                raise ValueError("non-integer step for randrange()")
            _warn('non-integer arguments to randrange() have been deprecated '
                  'since Python 3.10 and will be removed in a subsequent '
                  'version',
                  DeprecationWarning, 2)
        # Fast path.
        if istep == 1:
            if width > 0:
                return istart + self._randbelow(width)
            raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))

        # Non-unit step argument supplied.
        if istep > 0:
            n = (width + istep - 1) // istep
        elif istep < 0:
            n = (width + istep + 1) // istep
        else:
            raise ValueError("zero step for randrange()")
        if n <= 0:
            raise ValueError("empty range for randrange()")
        return istart + istep * self._randbelow(n)

    def randint(self, a, b):
        """Return random integer in range [a, b], including both end points.
        """

        return self.randrange(a, b+1)

正如您所见,randrange函数中的Try/Except块涉及许多操作,由于这些额外开销和控制流程,它在Python中运行速度较慢。请注意,保留了HTML标记。

1
谢谢您的解释。您可以给这个问题投票吗?因为我认为这是一件相当有趣的事情,而且我找不到任何相关内容。也许会对某人有所帮助。 - undefined

0

当您调用randint时执行的代码如下:

#!/usr/bin/python3
import random

random.randint(0, 1023)

$ python3 -m trace -t s.py
zen importlib._bootstrap>(194): s.py(4): random.randint(0, 1023)
 --- modulename: random, funcname: randint
random.py(339):         return self.randrange(a, b+1)
 --- modulename: random, funcname: randrange
random.py(301):         istart = int(start)
random.py(302):         if istart != start:
random.py(304):         if stop is None:
random.py(310):         istop = int(stop)
random.py(311):         if istop != stop:
random.py(313):         width = istop - istart
random.py(314):         if step == 1 and width > 0:
random.py(315):             return istart + self._randbelow(width)
 --- modulename: random, funcname: _randbelow_with_getrandbits
random.py(241):         if not n:
random.py(243):         getrandbits = self.getrandbits
random.py(244):         k = n.bit_length()  # don't use (n-1) here because n can be 1
random.py(245):         r = getrandbits(k)  # 0 <= r < 2**k
random.py(246):         while r >= n:
random.py(248):         return r

所以答案是有一些额外的样板代码最终会调用`getrandbits`,否则你可以直接调用它。请注意,使用2的幂次调用`_randbelow_with_getrandbits`会表现出最坏情况:它将采样`getrandbits(11)`,直到获得具有10位的数字,因此它将丢弃一半的值(这不是它较慢的主要原因,但在调用`getrandbits`时添加拒绝采样逻辑使它们变慢了约两倍)。

一些答案说getrandbits最终会调用os.urandom。这是不正确的。

常规的getrandbits函数是直接在C中实现的_random_Random_getrandbits_impl。调用os.urandom的是来自SystemRandom的函数。

您可以使用trace模块进行检查:

#!/usr/bin/python3
import random

r = random.SystemRandom()

r.getrandbits(5)

random.getrandbits(5)

$ python3 -m trace -t s.py
...
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): s.py(4): r = random.SystemRandom()
 --- modulename: random, funcname: __init__
random.py(123):         self.seed(x)
 --- modulename: random, funcname: seed
random.py(800):         return None
random.py(124):         self.gauss_next = None
s.py(6): r.getrandbits(5)
 --- modulename: random, funcname: getrandbits
random.py(786):         if k < 0:
random.py(788):         numbytes = (k + 7) // 8                       # bits / 8 and rounded up
random.py(789):         x = int.from_bytes(_urandom(numbytes), 'big')
random.py(790):         return x >> (numbytes * 8 - k)                # trim excess bits
s.py(8): random.getrandbits(5)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接