根据
这个答案,指数运算的实现存在一些开销,而乘法则没有。然而,朴素的乘法会随着指数的增加越来越慢。以下是一个经验性的演示:
In [3]: x = np.random.rand(1e6)
In [15]: %timeit x**2
100 loops, best of 3: 11.9 ms per loop
In [16]: %timeit x*x
100 loops, best of 3: 12.7 ms per loop
In [17]: %timeit x**3
10 loops, best of 3: 132 ms per loop
In [18]: %timeit x*x*x
10 loops, best of 3: 27.2 ms per loop
In [19]: %timeit x**4
10 loops, best of 3: 132 ms per loop
In [20]: %timeit x*x*x*x
10 loops, best of 3: 42.4 ms per loop
In [21]: %timeit x**10
10 loops, best of 3: 132 ms per loop
In [22]: %timeit x*x*x*x*x*x*x*x*x*x
10 loops, best of 3: 137 ms per loop
In [24]: %timeit x**15
10 loops, best of 3: 132 ms per loop
In [25]: %timeit x*x*x*x*x*x*x*x*x*x*x*x*x*x*x
1 loops, best of 3: 212 ms per loop
请注意,指数计算时间基本保持不变,除了x ** 2
这种情况,我怀疑它被特殊处理了,而乘法越来越慢。似乎可以利用这一点来加速整数幂运算...例如:
In [26]: %timeit x**16
10 loops, best of 3: 132 ms per loop
In [27]: %timeit x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x
1 loops, best of 3: 225 ms per loop
In [28]: def tosixteenth(x):
....: x2 = x*x
....: x4 = x2*x2
....: x8 = x4*x4
....: x16 = x8*x8
....: return x16
....:
In [29]: %timeit tosixteenth(x)
10 loops, best of 3: 49.5 ms per loop
似乎您可以将此技术应用于任何整数,通过将其拆分为二的幂的和,针对每个二的幂进行如上计算,并相加:
In [93]: %paste
def smartintexp(x, exp):
result = np.ones(len(x))
curexp = np.array(x)
while True:
if exp%2 == 1:
result *= curexp
exp >>= 1
if not exp: break
curexp *= curexp
return result
In [94]: x
Out[94]:
array([ 0.0163407 , 0.57694587, 0.47336487, ..., 0.70255032,
0.62043303, 0.0796748 ])
In [99]: x**21
Out[99]:
array([ 3.01080670e-38, 9.63466181e-06, 1.51048544e-07, ...,
6.02873388e-04, 4.43193256e-05, 8.46721060e-24])
In [100]: smartintexp(x, 21)
Out[100]:
array([ 3.01080670e-38, 9.63466181e-06, 1.51048544e-07, ...,
6.02873388e-04, 4.43193256e-05, 8.46721060e-24])
In [101]: %timeit x**21
10 loops, best of 3: 132 ms per loop
In [102]: %timeit smartintexp(x, 21)
10 loops, best of 3: 70.7 ms per loop
对于2的小次幂,它速度很快:
In [106]: %timeit x**32
10 loops, best of 3: 131 ms per loop
In [107]: %timeit smartintexp(x, 32)
10 loops, best of 3: 57.4 ms per loop
但随着指数的增大而变得越来越慢:
In [97]: %timeit x**63
10 loops, best of 3: 133 ms per loop
In [98]: %timeit smartintexp(x, 63)
10 loops, best of 3: 110 ms per loop
对于最坏情况并不更快:
In [115]: %timeit x**511
10 loops, best of 3: 135 ms per loop
In [114]: %timeit smartintexp(x, 511)
10 loops, best of 3: 192 ms per loop
x**y
被重写为2**(y*log x)
。在现代处理器上,2**a
和log a
都是单浮点指令。 - Jeffrey Sax