Numba @jit比纯Python慢吗？

Question

Numba @jit比纯Python慢吗？

4

我需要提高一份我正在处理的脚本的执行时间。我开始使用Numba JIT装饰器来尝试并行计算，但它却抛出了错误。

KeyError: "Does not support option: 'parallel'"

我决定测试nogil是否可以释放我的CPU的全部能力，但它比纯Python更慢，我不明白为什么会这样，如果有人可以帮助或指导我，我将非常感激。

import numpy as np
from numba import *
@jit(['float64[:,:],float64[:,:]'],'(n,m),(n,m)->(n,m)',nogil=True)
def asd(x,y):
    return x+y
u=np.random.random(100)
w=np.random.random(100)

%timeit asd(u,w)
%timeit u+w

3轮中循环10000次，每次循环平均耗时137微秒。最慢的运行时间比最快的运行时间慢了7.13倍。这可能意味着中间结果已被缓存。3轮中循环1000000次，每次循环平均耗时1.75微秒。

- jmparejaz

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- JoshAdel · Accepted Answer

您不能指望Numba在如此简单的向量化操作上胜过NumPy。而且，由于Numba函数包括外部函数调用的成本，因此您的比较并不完全公平。如果您对更大的数组进行求和，您会发现两者的性能趋于一致，您所看到的只是一个非常快速操作的开销：

import numpy as np
import numba as nb

@nb.njit
def asd(x,y):
    return x+y

def asd2(x, y):
    return x + y

u=np.random.random(10000)
w=np.random.random(10000)

%timeit asd(u,w)
%timeit asd2(u,w)

The slowest run took 17796.43 times longer than the fastest. This could mean 
that an intermediate result is being cached.
100000 loops, best of 3: 6.06 µs per loop

The slowest run took 29.94 times longer than the fastest. This could mean that 
an intermediate result is being cached.
100000 loops, best of 3: 5.11 µs per loop

就并行功能而言，对于这个简单的操作，你可以使用nb.vectorize：

@nb.vectorize([nb.float64(nb.float64, nb.float64)], target='parallel')
def asd3(x, y):
    return x + y

u=np.random.random((100000, 10))
w=np.random.random((100000, 10))

%timeit asd(u,w)
%timeit asd2(u,w)
%timeit asd3(u,w)

但是，如果您处理的是小型数组，则会看到线程分派的开销。对于上述数组大小，我看到并行处理使我加速了2倍。

Numba真正优秀的地方在于执行一些难以使用广播方式进行的numpy操作，或者当操作会导致大量临时中间数组分配时。