为什么numpy.array如此缓慢？

Question

为什么numpy.array如此缓慢？

30

我对此感到困惑

def main():
    for i in xrange(2560000):
        a = [0.0, 0.0, 0.0]

main()

$ time python test.py

real     0m0.793s

现在让我们用numpy来看一下：

import numpy

def main():
    for i in xrange(2560000):
        a = numpy.array([0.0, 0.0, 0.0])

main()

$ time python test.py

real    0m39.338s

哇，CPU周期宝宝！

使用numpy.zeros(3)会有所改善，但在我看来还不够。

$ time python test.py

real    0m5.610s
user    0m5.449s
sys 0m0.070s

numpy.version.version = '1.5.1'

如果你想知道第一个例子中是否跳过了列表创建以优化代码，那么并没有：

  5          19 LOAD_CONST               2 (0.0)
             22 LOAD_CONST               2 (0.0)
             25 LOAD_CONST               2 (0.0)
             28 BUILD_LIST               3
             31 STORE_FAST               1 (a)

- Stefano Borini

2

一个快速的想法：numpy.array实际上比列表更复杂。在第二个片段中，您创建了一个列表和一个numpy数组（在第一个片段中只有一个列表）。这是否是如此大的差异的唯一原因，我不能说。 - Felix Kling

3

但要考虑到：在使用NumPy这样的复杂应用程序中，创建数据很少是瓶颈。我也不知道底层发生了什么，但显然它可以让数学密集型程序在最后更快，所以没有理由抱怨;) - user395760

7

@Stefano：你的计时中包括了导入numpy吗？（同时Python有一个内置的时间模块。） - Katriel

1

只是一个快速提示，您可以使用 python -mtimeit test.py 进行基准测试。 - igorgue

numpy是否有重复使用未使用数组的机制？因为Python列表有。请注意，numpy.array需要在numpy对象中查找array属性，而[]构造函数不执行查找，即使这并不真正影响性能。 - mg.

显示剩余2条评论

4个回答

4

天哪，CPU时钟周期数太多了！

但是请考虑与numpy相关的一些非常基础的内容；像随机数或奇异值分解这样的复杂线性代数功能。现在，考虑这些看似简单的计算：

In []: A= rand(2560000, 3)
In []: %timeit rand(2560000, 3)
1 loops, best of 3: 296 ms per loop
In []: %timeit u, s, v= svd(A, full_matrices= False)
1 loops, best of 3: 571 ms per loop

请相信我，目前市面上没有任何一个软件包能够显著地超越这种性能。

因此，请描述您的实际问题，我将尝试为其找到基于numpy的解决方案。

更新：
这里是一些用于光线球体交叉的简单代码：

import numpy as np

def mag(X):
    # magnitude
    return (X** 2).sum(0)** .5

def closest(R, c):
    # closest point on ray to center and its distance
    P= np.dot(c.T, R)* R
    return P, mag(P- c)

def intersect(R, P, h, r):
    # intersection of rays and sphere
    return P- (h* (2* r- h))** .5* R

# set up
c, r= np.array([10, 10, 10])[:, None], 2. # center, radius
n= 5e5
R= np.random.rand(3, n) # some random rays in first octant
R= R/ mag(R) # normalized to unit length

# find rays which will intersect sphere
P, b= closest(R, c)
wi= b<= r

# and for those which will, find the intersection
X= intersect(R[:, wi], P[:, wi], r- b[wi], r)

显然我们的计算是正确的：

In []: allclose(mag(X- c), r)
Out[]: True

以下是一些时间安排：

In []: % timeit P, b= closest(R, c)
10 loops, best of 3: 93.4 ms per loop
In []: n/ 0.0934
Out[]: 5353319 #=> more than 5 million detection's of possible intersections/ s
In []: %timeit X= intersect(R[:, wi], P[:, wi], r- b[wi])
10 loops, best of 3: 32.7 ms per loop
In []: X.shape[1]/ 0.0327
Out[]: 874037 #=> almost 1 million actual intersections/ s

这些时间是使用非常普通的机器完成的。使用现代化的机器，仍然可以期望显著的加速。

无论如何，这只是一个简短的演示如何使用numpy编码。

- eat

我的真正问题：https://dev59.com/N2w15IYBdhLWcg3wkcqq - Stefano Borini

不错。然而，这种方式并不能直接处理球体对象。你必须拥有一个后端，将高级设计转换为一组聚合坐标，然后再馈送给numpy。 - Stefano Borini

+1 表示“请考虑与numpy相关的一些非常基本的事情”。 - doug

@Stefano Borini：嗯，我仍然不知道你真正想做什么，但为了有效地利用numpy，你应该以合理的“块”进行处理。为什么不保留你的面向对象设计，但不要在对象中单独存储坐标。在对象和列（或行）之间建立映射是很简单的。请注意，使用numpy可以轻松地编写易于阅读（接近所涉及的更高数学水平）的代码。谢谢。 - eat

@eat：我想要单独的几何对象，比如球体、平面等，并且我希望这些对象知道它们自己的信息，比如几何形状，并且能够判断它们是否相交。我执行的大部分操作都是通过这些坐标进行的，这意味着每次我对一个坐标进行操作时，很可能会创建一个numpy数组（用于临时变量、最终结果，如交点等）。你提出的方案没有考虑到我可能只有一个球体，它只是一个3个元素的数组，因此我永远不会有一个巨大的数组来执行操作。 - Stefano Borini

1

@Stefano Borini：就我所知，至少你似乎有很多光线。我仍然建议将所有“永久”点保存在数组中，并编写这样的代码，让numpy处理临时变量，即最小化创建小numpy数组的需求。祝你好运！谢谢 - eat

2

晚了些，但对其他观众可能很重要。

这个问题在kwant项目中也被考虑过。实际上，numpy没有优化小数组，而小数组恰好是你需要的东西。

因此，他们创建了一个替代小数组的工具，它的行为与numpy数组相同（新数据类型中未实现的任何操作都由numpy处理）。

你应该看看这个项目：
https://pypi.python.org/pypi/tinyarray/1.0.5
它的主要目的是为小数组提供良好的支持。当然，一些更高级的numpy功能不受支持。但数值计算似乎是你的要求。

我进行了一些小测试：

python

我添加了numpy导入以获得正确的加载时间。

import numpy

def main():
    for i in xrange(2560000):
        a = [0.0, 0.0, 0.0]

main()

numpy

import numpy

def main():
    for i in xrange(2560000):
        a = numpy.array([0.0, 0.0, 0.0])

main()

numpy-zero

import numpy

def main():
    for i in xrange(2560000):
        a = numpy.zeros((3,1))

main()

小数组

import numpy,tinyarray

def main():
    for i in xrange(2560000):
        a = tinyarray.array([0.0, 0.0, 0.0])

main()

tinyarray-zero

import numpy,tinyarray

def main():
    for i in xrange(2560000):
        a = tinyarray.zeros((3,1))

main()

我运行了这个：

for f in python numpy numpy_zero tiny tiny_zero ; do 
   echo $f 
   for i in `seq 5` ; do 
      time python ${f}_test.py
   done 
 done

"并获得："

python
python ${f}_test.py  0.31s user 0.02s system 99% cpu 0.339 total
python ${f}_test.py  0.29s user 0.03s system 98% cpu 0.328 total
python ${f}_test.py  0.33s user 0.01s system 98% cpu 0.345 total
python ${f}_test.py  0.31s user 0.01s system 98% cpu 0.325 total
python ${f}_test.py  0.32s user 0.00s system 98% cpu 0.326 total
numpy
python ${f}_test.py  2.79s user 0.01s system 99% cpu 2.812 total
python ${f}_test.py  2.80s user 0.02s system 99% cpu 2.832 total
python ${f}_test.py  3.01s user 0.02s system 99% cpu 3.033 total
python ${f}_test.py  2.99s user 0.01s system 99% cpu 3.012 total
python ${f}_test.py  3.20s user 0.01s system 99% cpu 3.221 total
numpy_zero
python ${f}_test.py  1.04s user 0.02s system 99% cpu 1.075 total
python ${f}_test.py  1.08s user 0.02s system 99% cpu 1.106 total
python ${f}_test.py  1.04s user 0.02s system 99% cpu 1.065 total
python ${f}_test.py  1.03s user 0.02s system 99% cpu 1.059 total
python ${f}_test.py  1.05s user 0.01s system 99% cpu 1.064 total
tiny
python ${f}_test.py  0.93s user 0.02s system 99% cpu 0.955 total
python ${f}_test.py  0.98s user 0.01s system 99% cpu 0.993 total
python ${f}_test.py  0.93s user 0.02s system 99% cpu 0.953 total
python ${f}_test.py  0.92s user 0.02s system 99% cpu 0.944 total
python ${f}_test.py  0.96s user 0.01s system 99% cpu 0.978 total
tiny_zero
python ${f}_test.py  0.71s user 0.03s system 99% cpu 0.739 total
python ${f}_test.py  0.68s user 0.02s system 99% cpu 0.711 total
python ${f}_test.py  0.70s user 0.01s system 99% cpu 0.721 total
python ${f}_test.py  0.70s user 0.02s system 99% cpu 0.721 total
python ${f}_test.py  0.67s user 0.01s system 99% cpu 0.687 total

这些测试并不是最好的测试，正如已经指出的那样。然而，它们仍然表明 tinyarray 更适合处理小数组。
另一个事实是，最常见的操作应该更快地使用 tinyarray。因此，它可能比仅仅创建数据具有更好的使用效益。

我从未在完整的项目中尝试过它，但 kwant 项目正在使用它。

- nickpapior

顺便提一下，如果某些 numpy 函数创建了太多的开销，有时候将其推迟到单个函数中可能会更加有益，而不是在模块中查找它，例如 d = numpy.array; a = d([0. 0. 0.])。 - nickpapior

0

当然，在这种情况下，NumPy消耗的时间更多，因为：a = np.array([0.0, 0.0, 0.0]) <=~=> a = [0.0, 0.0, 0.0]; a = np.array(a)，需要两个步骤。但是NumPy数组有许多优点，它们的高速度可以在对它们进行操作时看到，而不是创建它们时。这是我的个人想法 :)。

- ZhengPeng

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dunes · Accepted Answer

Numpy针对大量数据进行了优化。给它一个长度为3的微小数组，不出所料，它的性能表现很差。

考虑进行单独测试。

import timeit

reps = 100

pythonTest = timeit.Timer('a = [0.] * 1000000')
numpyTest = timeit.Timer('a = numpy.zeros(1000000)', setup='import numpy')
uninitialised = timeit.Timer('a = numpy.empty(1000000)', setup='import numpy')
# empty simply allocates the memory. Thus the initial contents of the array 
# is random noise

print 'python list:', pythonTest.timeit(reps), 'seconds'
print 'numpy array:', numpyTest.timeit(reps), 'seconds'
print 'uninitialised array:', uninitialised.timeit(reps), 'seconds'

输出结果为

python list: 1.22042918205 seconds
numpy array: 1.05412316322 seconds
uninitialised array: 0.0016028881073 seconds

看起来是numpy中数组清零的操作占用了大部分时间。所以，除非你需要初始化数组，否则尝试使用empty函数。