我看了一个有关Python中循环速度的视频,其中解释了使用sum(range(N))
比手动循环range
并将变量相加要快得多,因为前者由于使用内置函数而在C中运行,而后者在(慢的)Python中完成求和。我很好奇在混合使用numpy
时会发生什么。正如我所预料的那样,np.sum(np.arange(N))
是最快的,但sum(np.arange(N))
和np.sum(range(N))
甚至比进行朴素的for循环更慢。
这是为什么?
这是我用来测试的脚本,在我知道的地方(主要来源于视频)对减慢的原因进行了一些注释,并列出了我在我的电脑上得到的结果(python 3.10.0,numpy 1.21.2):
更新的脚本:
import numpy as np
from timeit import timeit
N = 10_000_000
repetition = 10
def sum0(N = N):
s = 0
i = 0
while i < N: # condition is checked in python
s += i
i += 1 # both additions are done in python
return s
def sum1(N = N):
s = 0
for i in range(N): # increment in C
s += i # addition in python
return s
def sum2(N = N):
return sum(range(N)) # everything in C
def sum3(N = N):
return sum(list(range(N)))
def sum4(N = N):
return np.sum(range(N)) # very slow np.array conversion
def sum5(N = N):
# much faster np.array conversion
return np.sum(np.fromiter(range(N),dtype = int))
def sum5v2_(N = N):
# much faster np.array conversion
return np.sum(np.fromiter(range(N),dtype = np.int_))
def sum6(N = N):
# possibly slow conversion to Py_long from np.int
return sum(np.arange(N))
def sum7(N = N):
# list returns a list of np.int-s
return sum(list(np.arange(N)))
def sum7v2(N = N):
# tolist conversion to python int seems faster than the implicit conversion
# in sum(list()) (tolist returns a list of python int-s)
return sum(np.arange(N).tolist())
def sum8(N = N):
return np.sum(np.arange(N)) # everything in numpy (fortran libblas?)
def sum9(N = N):
return np.arange(N).sum() # remove dispatch overhead
def array_basic(N = N):
return np.array(range(N))
def array_dtype(N = N):
return np.array(range(N),dtype = np.int_)
def array_iter(N = N):
# np.sum's source code mentions to use fromiter to convert from generators
return np.fromiter(range(N),dtype = np.int_)
print(f"while loop: {timeit(sum0, number = repetition)}")
print(f"for loop: {timeit(sum1, number = repetition)}")
print(f"sum_range: {timeit(sum2, number = repetition)}")
print(f"sum_rangelist: {timeit(sum3, number = repetition)}")
print(f"npsum_range: {timeit(sum4, number = repetition)}")
print(f"npsum_iterrange: {timeit(sum5, number = repetition)}")
print(f"npsum_iterrangev2: {timeit(sum5, number = repetition)}")
print(f"sum_arange: {timeit(sum6, number = repetition)}")
print(f"sum_list_arange: {timeit(sum7, number = repetition)}")
print(f"sum_arange_tolist: {timeit(sum7v2, number = repetition)}")
print(f"npsum_arange: {timeit(sum8, number = repetition)}")
print(f"nparangenpsum: {timeit(sum9, number = repetition)}")
print(f"array_basic: {timeit(array_basic, number = repetition)}")
print(f"array_dtype: {timeit(array_dtype, number = repetition)}")
print(f"array_iter: {timeit(array_iter, number = repetition)}")
print(f"npsumarangeREP: {timeit(lambda : sum8(N/1000), number = 100000*repetition)}")
print(f"npsumarangeREP: {timeit(lambda : sum9(N/1000), number = 100000*repetition)}")
# Example output:
#
# while loop: 11.493371912998555
# for loop: 7.385945574002108
# sum_range: 2.4605720699983067
# sum_rangelist: 4.509678105998319
# npsum_range: 11.85120212900074
# npsum_iterrange: 4.464334709002287
# npsum_iterrangev2: 4.498494338993623
# sum_arange: 9.537815956995473
# sum_list_arange: 13.290120724996086
# sum_arange_tolist: 5.231948580003518
# npsum_arange: 0.241889145996538
# nparangenpsum: 0.21876695199898677
# array_basic: 11.736577274998126
# array_dtype: 8.71628468400013
# array_iter: 4.303306431000237
# npsumarangeREP: 21.240833958996518
# npsumarangeREP: 16.690092379001726
numpy
是否专门为numpy
进行了优化,而不是与内置的 Python 函数一起使用,就像它的设计一样?例如,在sum(np.arange(N))
的情况下,numpy
范围必须首先转换为 Python 数据结构,然后进行求和,类似地,对于np.sum
,也许需要将range
转换为numpy
理解的类型,但我不确定。 - Matiisssum
实现(https://github.com/python/cpython/blob/79bc5e1dc6f87149240bded3654574b24168f1ac/Python/bltinmodule.c#L2408-L2597),并且numpy函数在此处(https://github.com/numpy/numpy/blob/b235f9e701e14ed6f6f6dcba885f7986a833743f/numpy/core/fromnumeric.py#L2123-L2260)(尽管这是一个包装器函数)。您可以在godbolt上查看所有函数的“dis”输出(https://godbolt.org/z/h5G4Ezx68)。我无法确定具体原因,可能是因为cpython(`sum`和`range`)完全在C中运行。 - Alexnp.sum
的源代码中的一条评论,我添加了几个其他的测试。我猜在range
上调用np.sum
隐含地涉及到转换为np.array
,这似乎是非常低效的转换,除非明确告诉numpy正在使用生成器。观察转换时间(最后三行)以及使用fromiter
如何改变运行时,这可以解释为什么np.sum(range(N))
很慢。现在我唯一不明白的是为什么sum(np.arange(N))
这么慢。 - fbencesum(np.arange(N))
会很慢,因为你正在创建一个numpy整数数组,而sum
将把它从numpy表示转换为Py_Long
。 - Alex