列表推导式和函数式函数是否比“for循环”更快？

Question

列表推导式和函数式函数是否比“for循环”更快？

pythonperformancefor-looplist-comprehensionmap-function

231

在Python中，列表推导式、map()、filter()和reduce()等函数是否比for循环更快？从技术上讲，它们以C速度运行，而for循环以Python虚拟机速度运行，所以为什么它们更快呢？

假设我正在开发一个需要使用for循环绘制复杂且巨大地图的游戏。这个问题肯定是相关的，因为如果列表推导式确实更快，那么它将是一个更好的选择，以避免卡顿（尽管代码在视觉上更加复杂）。

- Ericson Willians

2

你可以看一下这篇文章。它解释了底层的工作原理 - 这基本上就是解释了何时以及如何更快：https://pythonsimplified.com/comprehensive-guide-to-python-list-comprehensions/#What_happens_under_the_hood - PlagTag

8个回答

30

如果你查看python.org上的信息，你会看到这个摘要：

Version Time (seconds)
Basic loop 3.47
Eliminate dots 2.45
Local variable & no dots 1.79
Using map function 0.54

但你真的应该详细阅读上文以了解性能差异的原因。

我还强烈建议您使用timeit来计时您的代码。最终可能会出现这样一种情况，例如当满足条件时，您可能需要退出for循环。这可能比通过调用map找到结果更快。

- Anthony Kong

41

那页内容很好，部分相关，但仅引用那些数字并不有益，甚至可能会误导。 - user395760

5

这并未说明你正在计时什么，相对性能会因循环/列表推导式/map中包含的内容而大不相同。 - user2357112

@delnan 我同意。我修改了我的回答，督促OP阅读文档以了解性能差异。 - Anthony Kong

@user2357112 你需要阅读我链接的维基页面以了解上下文。我为OP的参考而发布它。 - Anthony Kong

17

我修改了@Alisa的代码，并使用cProfile来展示列表推导式为什么更快：

from functools import reduce
import datetime

def reduce_(numbers):
    return reduce(lambda sum, next: sum + next * next, numbers, 0)

def for_loop(numbers):
    a = []
    for i in numbers:
        a.append(i*2)
    a = sum(a)
    return a

def map_(numbers):
    sqrt = lambda x: x*x
    return sum(map(sqrt, numbers))

def list_comp(numbers):
    return(sum([i*i for i in numbers]))

funcs = [
        reduce_,
        for_loop,
        map_,
        list_comp
        ]

if __name__ == "__main__":
    # [1, 2, 5, 3, 1, 2, 5, 3]
    import cProfile
    for f in funcs:
        print('=' * 25)
        print("Profiling:", f.__name__)
        print('=' * 25)
        pr = cProfile.Profile()
        for i in range(10**6):
            pr.runcall(f, [1, 2, 5, 3, 1, 2, 5, 3])
        pr.create_stats()
        pr.print_stats()

以下是结果:

=========================
Profiling: reduce_
=========================
         11000000 function calls in 1.501 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000000    0.162    0.000    1.473    0.000 profiling.py:4(reduce_)
  8000000    0.461    0.000    0.461    0.000 profiling.py:5(<lambda>)
  1000000    0.850    0.000    1.311    0.000 {built-in method _functools.reduce}
  1000000    0.028    0.000    0.028    0.000 {method 'disable' of '_lsprof.Profiler' objects}


=========================
Profiling: for_loop
=========================
         11000000 function calls in 1.372 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000000    0.879    0.000    1.344    0.000 profiling.py:7(for_loop)
  1000000    0.145    0.000    0.145    0.000 {built-in method builtins.sum}
  8000000    0.320    0.000    0.320    0.000 {method 'append' of 'list' objects}
  1000000    0.027    0.000    0.027    0.000 {method 'disable' of '_lsprof.Profiler' objects}


=========================
Profiling: map_
=========================
         11000000 function calls in 1.470 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000000    0.264    0.000    1.442    0.000 profiling.py:14(map_)
  8000000    0.387    0.000    0.387    0.000 profiling.py:15(<lambda>)
  1000000    0.791    0.000    1.178    0.000 {built-in method builtins.sum}
  1000000    0.028    0.000    0.028    0.000 {method 'disable' of '_lsprof.Profiler' objects}


=========================
Profiling: list_comp
=========================
         4000000 function calls in 0.737 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000000    0.318    0.000    0.709    0.000 profiling.py:18(list_comp)
  1000000    0.261    0.000    0.261    0.000 profiling.py:19(<listcomp>)
  1000000    0.131    0.000    0.131    0.000 {built-in method builtins.sum}
  1000000    0.027    0.000    0.027    0.000 {method 'disable' of '_lsprof.Profiler' objects}

在我看来：

reduce和map通常相当慢。而且，对于map返回的迭代器使用sum来说是慢的，与在列表上求和相比
for_loop使用的是append，它在某种程度上显然很慢
列表推导不仅在构建列表上花费了最少的时间，而且与map相比，还使sum更快

- tjysdsg

15

您明确询问了map()，filter()和reduce()，但我假设您想了解函数式编程的一般情况。我自己在计算一组点内所有点之间距离的问题上进行了测试，使用内置的itertools模块中的starmap函数进行函数式编程稍微比使用for循环慢（实际上需要1.25倍的时间）。以下是我使用的示例代码：

import itertools, time, math, random

class Point:
    def __init__(self,x,y):
        self.x, self.y = x, y

point_set = (Point(0, 0), Point(0, 1), Point(0, 2), Point(0, 3))
n_points = 100
pick_val = lambda : 10 * random.random() - 5
large_set = [Point(pick_val(), pick_val()) for _ in range(n_points)]
    # the distance function
f_dist = lambda x0, x1, y0, y1: math.sqrt((x0 - x1) ** 2 + (y0 - y1) ** 2)
    # go through each point, get its distance from all remaining points 
f_pos = lambda p1, p2: (p1.x, p2.x, p1.y, p2.y)

extract_dists = lambda x: itertools.starmap(f_dist, 
                          itertools.starmap(f_pos, 
                          itertools.combinations(x, 2)))

print('Distances:', list(extract_dists(point_set)))

t0_f = time.time()
list(extract_dists(large_set))
dt_f = time.time() - t0_f

函数式版本是否比过程式版本更快？

def extract_dists_procedural(pts):
    n_pts = len(pts)
    l = []    
    for k_p1 in range(n_pts - 1):
        for k_p2 in range(k_p1, n_pts):
            l.append((pts[k_p1].x - pts[k_p2].x) ** 2 +
                     (pts[k_p1].y - pts[k_p2].y) ** 2)
    return l

t0_p = time.time()
list(extract_dists_procedural(large_set)) 
    # using list() on the assumption that
    # it eats up as much time as in the functional version

dt_p = time.time() - t0_p

f_vs_p = dt_p / dt_f
if f_vs_p >= 1.0:
    print('Time benefit of functional progamming:', f_vs_p, 
          'times as fast for', n_points, 'points')
else:
    print('Time penalty of functional programming:', 1 / f_vs_p, 
          'times as slow for', n_points, 'points')

- andreipmbcn

4

看起来回答这个问题的方式有些复杂。你能简化一下，让它更容易理解吗？ - Russia Must Remove Putin

3

@AaronHall，我觉得andreipmbcn的答案相当有趣，因为它是一个非常规的示例。我们可以尝试运行一下代码。 - Anthony Kong

@AaronHall，你想让我编辑文本段落，使其更加清晰明了，还是让我编辑代码？ - andreipmbcn

11

我编写了一个简单的脚本来测试速度，这是我的发现。实际上，在我的情况下，for循环最快。这真让我惊讶，看看下面的结果吧（我正在计算平方和）。

from functools import reduce
import datetime


def time_it(func, numbers, *args):
    start_t = datetime.datetime.now()
    for i in range(numbers):
        func(args[0])
    print (datetime.datetime.now()-start_t)

def square_sum1(numbers):
    return reduce(lambda sum, next: sum+next**2, numbers, 0)


def square_sum2(numbers):
    a = 0
    for i in numbers:
        i = i**2
        a += i
    return a

def square_sum3(numbers):
    sqrt = lambda x: x**2
    return sum(map(sqrt, numbers))

def square_sum4(numbers):
    return(sum([int(i)**2 for i in numbers]))


time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])

0:00:00.302000 #Reduce
0:00:00.144000 #For loop
0:00:00.318000 #Map
0:00:00.390000 #List comprehension

- alphiii

使用Python 3.6.1版本，性能差异不是很大；Reduce和Map的性能下降至0.24，列表推导式为0.29。而for循环的性能较高，达到了0.18。 - jjmerelo

1

在 square_sum4 中消除 int 也使其比 for 循环快得多，稍微慢一点。 - jjmerelo

3

这个例子不好，用来展示速度。for循环胜出是因为你让它相对于其他方法不浪费资源。每次使用map和reduce都会重新创建函数对象，这样会浪费资源 - 提取函数。在列表推导中，你做了无意义的事情，创建了一个一次性的list，然后将其传递给sum - 去掉括号。你还使用了自己实现的计时函数，而没有使用更准确的timeit模块。 - WloHu

谁计时自定义的time_it？ - mirekphd

7

我成功修改了一些@alpiii的代码，并发现列表推导式比for循环稍微快一些。这可能是由于int()，在列表推导式和for循环之间不太公平。

from functools import reduce
import datetime

def time_it(func, numbers, *args):
    start_t = datetime.datetime.now()
    for i in range(numbers):
        func(args[0])
    print (datetime.datetime.now()-start_t)

def square_sum1(numbers):
    return reduce(lambda sum, next: sum+next*next, numbers, 0)

def square_sum2(numbers):
    a = []
    for i in numbers:
        a.append(i*2)
    a = sum(a)
    return a

def square_sum3(numbers):
    sqrt = lambda x: x*x
    return sum(map(sqrt, numbers))

def square_sum4(numbers):
    return(sum([i*i for i in numbers]))

time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])

0:00:00.101122 #Reduce

0:00:00.089216 #For loop

0:00:00.101532 #Map

0:00:00.068916 #List comprehension

- Alisca Chen

6

在 Alphii 的回答上做一些改进，实际上使用 for 循环是第二好的选择，而且比使用 map 慢大约 6 倍。

from functools import reduce
import datetime


def time_it(func, numbers, *args):
    start_t = datetime.datetime.now()
    for i in range(numbers):
        func(args[0])
    print (datetime.datetime.now()-start_t)

def square_sum1(numbers):
    return reduce(lambda sum, next: sum+next**2, numbers, 0)


def square_sum2(numbers):
    a = 0
    for i in numbers:
        a += i**2
    return a

def square_sum3(numbers):
    a = 0
    map(lambda x: a+x**2, numbers)
    return a

def square_sum4(numbers):
    a = 0
    return [a+i**2 for i in numbers]

time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])

主要的更改是消除缓慢的sum调用，以及在最后一个案例中可能不必要的int()。将for循环和map放在同样的条件下会使它变得相当简单。请记住，lambda表达式是函数概念，理论上不应该具有副作用，但是，它们确实可以产生副作用，例如添加到a。

这种情况在Python 3.6.1、Ubuntu 14.04、Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz下测试结果如下：

0:00:00.257703 #Reduce
0:00:00.184898 #For loop
0:00:00.031718 #Map
0:00:00.212699 #List comprehension

- jjmerelo

4

square_sum3和square_sum4是不正确的。它们将不会给出总和。@alisca chen提供的答案是正确的。 - ShikharDua

2

我正在寻找关于'for'循环和'列表推导式'的性能信息，偶然发现了这个主题。 Python 3.11发布已经有几个月了（2022年10月），Python 3.11的主要特性之一是速度提升。 https://www.python.org/downloads/release/python-3110/

更快的CPython项目已经取得了一些令人兴奋的成果。 Python 3.11比Python 3.10快10-60％。平均而言，我们在标准基准测试套件上测量到了1.22倍的加速。有关详细信息，请参见更快的CPython。

我运行了最初由Alphi发布的相同代码，然后由jjmerelo“扭曲”过的代码。以下是Python3.10和Python3.11的结果：

    from functools import reduce
    import datetime
    
    def time_it(func, numbers, *args):
        start_t = datetime.datetime.now()
        for i in range(numbers):
            func(args[0])
        print(datetime.datetime.now()-start_t)
    
    def square_sum1(numbers):
        return reduce(lambda sum, next: sum+next**2, numbers, 0)
    
    
    def square_sum2(numbers):
        a = 0
        for i in numbers:
            a += i**2
        return a
    
    
    def square_sum3(numbers):
        a = 0
        map(lambda x: a+x**2, numbers)
        return a
    
    
    def square_sum4(numbers):
        a = 0
        return [a+i**2 for i in numbers]
    
    
    time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
    time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
    time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
    time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])

我尚未计算准确的百分比改进，但很明显，在这个特定实例中，性能提升似乎非常显著（快了3到4倍），唯一例外的是'map'函数，其性能改进微不足道。

#Python 3.10
0:00:00.221134  #Reduce
0:00:00.186307  #For
0:00:00.024311  #Map
0:00:00.206454  #List comprehension

#python3.11
0:00:00.072550  #Reduce
0:00:00.037168  #For
0:00:00.021702  #Map
0:00:00.058655  #List Comprehension

注意：我在运行WSL下的Windows 11上的Kali Linux虚拟机中运行了这个代码。我不确定如果在Linux实例上原生运行（裸机）是否可以获得更好的性能。

我的Kali Linux虚拟机规格如下：

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   39 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          8
On-line CPU(s) list:             0-7
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Core(TM) i7-6700T CPU @ 2.80GHz
CPU family:                      6
Model:                           94
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
Stepping:                        3
BogoMIPS:                        5615.99
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi ept vpid ept_ad fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
Virtualization:                  VT-x
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       128 KiB (4 instances)
L1i cache:                       128 KiB (4 instances)
L2 cache:                        1 MiB (4 instances)
L3 cache:                        8 MiB (1 instance)
Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Unknown: Dependent on hypervisor status
Vulnerability Tsx async abort:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown

- user14752375

你的四个函数都有不同的功能。4构建一个列表。3返回与4中相同值的迭代器。2和1执行相同的操作，但我猜想它们是为了展开循环，因为列表很短且静态。 - Josh from Qaribou

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user395760 · Accepted Answer

以下是基于经验的粗略指南和猜测。您应该使用timeit或分析您具体的用例以获得硬数据，这些数据有时可能与以下内容不符。

列表推导通常比完全等效的for循环（实际上构建了一个列表）快一点，最可能是因为它不必在每次迭代中查找列表及其append方法。但是，列表推导仍然执行字节码级别的循环：

>>> dis.dis(<the code object for `[x for x in range(10)]`>)
 1           0 BUILD_LIST               0
             3 LOAD_FAST                0 (.0)
       >>    6 FOR_ITER                12 (to 21)
             9 STORE_FAST               1 (x)
            12 LOAD_FAST                1 (x)
            15 LIST_APPEND              2
            18 JUMP_ABSOLUTE            6
       >>   21 RETURN_VALUE

使用列表推导式代替不构建列表的循环，无意义地累积一系列无意义的值，然后将其丢弃，通常会更慢，因为需要创建和扩展列表。列表推导式并不是比传统循环更快的魔法。

至于函数式列表处理函数：虽然它们是用 C 写的，可能比用 Python 编写的等效函数表现更好，但它们不一定是最快的选项。如果函数也是用 C 写的话，可以期望有所加速。但在大多数情况下，使用 lambda（或其他 Python 函数）会反复设置 Python 栈帧等开销，从而消耗掉任何节省。简单地在行内执行相同的工作，而不是调用函数（例如使用列表推导式代替 map 或 filter），通常会稍微快一些。

假设我正在开发一个需要使用 for 循环绘制复杂和巨大地图的游戏。这个问题肯定是相关的，因为如果列表推导式确实更快，那么它将是一个更好的选择，以避免卡顿（尽管代码视觉上更复杂）。

如果像这样的代码在良好的非“优化”Python中编写时仍然不够快，那么任何Python级别的微小优化都不足以使其足够快，您应该开始考虑转向C。尽管广泛的微观优化通常可以显着加速Python代码，但是这方面存在一个低（绝对意义上的）限制。此外，在达到这个限制之前，更具成本效益（15％的速度提升与相同努力下的300％的速度提升）的方法是咬紧牙关并编写一些C。