奇怪的性能结果--循环 vs 列表推导式和 zip()

Question

奇怪的性能结果--循环 vs 列表推导式和 zip()

pythonperformancelist-comprehension

8

我有一个非常简单的问题，但当我试图找出哪个解决方案更快时，却得到了一些奇怪的结果。

原始问题：给定两个列表ListA、ListB和一个常量k，删除所有两个列表之和为k的条目。

我用两种方法解决了这个问题：首先我尝试使用循环，然后我使用列表推导式和zip()来压缩和解压缩两个列表。

使用循环的版本。

def Remove_entries_simple(listA, listB, k):
    """ removes entries that sum to k """
    new_listA = []
    new_listB = []
    for index in range(len(listA)):
        if listA[index] + listB[index] == k:
            pass
        else:
            new_listA.append(listA[index])
            new_listB.append(listB[index])
    return(new_listA, new_listB)

使用列表推导和zip()函数的版本

def Remove_entries_zip(listA, listB, k):
    """ removes entries that sum to k using zip"""
    zip_lists = [(a, b) for (a, b) in zip(listA, listB) if not (a+b) == k]

    # unzip the lists
    new_listA, new_listB = zip(*zip_lists)
    return(list(new_listA), list(new_listB))

然后我尝试确定哪种方法更快。但是我得到了下面图表所示的结果（x轴：列表大小，y轴：运行它的平均时间，10 ** 3次重复）。由于某种原因，使用zip()版本总是在大约相同的位置跳跃 - 我在不同的机器上多次运行它。有人能解释一下可能导致这种奇怪行为的原因吗？

更新：我用来生成图表的代码。我使用了一个函数装饰器来运行每个问题1000次。

导入语句：

import random
import time
import matplotlib.pyplot as plt

函数装饰器：

def Repetition_Decorator(fun, Rep=10**2):
    ''' returns the average over Rep repetitions'''
    def Return_function(*args, **kwargs):
        Start_time = time.clock()
        for _ in range(Rep):
            fun(*args, **kwargs)
        return (time.clock() - Start_time)/Rep

return Return_function

创建图表的代码：

Zippedizip = []
Loops = []
The_Number = 10
Size_list = list(range(10, 1000, 10))

Repeated_remove_loop = Repetition_Decorator(Remove_entries_simple, Rep=10**3)
Repeated_remove_zip = Repetition_Decorator(Remove_entries_zip, Rep=10**3)

for size in Size_list:
    ListA = [random.choice(range(10)) for _ in range(size)]
    ListB = [random.choice(range(10)) for _ in range(size)]

    Loops.append(Repeated_remove_loop(ListA, ListB, The_Number))
    Zippedizip.append(Repeated_remove_zip(ListA, ListB, The_Number))

plt.xlabel('Size of List')
plt.ylabel('Averaged time in seconds')
plt.plot(Size_list, Loops, label="Using Loop")
plt.plot(Size_list, Zippedizip, label="Zip")
plt.legend(loc='upper left', shadow=False, fontsize='x-large')
plt.show()

更新-更新：感谢kaya3指出timeit模块。

为了尽可能接近我的原始代码，同时使用timeit模块，我创建了一个新的函数装饰器来计时代码。

新的装饰器：

def Repetition_Decorator_timeit(fun, Rep=10**2):                                                                                   
"""returns average over Rep repetitions with timeit"""                                                                         
    def Return_function(*args, **kwargs):                                                                                          
        partial_fun = lambda: fun(*args, **kwargs)                                                                                 
        return timeit.timeit(partial_fun, number=Rep) / Rep                                                                        
return Return_function

当我使用新的装饰器时，使用for循环的版本不受影响，但使用zip的版本不再跳跃。

到目前为止，我非常确定跳跃是由于我测量函数的方式而不是函数本身引起的。但是这个跳跃非常明显 - 在不同的机器上总是在相同的列表大小处发生 - 因此它不可能是巧合。有什么想法为什么会发生这种跳跃？

更新-更新-更新：

这与垃圾收集器有关，因为如果我使用 gc.disable() 禁用垃圾收集器，则两种测量方法都会给出相同的结果。

我在这里学到了什么：不要只靠自己测量执行时间。使用timeit模块来测量代码片段的性能。

- oldmansaur

有趣；请问您能否提供您用来分析它的代码？ - kaya3

我自己无法复现这种行为；使用 zip 的版本在大小从100到10,000的情况下始终快约30%。 - kaya3

你是在使用Python 2还是Python 3？这很重要，因为整个过程最好都使用迭代器完成。这个问题中没有任何需要创建临时列表的地方。你真的想要通过一些处理函数（根据你的标准进行删除）将两个并行列表传递到另一端。如果你在这里使用py2，那么你的当前实现会创建相当多的临时列表。 - Bill Huneke

我正在使用Python 3.6.8。 - oldmansaur

@kaya3 添加了生成图表的代码。感谢您的建议。 - oldmansaur

我看到了类似的模式，但zip仍然更快。Python 3.7在macOS上：https://i.imgur.com/WpNXsTI.png - AKX

2个回答

0

使用zip并行迭代列表，同时使用append进行求和的版本似乎更加一致。

我认为我们看到的是zip(*...)将可迭代对象展开成参数列表时存在某个阈值（可能是768个参数），超过这个阈值后会采用较慢的方法。

def Remove_entries_halfzip(listA, listB, k):
    """ removes entries that sum to k using zip"""
    new_listA = []
    new_listB = []
    for a, b in zip(listA, listB):
        if a + b != k:
            new_listA.append(a)
            new_listB.append(b)

    return (new_listA, new_listB)

你可以通过将附加函数设置为本地变量进一步微调优化，每次迭代保存属性查找：

def Remove_entries_halfzip_micro_opt(listA, listB, k):
    """ removes entries that sum to k using zip"""
    new_listA = []
    new_listB = []
    append_a = new_listA.append
    append_b = new_listB.append
    for a, b in zip(listA, listB):
        if a + b != k:
            append_a(a)
            append_b(b)

    return (new_listA, new_listB)

我也尝试了一个Numpy实现，但由于需要进行数据转换（每次调用都需要进行np.array()强制转换），所以速度较慢。如果你一直在使用Numpy数组，那么它会更快。

- AKX

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- kaya3 · Accepted Answer

这似乎是由于您测量运行时间的方式而导致的人为因素。我不知道什么原因会导致您的计时代码产生这种效果，但当我使用 timeit 来测量运行时间时，这种效果就会消失。我正在使用 Python 3.6.2。

我可以通过使用您的计时代码来一致地重现这种效果；我发现在我的机器上，zip 版本的运行时间在大约相同的阈值处跳跃，尽管它仍然比另一个版本稍快。

然而，当我使用 timeit 进行计时时，这种效果完全消失了：

这是使用timeit的代码；我尽可能地少更改了您的分析代码。

import timeit

Zippedizip = []
Loops = []
The_Number = 10
Size_list = list(range(10, 1000, 10))
Reps = 1000

for size in Size_list:
    ListA = [random.choice(range(10)) for _ in range(size)]
    ListB = [random.choice(range(10)) for _ in range(size)]

    remove_loop = lambda: Remove_entries_simple(ListA, ListB, The_Number)
    remove_zip = lambda: Remove_entries_zip(ListA, ListB, The_Number)

    Loops.append(timeit.timeit(remove_loop, number=Reps) / Reps)
    Zippedizip.append(timeit.timeit(remove_zip, number=Reps) / Reps)

# ...

所以我认为这是一个虚假的结果。话虽如此，我不明白你的计时代码中是什么导致了这个结果。我尝试简化你的计时代码，不使用装饰器或可变参数，并用更精确的time.perf_counter()替换了time.clock()，但这并没有改变任何事情。