你可以使用
heapq
模块,特别是它的
nlargest
或
nsmallest
函数。
或者只需构建堆并调用heappop()
。构建堆应该需要 O(n) 的时间,检索 k
个元素需要 O(k*log(n)) 的时间。
这里是一个非常简单而小巧的基准测试:
In [1]: import random, heapq
In [2]: seq = [random.randint(-5000, 5000) for _ in range(35000)]
In [3]: %timeit sorted(seq)[:75]
100 loops, best of 3: 14.5 ms per loop
In [4]: %%timeit
...: s = seq[:]
...: heapq.nsmallest(75, s)
...:
100 loops, best of 3: 4.05 ms per loop
In [5]: %%timeit
...: s = seq[:]
...: heapq.heapify(s)
...: for _ in range(75): heapq.heappop(s)
...:
100 loops, best of 3: 2.41 ms per loop
我不知道为什么nsmallest
比直接调用heappop
慢那么多。事实上,我本应该计时而不是复制seq
:
In [6]: %%timeit
...: heapq.nsmallest(75, seq)
...:
100 loops, best of 3: 3.82 ms per loop
将长度增加100倍:
In [12]: %timeit sorted(seq)[:75]
1 loops, best of 3: 1.9 s per loop
In [13]: %%timeit
...: heapq.nsmallest(75, seq)
...:
1 loops, best of 3: 352 ms per loop
In [14]: %%timeit
...: s = seq[:]
...: heapq.heapify(s)
...: for _ in range(75): heapq.heappop(s)
...:
1 loops, best of 3: 264 ms per loop
注意:为了对抗 F.J 的偏见刻板印象:
In [13]: a = list(range(1000000))
In [14]: random.shuffle(a)
In [15]: %timeit sorted(a)
1 loops, best of 3: 985 ms per loop
In [16]: %%timeit
...: s = a[:]
...: heapq.heapify(s)
...:
1 loops, best of 3: 284 ms per loop
如您所见,heapify
在处理包含1000000个元素的列表时比排序要快得多。
boring_list[:n]
就足够了。) - DSMsorted
。 - squiguy