在Python3中，使用“not in”比使用“in”更快吗？

Question

在Python3中，使用“not in”比使用“in”更快吗？

5

假设我们正在解决一个简单的单词计数问题。有一个列表，我们正在尝试找到列表中每个单词出现的次数。在此情况下哪种模式更快？

book_title =  ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
word_count_dict = {}

模式 1

for word in book_title:
    if word in word_count_dict:
        word_count_dict[word] += 1
    else:
        word_count_dict[word] = 1

模式 2

for word in book_title:
    if word not in word_counter:
        word_counter[word] = 1
    else:
        word_counter[word] += 1

- IronMan

1

应该是一样的。两者都需要在哈希映射中进行搜索，对吧？ - Haris

1

不，它们应该是相同的成本。 - Haleemur Ali

我相信使用try-except会更快，因为我之前在某个地方读到过。不过我不确定。 - Gareth Ma

5个回答

4

六个一半打，两者应该大致相当-在计算方面，not操作几乎可以忽略不计（字面上是最便宜的操作），而像字典这样的哈希表中的in操作在常数时间内运行（要么哈希表中存在，要么不存在）。如果我们处理的是列表，则会以线性时间运行，但仍然在in和notin之间。另请参见python数据结构的计算复杂度。所以基本上，请使用使您的代码更易于理解的任何一个。

话虽如此，您是否考虑使用 collections.Counter，这是一个专为此目的设计的数据结构？

import collections
book_title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
word_counts = collections.Counter(book_title)
print(word_counts)
# Counter({'great': 2, 'the': 2, 'adventures': 2, 'of': 2, 'expectations': 1, 'sherlock': 1, 'holmes': 1, 'gasby': 1, 'hamlet': 1, 'huckleberry': 1, 'fin': 1})

你可以将 collections.Counter 强制转换为 dict，如果需要的话，实际上 collections.Counter 是 dict 的子类。它甚至有一个 .update() 方法，专门用于与其他计数器一起使用 - 如果你添加了另一本书的标题，只需将其输入到 Counter 中，然后使用 .update() 更新原始计数器即可。

- Green Cloak Guy

1

它们的成本大致相同。将not in运算符视为首先应用in运算符，然后对该结果应用逻辑not(几乎可以忽略不计)。

为了确认，这里有一个小实验，您可以执行以测量执行时间。

from time import time
book_title =  ['hi']*100000+['there']*10000
word_count_dict1 = {}
word_count_dict2 = {}
start = time()
for word in book_title:
    if word in word_count_dict1:
        word_count_dict1[word] += 1
    else:
        word_count_dict1[word] = 1
print(time()-start)
start = time()
for word in book_title:
    if word not in word_count_dict2:
        word_count_dict2[word] = 1
    else:
        word_count_dict2[word] += 1
print(time()-start)

输出（你的输出可能会有所不同）

0.021044015884399414
0.02713179588317871

- Ajay Dabas

0

基本上，它们的成本相同。来自Python 3 参考手册

not in 运算符被定义为具有 in 的反向真值。

- Iraj Hedayati

0

您可以检查模式和比较的操作时间。

import timeit

def pattern1():
  title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
  counts = {}
  for word in title:
    if word in counts:
      counts[word] += 1
    else:
      counts[word] = 1

def pattern2():
  title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
  counts = {}
  for word in title:
    if word not in counts:
      counts[word] = 1
    else:
      counts[word] += 1

sample1 = [timeit.timeit(pattern1, number=10000) for _ in range(10)]
sample2 = [timeit.timeit(pattern2, number=10000) for _ in range(10)]

print(sum(sample1) / len(sample1))
# 0.01713230140048836
print(sum(sample2) / len(sample2))
# 0.017954919600015273

正如我们所看到的，差异是微不足道的。

- PaxPrz

你能否给出一个适用于这个问题和样本输出的 pattern1 和 pattern2 的例子？否则，这个答案过于笼统，对 OP 没有太大帮助。 - jayelm

我本可以这样做@jayelm。但是他的pattern2有一个变量名为word_counter，而问题中没有定义它。因此，我不确定word_counter是否为空字典。 - PaxPrz

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Christopher Janzon · Accepted Answer

使用dis，您可以查看每个方法生成的字节码。

通过反汇编器运行您的代码后，我唯一能够看到的区别是在in和not in处，它们的字节码差异为：

COMPARE_OP 7 (not in) 或 COMPARE_OP 6 (in)

然后是POP_JUMP_IF_FALSE（即继续执行此条件的下一条指令）。

总之，不管比较返回true还是false，这两种方法似乎都具有相同数量的指令，因此很可能具有相同的执行速度。

然而，根据CPU指令更接近的潜在优化可能导致其中一种方法更快，但我认为这超出了此问题的范围。如果是这种情况，那么我相信在更大的列表上测量执行时间将证明哪种方法更快。

Python字节码之下，这两个指令的执行速度可能会因Python版本、构建、操作系统或体系结构而有所不同。您可能能够用Python源代码进行小改动，以使其中一种指令执行得更快。