Python在底层优化字典查找吗？

Question

Python在底层优化字典查找吗？

5

例如：

d = {"John": "Doe", "Paul": "Allen", "Bill": "Gates"}

想象一下，如果有几千万个这样的名字，所有名字都是以名字为唯一标识。

如果我想查看键 "Paul" 是否存在，那么它在底层执行了什么操作？

- AureliusPhi

2

Python 字典是哈希表。 - shx2

2个回答

2

Python 字典使用哈希表实现，所以平均查找时间为 O(1)，具体取决于哈希函数的强度。

参考资料：

Python 字典是否是哈希表的一个例子？

- Bill Lynch

1

平均时间复杂度为O(1)，而非最坏情况。 - shx2

一般来说，如果你要进行足够多的查找以至于它很重要，那么平均情况就是主导因素。除非你有可能遭受拒绝服务攻击。 - Mark Ransom

@MarkRansom：除了最坏的情况，也可以假设存在一个糟糕的哈希算法。 - Bill Lynch

1

@BillLynch：哈希不是问题，冲突才是。Python在字典中添加了一个随机种子来防御一类DOS攻击，这些攻击会利用最坏情况下创建特定字符串值以始终发生冲突的情况。例如，发送10000个这样的请求可以占用Web服务器。 - Martijn Pieters

@BillLynch，您是指类似于这个随机数生成器的哈希算法吗？http://dilbert.com/strips/comic/2001-10-25/ - Mark Ransom

是的，因为当所有键生成相同的哈希值时，它们都驻留在同一个桶中。因此，要查找特定的键值对，您必须检查该桶中的所有对，因此这又是线性搜索。 - JNevens

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- JNevens · Accepted Answer

Python的字典实现通过要求键对象提供“哈希”函数，将字典查找的平均复杂度降低到O(1)。这样的哈希函数获取键对象中的信息，并使用它生成一个整数，称为哈希值。然后使用此哈希值来确定应将此（键，值）对放置在哪个“桶”中。此查找功能的伪代码可能如下所示：

def lookup(d, key):
    '''dictionary lookup is done in three steps:
       1. A hash value of the key is computed using a hash function.

       2. The hash value addresses a location in d.data which is
          supposed to be an array of "buckets" or "collision lists"
          which contain the (key,value) pairs.

       3. The collision list addressed by the hash value is searched
         sequentially until a pair is found with pair[0] == key. The
         return value of the lookup is then pair[1].
   '''
   h = hash(key)                  # step 1
   cl = d.data[h]                 # step 2
   for pair in cl:                # step 3
       if key == pair[0]:
           return pair[1]
   else:
       raise KeyError, "Key %s not found." % key

From the Python Wiki