Python字典查询性能：get vs in

Question

Python字典查询性能：get vs in

9

这不是过早的优化。我的用例在最内层循环中一直运行，需要对字典进行反复检查。此外，这种方式在智力上也令人不安（见结果）。

这些方法中哪个更快？

mydict = { 'hello': 'yes', 'goodbye': 'no' }
key = 'hello'

# (A)
if key in mydict:
    a = mydict[key]
    do_things(a)
else:
    handle_an_error()

# vs (B)
a = mydict.get(key,None)
if a is not None:
    do_things(a)
else:
    handle_an_error()

编辑：这两个速度是相同的。常识告诉我（B）应该明显更快，因为它只有一个字典查找而不是2个，但结果是不同的。我正在思考。

基准测试结果平均值，共进行12次运行，其中一半是命中，另一半是未命中。

doing in
switching to get
total time for IN:  0.532250006994
total time for GET:  0.480916659037
times found: 12000000
times not found: 12000000

当运行类似的代码(*10个循环)，却从未找到密钥时，

doing in
switching to get
total time for IN:  2.35899998744
total time for GET:  4.13858334223

为什么！？

正确的代码

import time
smalldict = {}
for i in range(10):
    smalldict[str(i*4)] = str(i*18)

smalldict["8"] = "hello"

bigdict = {}
for i in range(10000):
    bigdict[str(i*100)] = str(i*4123)
bigdict["hello"] = "yes!"

timetotal = 0
totalin = 0
totalget = 0
key = "hello"
found= 0
notfound = 0

ddo = bigdict # change to smalldict for small dict gets
print 'doing in'


for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(1000000):
        if a == 0:
            if str(key) in ddo:
                found = found + 1
                foo = ddo[str(key)]
            else:
                notfound = notfound + 1
                foo = "nooo"
        else:
            if 'yo' in ddo:
                found = found + 1
                foo = ddo['yo']
            else:
                notfound = notfound + 1
                foo = "nooo"
    timetotal = timetotal + (time.time() - start)

totalin = timetotal / 12.0 

print 'switching to get'
timetotal = 0
for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(1000000):
        if a == 0:
            foo = ddo.get(key,None)
            if foo is not None:
                found = found + 1
            else:
                notfound = notfound + 1
                foo = "nooo"
        else:
            foo = ddo.get('yo',None)
            if foo is not None:
                found = found + 1
                notfound = notfound + 1
            else:
                notfound = notfound + 1
                foo = "oooo"
    timetotal = timetotal + (time.time() - start)

totalget = timetotal / 12

print "total time for IN: ", totalin
print 'total time for GET: ', totalget
print 'times found:', found
print 'times not found:', notfound

(translated) 代码 import time smalldict = {} for i in range(10): smalldict[str(i*4)] = str(i*18)

smalldict["8"] = "hello"

bigdict = {}
for i in range(10000):
    bigdict[str(i*100)] = str(i*4123)
bigdict["8000"] = "hello"

timetotal = 0
totalin = 0
totalget = 0
key = "hello"
found= 0
notfound = 0

ddo = bigdict # change to smalldict for small dict gets
print 'doing in'
for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(10000000):
        if a == 0:
            if key in ddo:
                foo = ddo[key]
            else:
                foo = "nooo"
        else:
            if 'yo' in ddo:
                foo = ddo['yo']
            else:
                foo = "nooo"
    timetotal = timetotal + (time.time() - start)

totalin = timetotal / 12.0 

print 'switching to get'
timetotal = 0
for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(10000000):
        if a == 0:
            foo = ddo.get(key,None)
            if foo is not None:
                # yaaay
                pass
            else:
                foo = "nooo"
        else:
            foo = ddo.get('yo',None)
            if foo is not None:
                #yaaay
                pass
            else:
                foo = "oooo"
    timetotal = timetotal + (time.time() - start)

totalget = timetotal / 12

print "total time for IN: ", totalin
print 'total time for GET: ', totalget

- std''OrgnlDave

1

看起来你并没有计时字典中键存在的情况，所以 in 代码实际上从未进行双重查找。 - user2357112

@user2357112：观察得很好。我想成员函数查找为“get”完成了其余部分。通过预先查看“get”来减少了10％的时间：g = ddo.get 然后调用 foo = g('yo',None) - Jean-François Fabre

1

查找属性并调用必须返回对象或默认值的方法永远不会像只检查键是否存在的单个操作码那样快。但是，请使用 timeit 模块进行正确的时间测试。 - Martijn Pieters

你的测试似乎有不同的缺陷。为什么要使用 if，而 dict.get() 可以处理真假两种情况呢？只需使用 foo = ddo.get(key, 'nooo')，根本不需要使用 if / else 测试。这就是 dict.get() 的设计目的，而进行 in 测试加上 if var = dict[key] / else var = 'default' 分支将会失去意义。 - Martijn Pieters

1

还要注意，你可能会更好地将内部循环从range切换到xrange...你很可能会花费大量时间来构建具有1000万个元素的列表... - mgilson

@MartijnPieters 因为 foo = 'nooo' 并不是在失败时实际执行的工作。事实上，它可能包括使用异常终止具有约100个这些方法的一个方法。 - std''OrgnlDave

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mgilson · Accepted Answer

我们可以做一些更好的时间安排：

import timeit

d = dict.fromkeys(range(10000))

def d_get_has(d):
    return d.get(1)

def d_get_not_has(d):
    return d.get(-1)

def d_in_has(d):
    if 1 in d:
        return d[1]

def d_in_not_has(d):
    if -1 in d:
        return d[-1]


print timeit.timeit('d_get_has(d)', 'from __main__ import d, d_get_has')
print timeit.timeit('d_get_not_has(d)', 'from __main__ import d, d_get_not_has')
print timeit.timeit('d_in_has(d)', 'from __main__ import d, d_in_has')
print timeit.timeit('d_in_not_has(d)', 'from __main__ import d, d_in_not_has')

在我的电脑上，“in”变体比使用.get的方式更快。这可能是因为.get是对字典进行属性查找，而属性查找很可能和字典上的成员测试一样昂贵。请注意，使用in和dict[x]进行项目查找可以直接在字节码中完成，因此正常的方法查找可以被绕过...

还值得指出的是，如果我只是使用pypy，我可以获得巨大的优化：-）

$ python ~/sandbox/test.py
0.169840812683
0.1732609272
0.122044086456
0.0991759300232

$ pypy ~/sandbox/test.py
0.00974893569946
0.00752687454224
0.00812077522278
0.00686597824097