Python中获取字典的键列表或键集的最有效方法是什么？

Question

Python中获取字典的键列表或键集的最有效方法是什么？

3

为了快速比较两个字典的键，我使用以下方法创建键的集合:

dict_1 = {"file_1":10, "file_2":20, "file_3":30, "file_4":40}
dict_2 = {"file_1":10, "file_2":20, "file_3":30}
set_1 = {file for file in dict_1}
set_2 = {file for file in dict_2}

我使用diff_set = set_1 - set_2查看set_2中缺少的键。

有更快的方法吗？我发现使用set(dict.keys())是一种更好的方法，因此我会切换到它-但它是否更有效率呢？

- Asi

diff_set = set(dict1) - set(dict2) - deceze

你不需要调用 keys()。set(dict) 就可以了。 - Barmar

3

dict_1.keys() - dict_2.keys() 将这两个字典看作集合。 - Chris Charley

@ChrisCharley 正确，但速度不会很快。大约需要1.06e-5的时间。 - Abhyuday Vaish

2个回答

2

最快、最有效的方法是：

diff_set = {*dict_1} - {*dict_2}

输出：

{'file_4'}

证明（执行时间比较）：

import timeit
    
dict_1 = {"file_1":10, "file_2":20, "file_3":30, "file_4":40}
dict_2 = {"file_1":10, "file_2":20, "file_3":30}

def method1():
    return {file for file in dict_1} - {file for file in dict_2}

def method2():
    return set(dict_1) - set(dict_2)

def method3():
    return set(dict_1.keys()) - set(dict_2.keys())

def method4():
    return dict_1.keys() - dict_2.keys()

def method5():
    return {*dict_1} - {*dict_2}


print(method1())
print(method2())
print(method3())
print(method4())
print(method5())

print(timeit.timeit(stmt = method1, number = 10000)/10000)
print(timeit.timeit(stmt = method2, number = 10000)/10000)
print(timeit.timeit(stmt = method3, number = 10000)/10000)
print(timeit.timeit(stmt = method4, number = 10000)/10000)
print(timeit.timeit(stmt = method5, number = 10000)/10000)

输出：

It took 1.6434900000149355e-06 sec for method 1
It took 8.317999999690073e-07 sec for method 2
It took 1.1994899999990594e-06 sec for method 3
It took 9.747700000389159e-07 sec for method 4
It took 8.049199999732082e-07 sec for method 5

- Abhyuday Vaish

1

一微秒可能是由于系统进程在执行某些操作。我会得出结论，如果没有更多的测试，它们是等价的。这只是我的个人意见。除此之外，答案非常棒。 - netskink

1

好的，如我所提到的：最好不要在时间内包含设置，并且最好不要只测量单个执行（我做了10000次，重复五次，然后重复所有这些20次）。使用这三个导入项，看起来你没有将其作为单个脚本运行，而是分别运行了三个脚本，在这种情况下，字符串哈希的不同盐可能会影响结果（当然，这也是我的缺陷，但我运行了多次，结果相似）。不确定要解释我的代码中的什么，有什么不清楚的吗？ - Kelly Bundy

@KellyBundy 感谢您的建议和意见。我已经编辑了我的答案。请看一下并告诉我是否正确？ - Abhyuday Vaish

是的，好多了。不过还有一些建议：1）更好的结果格式：不要使用科学计数法，这会强制我们检查指数，并且许多数字会分散注意力。2）运行多轮，以防止像进程在开始时没有得到足够的 CPU 分配，而在结束时得到了更多的情况，或者在其中一个方法中随机获得较少的 CPU 分配。另请参见 repeat 下的注释。3）我不会将其声明为“最快的”。也许还有更快的东西。 - Kelly Bundy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Kelly Bundy · Accepted Answer

让我们更好地测量（不仅仅是测量单个执行，也不包括设置），并包含更快的解决方案：

300 ns  300 ns  300 ns  {*dict_1} - {*dict_2}
388 ns  389 ns  389 ns  {file for file in dict_1 if file not in dict_2}
389 ns  390 ns  390 ns  dict_1.keys() - dict_2
458 ns  458 ns  458 ns  set(dict_1) - set(dict_2)
472 ns  472 ns  472 ns  dict_1.keys() - dict_2.keys()
665 ns  665 ns  668 ns  set(dict_1.keys()) - set(dict_2.keys())
716 ns  716 ns  716 ns  {file for file in dict_1} - {file for file in dict_2}

基准代码(在线测试!):

import timeit

setup = '''
dict_1 = {"file_1":10, "file_2":20, "file_3":30, "file_4":40}
dict_2 = {"file_1":10, "file_2":20, "file_3":30}
'''

codes = [
    '{file for file in dict_1} - {file for file in dict_2}',
    'set(dict_1) - set(dict_2)',
    'set(dict_1.keys()) - set(dict_2.keys())',
    'dict_1.keys() - dict_2',
    'dict_1.keys() - dict_2.keys()',
    '{*dict_1} - {*dict_2}',
    '{file for file in dict_1 if file not in dict_2}',
]

exec(setup)
for code in codes:
    print(eval(code))

tss = [[] for _ in codes]
for _ in range(20):
    print()
    for code, ts in zip(codes, tss):
        number = 10000
        t = min(timeit.repeat(code, setup, number=number)) / number
        ts.append(t)
    for code, ts in sorted(zip(codes, tss), key=lambda cs: sorted(cs[1])):
        print(*('%3d ns ' % (t * 1e9) for t in sorted(ts)[:3]), code)