TL;DR:在CPython 3.7中按键或值(分别)排序的最佳方法:
{k: d[k] for k in sorted(d)}
{k: v for k,v in sorted(d.items(), key=itemgetter(1))}
在MacBook上测试,使用sys.version
:
3.7.0b4 (v3.7.0b4:eb96c37699, May 2 2018, 04:13:13)
[Clang 6.0 (clang-600.0.57)]
使用包含1000个浮点数的字典进行一次设置:
>>> import random
>>> from operator import itemgetter
>>> random.seed(123)
>>> d = {random.random(): random.random() for i in range(1000)}
按键排序数字(从最好到最差):
>>> %timeit {k: d[k] for k in sorted(d)}
# 296 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit {k: d[k] for k in sorted(d.keys())}
# 306 µs ± 9.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit dict(sorted(d.items(), key=itemgetter(0)))
# 345 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit {k: v for k,v in sorted(d.items(), key=itemgetter(0))}
# 359 µs ± 2.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit dict(sorted(d.items(), key=lambda kv: kv[0]))
# 391 µs ± 8.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit dict(sorted(d.items()))
# 409 µs ± 9.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit {k: v for k,v in sorted(d.items())}
# 420 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit {k: v for k,v in sorted(d.items(), key=lambda kv: kv[0])}
# 432 µs ± 39.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
按数值排序(从高到低):
>>> %timeit {k: v for k,v in sorted(d.items(), key=itemgetter(1))}
# 355 µs ± 2.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit dict(sorted(d.items(), key=itemgetter(1)))
>>> %timeit {k: v for k,v in sorted(d.items(), key=lambda kv: kv[1])}
# 393 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit dict(sorted(d.items(), key=lambda kv: kv[1]))
>>> %timeit {k: d[k] for k in sorted(d, key=d.get)}
# 404 µs ± 3.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit {k: d[k] for k in sorted(d, key=d.__getitem__)}
>>> %timeit {k: d[k] for k in sorted(d, key=lambda k: d[k])}
一次性使用包含大量字符串的字典进行设置:
>>> import random
>>> from pathlib import Path
>>> from operator import itemgetter
>>> random.seed(456)
>>> words = Path('/usr/share/dict/words').read_text().splitlines()
>>> random.shuffle(words)
>>> keys = words.copy()
>>> random.shuffle(words)
>>> values = words.copy()
>>> d = dict(zip(keys, values))
>>> list(d.items())[:5]
[('ragman', 'polemoscope'),
('fenite', 'anaesthetically'),
('pycnidiophore', 'Colubridae'),
('propagate', 'premiss'),
('postponable', 'Eriglossa')]
>>> len(d)
235886
按键对字符串字典进行排序:
>>> %timeit {k: d[k] for k in sorted(d)}
>>> %timeit {k: d[k] for k in sorted(d.keys())}
>>> %timeit dict(sorted(d.items(), key=itemgetter(0)))
>>> %timeit dict(sorted(d.items(), key=lambda kv: kv[0]))
>>> %timeit {k: v for k,v in sorted(d.items(), key=itemgetter(0))}
>>> %timeit {k: v for k,v in sorted(d.items(), key=lambda kv: kv[0])}
>>> %timeit dict(sorted(d.items()))
>>> %timeit {k: v for k,v in sorted(d.items())}
按值排序字符串字典:
>>> %timeit {k: v for k,v in sorted(d.items(), key=itemgetter(1))}
>>> %timeit dict(sorted(d.items(), key=itemgetter(1)))
>>> %timeit dict(sorted(d.items(), key=lambda kv: kv[1]))
>>> %timeit {k: v for k,v in sorted(d.items(), key=lambda kv: kv[1])}
>>> %timeit {k: d[k] for k in sorted(d, key=d.__getitem__)}
>>> %timeit {k: d[k] for k in sorted(d, key=d.get)}
>>> %timeit {k: d[k] for k in sorted(d, key=lambda k: d[k])}
Note:现实世界的数据通常包含已排序序列的长串,这是Timsort算法可以利用的。如果字典排序在您的快速路径上,则建议使用自己平台上的典型数据进行基准测试,然后再得出任何有关最佳方法的结论。我在每个timeit结果前添加了一个注释字符(#
),以便IPython用户可以复制/粘贴整个代码块以在自己的平台上重新运行所有测试。
{k: dct[k] ...
而不是{k: v
并在keys()
的位置使用items()
。按值排序只需使用operator.itemgetter(1)
作为键即可。 - g.d.d.cdef sort(byValues = False)
的东西,所以默认情况下按键排序,但使用sort(True)
这样的调用可以按值排序(或类似的方式)。 - g.d.d.cdict(sorted(dct.items()))
。 - kindall