将两个列表合并为一个字典，并对第二个列表的元素求和

Question

将两个列表合并为一个字典，并对第二个列表的元素求和

3

如果我有两个列表（长度相同）：

ls1 = ['a','b','c','a','d','c']
ls2 = [1,2,3,5,1,2]

我希望得到以下字典（如果是相同的键，则对值求和）：

d = {'a':6,'b':2,'c':5,'d':1}

我做了以下事情：

ls1 = np.array(ls1)
ls2 = np.array(ls2)
uniqe_vals = list(set(ls1))
d = {}
for u in uniqe_vals:
    ind = np.where(ls1 == u)[0]
    d[u] = sum(ls2[ind])

对于小数据，它运行良好，但是处理整个数据（我有一个大小约为500万的列表）需要太长时间。你有没有更有效的方法建议？

- Hadas

1

我真的不太清楚numpy数组和np.where有多高效，但如果没有numpy，我会将d设置为defaultdict(int)，然后简单地迭代izip(ls1,ls2)并将来自ls2的值添加到具有ls1键的字典元素中。这只是一个猜测，可能比你的解决方案要低效得多。 - L3viathan

3个回答

2

您可以尝试以下方法：

import numpy as np
uni, i = np.unique(ls1, return_inverse=1)
vals = np.bincount(i, ls2)
dict(zip(uni, vals))

- tillsten

1

既然您问如何使其更有效率，我将您原始解决方案消耗的时间与我的评论中建议的版本（等同于Juergen的第二个解决方案）进行了比较，使用我shell的time函数，测试了500万个a-z之间随机字符作为键和500万个0-20之间随机值：

~/test $ time python defdict.py
defaultdict(<type 'int'>, {'a': 381956, 'c': 383815, 'b': 378277, 'e': 384629, 'd': 383557, 'g': 381139, 'f': 386268, 'i': 383902, 'h': 385809, 'k': 385138, 'j': 384690, 'm': 388552, 'l': 384393, 'o': 384533, 'n': 385011, 'q': 385685, 'p': 386188, 's': 387132, 'r': 383886, 'u': 386176, 't': 387144, 'w': 386371, 'v': 388263, 'y': 381337, 'x': 385281, 'z': 384048})
python defdict.py  13,24s user 0,35s system 96% cpu 14,045 total

~/test $ time python original.py
{'a': 386316, 'c': 383596, 'b': 383424, 'e': 385598, 'd': 383324, 'g': 382233, 'f': 385435, 'i': 386761, 'h': 384047, 'k': 386640, 'j': 386313, 'm': 381032, 'l': 383035, 'o': 389142, 'n': 385000, 'q': 386088, 'p': 387435, 's': 385429, 'r': 384260, 'u': 385442, 't': 384793, 'w': 385052, 'v': 380830, 'y': 386500, 'x': 386871, 'z': 379870}
python original.py  14,68s user 0,38s system 96% cpu 15,529 total

因此似乎存在一些差异，尽管不是很大。为了使其更加公平，在defdict.py中也导入了numpy。

- L3viathan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Juergen · Accepted Answer

同样使用defaultdict，但是方法不同且更加简单：

from collections import defaultdict

d = defaultdict(int)
for n, v in zip(ls1, ls2):
   d[n] += v

或者，建议如下：

from collections import defaultdict
from itertools   import izip

d = defaultdict(int)
for n, v in izip(ls1, ls2):
   d[n] += v