作为解决此类问题的更Pythonic的方法,使用{{link1:
collections.defaultdict
}}:
>>> from collections import defaultdict
>>> d=defaultdict(list)
>>> new=[i[:-1] for i in a]
>>> d=defaultdict(list)
>>> for i,j in enumerate(new):
... d[j].append(i)
...
>>> d
defaultdict(<type 'list'>, {'1CDABCABD': [0, 1, 2], '1DDAABBBB': [4], '1BCABCCCA': [3, 5]})
>>> d.items()
[('1CDABCABD', [0, 1, 2]), ('1DDAABBBB', [4]), ('1BCABCCCA', [3, 5])]
请注意,defaultdict是一种线性解决方案,比itertools.groupby和sorted更有效。
另外,您可以使用dict.setdefault方法:
>>> d={}
>>> for i,j in enumerate(new):
... d.setdefault(j,[]).append(i)
...
>>> d
{'1CDABCABD': [0, 1, 2], '1DDAABBBB': [4], '1BCABCCCA': [3, 5]}
更多细节请查看以下基准测试,它比原来快约4倍:
s1="""
from itertools import groupby
a = ['1CDABCABDA', '1CDABCABDB', '1CDABCABDD', '1BCABCCCAA', '1DDAABBBBA', '1BCABCCCAD']
key = lambda i: a[i][:-1]
indexes = sorted(range(len(a)), key=key)
result = [[x, list(y)] for x, y in groupby(indexes, key=key)]
"""
s2="""
a = ['1CDABCABDA', '1CDABCABDB', '1CDABCABDD', '1BCABCCCAA', '1DDAABBBBA', '1BCABCCCAD']
new=[i[:-1] for i in a]
d={}
for i,j in enumerate(new):
d.setdefault(j,[]).append(i)
d.items()
"""
print ' first: ' ,timeit(stmt=s1, number=100000)
print 'second : ',timeit(stmt=s2, number=100000)
结果:
first: 0.949549913406
second : 0.250894069672
['1DDAABBBB',[4]]
中的4代表什么? - Mazdak