假设我有一个字符串列表:
a = ['a', 'a', 'b', 'c', 'c', 'c', 'd']
我想列出在列表中至少连续出现两次的项目:
result = ['a', 'c']
我知道我必须使用for循环,但我想不出如何针对一行中重复的项目。 我该怎么做?
编辑:如果同一项在a中重复两次怎么办?那么set函数就无效了。
a = ['a', 'b', 'a', 'a', 'c', 'a', 'a', 'a', 'd', 'd']
result = ['a', 'a', 'd']
在这里尝试使用itertools.groupby()
:
>>> from itertools import groupby,islice
>>> a = ['a', 'a', 'b', 'c', 'c', 'c', 'b']
>>> [list(g) for k,g in groupby(a)]
[['a', 'a'], ['b'], ['c', 'c', 'c'], ['b']]
>>> [k for k,g in groupby(a) if len(list(g))>=2]
['a', 'c']
使用 islice()
:
>>> [k for k,g in groupby(a) if len(list(islice(g,0,2)))==2]
>>> ['a', 'c']
使用 zip()
和 izip()
:
In [198]: set(x[0] for x in izip(a,a[1:]) if x[0]==x[1])
Out[198]: set(['a', 'c'])
In [199]: set(x[0] for x in zip(a,a[1:]) if x[0]==x[1])
Out[199]: set(['a', 'c'])
timeit
的结果:
from itertools import *
a='aaaabbbccccddddefgggghhhhhiiiiiijjjkkklllmnooooooppppppppqqqqqqsssstuuvv'
def grp_isl():
[k for k,g in groupby(a) if len(list(islice(g,0,2)))==2]
def grpby():
[k for k,g in groupby(a) if len(list(g))>=2]
def chn():
set(x[1] for x in chain(izip(*([iter(a)] * 2)), izip(*([iter(a[1:])] * 2))) if x[0] == x[1])
def dread():
set(a[i] for i in range(1, len(a)) if a[i] == a[i-1])
def xdread():
set(a[i] for i in xrange(1, len(a)) if a[i] == a[i-1])
def inrow():
inRow = []
last = None
for x in a:
if last == x and (len(inRow) == 0 or inRow[-1] != x):
inRow.append(last)
last = x
def zipp():
set(x[0] for x in zip(a,a[1:]) if x[0]==x[1])
def izipp():
set(x[0] for x in izip(a,a[1:]) if x[0]==x[1])
if __name__=="__main__":
import timeit
print "islice",timeit.timeit("grp_isl()", setup="from __main__ import grp_isl")
print "grpby",timeit.timeit("grpby()", setup="from __main__ import grpby")
print "dread",timeit.timeit("dread()", setup="from __main__ import dread")
print "xdread",timeit.timeit("xdread()", setup="from __main__ import xdread")
print "chain",timeit.timeit("chn()", setup="from __main__ import chn")
print "inrow",timeit.timeit("inrow()", setup="from __main__ import inrow")
print "zip",timeit.timeit("zipp()", setup="from __main__ import zipp")
print "izip",timeit.timeit("izipp()", setup="from __main__ import izipp")
输出:
islice 39.9123107277
grpby 30.1204478987
dread 17.8041124706
xdread 15.3691785568
chain 17.4777339702
inrow 11.8577565327
zip 16.6348844045
izip 15.1468557105
结论:
根据比较,Poke的解决方案是最快的选择。a
,但保留每个元素的索引变量。使用enumerate()
很有用。for
循环中,从当前项目的索引开始一个while
循环。break
。>=2
,则将该项附加到result
中。>>> a = ['a', 'a', 'b', 'c', 'c', 'c', 'd']
>>> inRow = []
>>> last = None
>>> for x in a:
if last == x and (len(inRow) == 0 or inRow[-1] != x):
inRow.append(last)
last = x
>>> inRow
['a', 'c']
set([a[i] for i in range(1, len(a)) if a[i] == a[i-1]])
xrange
而不是range
,这样你可以节省一些时间和大量内存。 - kreativitea这里有一个Python一行代码,可以完成您想要的功能。它使用了 itertools
包:
from itertools import chain, izip
a = "aabbbdeefggh"
set(x[1] for x in chain(izip(*([iter(a)] * 2)), izip(*([iter(a[1:])] * 2))) if x[0] == x[1])
a="aabbbdeefggh"
时给出了错误的输出,我期望得到的是{'a', 'b', 'e', 'g'}
,但实际上只得到了{'a', 'b', 'e'}
。 - Ashwini Chaudharygroupby()
快得多,我在我的解决方案中发布了timeit
结果。 - Ashwini Chaudhary编辑后的问题要求避免使用set(),排除了大部分答案。
我想比较一下@poke的好老循环和我创建的另一个花哨的一行列表理解:
from itertools import *
a = 'aaaabbbccccaaaaefgggghhhhhiiiiiijjjkkklllmnooooooaaaaaaaaqqqqqqsssstuuvv'
def izipp():
return set(x[0] for x in izip(a, a[1:]) if x[0] == x[1])
def grpby():
return [k for k,g in groupby(a) if len(list(g))>=2]
def poke():
inRow = []
last = None
for x in a:
if last == x and (len(inRow) == 0 or inRow[-1] != x):
inRow.append(last)
last = x
return inRow
def dread2():
repeated_chars = []
previous_char = ''
for char in a:
if repeated_chars and char == repeated_chars[-1]:
continue
if char == previous_char:
repeated_chars.append(char)
else:
previous_char = char
return repeated_chars
if __name__=="__main__":
import timeit
print "izip",timeit.timeit("izipp()", setup="from __main__ import izipp"),''.join(izipp())
print "grpby",timeit.timeit("grpby()", setup="from __main__ import grpby"),''.join(grpby())
print "poke",timeit.timeit("poke()", setup="from __main__ import poke"),''.join(poke())
print "dread2",timeit.timeit("dread2()", setup="from __main__ import dread2"),''.join(dread2())
给我结果:
izip 13.2173779011 acbgihkjloqsuv
grpby 18.1190848351 abcaghijkloaqsuv
poke 11.8500328064 abcaghijkloaqsuv
dread2 9.0088801384 abcaghijkloaqsuv
因此,基本循环似乎比所有列表推导都要快,并且速度最高是groupby的两倍。然而,基本循环更复杂,难以阅读和编写,因此在大多数情况下,我可能会坚持使用groupby()。
>>> mylist = ['a', 'a', 'b', 'c', 'c', 'c', 'd', 'a', 'a']
>>> results = [match[0][0] for match in re.findall(r'((\w)\2{1,})', ''.join(mylist))]
>>> results
['a', 'c', 'a']
抱歉,太懒了不想计时。
a = ['a', 'a', 'b', 'c', 'c', 'c', 'd']
res=[]
for i in a:
if a.count(i)>1 and i not in res:
res.append(i)
print(res)
使用enumerate检查是否有连续两个:
def repetitives(long_list)
repeaters = []
for counter,item in enumerate(long_list):
if item == long_list[counter-1] and item not in repeaters:
repeaters.append(item)
return repeaters
f(a a a b c c c c b)
只会返回a c
。 - Blender