在列表中删除特定的连续重复项

Question

在列表中删除特定的连续重复项

4

我有一个字符串列表，类似于这样：

['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz']

我想用一个单独的 '**' 替换 '**', '**'，但保留 'bar', 'bar'。也就是说，将任何连续数量的 '**' 替换为一个。我的当前代码如下：

p = ['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz']
np = [p[0]]
for pi in range(1,len(p)):
  if p[pi] == '**' and np[-1] == '**':
    continue
  np.append(p[pi])

有没有更符合Python风格的方法来处理这个问题？

- user408952

如果有三个 ** 连在一起会发生什么？它们应该缩减为两个 ** 还是一个？ - Daniel Stutzbach

@Daniel： "即用单个''替换任何连续的''数。" - senderle

6个回答

3

在我看来，这是符合Python风格的。

result = [v for i, v in enumerate(L) if L[i:i+2] != ["**", "**"]]

唯一的“诡计”是当 i == len(L)-1 时，L[i:i+2] 是一个元素列表。

请注意，当然可以将相同的表达式用作生成器。

- 6502

如果您使用enumerate，它将更符合Python的风格：[v for i, v in enumerate(L) if L[i:i+2] != ['**', '**']]。 - John Machin

1

这个可以工作。不确定它有多符合Python的风格。

import itertools

p = ['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz']

q = []
for key, iter in itertools.groupby(p):
    q.extend([key] * (1 if key == '**' else len(list(iter))))

print(q)

- Tom Zych

1

使用 groupby 是一个好的选择，但是你能不能直接这样做呢？q.extend([key] if key == '**' else list(iter))？ - senderle

噢，说得好。我对itertools还不太熟悉，所以我改编了一个我找到的食谱。谢谢！ - Tom Zych

@eyquem，我鼓励Tom Zych编辑他的答案，但我认为基于groupby的一般思路是最好的解决方案——正如您在答案中基本上复制它的方式一样。 - senderle

1

from itertools import groupby

p = ['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz']
keep = set(['foo',  'bar', 'baz'])
result = []

for k, g in groupby(p):
    if k in keep:
        result.extend(list(g))
    else:
        result.append(k)

- dugres

很遗憾你没有使用一个有用的测试，如果“k == '**'”而不是一个无用的集合。 - eyquem

@eyquem：使用集合可以使解决方案具有可扩展性/可维护性，您可以轻松地确定要“保留”或要“压缩”的内容。 - dugres

1

一个没有使用 itertools.groupby() 的解决方案：

p = ['**', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'baz', '**', '**',
     'foo', '*','*', 'bar', 'bar','bar', '**', '**','foo','bar',]

def treat(A):
    prec = A[0]; yield prec
    for x in A[1:]:
        if (prec,x)!=('**','**'):  yield x
        prec = x

print p
print
print list(treat(p))

结果

['**', 'foo', '*', 'bar', 'bar', '**', '**', '**',  
 'baz', '**', '**',
 'foo', '*', '*', 'bar', 'bar','bar', '**', '**',
 'foo', 'bar']


['**', 'foo', '*', 'bar', 'bar', '**',
 'baz', '**',
 'foo', '*', '*', 'bar', 'bar', 'bar', '**',
 'foo', 'bar']

另一种解决方案，灵感来自dugres

from itertools import groupby

p = ['**', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'baz', '**', '**',
     'foo', '*','*', 'bar', 'bar','bar', '**', '**','foo','bar',]

res = []
for k, g in groupby(p):
    res.extend(  ['**'] if k=='**' else list(g) )    
print res

这就像Tom Zych的解决方案，但更简单

.

编辑

p = ['**','**', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'baz', '**', '**',
     'foo', '*','*', 'bar', 'bar','bar', '**', '**','foo','bar', '**', '**', '**']


q= ['**',12,'**',45, 'foo',78, '*',751, 'bar',4789, 'bar',3, '**', 5,'**',7, '**',
    73,'baz',4, '**',8, '**',20,'foo', 8,'*',36,'*', 36,'bar', 11,'bar',0,'bar',9,
    '**', 78,'**',21,'foo',27,'bar',355, '**',33, '**',37, '**','end']

def treat(B,dedupl):
    B = iter(B)
    prec = B.next(); yield prec
    for x in B:
        if not(prec==x==dedupl):  yield x
        prec = x

print 'gen = ( x for x in q[::2])'
gen = ( x for x in q[::2])
print 'list(gen)==p is ',list(gen)==p
gen = ( x for x in q[::2])
print 'list(treat(gen)==',list(treat(gen,'**'))

ch = '??h4i4???4t4y?45l????hmo4j5???'
print '\nch==',ch
print "''.join(treat(ch,'?'))==",''.join(treat(ch,'?'))

print "\nlist(treat([],'%%'))==",list(treat([],'%%'))

结果

gen = ( x for x in q[::2])
list(gen)==p is  True
list(treat(gen)== ['**', 'foo', '*', 'bar', 'bar', '**', 'baz', '**', 'foo', '*', '*', 'bar', 'bar', 'bar', '**', 'foo', 'bar', '**']

ch== ??h4i4???4t4y?45l????hmo4j5???
''.join(treat(ch,'?'))== ?h4i4?4t4y?45l?hmo4j5?

list(treat([],'%%'))== []

.

备注：生成器函数允许通过在调用生成器周围编写来将输出适应输入类型，它不需要更改生成器函数的内部代码；

而 Tom Zynch 的解决方案并非如此容易地适应输入类型。

.

编辑 2

我搜索了一种使用列表推导或生成器表达式的单行方法。

我发现有两种方法可以做到这一点，我认为没有 groupby() 是不可能的。

from itertools import groupby
from operator import concat

p = ['**', '**','foo', '*', 'bar', 'bar', '**', '**', '**',
     'bar','**','foo','sun','sun','sun']
print 'p==',p,'\n'

dedupl = ("**",'sun')
print 'dedupl==',repr(dedupl)

print [ x for k, g in groupby(p) for x in ((k,) if k in dedupl else g) ]

# or

print reduce(concat,( [k] if k in dedupl else list(g) for k, g in groupby(p)),[])

基于同样的原理，将dugres函数转换为生成器函数非常容易：

from itertools import groupby

def compress(iterable, to_compress):
    for k, g in groupby(iterable):
        if k in to_compress:
            yield k
        else:
            for x in g: yield x

然而，这个生成器函数有两个缺点：

它使用了函数groupby()，对于不熟悉Python的人来说不容易理解
它的执行时间比我的生成器函数treat()和John Machin的生成器函数长，它们都没有使用groupby()。

我稍微修改了它们，使它们能够接受一个要去重的项目序列，并测量了执行时间：

from time import clock
from itertools import groupby

def squeeze(iterable, victims, _dummy=object()):
    if hasattr(iterable, '__iter__') and not hasattr(victims, '__iter__'):
        victims = (victims,)
    previous = _dummy
    for item in iterable:
        if item in victims and item==previous:
            continue
        previous = item
        yield item

def treat(B,victims):
    if hasattr(B, '__iter__') and not hasattr(victims, '__iter__'):
        victims = (victims,)
    B = iter(B)
    prec = B.next(); yield prec
    for x in B:
        if x  not in victims or x!=prec:  yield x
        prec = x

def compress(iterable, to_compress):
    if hasattr(iterable, '__iter__') and not hasattr(to_compress, '__iter__'):
        to_compress = (to_compress,)
    for k, g in groupby(iterable):
        if k in to_compress:
            yield k
        else:
            for x in g: yield x

p = ['**', '**','su','foo', '*', 'bar', 'bar', '**', '**', '**',
     'su','su','**','bin', '*','*','bar','bar','su','su','su']

n = 10000

te = clock()
for i in xrange(n):
    a = list(compress(p,('**','sun')))
print clock()-te,'  generator function with groupby()'

te = clock()
for i in xrange(n):
    b = list(treat(p,('**','sun')))
print clock()-te,'  generator function eyquem'


te = clock()
for i in xrange(n):
    c = list(squeeze(p,('**','sun')))
print clock()-te,'  generator function John Machin'

print p
print 'a==b==c is ',a==b==c
print a

指令

if hasattr(iterable, '__iter__') and not hasattr(to_compress, '__iter__'):
    to_compress = (to_compress,)

当可迭代参数为序列而另一个参数只有一个字符串时，有必要避免出现错误：这后者需要被修改成容器，前提是可迭代参数本身不是字符串。

这是基于这样一个事实：元组、列表、集合等序列有方法iter，但字符串没有。以下代码展示了问题：

def compress(iterable, to_compress):
    if hasattr(iterable, '__iter__') and not hasattr( to_compress, '__iter__'):
        to_compress = (to_compress,)
    print 't_compress==',repr(to_compress)
    for k, g in groupby(iterable):
        if k in to_compress:
            yield k
        else:
            for x in g: yield x


def compress_bof(iterable, to_compress):
    if not hasattr(to_compress, '__iter__'): # to_compress is a string
        to_compress = (to_compress,)
    print 't_compress==',repr(to_compress)
    for k, g in groupby(iterable):
        if k in to_compress:
            yield k
        else:
            for x in g: yield x


def compress_bug(iterable, to_compress_bug):
    print 't_compress==',repr(to_compress_bug)
    for k, g in groupby(iterable):
        #print 'k==',k,k in to_compress_bug
        if k in to_compress_bug:
            yield k
        else:
            for x in g: yield x


q = ';;;htr56;but78;;;;$$$$;ios4!'
print 'q==',q
dedupl = ";$"
print 'dedupl==',repr(dedupl)
print

print "''.join(compress    (q,"+repr(dedupl)+")) :\n",''.join(compress    (q,dedupl))+\
      ' <-CORRECT ONE'
print
print "''.join(compress_bof(q,"+repr(dedupl)+")) :\n",''.join(compress_bof(q,dedupl))+\
      '  <====== error ===='
print
print "''.join(compress_bug(q,"+repr(dedupl)+")) :\n",''.join(compress_bug(q,dedupl))

print '\n\n\n'


q = [';$', ';$',';$','foo', ';', 'bar','bar',';',';',';','$','$','foo',';$12',';$12']
print 'q==',q
dedupl = ";$12"
print 'dedupl==',repr(dedupl)
print
print 'list(compress    (q,'+repr(dedupl)+')) :\n',list(compress    (q,dedupl)),\
      ' <-CORRECT ONE'
print
print 'list(compress_bof(q,'+repr(dedupl)+')) :\n',list(compress_bof(q,dedupl))
print
print 'list(compress_bug(q,'+repr(dedupl)+')) :\n',list(compress_bug(q,dedupl)),\
      '  <====== error ===='
print

结果

q== ;;;htr56;but78;;;;$$$$;ios4!
dedupl== ';$'

''.join(compress    (q,';$')) :
t_compress== ';$'
;htr56;but78;$;ios4! <-CORRECT ONE

''.join(compress_bof(q,';$')) :
t_compress== (';$',)
;;;htr56;but78;;;;$$$$;ios4!  <====== error ====

''.join(compress_bug(q,';$')) :
t_compress== ';$'
;htr56;but78;$;ios4!




q== [';$', ';$', ';$', 'foo', ';', 'bar', 'bar', ';', ';', ';', '$', '$', 'foo', ';$12', ';$12']
dedupl== ';$12'

list(compress    (q,';$12')) :
t_compress== (';$12',)
[';$', ';$', ';$', 'foo', ';', 'bar', 'bar', ';', ';', ';', '$', '$', 'foo', ';$12']  <-CORRECT ONE

list(compress_bof(q,';$12')) :
t_compress== (';$12',)
[';$', ';$', ';$', 'foo', ';', 'bar', 'bar', ';', ';', ';', '$', '$', 'foo', ';$12']

list(compress_bug(q,';$12')) :
t_compress== ';$12'
[';$', 'foo', ';', 'bar', 'bar', ';', '$', 'foo', ';$12']   <====== error ====

我获得了以下执行时间：

0.390163274941   generator function with groupby()
0.324547114228   generator function eyquem
0.310176572721   generator function John Machin
['**', '**', 'su', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'su', 'su', '**', 'bin', '*', '*', 'bar', 'bar', 'su', 'su', 'su']
a==b==c is  True
['**', 'su', 'foo', '*', 'bar', 'bar', '**', 'su', 'su', '**', 'bin', '*', '*', 'bar', 'bar', 'su', 'su', 'su']

我更喜欢John Machin的解决方案，因为它没有像我的解决方案一样的指令B = iter(B)。

但是，previous = _dummy与_dummy = object()的指令对我来说看起来很奇怪。因此，最终我认为更好的解决方案是以下代码，即使使用字符串作为可迭代参数也可以正常工作，在其中定义的第一个对象previous不是假的：

def squeeze(iterable, victims):
    if hasattr(iterable, '__iter__') and not hasattr(victims, '__iter__'):
        victims = (victims,)
    for item in iterable:
        previous = item
        break
    for item in iterable:
        if item in victims and item==previous:
            continue
        previous = item
        yield item

.

编辑 3

我曾经理解object()被用作哨兵。

但是我对于object被调用的事实感到困惑。昨天，我认为object非常特殊，不可能出现在传递给squeeze()的任何可迭代对象中。因此，我想知道为什么您叫它John Machin，并且这让我对其性质产生了怀疑；这就是为什么我要求您确认object是超级元类的原因。

但是今天，我想我明白了为什么在您的代码中调用object。

实际上，object很可能出现在一个可迭代对象中，为什么不呢？超级元类object本身就是一个对象，因此在可迭代对象上进行去重之前，有可能已经将其放入了可迭代对象中。因此，使用object本身作为哨兵是不正确的做法。

.

所以你没有使用对象，而是使用了一个实例object()作为哨兵。

但我想知道为什么选择这个神秘的东西，即调用object的返回值是什么？

我的思考继续进行，我注意到了一个可能是这个调用原因的事情：

调用object会创建一个实例，因为object是Python中最基本的类，每次创建一个实例时，它都是与之前创建的任何实例不同的对象，并且其值始终不同于任何先前object的实例的值：

a = object()
b = object()
c = object()
d = object()

print id(a),'\n',id(b),'\n',id(c),'\n',id(d)

print a==b,a==c,a==d
print b==c,b==d,c==d

结果

10818752 
10818760 
10818768 
10818776
False False False
False False False

所以可以确定 _dummy=object() 是一个唯一的对象，具有唯一的id和唯一的值。顺便问一下，我想知道一个object实例的值是什么。无论如何，以下代码展示了使用_dummy=object存在的问题，而使用_dummy=object()则没有问题。

def imperfect_squeeze(iterable, victim, _dummy=object):
    previous = _dummy
    print 'id(previous)   ==',id(previous)
    print 'id(iterable[0])==',id(iterable[0])
    for item in iterable:
        if item in victim and item==previous:  continue
        previous = item; yield item

def squeeze(iterable, victim, _dummy=object()):
    previous = _dummy
    print 'id(previous)   ==',id(previous)
    print 'id(iterable[0])==',id(iterable[0])
    for item in iterable:
        if item in victim and item==previous:  continue
        previous = item; yield item

wat = object
li = [wat,'**','**','foo',wat,wat]
print 'imperfect_squeeze\n''li before ==',li
print map(id,li)
li = list(imperfect_squeeze(li,[wat,'**']))
print 'li after  ==',li
print


wat = object()
li = [wat,'**','**','foo',wat,wat]
print 'squeeze\n''li before ==',li
print map(id,li)
li = list(squeeze(li,[wat,'**']))
print 'li after  ==',li
print


li = [object(),'**','**','foo',object(),object()]
print 'squeeze\n''li before ==',li
print map(id,li)
li = list(squeeze(li,[li[0],'**']))
print 'li after  ==',li

结果

imperfect_squeeze
li before == [<type 'object'>, '**', '**', 'foo', <type 'object'>, <type 'object'>]
[505317320, 18578968, 18578968, 13208848, 505317320, 505317320]
id(previous)   == 505317320
id(iterable[0])== 505317320
li after  == ['**', 'foo', <type 'object'>]

squeeze
li before == [<object object at 0x00A514C8>, '**', '**', 'foo', <object object at 0x00A514C8>, <object object at 0x00A514C8>]
[10818760, 18578968, 18578968, 13208848, 10818760, 10818760]
id(previous)   == 10818752
id(iterable[0])== 10818760
li after  == [<object object at 0x00A514C8>, '**', 'foo', <object object at 0x00A514C8>]

squeeze
li before == [<object object at 0x00A514D0>, '**', '**', 'foo', <object object at 0x00A514D8>, <object object at 0x00A514E0>]
[10818768, 18578968, 18578968, 13208848, 10818776, 10818784]
id(previous)   == 10818752
id(iterable[0])== 10818768
li after  == [<object object at 0x00A514D0>, '**', 'foo', <object object at 0x00A514D8>, <object object at 0x00A514E0>]

问题在于经过 imperfect_squeeze() 处理后，列表中缺少 <type 'object'> 作为第一个元素。

然而，我们必须注意到，“问题”只可能出现在第一个元素是 object 的列表中：这对于如此微小的概率来说是很多反思……但是严谨的编码人员会考虑到所有情况。

如果我们使用 list 而不是 object，结果会有所不同：

def imperfect_sqlize(iterable, victim, _dummy=list):
    previous = _dummy
    print 'id(previous)   ==',id(previous)
    print 'id(iterable[0])==',id(iterable[0])
    for item in iterable:
        if item in victim and item==previous:  continue
        previous = item; yield item

def sqlize(iterable, victim, _dummy=list()):
    previous = _dummy
    print 'id(previous)   ==',id(previous)
    print 'id(iterable[0])==',id(iterable[0])
    for item in iterable:
        if item in victim and item==previous:  continue
        previous = item; yield item

wat = list
li = [wat,'**','**','foo',wat,wat]
print 'imperfect_sqlize\n''li before ==',li
print map(id,li)
li = list(imperfect_sqlize(li,[wat,'**']))
print 'li after  ==',li
print

wat = list()
li = [wat,'**','**','foo',wat,wat]
print 'sqlize\n''li before ==',li
print map(id,li)
li = list(sqlize(li,[wat,'**']))
print 'li after  ==',li
print

li = [list(),'**','**','foo',list(),list()]
print 'sqlize\n''li before ==',li
print map(id,li)
li = list(sqlize(li,[li[0],'**']))
print 'li after  ==',li

结果

imperfect_sqlize
li before == [<type 'list'>, '**', '**', 'foo', <type 'list'>, <type 'list'>]
[505343304, 18578968, 18578968, 13208848, 505343304, 505343304]
id(previous)   == 505343304
id(iterable[0])== 505343304
li after  == ['**', 'foo', <type 'list'>]

sqlize
li before == [[], '**', '**', 'foo', [], []]
[18734936, 18578968, 18578968, 13208848, 18734936, 18734936]
id(previous)   == 18734656
id(iterable[0])== 18734936
li after  == ['**', 'foo', []]

sqlize
li before == [[], '**', '**', 'foo', [], []]
[18734696, 18578968, 18578968, 13208848, 18735016, 18734816]
id(previous)   == 18734656
id(iterable[0])== 18734696
li after  == ['**', 'foo', []]

在Python中，除了object之外，还有其他对象具有这种特性吗？

John Machin，为什么您在生成器函数中选择了object的实例作为哨兵？您是否已经知道上述特性？

- eyquem

@eyquem：不是一个好的生成器：需要序列作为输入；在空序列上失败；复制（A[1：]）。 - John Machin

@John Machin 我想你的意思是它是一个生成器，但并不是很好的一个。我编辑了一下以考虑到你的友好评论。 - eyquem

@John Machin（1）为什么不符合Python风格？我添加了验证定义元素以进行去重的参数，因为我遇到了一个错误，即“iterable = [''，''，'sun'，''，''，'']”和“to_compress =''”，结果是“[''，'sun'，'']”而不是“['**'，'sun'，''，''，'*']”。 - eyquem

@John Machin (3) 是的，我会在某一天点赞的，我认为你比其他回答者更值得。但你知道：程序员都是懒惰的人... - eyquem

@John Machin关于(2)的问题，我想写一些作为编辑，因为评论是一个太小的空间来写。顺便说一句，"Knuth将放置在数据末尾的值称为虚拟值，而不是哨兵值。" (http://en.wikipedia.org/wiki/Sentinel_value) - eyquem

显示剩余4条评论

1

一个通用的“Pythonic”解决方案，适用于任何可迭代对象（无需备份、复制、索引或切片，即使可迭代对象为空也不会失败），并且可以压缩任何东西（包括None）。

>>> test = ['**', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'baz', '**', '**',
...      'foo', '*','*', 'bar', 'bar','bar', '**', '**','foo','bar',]
>>>
>>> def squeeze(iterable, victim, _dummy=object()):
...     previous = _dummy
...     for item in iterable:
...         if item == victim == previous: continue
...         previous = item
...         yield item
...
>>> print test
['**', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'baz', '**', '**', 'foo', '*'
, '*', 'bar', 'bar', 'bar', '**', '**', 'foo', 'bar']
>>> print list(squeeze(test, "**"))
['**', 'foo', '*', 'bar', 'bar', '**', 'baz', '**', 'foo', '*', '*', 'bar', 'bar
', 'bar', '**', 'foo', 'bar']
>>> print list(squeeze(["**"], "**"))
['**']
>>> print list(squeeze(["**", "**"], "**"))
['**']
>>> print list(squeeze([], "**"))
[]
>>>

更新：针对@eyquem的启示，他声称victim不能是一个序列（或者可能是一个集合）。

拥有受害者的容器意味着存在两种可能的语义：

>>> def squeeze2(iterable, victims, _dummy=object()):
...     previous = _dummy
...     for item in iterable:
...         if item == previous in victims: continue
...         previous = item
...         yield item
...
>>> def squeeze3(iterable, victims, _dummy=object()):
...     previous = _dummy
...     for item in iterable:
...         if item in victims and previous in victims: continue
...         previous = item
...         yield item
...
>>> guff = "c...d..e.f,,,g,,h,i.,.,.,.j"
>>> print "".join(squeeze2(guff, ".,"))
c.d.e.f,g,h,i.,.,.,.j
>>> print "".join(squeeze3(guff, ".,"))
c.d.e.f,g,h,i.j
>>>

- John Machin

@John Machin 我喜欢使用 continue ：当 item == victim == previous 为 True 时，指令 previous=item 不会被执行，这与我的代码不同。然而，有一个缺点：victim 不能是一个序列，就像 dugres 的答案中所示。 - eyquem

@John Machin 请问为什么你在 squeeze() 函数代码中没有直接定义 previous = object()，而是选择定义一个带有默认参数 object() 的 _dummy 参数。顺便问一下，object() 是什么？在这里的 object 是用于定义新式类的超元类吗？ - eyquem

@John Machin，使用指令 if item == victim == previous:，对象 victim 不能是 **('juju','10$',513)**。传递给参数 iterable 的参数可以是任何可迭代的内容，但不包括传递给参数 victim 的参数，这就是我的意思。 - eyquem

@John Machin 为什么要引入 squeeze3() ? - 引入 squeeze2() 意味着您同意 squeeze() 不能接受可迭代对象作为参数 victim，是吗？顺便说一下，我不知道写法 if item == previous in victims，Python 很好！ - eyquem

@John Machin：抱歉，我是新手，但为什么不将“previous”初始化为“None”？ - Mat M

显示剩余3条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- senderle · Accepted Answer

我不确定这是否符合Python的规范，但这应该有效并且更加简洁：

star_list = ['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz']
star_list = [i for i, next_i in zip(star_list, star_list[1:] + [None]) 
             if (i, next_i) != ('**', '**')]

上面的代码会复制列表两次；如果想避免这种情况，可以考虑使用Tom Zych的方法。或者，您可以按照以下方式进行操作：

from itertools import islice, izip, chain

star_list = ['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz']
sl_shift = chain(islice(star_list, 1, None), [None])
star_list = [i for i, next_i in izip(star_list, sl_shift) 
             if (i, next_i) != ('**', '**')]

这可以被概括并且可以使用 itertools 文档中的 pairwise 配方的变体，使其适合迭代器，并且更易读：

from itertools import islice, izip, chain, tee
def compress(seq, x):
    seq, shift = tee(seq)
    shift = chain(islice(shift, 1, None), (object(),))
    return (i for i, j in izip(seq, shift) if (i, j) != (x, x))

测试：

>>> list(compress(star_list, '**'))
['**', 'foo', '*', 'bar', 'bar', '**', 'baz']