我有一些代码,类似于:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据条件将mylist
的内容拆分成另外两个列表。
我该如何更加优雅地完成这个操作?能否避免对mylist
进行两次独立迭代?这样做可以提高性能吗?
from itertools import tee
def unpack_args(fn):
return lambda t: fn(*t)
def separate(fn, lx):
return map(
unpack_args(
lambda i, ly: filter(
lambda el: bool(i) == fn(el),
ly)),
enumerate(tee(lx, 2)))
[even, odd] = separate(
lambda x: bool(x % 2),
[1, 2, 3, 4, 5])
print(list(even) == [2, 4])
print(list(odd) == [1, 3, 5])
def split(items, p):
groups = [[]]
for i in items:
if p(i):
groups.append([])
groups[-1].append(i)
return groups
使用方法:
split(range(1,11), lambda x: x % 3 == 0)
# gives [[1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
>>> images, anims = [[i for i in files if t ^ (i[2].lower() in IMAGE_TYPES) ] for t in (0, 1)]
>>> images
[('file1.jpg', 33, '.jpg')]
>>> anims
[('file2.avi', 999, '.avi')]
不确定这是否是一个好的方法,但也可以用这种方式来完成
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi')]
images, anims = reduce(lambda (i, a), f: (i + [f], a) if f[2] in IMAGE_TYPES else (i, a + [f]), files, ([], []))
简易生成器版本,尽可能少地在内存中保存值并仅调用pred
一次:
from collections import deque
from typing import Callable, TypeVar, Iterable
_T = TypeVar('_T')
def iter_split(pred: Callable[[_T], bool],
iterable: Iterable[_T]) -> tuple[Iterable[_T], Iterable[_T]]:
"""Split an iterable into two iterables based on a predicate.
The predicate will only be called once per element.
Returns:
A tuple of two iterables, the first containing all elements for which
the predicate returned True, the second containing all elements for
which the predicate returned False.
"""
iterator = iter(iterable)
true_values: deque[_T] = deque()
false_values: deque[_T] = deque()
def true_generator():
while True:
while true_values:
yield true_values.popleft()
for item in iterator:
if pred(item):
yield item
break
false_values.append(item)
else:
break
def false_generator():
while True:
while false_values:
yield false_values.popleft()
for item in iterator:
if not pred(item):
yield item
break
true_values.append(item)
else:
break
return true_generator(), false_generator()
我使用numpy
来解决这个问题,将其限制在几行内并转化为简单的函数。
我能够通过使用np.where
将列表分成两部分来满足条件。这适用于数字,但我认为可以扩展到字符串和列表。
这就是代码...
from numpy import where as wh, array as arr
midz = lambda a, mid: (a[wh(a > mid)], a[wh((a =< mid))])
p_ = arr([i for i in [75, 50, 403, 453, 0, 25, 428] if i])
high,low = midz(p_, p_.mean())
import sys
from itertools import ifilter
trustedPeople = sys.argv[1].split(',')
newName = sys.argv[2]
myFriends = ifilter(lambda x: x.startswith('Shi'), trustedPeople)
print '%s is %smy friend.' % (newName, newName not in myFriends 'not ' or '')
如果你坚持要聪明一点,你可以采用Winden的解决方案,只需要稍微有些狡猾就行了:
def splay(l, f, d=None):
d = d or {}
for x in l: d.setdefault(f(x), []).append(x)
return d
def partition(pred, iterable):
xs = list(zip(map(pred, iterable), iterable))
return [x[1] for x in xs if x[0]], [x[1] for x in xs if not x[0]]
从性能上来说,这种方法的好处在于(除了对iterable
的每个成员只评估一次pred
),它将大量逻辑移出解释器并移到高度优化的迭代和映射代码中。如此答案所述,这可以加速对长迭代对象的迭代。
从表达能力上来说,它利用了像理解和映射这样的表达能力丰富的习语。
这里已经有很多解决方案了,但另一种方法是 -
anims = []
images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
只对列表进行一次迭代,看起来更像Python,因此更易读。
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file1.bmp', 33L, '.bmp')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> anims = []
>>> images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
>>> print '\n'.join([str(anims), str(images)])
[('file2.avi', 999L, '.avi')]
[('file1.jpg', 33L, '.jpg'), ('file1.bmp', 33L, '.bmp')]
>>>
str.split()
相当的列表等效方法,以将列表分割成一组连续的子列表。例如,split([1,2,3,4,5,3,6], 3) -> ([1,2],[4,5],[6])
,而不是按类别划分列表元素。 - StewIMAGE_TYPES = set('.jpg','.jpeg','.gif','.bmp','.png')
。在可读性上几乎没有区别,应该使用 n(1) 而不是 n(o/2)。 - ChaimG