在Python中展开一个浅层列表

Question

在Python中展开一个浅层列表

431

有没有一种简单的方式可以使用列表推导来展开可迭代对象的列表，或者如果无法做到这一点，你们认为最好的展开浅层次列表的方法是什么，需要在性能和可读性之间取得平衡？

我尝试使用嵌套的列表推导来展开这样的列表，像这样：

[image for image in menuitem for menuitem in list_of_menuitems]

但是在那里我遇到了NameError错误，因为name 'menuitem' is not defined。在谷歌搜索和Stack Overflow上查看后，我使用了一个reduce语句来得到所需的结果:

reduce(list.__add__, map(lambda x: list(x), list_of_menuitems))

但是这种方法相当难以阅读，因为我需要在那里调用list(x)，因为x是Django QuerySet对象。

结论：

感谢所有为这个问题做出贡献的人。以下是我所学到的内容总结。我还将此作为社区维基，以便其他人可以添加或更正这些观察结果。

我的原始reduce语句是多余的，最好改成这样：

>>> reduce(list.__add__, (list(mi) for mi in list_of_menuitems))

这是嵌套列表推导式的正确语法（dF总结得非常好！）：

>>> [image for mi in list_of_menuitems for image in mi]

但是，这两种方法都不如使用 itertools.chain 高效：

>>> from itertools import chain
>>> list(chain(*list_of_menuitems))

正如@cdleary所指出的，使用chain.from_iterable 可能更好地避免了*操作符的魔法:

>>> chain = itertools.chain.from_iterable([[1,2],[3],[5,89],[],[6]])
>>> print(list(chain))
>>> [1, 2, 3, 5, 89, 6]

- Prairie Dogg

31

为什么所有人都在使用map(lambda x: list(x), other)？这不等同于map(list, other)吗？list是个可调用函数... - cdleary

2

@recursive：是的，当你指出我的reduce语句有多少冗余之后，我肯定脸红了。我从这个问题中学到了很多，所以非常感谢大家！ - prairiedogg

我已经学习Ruby有一段时间了，最近遇到了一个类似的问题，刚好可以使用Ruby的惯用语法。顺便说一下：[[1,2],[3],[5,89],[],[6]].flatten -> [1, 2, 3, 5, 89, 6] - prairiedogg

1

对于所有列表都为空的情况，reduce(list.add, (list(mi.image_set.all()) for mi in list_of_menuitems)) 是不正确的。应该是 reduce(list.add, (list(mi.image_set.all()) for mi in list_of_menuitems), [])。 - Daira Hopwood

1

这个问题导致了https://dev59.com/qnNA5IYBdhLWcg3wdtld被关闭为重复。然而，由于所有与Django无关的内容，它变得不太清晰。它应该重新编写吗？ - Juh_

显示剩余3条评论

23个回答

300

你已经非常接近了！使用嵌套列表推导的方法是将for语句放在与常规嵌套for语句相同的顺序中。

因此，这段代码：

for inner_list in outer_list:
    for item in inner_list:
        ...

对应于

[... for inner_list in outer_list for item in inner_list]

所以你想要

[image for menuitem in list_of_menuitems for image in menuitem]

- dF

46

+1，我已经查过很多次了，这是唯一一个明确说明顺序的答案...也许现在我能记住它了！ - Izkata

11

我希望我能再次点赞，因为这种思考方式使得嵌套列表解析更容易理解。 - Derek Litz

2

这个排序真的很奇怪。如果你将 for i in list: ... 改为 ... for i in list，那么为什么不也改变 for 循环的顺序呢？ - naught101

是的，虽然是旧帖子，但是这是最好的嵌套推导式的简单解释。谢谢...现在回到起点可能还是有些不方便，但是现在我明白了。 - Andrew

1

哈！我又忘了。我想Guido的大脑和我的看法不一致，对什么是直观的。 - clacke

显示剩余3条评论

133

@S.Lott：你激发了我编写一个timeit应用。

我认为这也会根据分区数（容器列表中迭代器的数量）而有所不同 - 你的评论没有提到有多少个30个项目的分区。该情节是在每次运行中展平一千个项目，分区数量不同。这些项目在分区之间均匀分布。

Flattening Comparison

代码（Python 2.6）：

#!/usr/bin/env python2.6

"""Usage: %prog item_count"""

from __future__ import print_function

import collections
import itertools
import operator
from timeit import Timer
import sys

import matplotlib.pyplot as pyplot

def itertools_flatten(iter_lst):
    return list(itertools.chain(*iter_lst))

def itertools_iterable_flatten(iter_iter):
    return list(itertools.chain.from_iterable(iter_iter))

def reduce_flatten(iter_lst):
    return reduce(operator.add, map(list, iter_lst))

def reduce_lambda_flatten(iter_lst):
    return reduce(operator.add, map(lambda x: list(x), [i for i in iter_lst]))

def comprehension_flatten(iter_lst):
    return list(item for iter_ in iter_lst for item in iter_)

METHODS = ['itertools', 'itertools_iterable', 'reduce', 'reduce_lambda',
           'comprehension']

def _time_test_assert(iter_lst):
    """Make sure all methods produce an equivalent value.
    :raise AssertionError: On any non-equivalent value."""
    callables = (globals()[method + '_flatten'] for method in METHODS)
    results = [callable(iter_lst) for callable in callables]
    if not all(result == results[0] for result in results[1:]):
        raise AssertionError

def time_test(partition_count, item_count_per_partition, test_count=10000):
    """Run flatten methods on a list of :param:`partition_count` iterables.
    Normalize results over :param:`test_count` runs.
    :return: Mapping from method to (normalized) microseconds per pass.
    """
    iter_lst = [[dict()] * item_count_per_partition] * partition_count
    print('Partition count:    ', partition_count)
    print('Items per partition:', item_count_per_partition)
    _time_test_assert(iter_lst)
    test_str = 'flatten(%r)' % iter_lst
    result_by_method = {}
    for method in METHODS:
        setup_str = 'from test import %s_flatten as flatten' % method
        t = Timer(test_str, setup_str)
        per_pass = test_count * t.timeit(number=test_count) / test_count
        print('%20s: %.2f usec/pass' % (method, per_pass))
        result_by_method[method] = per_pass
    return result_by_method

if __name__ == '__main__':
    if len(sys.argv) != 2:
        raise ValueError('Need a number of items to flatten')
    item_count = int(sys.argv[1])
    partition_counts = []
    pass_times_by_method = collections.defaultdict(list)
    for partition_count in xrange(1, item_count):
        if item_count % partition_count != 0:
            continue
        items_per_partition = item_count / partition_count
        result_by_method = time_test(partition_count, items_per_partition)
        partition_counts.append(partition_count)
        for method, result in result_by_method.iteritems():
            pass_times_by_method[method].append(result)
    for method, pass_times in pass_times_by_method.iteritems():
        pyplot.plot(partition_counts, pass_times, label=method)
    pyplot.legend()
    pyplot.title('Flattening Comparison for %d Items' % item_count)
    pyplot.xlabel('Number of Partitions')
    pyplot.ylabel('Microseconds')
    pyplot.show()

编辑：决定将其变为社区维基。

注意：METHODS 可能应该使用装饰器累积，但我认为这样阅读起来会更容易。

- cdleary

尝试使用sum_flatten = lambda iter_lst: sum(map(list, iter_lst), []) - jfs

20

将列表展开为一个扁平化的列表，可以使用sum(list, [])函数。 - hoju

@EnTerr建议使用reduce(operator.iadd https://dev59.com/eE7Sa4cB1Zd3GeqP3W6o#3041450 这是目前最快的方法（代码：http://ideone.com/NWThp 图片：http://i403.photobucket.com/albums/pp111/uber_ulrich/p1000.png） - jfs

2

chain.from_iterable() 如果有许多分区，速度会稍微更快一些。http://i403.photobucket.com/albums/pp111/uber_ulrich/p10000.png - jfs

3

我知道这是一个旧的帖子，但我添加了一种方法，来自此处，它使用了已被证明在各方面最快的list.extend。图表更新的gist - Mike S

55

sum(list_of_lists, [])将其扁平化。

l = [['image00', 'image01'], ['image10'], []]
print sum(l,[]) # prints ['image00', 'image01', 'image10']

- Prem Anand

我喜欢它！它让我想起了使用iter [:: -1]而不是sorted（iter，reverse = True）。我想知道这是否是那些在多年后会被严格审查为“糟糕Python”的事情之一。它给我留下了非常* TIMTOWTDI *的解决方案印象。 - yurisich

43

这个解决方案适用于任意嵌套深度 - 不仅仅是其他解决方案所限制的“列表中的列表”深度：

def flatten(x):
    result = []
    for el in x:
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

正是递归使得任意深度嵌套成为可能 - 当然，直到你达到最大递归深度...

- jamesbrady

1

也许值得添加 hasattr(el, '__getitem__') 以兼容 iter() 函数和内置的 for-in 循环（尽管所有 Python 序列（具有 __getitem__ 的对象）也是可迭代的（具有 __iter__ 的对象））。 - jfs

1

我本来就期望在itertools中有类似的解决方案。是否有使用推导式的类似解决方案？ - Josep Valls

3

这对我来说非常有用，因为它不会分隔字符串。 - Chris Hagmann

1

@JosepVallsm 不错的解决方案！对于Python3，您需要使用str而不是basestring。[内置的basestring抽象类型已被删除。改用str。 str和bytes类型没有足够的共同功能来保证共享基类。 2to3工具（见下文）将每个basestring出现替换为str。]（https://docs.python.org/3.0/whatsnew/3.0.html） - Anu

@JosepValls，还有，请问为什么类似你的方法在输入A = ['str1', [[[['str2']]]], [['str3'], 'str4'], 'str5']和输入A = [1.0, 2, 'a', (4,), ((6,), (8,)), (((8,),(9,)), ((12,),(10)))]时会出现RECURSION ERROR，但与您的解决方案一起使用时可以正常工作！ - Anu

25

在Python 2.6中，使用chain.from_iterable():

>>> from itertools import chain
>>> list(chain.from_iterable(mi.image_set.all() for mi in h.get_image_menu()))

它避免了创建中间列表的问题。

- J.F. Sebastian

24

性能测试结果。已修订。

import itertools
def itertools_flatten( aList ):
    return list( itertools.chain(*aList) )

from operator import add
def reduce_flatten1( aList ):
    return reduce(add, map(lambda x: list(x), [mi for mi in aList]))

def reduce_flatten2( aList ):
    return reduce(list.__add__, map(list, aList))

def comprehension_flatten( aList ):
    return list(y for x in aList for y in x)

我将一个由30个元素组成的2级列表扁平化了1000次。

itertools_flatten     0.00554
comprehension_flatten 0.00815
reduce_flatten2       0.01103
reduce_flatten1       0.01404

使用reduce通常不是一个好选择。

- S.Lott

5

map(lambda x: list(x), [mi for mi in aList])) 等价于 map(list, aList)。 - jfs

reduce_flatten = lambda list_of_iters: reduce(list.__add__, map(list, list_of_iters)) - jfs

itertools_flatten2 = lambda aList: list(itertools.chain.from_iterable(aList)) - jfs

抱歉，在2.5.2中没有chain.from_iterable，无法与其他解决方案进行比较。 - S.Lott

@recursive的版本：sum_flatten = lambda aList: sum(map(list, aList), []) - jfs

16

关于 operator.add 的使用存在一些混淆！当您将两个列表相加时，正确的术语是 concat 而不是 add。您需要使用的是 operator.concat。

如果您考虑函数式编程，那么只需这样：

>>> from functools import reduce
>>> import operator
>>> list2d = ((1,2,3),(4,5,6), (7,), (8,9))
>>> reduce(operator.concat, list2d)
(1, 2, 3, 4, 5, 6, 7, 8, 9)

你会发现，reduce函数的返回值类型与输入序列类型相同。如果你提供了一个元组作为输入序列，你将得到一个元组作为输出；我们来试一下列表：

>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.concat, list2d)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

啊哈，你得到了一个列表。

那性能怎么样呢：

>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> %timeit list(itertools.chain.from_iterable(list2d))
1000000 loops, best of 3: 1.36 µs per loop

from_iterable的速度非常快！但它与reduce和concat相比没有可比性。

>>> list2d = ((1,2,3),(4,5,6), (7,), (8,9))
>>> %timeit reduce(operator.concat, list2d)
1000000 loops, best of 3: 492 ns per loop

- Meitham

这可能是一层嵌套的最佳解决方案。但这可能是一个过于严格的限制。你的情况可能会有所不同。 - LBarret

9

以下是使用列表推导式的正确解决方案（问题中它们是反向的）：

>>> join = lambda it: (y for x in it for y in x)
>>> list(join([[1,2],[3,4,5],[]]))
[1, 2, 3, 4, 5]

在您的情况下，应该是：

[image for menuitem in list_of_menuitems for image in menuitem.image_set.all()]

或者你可以使用join命令，如下：

join(menuitem.image_set.all() for menuitem in list_of_menuitems)

无论哪种情况，需要注意的是for循环的嵌套。

- jleedev

8

从我的经验来看，你可以消除lambda：

reduce(list.__add__, map(list, [mi.image_set.all() for mi in list_of_menuitems]))

甚至可以省略地图，因为您已经有了列表推导式：

reduce(list.__add__, [list(mi.image_set.all()) for mi in list_of_menuitems])

您可以将其表示为列表的总和：

sum([list(mi.image_set.all()) for mi in list_of_menuitems], [])

- recursive

你可以直接使用add，我认为sum的第二个参数是多余的。 - daniel

2

这并不是多余的。默认值为零，导致 TypeError: unsupported operand type(s) for +: 'int' and 'list'。在我看来，sum()比reduce(add, ...)更直接。 - recursive

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- cdleary · Accepted Answer

如果您只想遍历一个扁平化的数据结构，并且不需要可索引的序列，可以考虑使用itertools.chain和相关工具。

>>> list_of_menuitems = [['image00', 'image01'], ['image10'], []]
>>> import itertools
>>> chain = itertools.chain(*list_of_menuitems)
>>> print(list(chain))
['image00', 'image01', 'image10']

它将适用于任何可迭代对象，这应该包括Django中的可迭代QuerySet，似乎你在问题中使用了它。

编辑：无论如何，这可能和reduce一样好，因为reduce将具有将项目复制到被扩展的列表中的相同开销。 chain仅在最后运行list(chain)时会产生这个（相同的）开销。

元编辑：实际上，与问题提出的解决方案相比，它的开销更小，因为在使用临时列表扩展原始列表时会丢弃这些临时列表。

编辑：正如J.F.Sebastian所说的那样，itertools.chain.from_iterable避免了解包并且您应该使用它来避免*魔法，但是timeit app显示性能差异可以忽略不计。