在Python中展开一个浅层列表

Question

在Python中展开一个浅层列表

431

有没有一种简单的方式可以使用列表推导来展开可迭代对象的列表，或者如果无法做到这一点，你们认为最好的展开浅层次列表的方法是什么，需要在性能和可读性之间取得平衡？

我尝试使用嵌套的列表推导来展开这样的列表，像这样：

[image for image in menuitem for menuitem in list_of_menuitems]

但是在那里我遇到了NameError错误，因为name 'menuitem' is not defined。在谷歌搜索和Stack Overflow上查看后，我使用了一个reduce语句来得到所需的结果:

reduce(list.__add__, map(lambda x: list(x), list_of_menuitems))

但是这种方法相当难以阅读，因为我需要在那里调用list(x)，因为x是Django QuerySet对象。

结论：

感谢所有为这个问题做出贡献的人。以下是我所学到的内容总结。我还将此作为社区维基，以便其他人可以添加或更正这些观察结果。

我的原始reduce语句是多余的，最好改成这样：

>>> reduce(list.__add__, (list(mi) for mi in list_of_menuitems))

这是嵌套列表推导式的正确语法（dF总结得非常好！）：

>>> [image for mi in list_of_menuitems for image in mi]

但是，这两种方法都不如使用 itertools.chain 高效：

>>> from itertools import chain
>>> list(chain(*list_of_menuitems))

正如@cdleary所指出的，使用chain.from_iterable 可能更好地避免了*操作符的魔法:

>>> chain = itertools.chain.from_iterable([[1,2],[3],[5,89],[],[6]])
>>> print(list(chain))
>>> [1, 2, 3, 5, 89, 6]

- Prairie Dogg

31

为什么所有人都在使用map(lambda x: list(x), other)？这不等同于map(list, other)吗？list是个可调用函数... - cdleary

2

@recursive：是的，当你指出我的reduce语句有多少冗余之后，我肯定脸红了。我从这个问题中学到了很多，所以非常感谢大家！ - prairiedogg

我已经学习Ruby有一段时间了，最近遇到了一个类似的问题，刚好可以使用Ruby的惯用语法。顺便说一下：[[1,2],[3],[5,89],[],[6]].flatten -> [1, 2, 3, 5, 89, 6] - prairiedogg

1

对于所有列表都为空的情况，reduce(list.add, (list(mi.image_set.all()) for mi in list_of_menuitems)) 是不正确的。应该是 reduce(list.add, (list(mi.image_set.all()) for mi in list_of_menuitems), [])。 - Daira Hopwood

1

这个问题导致了https://dev59.com/qnNA5IYBdhLWcg3wdtld被关闭为重复。然而，由于所有与Django无关的内容，它变得不太清晰。它应该重新编写吗？ - Juh_

显示剩余3条评论

23个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- DartLenin · Answer 1

如果您需要展平一个更复杂的列表，其中包含不可迭代元素或深度超过2层，则可以使用以下函数：

def flat_list(list_to_flat):
    if not isinstance(list_to_flat, list):
        yield list_to_flat
    else:
        for item in list_to_flat:
            yield from flat_list(item)

它将返回一个生成器对象，您可以使用list()函数将其转换为列表。请注意，yield from语法从python3.3开始可用，但您也可以使用显式迭代。
例如：

>>> a = [1, [2, 3], [1, [2, 3, [1, [2, 3]]]]]
>>> print(list(flat_list(a)))
[1, 2, 3, 1, 2, 3, 1, 2, 3]

- devsaw · Answer 2

这个版本是一个生成器。如果您需要一个列表，请进行微调。

def list_or_tuple(l):
    return isinstance(l,(list,tuple))
## predicate will select the container  to be flattened
## write your own as required
## this one flattens every list/tuple


def flatten(seq,predicate=list_or_tuple):        
    ## recursive generator 
    for i in seq:
        if predicate(seq):
            for j in flatten(i):
                yield j
        else:
            yield i

如果想要展平满足某个条件的内容，可以添加谓词。

摘自Python食谱。

- Pierre Thibault · Answer 3

4

这是一个适用于多级列表的版本，使用collectons.Iterable实现：

import collections

def flatten(o, flatten_condition=lambda i: isinstance(i,
               collections.Iterable) and not isinstance(i, str)):
    result = []
    for i in o:
        if flatten_condition(i):
            result.extend(flatten(i, flatten_condition))
        else:
            result.append(i)
    return result

- Pierre Thibault

请问为什么您的解决方案在输入 A = ['image1', [[[['image2']]]], [['image3'], 'image4'], 'image5'] 时会出现 RecursionError: maximum recursion depth exceeded in comparison 错误，而在处理输入 A = [1,[2,3],[4,5,[6,[7,8],9]]] 时却能正常运行并展开呢？ - Anu

1

这是一个与展平条件有关的问题。由于字符串是可迭代的，因此它们被展平为字符，这些字符本身是长度为一的字符串，因为它们是字符串，所以相同的逻辑再次应用，从而创建了一个无限循环。因此，我创建了一个新版本，具有更多控制的展平条件。 - Pierre Thibault

太好了！非常感谢您的解释，现在它正常工作了。我有点理解您的推理，但无法完全消化它。您能否指点我一些网上文章或任何帖子，以帮助我理解它的问题！我所理解的是['image1'] --> ['i'，'m'，'a'，'g'，'e'，'1']即长度为一的字符串！现在它将如何进入无限循环，是什么使它进入无限循环？这部分我还没有理解！你能在某种程度上提供帮助吗？ - Anu

1

对于函数flatten的结束，如果它进入for循环，则需要在某个时刻进入else语句。如果进入else语句，则会开始展开调用堆栈并返回结果。基于先前的版本，因为'image1'是可迭代的，所以o将等于'image1'，而i将等于'i'。'i'也是可迭代的，因此在下一次调用中，o将等于'i'，而i也将等于'i'。函数将再次被调用，导致完全相同的状态和无限循环，只能通过堆栈溢出来打破。 - Pierre Thibault

最好使用 yield 通过 result 列表生成项目序列。迭代器可以惰性地评估，使用此功能的 fn 可以根据需要消耗序列。 - kopos

- kopos · Answer 4

3

def is_iterable(item):
   return isinstance(item, list) or isinstance(item, tuple)


def flatten(items):
    for i in items:
        if is_iterable(item):
            for m in flatten(i):
                yield m
        else:
            yield i

测试：

print list(flatten2([1.0, 2, 'a', (4,), ((6,), (8,)), (((8,),(9,)), ((12,),(10)))]))

- kopos

1

这可能会将字符串展开为单个字符，这可能不是预期的行为？ - ericmjl

1

是的，我没有考虑到那个条件。谢谢。 - kopos

@kopos，感谢您的解决方案，但是在您提供的输入A = [1.0, 2, 'a', (4,), ((6,), (8,)), (((8,),(9,)), ((12,),(10)))]和A = ['str1', [[[['str2']]]], [['str3'], 'str4'], 'str5']上我遇到了这个错误for m in flatten(i):[Previous line repeated 996 more times]RecursionError: maximum recursion depth exceeded，但是在这个输入A = [1, [[[[2]]]], [[3], 4], 5]上它可以正常工作。您知道失败的原因吗？如何修复？有什么建议吗？ - Anu

@kopos，我现在有一个解决方法了！你需要在if语句中添加一个条件and not isinstance(i,str)来防止在展开列表时出现字符串。 - Anu

1

@anu：是的，那个修复方法有效！但问题在于，我们是基于hasattr和isinstance来识别集合类型的。如果我们知道集合节点的类型，该函数就可以进行自定义。如果集合是一个“set”，我们可能还需要调整函数的行为。 - kopos

@kopos，你是对的，有一种方法可以在集合上进行泛化，就是使用typing.Sequence，并使用Sequence代替固定的集合类型，例如list、strings、sets、range等！此外，为了防止字符串，我们添加了之前提到的条件这就是我所说的。 - Anu

- Juh_ · Answer 5

根据我的经验，将列表压平的最有效方法是：

flat_list = []
map(flat_list.extend, list_of_list)

以下是一些与其他提议方法的timeit比较:

list_of_list = [range(10)]*1000
%timeit flat_list=[]; map(flat_list.extend, list_of_list)
#10000 loops, best of 3: 119 µs per loop
%timeit flat_list=list(itertools.chain.from_iterable(list_of_list))
#1000 loops, best of 3: 210 µs per loop
%timeit flat_list=[i for sublist in list_of_list for i in sublist]
#1000 loops, best of 3: 525 µs per loop
%timeit flat_list=reduce(list.__add__,list_of_list)
#100 loops, best of 3: 18.1 ms per loop

现在，当处理更长的子列表时，效率提高似乎更好：

list_of_list = [range(1000)]*10
%timeit flat_list=[]; map(flat_list.extend, list_of_list)
#10000 loops, best of 3: 60.7 µs per loop
%timeit flat_list=list(itertools.chain.from_iterable(list_of_list))
#10000 loops, best of 3: 176 µs per loop

这个方法也适用于任何可迭代的对象:

class SquaredRange(object):
    def __init__(self, n): 
        self.range = range(n)
    def __iter__(self):
        for i in self.range: 
            yield i**2

list_of_list = [SquaredRange(5)]*3
flat_list = []
map(flat_list.extend, list_of_list)
print flat_list
#[0, 1, 4, 9, 16, 0, 1, 4, 9, 16, 0, 1, 4, 9, 16]

- user2290820 · Answer 6

你尝试过使用flatten函数吗？来源于matplotlib.cbook.flatten(seq, scalarp=)？

l=[[1,2,3],[4,5,6], [7], [8,9]]*33

run("list(flatten(l))")
         3732 function calls (3303 primitive calls) in 0.007 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.007    0.007 <string>:1(<module>)
      429    0.001    0.000    0.001    0.000 cbook.py:475(iterable)
      429    0.002    0.000    0.003    0.000 cbook.py:484(is_string_like)
      429    0.002    0.000    0.006    0.000 cbook.py:565(is_scalar_or_string)
  727/298    0.001    0.000    0.007    0.000 cbook.py:605(flatten)
      429    0.000    0.000    0.001    0.000 core.py:5641(isMaskedArray)
      858    0.001    0.000    0.001    0.000 {isinstance}
      429    0.000    0.000    0.000    0.000 {iter}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*66

run("list(flatten(l))")
         7461 function calls (6603 primitive calls) in 0.007 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.007    0.007 <string>:1(<module>)
      858    0.001    0.000    0.001    0.000 cbook.py:475(iterable)
      858    0.002    0.000    0.003    0.000 cbook.py:484(is_string_like)
      858    0.002    0.000    0.006    0.000 cbook.py:565(is_scalar_or_string)
 1453/595    0.001    0.000    0.007    0.000 cbook.py:605(flatten)
      858    0.000    0.000    0.001    0.000 core.py:5641(isMaskedArray)
     1716    0.001    0.000    0.001    0.000 {isinstance}
      858    0.000    0.000    0.000    0.000 {iter}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*99

run("list(flatten(l))")
         11190 function calls (9903 primitive calls) in 0.010 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.010    0.010 <string>:1(<module>)
     1287    0.002    0.000    0.002    0.000 cbook.py:475(iterable)
     1287    0.003    0.000    0.004    0.000 cbook.py:484(is_string_like)
     1287    0.002    0.000    0.009    0.000 cbook.py:565(is_scalar_or_string)
 2179/892    0.001    0.000    0.010    0.000 cbook.py:605(flatten)
     1287    0.001    0.000    0.001    0.000 core.py:5641(isMaskedArray)
     2574    0.001    0.000    0.001    0.000 {isinstance}
     1287    0.000    0.000    0.000    0.000 {iter}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*132

run("list(flatten(l))")
         14919 function calls (13203 primitive calls) in 0.013 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.013    0.013 <string>:1(<module>)
     1716    0.002    0.000    0.002    0.000 cbook.py:475(iterable)
     1716    0.004    0.000    0.006    0.000 cbook.py:484(is_string_like)
     1716    0.003    0.000    0.011    0.000 cbook.py:565(is_scalar_or_string)
2905/1189    0.002    0.000    0.013    0.000 cbook.py:605(flatten)
     1716    0.001    0.000    0.001    0.000 core.py:5641(isMaskedArray)
     3432    0.001    0.000    0.001    0.000 {isinstance}
     1716    0.001    0.000    0.001    0.000 {iter}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler'

更新这给了我另一个想法:

l=[[1,2,3],[4,5,6], [7], [8,9]]*33

run("flattenlist(l)")
         564 function calls (432 primitive calls) in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    133/1    0.000    0.000    0.000    0.000 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
      429    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*66

run("flattenlist(l)")
         1125 function calls (861 primitive calls) in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    265/1    0.001    0.000    0.001    0.001 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
      858    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*99

run("flattenlist(l)")
         1686 function calls (1290 primitive calls) in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    397/1    0.001    0.000    0.001    0.001 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
     1287    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*132

run("flattenlist(l)")
         2247 function calls (1719 primitive calls) in 0.002 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    529/1    0.001    0.000    0.002    0.002 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.000    0.000    0.002    0.002 <string>:1(<module>)
     1716    0.001    0.000    0.001    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



l=[[1,2,3],[4,5,6], [7], [8,9]]*1320

run("flattenlist(l)")
         22443 function calls (17163 primitive calls) in 0.016 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   5281/1    0.011    0.000    0.016    0.016 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.000    0.000    0.016    0.016 <string>:1(<module>)
    17160    0.005    0.000    0.005    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

因此，为了测试递归深度越深时它的有效性：有多深?

l=[[1,2,3],[4,5,6], [7], [8,9]]*1320

new=[l]*33

run("flattenlist(new)")
         740589 function calls (566316 primitive calls) in 0.418 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 174274/1    0.281    0.000    0.417    0.417 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.001    0.001    0.418    0.418 <string>:1(<module>)
   566313    0.136    0.000    0.136    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



new=[l]*66

run("flattenlist(new)")
         1481175 function calls (1132629 primitive calls) in 0.809 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 348547/1    0.542    0.000    0.807    0.807 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.002    0.002    0.809    0.809 <string>:1(<module>)
  1132626    0.266    0.000    0.266    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



new=[l]*99

run("flattenlist(new)")
         2221761 function calls (1698942 primitive calls) in 1.211 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 522820/1    0.815    0.000    1.208    1.208 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.002    0.002    1.211    1.211 <string>:1(<module>)
  1698939    0.393    0.000    0.393    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



new=[l]*132

run("flattenlist(new)")
         2962347 function calls (2265255 primitive calls) in 1.630 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 697093/1    1.091    0.000    1.627    1.627 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.003    0.003    1.630    1.630 <string>:1(<module>)
  2265252    0.536    0.000    0.536    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



new=[l]*1320

run("flattenlist(new)")
         29623443 function calls (22652523 primitive calls) in 16.103 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
6970921/1   10.842    0.000   16.069   16.069 <ipython-input-55-39b139bad497>:4(flattenlist)
        1    0.034    0.034   16.103   16.103 <string>:1(<module>)
 22652520    5.227    0.000    5.227    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

我打赌“flattenlist”，我将长期使用它，除非我需要一个类似于matploblib.cbook中“flatten”使用的yield生成器和快速结果。

这很快。

这里是代码：

:

typ=(list,tuple)


def flattenlist(d):
    thelist = []
    for x in d:
        if not isinstance(x,typ):
            thelist += [x]
        else:
            thelist += flattenlist(x)
    return thelist

- Bottersnike · Answer 7

如果您正在寻找内置的、简单的一行代码，您可以使用以下方法:

a = [[1, 2, 3], [4, 5, 6]
b = [i[x] for i in a for x in range(len(i))]
print b

返回值

[1, 2, 3, 4, 5, 6]

- ntg · Answer 8

pylab提供了一个flatten函数: numpy flatten的链接

- daniel · Answer 9

那么关于这个问题：

from operator import add
reduce(add, map(lambda x: list(x.image_set.all()), [mi for mi in list_of_menuitems]))

然而，Guido建议不要在单行代码中执行太多操作，因为这会降低可读性。与使用多行相比，在单行中执行所需操作几乎没有任何性能提升。

- YPCrumble · Answer 10

1

在Python 2或3中实现这一点最简单的方法是使用morph库，使用pip install morph即可。

代码如下：

import morph

list = [[1,2],[3],[5,89],[],[6]]
flattened_list = morph.flatten(list)  # returns [1, 2, 3, 5, 89, 6]

- YPCrumble

1

"easiest" 是个非常强烈的词语。 - cfi

@cfi 你提出的答案在Python 2中不起作用，并且从评论中看来，甚至在Python 3中也不是一个可接受的答案。morph库是一个简单的一函数解决方案，就像JavaScript中的lodash一样。无论如何，我编辑了我的答案，以澄清它是跨Python 2和3工作的最简单的解决方案。 - YPCrumble

非常抱歉，我的评论有点懒惰，特别是因为你指出了我在另一篇帖子中的评论。我想要表达的观点是，“最简单”的超级级别很难实现。你的建议需要一个外部库，这可能对某些人来说安装起来很困难（即使使用 venv 等）。由于问题涉及“浅”列表和“平衡性能和可读性”，你的答案可能会在可读性方面胜出。但是这个答案在性能上胜出，并且更容易，因为它不需要任何依赖项。 - cfi

@cfi 是的 - 我的方法可能是“懒人的方法”。对我来说，看到所有这些展平的方法让我想要找到一个像我在morph中找到的快速库命令。这个库的好处是它比numpy小得多（我必须使用swapfile在小型服务器实例上安装numpy）。它基本上使用了你在第二条评论中描述的函数；另一个选项是让我将其作为我的代码中的辅助函数使用。完全没有问题，感谢指出这些选项 :)。 - YPCrumble