将一个不规则（任意嵌套）的列表展开为一维列表

Question

将一个不规则（任意嵌套）的列表展开为一维列表

pythonlistoptimizationnested-listsflatten

553

是的，我知道这个主题以前已经被讨论过：

但据我所知，除了一个解决方案外，所有的解决方案都不能处理像[[[1, 2, 3], [4, 5]], 6]这样的列表，期望的输出是[1, 2, 3, 4, 5, 6]（或者更好的方式是迭代器）。

我看到的唯一一个可以处理任意嵌套的解决方案在这个问题中找到：

def flatten(x):
    result = []
    for el in x:
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

这是最好的方法吗？我有遗漏了什么吗？有什么问题吗？

- telliott99

32

这么多答案和讨论表明这个问题应该被内置为一个函数，是吧？尤其遗憾的是，在Python 3.0中移除了compiler.ast。 - Mittenchops

3

我认为Python真正需要的是不受限制的递归，而不是另一个内置函数。 - clay

4

@Mittenchops: 完全不同意，事实上，与明显糟糕的API或过于复杂的数据结构一起工作（仅说明一下：list 应该是同质的）并不意味着这是Python的问题，我们需要为这样的任务建立一个内置功能。 - Azat Ibrakov

7

如果您的项目可以添加软件包，我建议使用more_itertools.collapse解决方案。这个答案来自这里：https://dev59.com/qnNA5IYBdhLWcg3wdtld#40938883 - viddik13

@viddik13：请考虑将其作为此问题的答案，这样我一定会点赞。（我同意Mittenchops的观点。）事实上，它不是一个内置函数没关系（根据Azat Ibrakov），但是有（并且显然有）一个库例程来完成这个操作。（因为：不是所有的不规则性都是“糟糕的”/“过于复杂的”。有时，它只是...不“规则”，这没关系。在我看来，只要它是明确定义的，并且它可以被定义为不规则的（例如，“整数的任意嵌套列表（列表（列表...））”是明确定义的）。） - lindes

使用递归函数遍历列表树的列表 https://stackabuse.com/python-how-to-flatten-list-of-lists/ - Golden Lion

52个回答

67

我的解决方案：

import collections


def flatten(x):
    if isinstance(x, collections.Iterable):
        return [a for i in x for a in flatten(i)]
    else:
        return [x]

更为简洁，但基本相同。

- Josh Lee

7

如果你只是想测试一个对象是否可迭代，可以使用try: iter(x)，这样就不需要导入任何模块。但是我认为没有必要避免导入标准库模块。 - abarnert

8

值得注意的是，此解决方案仅适用于所有项目均为“int”类型。 - Nir Alfasi

3

可以让它更简洁一些，def flatten(x): return [a for i in x for a in flatten(i)] if isinstance(x, collections.Iterable) else [x] - 但是可读性可能因人而异。 - Zero

5

这不适用于字符串，因为字符串也是可迭代的。请用条件if isinstance(x, collections.Iterable) and not isinstance(x, basestring)来替换。 - aandis

递归错误：在比较中超出了最大递归深度。 - Eduardo Luis Santos

显示剩余2条评论

49

使用递归和鸭子类型的生成器（已更新为Python 3）：

def flatten(L):
    for item in L:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

list(flatten([[[1, 2, 3], [4, 5]], 6]))
>>>[1, 2, 3, 4, 5, 6]

- dansalmo

1

谢谢，这对Python 3很有效。对于2.x版本，需要使用以下代码：for i in flatten(item): yield i - dansalmo

list(flatten([['X'], 'Y'])) 在2.X版本上失败。 - sten

@user1019129，请看一下我在你上面的评论。 - dansalmo

是的，它在循环中失败了。我认为这是因为字符串也是“字符数组”。 - sten

45

这是我的递归展平函数版本，可以处理元组和列表，并允许您投入任何混合的位置参数。返回一个生成器，按顺序产生整个序列，逐个参数进行：

flatten = lambda *n: (e for a in n
    for e in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))

使用方法：

l1 = ['a', ['b', ('c', 'd')]]
l2 = [0, 1, (2, 3), [[4, 5, (6, 7, (8,), [9]), 10]], (11,)]
print list(flatten(l1, -2, -1, l2))
['a', 'b', 'c', 'd', -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

- samplebias

3

好的，我可以进行翻译。请问这段话的上下文是什么？ - Kristof Pal

3

尝试使用args代替n，使用intermediate(或更短的mid或您可能更喜欢element)代替a，并且使用result代替e，因此：

flatten = lambda *args: (result for mid in args for result in (flatten(*mid) if isinstance(mid, (tuple, list)) else (mid,)))

。 - Dennis Williamson

这比 compiler.ast.flatten 快得多。非常棒的、紧凑的代码，适用于任何（我认为）对象类型。 - bcdan

这是我在适度的谷歌搜索中找到的唯一解决方案，对于嵌套深度超过一级的列表，在任何实际可行的网站上。 - Alexej Magura

1

这是一件艺术品。如此少的字符，却几乎不可能理解。10/10 最好的 Python 代码高尔夫我见过了️‍♂️️‍♀️⛳️。有这么短的东西几乎弥补了 Python 没有内置展平函数的事实。 - Jeff Hykin

36

根据@ Andrew在评论中的请求，这是@ unutbu的非递归解决方案的生成器版本：

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        yield l[i]
        i += 1

这个生成器的稍微简化版本：

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    while l:
        while l and isinstance(l[0], ltypes):
            l[0:1] = l[0]
        if l: yield l.pop(0)

- Alex Martelli

2

这是一个通过嵌套列表形成的树的前序遍历。只返回叶子节点。请注意，这种实现将消耗原始数据结构，好坏各有千秋。写一个既保留原始树又不必复制列表条目的实现可能会很有趣。 - Andrew Wagner

7

我认为你需要测试字符串 - 例如按照Cristian的解决方案添加"and not isinstance(l[0], basestring)"。否则，你将在l[0:1] = l[0]周围得到一个无限循环。 - c-urchin

这是一个很好的生成器示例，但正如c-urchin所提到的，当序列包含字符串时，算法本身会失败。 - Daniel 'Dang' Griffith

33

此版本的flatten避免了Python的递归限制（因此可以处理任意深度、嵌套的可迭代对象）。它是一个生成器，可以处理字符串和任意可迭代对象（甚至是无限迭代对象）。

import itertools
import collections

def flatten(iterable, ltypes=collections.Iterable):
    remainder = iter(iterable)
    while True:
        try:
            first = next(remainder)
        except StopIteration:
            break
        if isinstance(first, ltypes) and not isinstance(first, (str, bytes)):
            remainder = itertools.chain(first, remainder)
        else:
            yield first

以下是一些示例，展示了它的使用方法：

print(list(itertools.islice(flatten(itertools.repeat(1)),10)))
# [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

print(list(itertools.islice(flatten(itertools.chain(itertools.repeat(2,3),
                                       {10,20,30},
                                       'foo bar'.split(),
                                       itertools.repeat(1),)),10)))
# [2, 2, 2, 10, 20, 30, 'foo', 'bar', 1, 1]

print(list(flatten([[1,2,[3,4]]])))
# [1, 2, 3, 4]

seq = ([[chr(i),chr(i-32)] for i in range(ord('a'), ord('z')+1)] + list(range(0,9)))
print(list(flatten(seq)))
# ['a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H',
# 'i', 'I', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'N', 'o', 'O', 'p', 'P',
# 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X',
# 'y', 'Y', 'z', 'Z', 0, 1, 2, 3, 4, 5, 6, 7, 8]

虽然 `flatten` 函数可以处理无限生成器，但它无法处理无限嵌套的情况。

def infinitely_nested():
    while True:
        yield itertools.chain(infinitely_nested(), itertools.repeat(1))

print(list(itertools.islice(flatten(infinitely_nested()), 10)))
# hangs

- unutbu

1

是否有关于使用ABC Iterable还是ABC Sequence的共识？ - wim

sets、dicts、deques、listiterators、generators、文件句柄以及定义了__iter__的自定义类都是collections.Iterable的实例，但不是collections.Sequence的实例。将dict展平的结果有点棘手，但除此之外，我认为collections.Iterable比collections.Sequence更好作为默认值。它绝对更自由。 - unutbu

@wim：使用collections.Iterable的一个问题是它包括无限生成器。我已经更改了我的答案来处理这种情况。 - unutbu

2

这似乎对第三个和第四个例子不起作用。它会抛出“StopIteration”异常。此外，看起来可以将“while True：first = next（remainder）”替换为“for first in remainder：”。 - Georgy

1

@Georgy，这可以通过在try-except StopIteration块中封装flatten的主体来解决。 - baduker

18

def flatten(xs):
    res = []
    def loop(ys):
        for i in ys:
            if isinstance(i, list):
                loop(i)
            else:
                res.append(i)
    loop(xs)
    return res

- kev

1

这看起来非常优雅和简单。为什么它没有更多的赞？这个解决方案有什么问题吗？ - mattino

1

这是一个非常棒的解决方案！ - matrixloading

优雅的解决方案 - Rolf of Saxony

18

Pandas有一个函数可以做到这一点。它会像你提到的那样返回一个迭代器。

In [1]: import pandas
In [2]: pandas.core.common.flatten([[[1, 2, 3], [4, 5]], 6])
Out[2]: <generator object flatten at 0x7f12ade66200>
In [3]: list(pandas.core.common.flatten([[[1, 2, 3], [4, 5]], 6]))
Out[3]: [1, 2, 3, 4, 5, 6]

- Mark Harrison

1

太棒了！对于像我这样已经在使用pandas的人来说，这是一个非常简单而美妙的方法。 - rvrvrv

使用pydantic对象列表时出现意外结果，属性也被展平（最终列表中不再有对象）。 - Madaray

11

这里有另一个更有趣的答案...

import re

def Flatten(TheList):
    a = str(TheList)
    b,_Anon = re.subn(r'[\[,\]]', ' ', a)
    c = b.split()
    d = [int(x) for x in c]

    return(d)

这个函数将嵌套列表转换为字符串，使用正则表达式删除嵌套语法，最后将结果转换回（已平铺的）列表。

- clay

如果你试图将这个推广到除了int值之外的其他东西，那么它会变得有趣，例如[['C=64', 'APPLE ]['], ['Amiga', 'Mac', 'ST']] :) 另一方面，如果给定一个包含自身的列表，它会比其他答案做得更好，抛出一个异常而不是仅仅循环直到内存耗尽/递归直到堆栈耗尽... - abarnert

原始提示是关于将整数列表扁平化的。如果你只是将列表推导式改为d=[x for x in c]，它应该可以很好地处理你的样本。 - clay

首先，[x for x in c] 只是一种缓慢而冗长的复制 c 的方式，那么你为什么要这样做呢？其次，你的代码显然会将 'APPLE ][' 转换为 'APPLE '，因为它没有处理引号，只是假设任何括号都是列表括号。 - abarnert

哈！以我的电脑上的格式化方式，我甚至没有意识到那应该是旧电脑上显示的Apple II。无论如何，对于你的两个问题，我的回答是，这个练习对我来说只是一个实验，旨在找到将列表展平的创造性解决方案。我不确定是否可以将其推广到所有列表的展平。 - clay

你只需要执行 arr_str = str(arr)，然后运行 [int(s) for s in re.findall(r'\d+', arr_str)] 即可。详见 https://github.com/jorgeorpinel/flatten_nested_lists/blob/master/flatten.py#L20-L26 - Jorge Orpinel Pérez

10

您可以使用第三方软件包iteration_utilities中的deepflatten：

>>> from iteration_utilities import deepflatten
>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(deepflatten(L))
[1, 2, 3, 4, 5, 6]

>>> list(deepflatten(L, types=list))  # only flatten "inner" lists
[1, 2, 3, 4, 5, 6]

这是一个迭代器，所以您需要对其进行迭代（例如通过使用list将其包装或在循环中使用它）。它在内部使用迭代方法而不是递归方法，并且编写为C扩展，因此可以比纯Python方法更快：

>>> %timeit list(deepflatten(L))
12.6 µs ± 298 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit list(deepflatten(L, types=list))
8.7 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit list(flatten(L))   # Cristian - Python 3.x approach from https://dev59.com/THI95IYBdhLWcg3wyBCc#2158532
86.4 µs ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(flatten(L))   # Josh Lee - https://dev59.com/THI95IYBdhLWcg3wyBCc#2158522
107 µs ± 2.99 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(genflat(L, list))  # Alex Martelli - https://dev59.com/THI95IYBdhLWcg3wyBCc#2159079
23.1 µs ± 710 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

我是 iteration_utilities 库的作者。

- MSeifert

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Cristian · Accepted Answer

使用生成器函数可以使您的示例更易于阅读并提高性能。

Python 2

在2.6中添加的Iterable ABC可用:

from collections import Iterable

def flatten(xs):
    for x in xs:
        if isinstance(x, Iterable) and not isinstance(x, basestring):
            for item in flatten(x):
                yield item
        else:
            yield x

Python 3

Python 3中，basestring已经不存在，但是元组(str, bytes)可以达到相同的效果。另外，yield from操作符会逐个从生成器中返回项。

from collections.abc import Iterable

def flatten(xs):
    for x in xs:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x