按值拆分列表并保留分隔符

Question

按值拆分列表并保留分隔符

8

我有一个名为 list_of_strings 的列表，它看起来像这样：

['a', 'b', 'c', 'a', 'd', 'c', 'e']

我希望能够按照某个值（在本例中为c）拆分此列表。我还希望在拆分后的结果中保留c。

因此，预期结果如下：

[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]]

有什么简单的方法来做到这一点吗？

- ScientiaEtVeritas

@ScientiaEtVeritas 感谢你的回复。你说得对，我刚刚看到了主要区别。我会将其删除。 - idjaw

您可能需要查看此解决方案：https://dev59.com/P2855IYBdhLWcg3weUMQ - mikea

8个回答

6

stuff = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

您可以像这样使用'c'来查找索引，并加1，因为您将在其后拆分，而不是在其索引处：

indices = [i + 1 for i, x in enumerate(stuff) if x == 'c']

然后像这样提取切片：

split_stuff = [stuff[i:j] for i, j in zip([0] + indices, indices + [None])]

zip函数会返回一个由元组构成的列表，每个元组包含对应位置上多个序列的元素，例如(indices[i], indices[i + 1])。通过索引[0]可以获取到每个元组中的第一个元素，而[None]则可以获取到最后一个切片(stuff[i:])。请注意保留原有的HTML标签格式。

- j4nw

请为您的答案提供一些上下文和解释。仅有代码并不能构成一个好的答案。 - Ben Visness

我理解了 - 我添加了解释。 - j4nw

3

你可以尝试以下方法：

你可以尝试类似下面的方法：

list_of_strings = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

output = [[]]

for x in list_of_strings:
    output[-1].append(x)
    if x == 'c':
        output.append([])

需要注意的是，如果你的输入的最后一个元素是'c'，那么这将会在你的输出中追加一个空列表。

- asongtoruin

只需在for循环体的顶部使用一个标志，将空列表的附加移动即可。我刚刚发布了一个类似的答案，回答了一个类似的问题。链接 - gboffi

1

这个怎么样？它只需要遍历一次输入，而且其中一部分在index方法中执行，该方法是作为本地代码执行的。

def splitkeep(v, c):

    curr = 0
    try:
        nex = v.index(c)
        while True:
            yield v[curr: (nex + 1)]
            curr = nex + 1
            nex += v[curr:].index(c) + 1

    except ValueError:
        if v[curr:]: yield v[curr:]

print(list(splitkeep( ['a', 'b', 'c', 'a', 'd', 'c', 'e'], 'c')))

结果

[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

我不确定您是否希望在结果的末尾保留一个空列表，如果最终值是您要分割的值。我做了一个假设，认为您不需要，所以我加入了一个条件来排除最后一个值为空的情况。

这导致输入[]的结果只有[]，尽管有可能会得到[[]]。

- Paul Rooney

1

def spliter(value, array):
    res = []
    while value in array:
        index = array.index(value)
        res.append(array[:index + 1])
        array = array[index + 1:]
    if array:
        # Append last elements
        res.append(array)
    return res

a = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
print(spliter('b',a))
# [['a', 'b'], ['c', 'a', 'd', 'c', 'e']]
print(spliter('c',a))
# [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

- Artem Kryvonis

现在看起来更好了。你可能编辑了一些东西。我不记得了。反正我没有点踩。 - Ma0

尝试了这个解决方案后，发现它似乎有点慢，但是它表现出了预期的效果。 - ScientiaEtVeritas

它的速度会很慢。数组中的值部分基本上使它变成了二次时间。相反，您可以在逐渐缩小的切片上调用数组，这样就可以将其变为线性时间。 - Paul Rooney

1

@PaulRooney 是的，你可以使用 try except else 来改进代码的这一部分，就像某人在这里描述的一样 https://dev59.com/b2sz5IYBdhLWcg3w6MXn 但这真的有必要吗？ - Artem Kryvonis

在我的数据集上，这段代码需要5分钟，而批准的答案只需要2秒钟。 - ScientiaEtVeritas

@ScientiaEtVeritas 你说得对，我写这段代码片段的时候并没有考虑到大数据集 :) 我同意你所说的被采纳的答案是更好的解决方案。 - Artem Kryvonis

0

这个脚本怎么样？有点儿俏皮。

a = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

b = ''.join(a).split('c')  # ['ab', 'ad', 'e']

c = [x + 'c' if i < len(b)-1 else x for i, x in enumerate(b)]  # ['abc', 'adc', 'e']

d = [list(x) for x in c if x]
print(d)  # [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

它还可以处理以"c"开头和结尾的情况

a = ['c', 'a', 'b', 'c', 'a', 'd', 'c', 'e', 'c']
d -> [['c'], ['a', 'b', 'c'], ['a', 'd', 'c'], ['e', 'c']]

- Ma0

这个解决方案的问题在于，由于连接字符串，它只适用于字符而不是一般的字符串。 - ScientiaEtVeritas

你能提供一个导致它失败的例子吗？ - Ma0

1

["['a', 'cb', 'c', 'ca', 'aad', 'c', 'ccc', 'e']"]，这是我所说的示例。 - ScientiaEtVeritas

0

list_of_strings = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

value = 'c'
new_list = []
temp_list = []
for item in list_of_strings:
    if item is value:
        temp_list.append(item)
        new_list.append(temp_list[:])
        temp_list.clear()
    else:
        temp_list.append(item)

if (temp_list):
    new_list.append(temp_list)

print(new_list)

- YuryChu

0

您可以尝试使用以下代码片段。使用more_itertools库。

>>> l = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
>>> from more_itertools import sliced
>>> list(sliced(l,l.index('c')+1))

输出为：

[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

- Ajay2588

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- pylang · Accepted Answer

您可以使用 more_itertoools⁺ 轻松明确地实现此目的：

from more_itertools import split_after


lst = ["a", "b", "c", "a", "d", "c", "e"]
list(split_after(lst, lambda x: x == "c"))
# [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

以下是需要翻译的内容：

另一个例子，我们可以通过改变谓词来分割单词：

lst = ["ant", "bat", "cat", "asp", "dog", "carp", "eel"]
list(split_after(lst, lambda x: x.startswith("c")))
# [['ant', 'bat', 'cat'], ['asp', 'dog', 'carp'], ['eel']]

_{⁺一个第三方库，实现了itertools的方法以及更多功能。 > pip install more_itertools}