Python中如何优雅地在多个位置切割字符串？

Question

Python中如何优雅地在多个位置切割字符串？

3

如果我有一个字符串，比如说 "The quick brown fox jumps over the lazy dog"，还有一个列表 [1, 8, 14, 18, 27] 表示在哪里切割这个字符串。

我希望得到的是一个包含切割后的字符串部分的列表。对于这个例子，输出应该是：

['T', 'he quic', 'k brow', 'n fo', 'x jumps o', 'ver the lazy dog']

我的直觉和天真的方式是写一个for循环，记住上一个索引，切分字符串并将切片附加到输出中。

_str="The quick brown fox jumps over the lazy dog"
cut=[1, 8, 14, 18, 27]
prev=0
out=[]
for i in cut:
    out.append(_str[prev:i])
    prev=i
out.append(_str[prev:])

有更好的方法吗？

- YiFei

2

展示你的代码，我们将为你提供帮助。;) - idjaw

尝试使用切片：例如，https://dev59.com/D3RB5IYBdhLWcg3wyqOo。 - B. M.

3个回答

1

一种递归方法：

def split(cut,str): 
    if cut:
        b=cut.pop()
        return split(cut,str[:b])+[str[b:]]
    return [str]

- B. M.

1

你可以使用生成器函数来实现它：

def sli(s, inds):
    it = iter(inds)
    p = next(it)
    yield s[:p]
    for i in it:
        yield s[p:i]
        p = i
    yield s[p:]

print(list(sli(_str, cut)))
['T', 'he quic', 'k brow', 'n fo', 'x jumps o', 'ver the lazy dog']

创建一个由切片组成的单一列表，可以进行惰性评估。

还需要考虑传递空字符串的情况，除非您想要一个空字符串列表：

def sli(s, inds):
    if not s:
        return
    it = iter(inds)
    p = next(it)
    yield s[:p]
    for i in it:
        yield s[p:i]
        p = i
    yield s[p:]

除了更加健壮并且使用更少的内存，它也更快：

Python3：

 l = sorted(random.sample(list(range(5000)), 1000))

 _l = [0] + l + [len(s)]
 [s[x:y] for x,y in zip(_l, _l[1:])]
 ....: 

1000 loops, best of 3: 368 µs per loop

In [39]: timeit list(sli(s, l))
1000 loops, best of 3: 311 µs per loop

Python2:

In [8]: s = "The quick brown fox jumps over the lazy dog"

In [9]: s *= 1000

In [10]: l = sorted(random.sample(list(range(5000)), 1000))

In [11]: %%timeit

_l = [0] + l + [len(s)]
[s[x:y] for x,y in zip(_l, _l[1:])]
....: 
1000 loops, best of 3: 321 µs per loop

In [12]: timeit list(sli(s, l))ched 
1000 loops, best of 3: 204 µs per loop

编写自己的函数是完全符合Python风格的，在这种情况下，比试图将代码压缩到几行更有效。

- Padraic Cunningham

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- timgeb · Accepted Answer

这是我会做的方法：

>>> s = "The quick brown fox jumps over the lazy dog"
>>> l = [1, 8, 14, 18, 27]
>>> l = [0] + l + [len(s)]
>>> [s[x:y] for x,y in zip(l, l[1:])]
['T', 'he quic', 'k brow', 'n fo', 'x jumps o', 'ver the lazy dog']

一些解释：

我将0添加到列表的最前面，将len(s)添加到最后面，这样

>>> zip(l, l[1:])
[(0, 1), (1, 8), (8, 14), (14, 18), (18, 27), (27, 43)]

给我一系列切片索引的元组。只需在列表理解中解包这些索引并生成所需的切片即可。

编辑：

如果您真的关心此操作的内存占用，因为您经常处理非常大的字符串和列表，请一直使用生成器，并构建您的列表l，使其首先包括0和len（s）。

对于Python 2：

>>> from itertools import izip, tee
>>> s = "The quick brown fox jumps over the lazy dog"
>>> l = [0, 1, 8, 14, 18, 27, 43]
>>> 
>>> def get_slices(s, l):
...     it1, it2 = tee(l)
...     next(it2)
...     for start, end in izip(it1, it2):
...         yield s[start:end]
... 
>>> list(get_slices(s,l))
['T', 'he quic', 'k brow', 'n fo', 'x jumps o', 'ver the lazy dog']

对于Python 3：
zip执行的是Python 2中izip的功能（请参见Python 3.3版本）。

对于使用yield from语法的Python 3.3+：

>>> from itertools import tee
>>> s = "The quick brown fox jumps over the lazy dog"
>>> l = [0, 1, 8, 14, 18, 27, 43]
>>> 
>>> def get_slices(s, l):
...     it1, it2 = tee(l)
...     next(it2)
...     yield from (s[start:end] for start, end in zip(it1, it2))
...     
>>> list(get_slices(s,l))
['T', 'he quic', 'k brow', 'n fo', 'x jumps o', 'ver the lazy dog']