每隔n个字符拆分字符串

Question

每隔n个字符拆分字符串

601

如何将字符串按每个第n个字符分割？

'1234567890'   →   ['12', '34', '56', '78', '90']

对于同一个问题的列表，请参见如何将列表分成相等大小的块？。

- Brandon L Burnett

19个回答

347

为了完整起见，您可以使用正则表达式来完成此操作：

>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']

对于奇数个字符，您可以这样做：

>>> import re
>>> re.findall('..?', '123456789')
['12', '34', '56', '78', '9']

你也可以使用以下方法，来简化更长文本的正则表达式：

>>> import re
>>> re.findall('.{1,2}', '123456789')
['12', '34', '56', '78', '9']

如果字符串很长，您可以使用 re.finditer 逐块生成。

- the wolf

16

这是迄今为止最好的答案，值得置于顶部。你甚至可以写 '.'*n 使其更加清晰。不需要连接、压缩、循环或列表推导式；只需找到相邻的两个字符，这正是人类大脑思考的方式。如果蒙提·派森还活着，他一定会喜欢这种方法的！ - SO_fix_the_vote_sorting_bug

2

这也是适用于相当长的字符串的最快方法：https://gitlab.com/snippets/1908857 - Ralph Bolton

10

如果字符串包含换行符，这段代码将无法生效。需要使用flags=re.S参数。 - Aran-Fey

1

是的，这不是一个好答案。正则表达式有很多陷阱（就像Aran-Fey发现的那样！），所以你应该非常谨慎地使用它们。在这里你绝对不需要它们。它们之所以更快，只是因为它们是用C实现的，而Python非常慢。 - Timmmm

这是快速的，但 more_itertools.sliced 看起来更高效。 - FifthAxiom

292

Python中已经有一个内置函数可以实现这个功能。

>>> from textwrap import wrap
>>> s = '1234567890'
>>> wrap(s, 2)
['12', '34', '56', '78', '90']

这是wrap的文档字符串内容：

>>> help(wrap)
'''
Help on function wrap in module textwrap:

wrap(text, width=70, **kwargs)
    Wrap a single paragraph of text, returning a list of wrapped lines.

    Reformat the single paragraph in 'text' so it fits in lines of no
    more than 'width' columns, and return a list of wrapped lines.  By
    default, tabs in 'text' are expanded with string.expandtabs(), and
    all other whitespace characters (including newline) are converted to
    space.  See TextWrapper class for available keyword args to customize
    wrapping behaviour.
'''

- Diptangsu Goswami

4

print(wrap('12345678', 3)) 将字符串分成每组3个数字，并从前面开始而不是后面。结果为：['123'，'456'，'78'] - Atalanttore

5

了解“包裹（wrap）”这个术语是很有趣的，但它并不能完全满足上述要求。它更适用于显示文本，而不是将字符串分割为固定数量的字符。 - Oren

14

如果字符串中包含空格，wrap可能无法返回所要求的内容。例如，wrap('0 1 2 3 4 5', 2)会返回['0', '1', '2', '3', '4', '5']（元素被剥离）。 - satomacoto

3

确实回答了这个问题，但如果在分割字符中有空格且您希望保留它们，会发生什么？ wrap()会删除紧跟在分割字符组后面的空格。 - Iron Attorney

2

如果您想使用连字符拆分文本，则此方法效果不佳（作为参数给出的数字实际上是最大字符数，而不是确切的字符数，并且会在连字符和空格处断开）。 - MrVocabulary

显示剩余5条评论

101

将元素分组成n个长度的另一种常见方法：

>>> s = '1234567890'
>>> map(''.join, zip(*[iter(s)]*2))
['12', '34', '56', '78', '90']

这个方法直接来自于zip()文档。

- Andrew Clark

2

在[19]中：a = "hello world"; list(map("".join, zip(*[iter(a)]*4)))得到结果['hell', 'o wo']。 - truease.com

21

如果有人觉得 zip(*[iter(s)]*2) 难以理解，请阅读 How does zip(*[iter(s)]*n) work in Python?。 - Grijesh Chauhan

19

对于奇数个字符，这段代码不予考虑，它会简单地舍弃那些字符：>>> map(''.join, zip(*[iter('01234567')]*5)) -> ['01234']。 - Bjorn

4

为了处理奇数个字符，只需将zip()替换为itertools.zip_longest()：map(''.join, zip_longest(*[iter(s)]*2, fillvalue=''))。 - Paulo Freitas

还有用处的是：maps()的文档。 - winklerrr

希望我永远不会在生产环境中发现这个问题。对于本应该相当简单的事情来说，它非常难以阅读。 - Neuron

77

我认为这个版本比使用itertools库的版本更短更易读:

def split_by_n(seq, n):
    '''A generator to divide a sequence into chunks of n units.'''
    while seq:
        yield seq[:n]
        seq = seq[n:]

print(list(split_by_n('1234567890', 2)))

- Russell Borogove

8

但实际上并不高效：在应用于字符串时：会产生太多的副本 - Eric

1

如果seq是一个生成器，那么这个方法也不起作用，而这正是itertools版本的用途。虽然OP没有要求这样做，但批评itertools版本不够简单是不公平的。 - mikenerone

42

使用PyPI中的more-itertools：

>>> from more_itertools import sliced
>>> list(sliced('1234567890', 2))
['12', '34', '56', '78', '90']

- Tim Diels

36

我喜欢这个解决方案：

s = '1234567890'
o = []
while s:
    o.append(s[:2])
    s = s[2:]

- vlk

在Python中，for循环特别快，尤其是当你需要迭代多次时。 - Kaleba KB Keitshokile

在Python中，使用for循环会更快，尤其是当你需要进行多次迭代的时候。 - Kaleba KB Keitshokile

19

你可以使用 itertools 中的 grouper() 函数：

Python 2.x:

from itertools import izip_longest    

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

Python 3.x:

from itertools import zip_longest

def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
    # grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
    # grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
    args = [iter(iterable)] * n
    if incomplete == 'fill':
        return zip_longest(*args, fillvalue=fillvalue)
    if incomplete == 'strict':
        return zip(*args, strict=True)
    if incomplete == 'ignore':
        return zip(*args)
    else:
        raise ValueError('Expected fill, strict, or ignore')

这些函数是高效利用内存，并可以与任何可迭代对象一起使用的。

- Eugene Yarmash

当使用非常大的字符串（len=2*2240）时，会抛出溢出异常。 - FifthAxiom

@FifthAxiom 你在说哪个版本的Python和什么类型的溢出？ - Eugene Yarmash

16

这可以通过简单的for循环实现。

a = '1234567890a'
result = []

for i in range(0, len(a), 2):
    result.append(a[i : i + 2])
print(result)

输出结果看起来像 ['12'，'34'，'56'，'78'，'90'，'a']

- Kasem777

4

虽然这段代码可能回答了问题，但是提供关于为什么或者如何回答问题的额外上下文信息，可以提高其长期价值。 - β.εηοιτ.βε

4

这是与此处相同的解决方案：https://dev59.com/hGox5IYBdhLWcg3wCQAq#59091507 - Georgy

1

这个解决方案与得票最高的答案相同 - 唯一的区别是顶部答案使用了列表推导。 - Leonardus Chen

13

我曾经陷入同样的情境。

这个方法对我起了作用：

x = "1234567890"
n = 2
my_list = []
for i in range(0, len(x), n):
    my_list.append(x[i:i+n])
print(my_list)

输出:

['12', '34', '56', '78', '90']

- Strick

2

列表是Python中的保留关键字，您应该将变量名更改为其他名称，例如my_list。 - Justin Hammond

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- satomacoto · Accepted Answer

785

>>> line = '1234567890'
>>> n = 2
>>> [line[i:i+n] for i in range(0, len(line), n)]
['12', '34', '56', '78', '90']

- satomacoto

1

@TrevorRudolph 它只会按照你的指示执行。上面的答案实际上只是一个for循环，但用Python表达。此外，如果您需要记住“简单”的答案，至少有数十万种方法可以记住它们：在stackoverflow上标记页面；将其复制然后粘贴到电子邮件中；保留一个包含您想要记住的内容的“有用”文件；每当需要时使用现代搜索引擎；在（可能）每个Web浏览器中使用书签等。 - dylnmc

4

非常适合在打印时分隔较长的文本行，例如：for i in range(0, len(string), n): print(string[i:i+n]) - PatrickT

遵循简单的哲学，这就是Python的优雅之处！ - MinhajulAnwar

2

对于像我这样不懂列表推导的新手来说，以下可能更容易理解，可以替代最后一行代码：substrings = [] for i in range(0, len(line), n): substring = line[i:i+n] substrings.append(substring) - ArduinoBen

对于像我这样不懂列表理解的新手，以下内容可能更容易理解，代替最后一行：

substrings = []
for i in range(0, len(line), n):
    substring = line[i:i+n]
    substrings.append(substring)

- ArduinoBen

显示剩余2条评论