将包含重复字符的字符串分割成列表

Question

将包含重复字符的字符串分割成列表

23

我对正则表达式不是很熟悉，但我一直在阅读相关资料。假设有一个字符串s = '111234' ，我希望将该字符串拆分为列表 L = ['111'，'2'，'3'，'4']。我的方法是创建一个检查是否为数字的组，并检查该组是否重复的正则表达式。类似于这样：

L = re.findall('\d[\1+]', s)

我认为\d[\1+]基本上会检查"数字"或"数字+"相同的重复。我认为这可能做到我想要的。

- Mathews_M_J

你知道这个字符串是否只包含数字吗？ - thefourtheye

@thefourtheye：不要假设它也会包含非数字字符。 - Mathews_M_J

我有印象你正在寻找 r_e = "(1*)(2*)(3*)(4*)"，它可以通过 re.findall(r_e, s)[0] 得到 ('111', '2', '3', '4')。 - Grijesh Chauhan

通过列表是有序集合：如果您不需要顺序，则可以使用 r_e = "((?P<o>1+)|(?P<to>2+)|(?P<th>3+)|(?P<f>4+))*" 然后 re.search(r_e, s).group('o', 'to', 'th', 'f')。 - Grijesh Chauhan

4个回答

20

如果您想将所有重复的字符分组，那么您也可以使用itertools.groupby，像这样：

from itertools import groupby
print ["".join(grp) for num, grp in groupby('111234')]
# ['111', '2', '3', '4']

如果您希望确保只要数字，请使用以下方法：

print ["".join(grp) for num, grp in groupby('111aaa234') if num.isdigit()]
# ['111', '2', '3', '4']

- thefourtheye

8

尝试这个：

试试这个：

s = '111234'

l = re.findall(r'((.)\2*)', s)
## it this stage i have [('111', '1'), ('2', '2'), ('3', '3'), ('4', '4')] in l

## now I am keeping only the first value from the tuple of each list
lst = [x[0] for x in l]

print lst

输出：

['111', '2', '3', '4']

- Sabuj Hassan

为什么创建了一个元组？是因为有两个需要找到的群组吗？ - Mathews_M_J

是的，这是针对两个群体的。 - Sabuj Hassan

0

如果您不想使用任何库，那么这里是代码：

s = "AACBCAAB"
L = []
temp = s[0]
for i in range(1,len(s)):
    if s[i] == s[i-1]:
        temp += s[i]
    else:
        L.append(temp)
        temp = s[i]
    if i == len(s)-1:
        L.append(temp)
print(L)

输出：

['AA', 'C', 'B', 'C', 'AA', 'B']

- artpods56

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- devnull · Accepted Answer

使用re.finditer()函数：

>>> s='111234'
>>> [m.group(0) for m in re.finditer(r"(\d)\1*", s)]
['111', '2', '3', '4']