在Python中查找由逗号分隔的字符串中所有数字

Question

在Python中查找由逗号分隔的字符串中所有数字

3

我有一列字符串数据，数据没有特定的格式。我需要找到所有以逗号分隔的数字。

例如：

string = "There are 5 people in the class and their heights 3,9,6,7,4".

我希望提取3、9、6、7、4这些数字，而不包括数字5。最终，我要将第一个数字前面的单词与每个数字连接起来。例如：heights3，heights9，heights6，heights7，heights4。

ExampleString = "There are 5 people in the class and their heights are 3,9,6,7,4"
temp = re.findall(r'\s\d+\b',ExampleString)

这里我也获得了数字5。

- Ritika

3

“4”后面没有跟着一个数字。 - Tony Tuttle

数字前面的单词是“are”; 字符串应该是are3,are9,are6,are7,are4，还是你想要数字前面的第二个单词？ - Kirk Broadhurst

4个回答

0

提取任何字符串中的数字序列：

    import re

    # some random text just for testing
    string = "azrazer 5,6,4 qsfdqdf 5,,1,2,!,88,9,44,aa,2"
    # retrieve all sequence of number separated by ','
    r = r'(?:\d+,)+\d+'
    # retrieve all sequence of number separated by ',' except the last one
    r2 = r'((?:\d+,)+)(?:\d+)'
    # best answers for question so far
    r3 = r'[\d,]+[,\d]+[^a-z]'
    r4 = r'[\d,]+[,\d]'

    print('findall r1: ', re.findall(r, string))
    print('findall r2:', re.findall(r3, string))
    print('findall r3:', re.findall(r4, string))
    print('-----------------------------------------')
    print('findall r2:', re.findall(r2, string))

输出：

findall r1:  ['5,6,4', '1,2', '88,9,44']  ---> correct
findall r3: ['5,6,4 ', '5,,1,2,!', ',88,9,44,']  --> wrong
findall r4: ['5,6,4', '5,,1,2,', ',88,9,44,', ',2'] --> wrong
-----------------------------------------
findall r2: ['5,6,', '1,', '88,9,']  --> correct exclude the last element

- Charif DZ

0

这应该可以工作。 \d 是一个数字（在0-9范围内的字符），+ 表示1次或多次

import re 

test_string = "There are 2 apples for 4 persons 4 helasdf 4 23 "


print("The original string : " + test_string) 

temp = re.findall(r'\d+', test_string) 
res = list(map(int, temp)) 


print("The numbers list is : " + str(res))

- Sahil

这个无法通过OP给出的测试用例。 - Tony Tuttle

是的，我刚看到OP的字符串，让我试一下。 - Sahil

0

如评论所述，4后面没有跟任何数字（因此可以省略）：

>>> t = "There are 5 people in the class and their heights are 3,9,6,7,4"
>>> 'heights'+'heights'.join(re.findall(r'\d+,', t)).rstrip(',')
'heights3,heights9,heights6,heights7'

如果你想要包含它，你可以这样做：

>>> 'heights'+'heights'.join(re.findall(r'\d+,|(?<=,)\d+', t))
'heights3,heights9,heights6,heights7,heights4'

- dcg

t = "班级里有5个人，他们的身高分别是3,what about this!!,9,6,7,4"。这将返回相同的结果，因此需要返回逗号后面的数字，而不是数字本身。 - Charif DZ

1

那是真的，但我的答案是基于提供的文本。 - dcg

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Massifox · Accepted Answer

正则表达式是你的好朋友。你可以用一行代码解决你的问题：

[int(n) for n in sum([l.split(',') for l in re.findall(r'[\d,]+[,\d]', test_string)], []) if n.isdigit()]

好的，让我们一步一步解释：

以下代码生成由逗号分隔的字符串数字列表：

test_string = "There are 5 people in the class and their heights are 3,9,6,7,4 and this 55,66, 77"
list_of_comma = [l for l in re.findall(r'[\d,]+[,\d]', test_string)]
# output: ['3,9,6,7,4', '55,66,', '77']

将list_of_comma分割并生成字符的list_of_lists：

list_of_list = [l.split(',') for l in list_of_comma]
# output: [['3', '9', '6', '7', '4'], ['55', '66', ''], ['77']]

我使用了一个技巧来展开列表中的列表：

lst = sum(list_of_list, [])
# output: ['3', '9', '6', '7', '4', '55', '66', '', '77']

将每个元素转换为整数并排除非整数：

int_list = [int(n) for n in lst if n.isdigit()]
# output: [3, 9, 6, 7, 4, 55, 66, 77]

编辑: 如果您想按照要求格式化数字列表:

keyword= ',heights'
formatted_res = keyword[1:] + keyword.join(map(str,res))
# output: 'heights3,heights9,heights6,heights7,heights4,heights55,heights66,heights77'