Python中删除数字（正则表达式）

Question

Python中删除数字（正则表达式）

40

我想从字符串中删除所有数字，但是下面的代码也会删除任何单词中包含的数字，而我显然不希望如此。我已经尝试了许多正则表达式，但没有成功。谢谢！

我正在尝试从字符串中删除所有数字。然而，下面的代码还会删除任何单词中包含的数字，这显然不是我想要的。我已经尝试过许多正则表达式，但都没有成功。

谢谢！

s = "This must not b3 delet3d, but the number at the end yes 134411"
s = re.sub("\d+", "", s)
print s

结果：

这部分内容不能被删除，但是末尾的数字可以被删除。

- Menda

11个回答

21

试试这个：

"\b\d+\b"

这将只匹配那些不属于其他单词的数字。

- jrcalzada

这不会删除第一个或最后一个数字，因为 s = s = "1234 This must not b3 delet3d, 123 but the number at the end yes 134411" - oneporter

2

我刚刚用你的字符串进行了测试，得到了预期的结果。\b匹配字符串的开头、结尾或任何不是单词字符（[A-Za-z0-9_]）的字符。我在IronPython中进行了测试，不知道Python对单词边界的处理是否有问题。 - jrcalzada

我没有尝试过这个，但你可以尝试像这样做：[^\b]\d+[$\b] - Bill Lynch

1

sharth: 基本上是一样的。\b 已经可以匹配字符串的开头或结尾了。它是一个“空模式”，可以匹配单词和非单词之间的位置。因此，re.sub(r'\b', '!', 'one two') 将会得到“!one! !two!”。 - dwc

7

使用 \s 并不是很好，因为它不能处理制表符等。更好的解决方案是：

re.sub(r"\b\d+\b", "", s)

请注意，模式是原始字符串，因为\b通常是字符串的退格转义符，而我们想要特殊的单词边界正则表达式转义符。稍微复杂一些的版本是：

re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$", "", s)

当字符串的开头/结尾有数字时，它会尝试去除前导/尾随空格。我说“尝试”是因为如果末尾有多个数字，则仍然会有一些空格。

- dwc

6

为了处理位于行首的数字字符串，可以使用以下方法：

s = re.sub(r"(^|\W)\d+", "", s)

- Lance Richardson

4

你可以尝试这个。

s = "This must not b3 delet3d, but the number at the end yes 134411"
re.sub("(\s\d+)","",s)

结果：

'This must not b3 delet3d, but the number at the end yes'

相同的规则也适用于 IT 技术。

s = "This must not b3 delet3d, 4566 but the number at the end yes 134411" 
re.sub("(\s\d+)","",s)

结果：

'This must not b3 delet3d, but the number at the end yes'

- adesst

4

匹配字符串中仅为整数：

\b(?<![0-9-])(\d+)(?![0-9-])\b

对于这个，它会做正确的事情，只匹配在“百万”之后的所有内容：

max-3 cvd-19 agent-007 8-zoo 2ab c3d ef4 55g h66i jk77 
8m9n o0p2     million     0 22 333  4444

这页上的其他8个正则表达式答案在处理该输入时都以各种方式失败。

第一个0-9后面的连字符... [0-9-] ...保留了-007，而第二组中的破折号保留了8-。

或者如果您喜欢可以使用\d代替0-9

在regex101上

它可以简化吗？

- gseattle

\d+周围的括号可以省略，但也可以用于仅捕获纯数字。 - gseattle

2

如果您的数字总是在字符串末尾，请尝试：

re.sub("\d+$", "", s)

否则，您可以尝试。

re.sub("(\s)\d+(\s)", "\1\2", s)

您可以调整反向引用以仅保留一个或两个空格（\s匹配任何空白分隔符）

- Raoul Supercopter

\W 对于这个问题可能比\s更好。另外，一个更好的变体是“\b\d+\b”，但它对我不起作用！ - dwc

2

我不知道你的具体情况，但大多数答案似乎无法处理负数或小数。

上述代码应该也能处理以下情况：

"This must not b3 delet3d, but the number at the end yes -134.411"

但这仍然不完整——你可能需要更完整的定义来确定需要解析的文件中可以找到的内容。

另外值得注意的是，'\b' 根据使用的语言环境/字符集而变化，因此需要小心处理。

- si28719e

1

我有一个灵光一闪的时刻，我尝试了一下，它起作用了：

sol = re.sub(r'[~^0-9]', '', 'aas30dsa20')

输出：

aasdsa

- ryhn

1

非正则表达式解决方案：

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> " ".join([x for x in s.split(" ") if not x.isdigit()])
'This must not b3 delet3d, but the number at the end yes'

按照" "进行拆分，并通过str().isdigit()检查块是否为数字，然后再将它们合并在一起。更详细地说明（不使用列表推导）：

words = s.split(" ")
non_digits = []
for word in words:
    if not word.isdigit():
        non_digits.append(word)

" ".join(non_digits)

- dbr

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- oneporter · Accepted Answer

在\d+前添加一个空格。

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> s = re.sub(" \d+", " ", s)
>>> s
'This must not b3 delet3d, but the number at the end yes '

编辑：经过查看评论，我决定提供一个更完整的答案。我认为这考虑了所有情况。

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)