在任何字母数字字符之前，查找包含非字母数字字符的内容。

Question

在任何字母数字字符之前，查找包含非字母数字字符的内容。

3

我在编写一些代码，它将接收一个字符串，如“This$#is% Matrix# %!”并删除所有非字母数字符号，在这些非字母数字符号之前和之后都有字母数字符号的情况下。我已经成功实现了这一点，但问题出在那些在开头就有非字母数字符号的字符串上。我想使用“不固定长度”的回溯来解决这个问题，但这是不可能的。有什么变通方法吗？以下是代码和一些示例：

decodedString = re.sub(r"[^0-9,a-z,A-Z](?=.+[0-9,a-z,A-Z])", " ",decodedString)
print("1st regex: " + decodedString)
decodedString = re.sub(r" (?= .+[0-9,a-z,A-Z])", "", decodedString)
print("2nd regex: " + decodedString)

第二个正则表达式删除连续出现的空格，但只有在其前面有字母或数字字符时才能删除。

"# @i##U" 应该变成 "# @i U"。这个字符串是唯一不能处理的，因为它删除了开头的非字母数字字符（它还原为 "i #U"）。

"This%%is$Matrix%%$script" 应该变成 "This is Matrix script"。

"This$#is% Matrix# %!" 应该变成 "This is Matrix# %!"。

非常感谢您的帮助！

- Anes

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Wiktor Stribiżew · Accepted Answer

您可以使用

re.sub(r'(?<=[^\W_])[\W_]+(?=[^\W_])', ' ', text)

细节:

(?<=[^\W_]) - 左边应该紧跟着一个字母或数字
[\W_]+ - 一个或多个非数字字母的字符
(?=[^\W_]) - 右边应该紧跟着一个字母或数字

请参见正则表达式演示。

请参见Python 演示：

import re
texts = ['This%%is$Matrix%%$script', 'This$#is% Matrix# %!']
for text in texts:
    print(re.sub(r'(?<=[^\W_])[\W_]+(?=[^\W_])', ' ', text))

输出：

This is Matrix script
This is Matrix# %!