获取未被正则表达式匹配的列表？

Question

3

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r'\w+', DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

我想要得到一个匹配单词之间内容的单独列表：

[", ", " - ", " ", " ", " ", " ", "!?"]

我应该怎么做？

- jedierikb

3个回答

3

如何使用互补的正则表达式来匹配非单词字符(\W)，而不是单词字符(\w)? 另外，为了提高效率，建议一次性获取所有内容，而不是分别获取。（当然，这取决于你打算如何处理这些内容。）

>>> re.findall(r'(\w+)(\W+)', DATA)
[('Hey', ', '), ('you', ' - '), ('what', ' '), ('are', ' '), ('you', ' '), ('doing', ' '), ('here', '!?')]

如果您真的想要分开列表，只需将其压缩：

>>> zip(*re.findall(r'(\w+)(\W+)', DATA))
[('Hey', 'you', 'what', 'are', 'you', 'doing', 'here'), (', ', ' - ', ' ', ' ', ' ', ' ', '!?')]

- kojiro

0

re.split

import re
DATA = "Hey, you - what are you doing here!?"
print re.split(r'\w+', DATA)
#prints ['', ', ', ' - ', ' ', ' ', ' ', ' ', '!?']

你可能还想过滤掉空字符串以完全匹配你所要求的内容。

- Steven

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Levon · Accepted Answer

print re.findall(r'\W+', DATA)  # note, UPPER-case "W"

产生您所需要的列表：

[', ', ' - ', ' ', ' ', ' ', ' ', '!?']

我使用了\W+而不是你使用的\w+，这将否定你所使用的字符类。

   \w  Matches word characters, i.e., letters, digits, and underscores.
   \W  Matches non-word characters, i.e., the negated version of \w

这个正则表达式参考手册可能对您在正则表达式搜索/匹配中选择最佳字符类/元字符有所帮助。此外，查看这个教程获取更多信息（特别是页面底部的参考部分）。