正则表达式匹配以连字符结尾或以连字符开头的单词

Question

正则表达式匹配以连字符结尾或以连字符开头的单词

3

我试图创建一个正则表达式，它可以删除任何以连字符开头或结尾的单词（不是同时都包含连字符）。 word1- -> 删除 -word2 -> 删除 sub-word -> 保留

我的尝试如下所示：

def begin_end_hyphen_removal(line):
    return re.sub(r"((\s+|^)(-[A-Za-z]+)(\s+|$))|((\s+|^)([A-Za-z]+-)(\s+|$))","",line)

然而，当我尝试在以下行上应用它时：

here are some word sub-words -word1 word2- sub-word2 word3- -word4
-word5 example
word6-
word7-
another one -word8
-word9

我再次获得与输入相同的输出。

- M.A.G

唯一不清楚的部分是该如何处理“-some-”。我理解它不应该被匹配（“*删除任何以连字符开头或结尾的单词 **(不同时)***”）。 - Wiktor Stribiżew

1

-sub1-sub2 是什么意思？ - dawg

@WiktorStribiżew 是的，如果它是“-some-”，我想保留它。谢谢！ - M.A.G

3个回答

1

import re

pattern = r"(?=\S*['-])([a-zA-Z'-]+)"
test_string = '''here are some word sub-words -word1 word2- sub-word2 word3- -word4
-word5 example
word6-
word7-
another one -word8
-word9'''
result = re.findall(pattern, test_string)
print(result)

- R.Vijayakumar

1

添加一些评论会使答案更有价值。 - Andronicus

1

你可以重复匹配单词字符前面或后面的-。

如果你有用连字符分隔的单词，并且以连字符结尾，你也想删除它，比如sugar-free-：

(?<!\S)(?:-\w+(?:-\w+)*|\w+(?:-\w+)*-)(?!\S)

部分匹配模式如下：

(?<!\S) 左侧的空白边界
(?: 非捕获组
- -\w+(?:-\w+)* 匹配-和单词字符，可选重复-和单词字符
- | 或者
- \w+(?:-\w+)*- 匹配单词字符，可选重复-和单词字符，以-结尾
) 关闭非捕获组
(?!\S) 右侧的空白边界

请参见正则表达式演示和Python演示。

注意在您尝试的模式中，您使用了\s，但请注意它也可以匹配换行符。

如果您不想删除换行符，可以使用[^\S\n]*代替\s*。

示例

import re

def begin_end_hyphen_removal(line):
    return re.sub(r"\s*(?<!\S)(?:-\w+(?:-\w+)*|\w+(?:-\w+)*-)(?!\S)", "", line)


s = ("here are some word sub-words -word1 word2- sub-word2 word3- -word4\n"
     "-word5 example\n"
     "word6-\n"
     "word7-\n"
     "another one -word8\n"
     "-word9")
print(begin_end_hyphen_removal(s))

输出

here are some word sub-words sub-word2 example
another one

- The fourth bird

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Wiktor Stribiżew · Accepted Answer

您可以使用。

r'\b(?<!-)[A-Za-z0-9]+-\B|\B-[A-Za-z0-9]+\b(?!-)'
r'\b(?<!-)\w+-\B|\B-\w+\b(?!-)'

请查看正则表达式演示. 详细信息:

\b(?<!-)\w+-\B - 匹配一个或多个单词字符，该字符不以-为前缀，并且后跟一个-字符，该字符位于字符串末尾或非单词字符之前
| - 或
\B-\w+\b(?!-) - 匹配一个-字符，该字符位于字符串开头或非单词字符之后，并且后跟一个或多个未跟随-的单词字符。

请查看Python演示:

import re
rx = re.compile( r' *(?:\b(?<!-)\w+-\B|\B-\w+\b(?!-))' )
text = 'here are -some- word sub-words -word1 word2- sub-word2 word3- -word4\n-word5 example\nword6-\nword7-\nanother one -word8\n-word9'
print( rx.sub('', text) )

输出：

here are -some- word sub-words sub-word2
 example


another one