如何使用正则表达式在R中删除一个词之前的所有单词？

Question

3

我想删除“not”之前的单词。当我尝试下面的代码片段时，我没有得到预期的结果。

test <- c("this will not work.", "'' is not one of ['A', 'B', 'C'].", "This one does not use period ending!")
gsub(".*(not .*)\\.", "\\1", test)

但是如果我将\\.替换为[[:punct:]]，它就能正常工作。有人能告诉我为什么第一个不起作用吗？除了句号以外，我可能需要保留其他标点符号。

> not work
> not one of ['A', 'B', 'C']
> not use period ending!

感谢您！

- WenliL

3个回答

1

以下是您原始代码的翻译：

如果表达式不符合此模式，包括那个句点，则不会匹配，gsub()也无法完成其任务。因此添加[[:punct:]]是有意义的，因为这样你就表示：“匹配该模式中的所有内容，然后匹配任何一种标点符号，而不仅仅是一个句点。

如果您不想使用[[:punct:]]，可以使用以下内容：

(?:.*(not\\s+.*)\\.?).+?$

下面是一个非捕获组

这个正则表达式的输出如下：

[1] "not work"                   "not one of ['A', 'B', 'C']"
[3] "not use period ending"

上面的例子确实去掉了 "!"，但如果您想保留它，只需使用 [[:punct:]]，或者您可以这样匹配任何一个这些标点符号：

[!"\#$%&'()*+,\-./:;<=>?@\[\\\]^_‘{|}~]

但那真的很恼人。这个网站应该会帮助你更好地理解。希望我有所帮助！

- loverde

1

你可以使用先行断言正则表达式来删除"not"之前的所有内容，并且删除末尾的句号。

gsub('.*(?=not)|\\.$', '', test, perl = TRUE)
#[1] "not work"     "not one of ['A', 'B', 'C']" "not use period ending!"

- Ronak Shah

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Onyambu · Accepted Answer

sub('.*(not.*?)\\.?$', '\\1', test)

[1] "not work"                   "not one of ['A', 'B', 'C']"
[3] "not use period ending!"