在R中的否定形式，如何替换否定后面的单词？

Question

在R中的否定形式，如何替换否定后面的单词？

rregexnlppcre

4

我在跟进一个与如何在否定词后添加前缀“not_”有关的问题，该问题已在这里提出。

在评论中，MrFlick提出了使用正则表达式gsub("(?<=(?:\\bnot|n't) )(\\w+)\\b", "not_\\1", x, perl=T)的解决方案。

我想编辑这个正则表达式，以便将not_前缀添加到所有跟在“not”或“n't”后面的单词，直到出现标点符号为止。

如果我正在编辑cptn的示例，则希望：

x <- "They didn't sell the company, and it went bankrupt"

请将其转化为：

"They didn't not_sell not_the not_company, and it went bankrupt"

在这里使用反向引用仍然可以奏效吗? 如果是的话，任何示例都将不胜感激。谢谢!

- Ben

为什么要使用 perl 标签呢？ - Flying_whale

@Flying_whale，他们指的是 [tag:pcre]，可以指示 R 使用它。（上面的 perl=T 或 perl=TRUE。）已修复。 - ikegami

3个回答

0

首先，您应该根据所需的标点符号拆分字符串。例如：

x <- "They didn't sell the company, and it went bankrupt. Then something else"
x_split <- strsplit(x, split = "[,.]") 
[[1]]
[1] "They didn't sell the company" " and it went bankrupt"        " Then something else"

接下来将正则表达式应用于列表 x_split 的每个元素。最后合并所有的片段（如果需要）。

- Batman

0

这不是理想的，但能完成工作：

x <- "They didn't sell the company, and it did not go bankrupt. That's it" 

gsub("((^|[[:punct:]]).*?(not|n't)|[[:punct:]].*?((?<=\\s)[[:punct:]]|$))(*SKIP)(*FAIL)|\\s", 
     " not_", x, 
     perl = TRUE)

# [1] "They didn't not_sell not_the not_company, and it did not not_go not_bankrupt. That's it"

注:

这里使用了(*SKIP)(*FAIL)技巧来避免正则表达式匹配到任何您不想要的模式。基本上，它会将每个空格替换为not_，除了那些落在以下情况之间的空格：

字符串开头或标点符号与"not"或"n't"之间或
标点符号和标点符号（后面没有空格）或字符串末尾

- acylam

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Wiktor Stribiżew · Accepted Answer

您可以使用

(?:\bnot|n't|\G(?!\A))\s+\K(\w+)\b

并用not_\1替换。请参见正则表达式演示。

详细信息

(?:\bnot|n't|\G(?!\A)) - 三个中的任意一个：
- \bnot - 整个单词not
- n't - n't
- \G(?!\A) - 前一次成功匹配的结尾位置
\s+ - 1个或多个空格
\K - 匹配重置运算符，可以丢弃到目前为止匹配的文本
(\w+) - 第1组（在替换模式中使用\1引用）：1个或多个单词字符（数字、字母或_）
\b - 单词边界。

R演示：

x <- "They didn't sell the company, and it went bankrupt"
gsub("(?:\\bnot|n't|\\G(?!\\A))\\s+\\K(\\w+)\\b", "not_\\1", x, perl=TRUE)
## => [1] "They didn't not_sell not_the not_company, and it went bankrupt"