具有子组的组的编号反向引用

3

当以下代词动词组合前出现单词“fan(s)”时,我希望将其替换为“fanatic(s)”。

gsub(
    "(((s?he( i|')s)|((you|they|we)( a|')re)|(I( a|')m)).{1,20})(\\b[Ff]an)(s?\\b)", 
    '\\1\\2atic\\3', 
    'He\'s the bigest fan I know.', 
    perl = TRUE, ignore.case = TRUE
)

## [1] "He's the bigest He'saticHe's I know."

我知道编号的反向引用是指第一个组的内部括号。 有没有办法让它们只引用外面的三个括号,其中包含三个组:(风扇前的内容)(风扇)(s\\b)(伪代码)。

我知道我的正则表达式可以替换所有组,所以我知道它是有效的。 只是反向引用部分有问题。

gsub(
    "(((s?he( i|')s)|((you|they|we)( a|')re)|(I( a|')m)).{1,20})(\\b[Ff]an)(s?\\b)", 
    '', 
    'He\'s the bigest fan I know.', 
    perl = TRUE, ignore.case = TRUE
)

## [1] " I know."

期望输出:

## [1] "He's the bigest fanatic I know."

匹配实例

inputs <- c(
    "He's the bigest fan I know.",
    "I am a huge fan of his.",
    "I know she has lots of fans in his club",
    "I was cold and turned on the fan",
    "An air conditioner is better than 2 fans at cooling."
)


outputs <- c(
    "He's the bigest fanatic I know.",
    "I am a huge fanatic of his.",
    "I know she has lots of fanatics in his club",
    "I was cold and turned on the fan",
    "An air conditioner is better than 2 fans at cooling."
)
1个回答

4

我知道你在处理过多的捕获组时遇到了麻烦。将那些你不感兴趣的组转换为非捕获组,或者删除那些明显多余的组:

((?:s?he(?: i|')s|(?:you|they|we)(?: a|')re|I(?: a|')m).{1,20})\b(Fan)(s?)\b

请查看正则表达式演示

注意,由于您使用了ignore.case=TRUE参数,因此[Ff]可以转换为Ff

R演示:

gsub(
    "((?:s?he(?: i|')s|(?:you|they|we)(?: a|')re|I(?: a|')m).{1,20})\\b(fan)(s?)\\b", 
    '\\1\\2atic\\3', 
    inputs, 
    perl = TRUE, ignore.case = TRUE
)

输出:

[1] "He's the bigest fanatic I know."                     
[2] "I am a huge fanatic of his."                         
[3] "I know she has lots of fans in his club"             
[4] "I was cold and turned on the fan"                    
[5] "An air conditioner is better than 2 fans at cooling."

1
谢谢...我学到了一个关于正则表达式非常简单的新东西...非常感激。 - Tyler Rinker

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接