在R中查找并替换两个字符串之间的文本

Question

在R中查找并替换两个字符串之间的文本

4

我已经在一些 Rscripts 上创作了一些关于 R 的教程。我需要一个不带答案的 Handout Set（HS）和编码集（CS），供学生编写代码。我需要一些正则表达式来搜索 HO 中的答案部分，以便我可以从 CS 中删除它。

在 HS 中，我在答案前后有起始(#'YOUR_ANSWER)和结束(#'END_ANSWER) 标志。为了创建 HO 集，我需要将

替换为

YOUR_ANSWER
As_samp2 = 36
As_samp3 = 38      
#'END_ANSWER

使用

"space for answer".

所以如果我的文本是在a标签中：

a = "#'YOUR_ANSWER
       As_samp2 = 36
       As_samp3 = 38

       #'END_ANSWER"

我尝试过正则表达式，但没有替换结果。

b <-gsub(pattern = "YOUR_ANSWER(.*\n*)*#'END_ANSWER", a, replace="space for answer" )

如果我不使用正则表达式，只是查找 "YOUR_ANSWER" - 替换就有效果。

c <-gsub(pattern = "YOUR_ANSWER", a, replace="space for answer" )

如果只使用正则表达式进行操作，那么所有的文本都将被替换。

d <- gsub(pattern = "(.*\n*)*", a, replace="space for answer" )

但是这种组合不起作用。正则表达式应该有效，请参考以下内容：

https://regex101.com/r/USvzLF/1

所以一定有一些深奥的R语言技巧我还没有掌握

    b <- gsub(pattern = "YOUR_ANSWER(.*\n*)*END_ANSWER", a, replace="space for answer" )
    c <- gsub(pattern = "YOUR_ANSWER", a, replace="space for answer" )
    d <- gsub(pattern = "(.*\n*)*", a, replace="space for answer" )

我希望将YOUR_ANSWER和END_ANSWER之间的所有内容替换为空格以获取答案，但是没有任何反应。你有什么想法吗？现在UPDATE，@r2evans已经向我展示了工作正则表达式；我正在尝试更改的R脚本为https://pastebin.com/mnjpkUFk (即myfile) 我使用的代码（在单独的R脚本中）是： FileM <- readLines(myfile) FileMedit <- gsub(pattern = "YOUR_ANSWER", FileM, replace="space for answer" ) FileMedit <- gsub(pattern = "YOUR_ANSWER.*END_ANSWER", FileM, replace="space for answer" ) writeLines(FileMedit,file = "outputfileM.R")

- WickHerd

我很难理解你当前的文本内容以及你希望将其转换成什么样子。请更新你的问题，提供一个简化的“之前”和“之后”转换的示例。 - MonkeyZeus

这是你想要的吗？https://regex101.com/r/FfunIi/1 - MonkeyZeus

1

不确定，但是 gsub("#'YOUR_ANSWER.*END_ANSWER", "(space for answer)", a) 不够好用吗？这实际上就是你的 b ... 对我也适用。 - r2evans

@MonkeyZeus 感谢您的帮助，但在 R 中存在转义字符的问题；错误信息为：Error: '\s' is an unrecognized escape in character string starting ""YOUR_ANSWER\s". 因此，在 Regex 101 中有效的内容在 R 中可能无法正常工作。 - WickHerd

您是指像这样吗？https://ideone.com/NVBo4e - The fourth bird

显示剩余4条评论

2个回答

0

为了获得更具体的匹配，您可以匹配第一行。然后匹配所有不以可选前导水平空格字符和#'END_ANSWER作为该行上唯一文本开头的行。

然后匹配最后一行并用space for answer替换匹配项。

#'YOUR_ANSWER.*(?:\R(?!\h*#'END_ANSWER$).*)*\R\h*#'END_ANSWER$

正则表达式演示 | R语言演示

例如

b <-gsub(pattern = "^#'YOUR_ANSWER.*(?:\\R(?!\\h*#'END_ANSWER$).*)*\\R\\h*#'END_ANSWER$", a, replace="space for answer", per=T)

如果你想替换 YOUR_ANSWER 和 END_ANSWER 之间的内容，你可以使用两个捕获组并在替换中使用它们。

^(#'YOUR_ANSWER.*)(?:\R(?!\h*#'END_ANSWER$).*)*(\R\h*#'END_ANSWER)$

正则表达式演示 | R语言演示

- The fourth bird

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Wiktor Stribiżew · Accepted Answer

问题在于，您将文件读入为字符向量列表，并应用正则表达式，而该正则表达式期望输入为单个多行文本。

> FileM
 [1] "#'Rstudio environment"                                                             "#'==="                                                                            
 [3] " "                                                                                 "#'Top Left - scripts"                                                             
 [5] "#+"                                                                                "myfirstvariable = \"Hello R\"  #press control enter with cursor on line  "        
 [7] "myfirstvariable"                                                                   "As_samp1 = 34"                                                                    
 [9] " "                                                                                 "#'practical: create variables for arsenic concentration in 2 more samples"        
[11] "#+"                                                                                "#'YOUR_ANSWER"                                                                    
[13] "As_samp2 = 36"                                                                     "As_samp3 = 38"                                                                    
[15] " "                                                                                 "#'END_ANSWER"                                                                     
[17] "#+"                                                                                "#'Bottom Left - console"                                                          
[19] "#+"                                                                                "2+2"                                                                              
[21] " "                                                                                 "#'practical: calculate average As concentration, store result in variable As_mean"
[23] "#+"                                                                                "#'YOUR_ANSWER"                                                                    
[25] "As_mean<- (As_samp1 + As_samp2 + As_samp3)/3"                                      "#'END_ANSWER"                                                                     
[27] "#+"                                                                                "#'A word on comments"                                                             
[29] "#This is a comment"                                                                "#ignore #' and #+ <br/><br/>"

因此，在运行正则表达式之前，您应该将这些行连接起来：

FileM <- paste(FileM, collapse="\n")

然后，使用

FileMedit <- gsub("YOUR_ANSWER.*?END_ANSWER", "space for answer", FileM)

现在，cat(FileMedit, collapse="\n") 显示如下内容：

#'Rstudio environment
#'===
 
#'Top Left - scripts
#+
myfirstvariable = "Hello R"  #press control enter with cursor on line  
myfirstvariable
As_samp1 = 34
 
#'practical: create variables for arsenic concentration in 2 more samples
#+
#'space for answer
#+
#'Bottom Left - console
#+
2+2
 
#'practical: calculate average As concentration, store result in variable As_mean
#+
#'space for answer
#+
#'A word on comments
#This is a comment
#ignore #' and #+ <br/><br/>

现在，保存它：

cat(FileMedit, file = "outputfileM.R")