如何在两个模式之间删除或替换多行文本

3

我想在一些脚本中添加一些客户标志,以便在由shell脚本打包之前被解析。

比如说,删除所有位于:

^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+\n

^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+\n

之间的多行文本。

我希望它对下划线数量容错(因此我使用了正则表达式)。

例如:

before.foo

i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again

after.foo

将变为:
i want this
and this
and this again

我更倾向于使用sed,但欢迎任何聪明的解决方案 :)

类似这样:

cat before.foo |  tr '\n' '\a' | sed -r 's/([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+\a.*\a([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+\a/\a/g' | tr '\a' '\n' > after.foo

哪种工具/编程语言? - Jan
shell脚本,谢谢。 - Guillaume D
这不是shell,而是 ^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\s.+)*?\R(?:#|//)_+NOT_FOR_CUSTOMER_END_+\s*。https://regex101.com/r/Qj2T59/1 - The fourth bird
它确实可以工作,但我该如何调用它? - Guillaume D
4个回答

6

sed 是处理此类任务最简便的工具,因为它能够删除从起始模式到结束模式之间的行:

sed -E '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/d' file

i want this
and this
and this again

如果您正在寻找 awk 的解决方案,那么这里有一个更简单的 awk

awk '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/{next} 1' file

1
最美的解决方案。我知道sed可以胜任 :) - Guillaume D

4

我用你展示的样例编写并测试了一种使用 awk 的解决方案。

awk '
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_BEGIN/{ found=1       }
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_END/  { found=""; next}
!found
'  Input_file

通过您提供的样例,输出结果如下。

i want this
and this
and this again

解释:简单来说,当找到起始字符串(使用正则表达式)时,将标志设置为TRUE(用于非打印),当结束字符串(通过正则表达式检查)出现时,将标志设为False(根据行数)从下一行开始打印。


3
你可以使用一个 Python 脚本:
import re

data = """
i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again
"""

rx = re.compile(r'^(#|//)(?:.+\n)+^\1.+\n?', re.MULTILINE)
data = rx.sub('', data)
print(data)

这将产生什么结果

i want this
and this
and this again

请查看regex101.com上的演示


3

您可以匹配尽可能少的行,从NOT_FOR_CUSTOMER_BEGIN_NOT_FOR_CUSTOMER_END_

请注意,[//]仅匹配单个/而不是//

^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\n.*)*?\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+\n*
  • ^ 字符串的起始位置
  • (?:#|//) 匹配 #//
  • _+NOT_FOR_CUSTOMER_BEGIN_+ 匹配至少一个下划线中间夹着 NOT_FOR_CUSTOMER_BEGIN
  • (?:\n.*)*? 做最小匹配,重复零次或多次
  • \n(?:#|//)_+NOT_FOR_CUSTOMER_END_+ 匹配换行符,然后匹配 #//,再匹配一系列下划线和 NOT_FOR_CUSTOMER_END_
  • \n* 移除可选的尾随换行符

正则表达式演示

使用 Python 的另一种方式:

import re

regex = r"^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\n.+)*?\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+\n*"

s = ("i want this\n"
            "#____NOT_FOR_CUSTOMER_BEGIN________\n"
            "not this\n"
            "nor this\n"
            "#________NOT_FOR_CUSTOMER_END____\n"
            "and this\n"
            "//____NOT_FOR_CUSTOMER_BEGIN__\n"
            "not this again\n"
            "nor this again\n"
            "//__________NOT_FOR_CUSTOMER_END____\n"
            "and this again")

subst = ""
result = re.sub(regex, "", s, 0, re.MULTILINE)

if result:
    print (result)

输出

i want this
and this
and this again

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接