如何在两个模式之间删除或替换多行文本

Question

如何在两个模式之间删除或替换多行文本

pythonshellawksedsubstitution

3

我想在一些脚本中添加一些客户标志，以便在由shell脚本打包之前被解析。

比如说，删除所有位于：

^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+\n

和

^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+\n

之间的多行文本。

我希望它对下划线数量容错（因此我使用了正则表达式）。

例如：

before.foo

i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again

after.foo

将变为：

i want this
and this
and this again

我更倾向于使用sed，但欢迎任何聪明的解决方案 :)

类似这样：

cat before.foo |  tr '\n' '\a' | sed -r 's/([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+\a.*\a([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+\a/\a/g' | tr '\a' '\n' > after.foo

- Guillaume D

哪种工具/编程语言？ - Jan

shell脚本，谢谢。 - Guillaume D

这不是shell，而是 ^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\s.+)*?\R(?:#|//)_+NOT_FOR_CUSTOMER_END_+\s*。https://regex101.com/r/Qj2T59/1 - The fourth bird

它确实可以工作，但我该如何调用它？ - Guillaume D

4个回答

4

我用你展示的样例编写并测试了一种使用 awk 的解决方案。

awk '
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_BEGIN/{ found=1       }
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_END/  { found=""; next}
!found
'  Input_file

通过您提供的样例，输出结果如下。

i want this
and this
and this again

解释：简单来说，当找到起始字符串（使用正则表达式）时，将标志设置为TRUE（用于非打印），当结束字符串（通过正则表达式检查）出现时，将标志设为False（根据行数）从下一行开始打印。

- RavinderSingh13

3

你可以使用一个 Python 脚本：

import re

data = """
i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again
"""

rx = re.compile(r'^(#|//)(?:.+\n)+^\1.+\n?', re.MULTILINE)
data = rx.sub('', data)
print(data)

这将产生什么结果

i want this
and this
and this again

请查看regex101.com上的演示。

- Jan

3

您可以匹配尽可能少的行，从NOT_FOR_CUSTOMER_BEGIN_到NOT_FOR_CUSTOMER_END_

请注意，[//]仅匹配单个/而不是//

^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\n.*)*?\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+\n*

^ 字符串的起始位置
(?:#|//) 匹配 # 或 //
_+NOT_FOR_CUSTOMER_BEGIN_+ 匹配至少一个下划线中间夹着 NOT_FOR_CUSTOMER_BEGIN
(?:\n.*)*? 做最小匹配，重复零次或多次
\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+ 匹配换行符，然后匹配 # 或 //，再匹配一系列下划线和 NOT_FOR_CUSTOMER_END_
\n* 移除可选的尾随换行符

正则表达式演示

使用 Python 的另一种方式：

import re

regex = r"^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\n.+)*?\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+\n*"

s = ("i want this\n"
            "#____NOT_FOR_CUSTOMER_BEGIN________\n"
            "not this\n"
            "nor this\n"
            "#________NOT_FOR_CUSTOMER_END____\n"
            "and this\n"
            "//____NOT_FOR_CUSTOMER_BEGIN__\n"
            "not this again\n"
            "nor this again\n"
            "//__________NOT_FOR_CUSTOMER_END____\n"
            "and this again")

subst = ""
result = re.sub(regex, "", s, 0, re.MULTILINE)

if result:
    print (result)

输出

i want this
and this
and this again

- The fourth bird

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- anubhava · Accepted Answer

sed 是处理此类任务最简便的工具，因为它能够删除从起始模式到结束模式之间的行：

sed -E '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/d' file

i want this
and this
and this again

如果您正在寻找 awk 的解决方案，那么这里有一个更简单的 awk：

awk '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/{next} 1' file