Sed秘籍：如何在两个模式之间执行操作，这两个模式可以位于一行或两行上？

Question

Sed秘籍：如何在两个模式之间执行操作，这两个模式可以位于一行或两行上？

sed

3

假设我们想要在某些模式之间进行替换，比如<a>和</a>（为了清晰起见…（好吧，好吧，它们是“start”和“end”！够了！）。

如果start和end总是出现在同一行上，那么我知道该怎么做：设计一个合适的正则表达式。

如果它们保证在不同的行上，并且我不关心包含end的行中的任何内容，并且在start所在的行上应用所有命令在start之前，那么我也知道该怎么做：只需将地址范围指定为/start/,/end/。

然而，这听起来并不是很有用。如果我需要完成更智能的工作，例如在{...}块内引入更改怎么办？

我能想到的一件事是，在处理之前按{和}将输入拆分并在处理后将其重新组合：

sed 's/{\|}/\n/g' input | sed 'main stuff' | sed ':a $!{N;ba}; s/\n\(}\|{\)\n/\1/g'

另一种选择则是相反的：

cat input | tr '\n' '#' | sed 'whatever; s/#/\n/g'

这两种方法都不太优美，主要是因为操作没有被限制在一个单一的命令中。第二个方法更糟糕，因为必须使用某些字符或子字符串作为“换行符”，假设它不存在于原始文本中。

因此问题是：是否有更好的方法或可以优化上述方法？从我最近在SO问题中阅读到的内容来看，这是一个相当常见的任务，所以我想一劳永逸地选择最佳实践。

附言：我主要对纯sed解决方案感兴趣：能否用一次sed调用完成工作，而无需其他任何东西？请不要使用awk、Perl等：这更多是一个理论问题，而不是一个“需要立即完成工作”的问题。

- Lev Levitsky

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- potong · Accepted Answer

这可能适合你：

# create multiline test data
cat <<\! >/tmp/a
> this
> this { this needs
> changing to
> that } that
> that
> !
sed '/{/!b;:a;/}/!{$q;N;ba};h;s/[^{]*{//;s/}.*//;s/this\|that/\U&/g;x;G;s/{[^}]*}\([^\n]*\)\n\(.*\)/{\2}\1/' /tmp/a
this
this { THIS needs
changing to
THAT } that
that
# convert multiline test data to a single line
tr '\n' ' ' </tmp/a >/tmp/b
sed '/{/!b;:a;/}/!{$q;N;ba};h;s/[^{]*{//;s/}.*//;s/this\|that/\U&/g;x;G;s/{[^}]*}\([^\n]*\)\n\(.*\)/{\2}\1/' /tmp/b
this this { THIS needs changing to THAT } that that

解释：

将数据读入模式空间（PS）。/{/!b;:a;/}/!{$q;N;ba}
将数据复制到保持空间（HS）。h
从字符串的前面和后面删除非数据。 s/[^{]*{//;s/}.*//
转换数据，例如s/this\|that/\U&/g
交换到HS并附加转换的数据。 x;G
使用转换的数据替换旧数据。s/{[^}]*}$[^\n]*$\n$.*$/{\2}\1/

编辑：

我认为以下答案更为复杂，可适用于每行多个块。

# slurp file into pattern space (PS)
:a
$! {
N
ba
}
# check for presence of \v if so quit with exit value 1
/\v/q1
# replace original newlines with \v's
y/\n/\v/
# append a newline to PS as a delimiter
G
# copy PS to hold space (HS)
h
# starting from right to left delete everything but blocks
:b
s/\(.*\)\({.*}\).*\n/\1\n\2/
tb
# delete any non-block details form the start of the file
s/.*\n//
# PS contains only block details
# do any block processing here e.g. uppercase this and that
s/th\(is\|at\)/\U&/g
# append ps to hs
H
# swap to HS
x
# replace each original block with its processed one from right to left
:c
s/\(.*\){.*}\(.*\)\n\n\(.*\)\({.*}\)/\1\n\n\4\2\3/
tc
# delete newlines
s/\n//g
# restore original newlines
y/\v/\n/
# done!

注意：这里使用了GNU特定的选项，但可以进行微调以适配通用sed工具。