"sed"特殊字符处理

Question

"sed"特殊字符处理

4

我们的脚本中有一个sed命令，用于使用变量中的值替换文件内容。

例如...

export value="dba01upc\Fusion_test"
sed -i "s%{"sara_ftp_username"}%$value%g" /home_ldap/user1/placeholder/Sara.xml

sed命令忽略特殊字符如'\'，并将其替换为字符串"dba01upcFusion_test"而不使用'\'。如果我像这样导出export value='dba01upc\Fusion_test'（用'\'包围），它可以正常工作。但不幸的是，我们的客户想要导出原始文本dba01upc\Fusion_test，并带有单/双引号，而且他不想在文本中添加任何额外的字符。请问有人知道如何让sed放置具有特殊字符的文本。

替换前：Sara.xml

<?xml version="1.0" encoding="UTF-8"?>
<ser:service-account >
<ser:description/>
<ser:static-account>
<con:username>{sara_ftp_username}</con:username>
</ser:static-account>
</ser:service-account>

替换后：Sara.xml。

<?xml version="1.0" encoding="UTF-8"?>
<ser:service-account>
<ser:description/>
<ser:static-account>
<con:username>dba01upcFusion_test</con:username>
</ser:static-account>
</ser:service-account>

Thanks in advance

- Tech Tech

FYI，BashFAQ＃21包含一个可靠替换的awk脚本；请参见http://mywiki.wooledge.org/BashFAQ/021 - Charles Duffy

1

不需要导出不需要从子进程访问的变量... 对于保存密码/凭据的变量，导出它们只是一个坏主意（在大多数非常新的UNIX类系统上已经修复了，但一些旧系统以与命令行相同的方式公开环境变量）。 - Charles Duffy

1

关于所引用的awk脚本，如果您通过参数列表传递字符串而不是从变量中赋值，则不需要转义反斜杠。我并不确定盲目转义每个反斜杠是否安全，但我可能是错误的 - 某一天当我没有其他事情要做时，我会尝试看看是否能够找到反例。 - Ed Morton

@CharlesDuffy：在您链接的“awk”脚本中，除了需要转义反斜杠之外，输入字符串中实际换行也会导致使用BSD“awk”时命令中断（从我所了解的情况来看，这种行为可能符合POSIX标准）。在@Ed's answer中使用将字符串作为伪文件名传递给技巧可以避免这个问题，同时也消除了转义的需要。 - mklement0

3个回答

3

更新：基于后来的认识，提供选项：

Update 2: If you're intent on using sed, see the - somewhat cumbersome, but now robust and generic - solution below.
If you want a robust, self-contained awk solution that also properly handles both arbitrary search and replacement strings (but cannot incorporate regex features such as word-boundary assertions), see Ed Morton's answer.
If you want a pure bash solution and your input files are small and preserving multiple trailing newlines is not important, see Charles Duffy's answer.
If you want a full-fledged third-party templating solution, consider, for instance, j2cli, a templating CLI for Jinja2 - if you have Python and pip, install with sudo pip install j2cli.
Simple example (note that since the replacement string is provided via a file, this may not be appropriate for sensitive data; note the double braces ({{...}})):
```
value='dba01upc\Fusion_test'
echo "sara_ftp_username=$value" >data.env
echo '<con:username>{{sara_ftp_username}}</con:username>' >tmpl.xml
j2 tmpl.xml data.env # -> <con:username>dba01upc\Fusion_test</con:username>
```

如果您使用sed，则需要小心地转义搜索和替换字符串，因为：

1. 正如Ed Morton在其他评论中指出的那样，sed不支持使用“字面量”字符串作为替换字符串-它总是解释替换字符串中的特殊字符/序列。 2. 同样，搜索字符串的字面量必须以一种方式进行转义，以使其字符不被误认为是特殊正则表达式字符。

以下内容使用两个通用的帮助函数来执行这种转义（引号），应用在"Is it possible to escape regex characters reliably with sed?"中说明的技术。

#!/usr/bin/env bash

# SYNOPSIS
#   quoteRe <text>
# DESCRIPTION
#   Quotes (escapes) the specified literal text for use in a regular expression,
#   whether basic or extended - should work with all common flavors.
quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }

# '

# SYNOPSIS
#  quoteSubst <text>
# DESCRIPTION
#  Quotes (escapes) the specified literal string for safe use as the substitution string (the 'new' in `s/old/new/`).
quoteSubst() {
  IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1")
  printf %s "${REPLY%$'\n'}"    
}

# The search string.
search='{sara_ftp_username}'

# The replacement string; a demo value with characters that need escaping.
value='&\1%"'\'';<>/|dba01upc\Fusion_test'

# Use the appropriately escaped versions of both strings.
sed "s/$(quoteRe "$search")/$(quoteSubst "$value")/g" <<<'<el>{sara_ftp_username}</el>'

# -> <el>&\1%"';<>/|dba01upc\Fusion_test</el>

quoteRe()和quoteSubst()都可以正确处理多行字符串。
- 需要注意的是，由于默认情况下sed每次只读取一行，因此在使用quoteRe()与多行字符串时，只有在sed命令明确一次性读取多个（或全部）行时才有意义。
quoteRe()始终可以安全地与命令替换（$(...)）一起使用，因为它总是返回一个单行字符串（输入中的换行符被编码为'\n'）。
相比之下，如果你使用quoteSubst()处理带有尾随换行符的字符串，就不应该再使用$(...)，因为后者会删除最后一个换行符，从而破坏编码方式（因为quoteSubst()将实际的换行符转义成\，返回的字符串将以一个悬挂的\结尾）。

因此，对于带有尾随换行符的字符串，应该先使用IFS= read -d '' -r escapedValue < <(quoteSubst "$value")将转义后的值读入一个单独的变量中，然后再在sed命令中使用该变量。

- mklement0

1

比我的（现已删除）解决方案要稳健得多。+1 - anubhava

1

这可以仅使用bash内置命令完成--无需sed、awk等。

orig='{sara_ftp_username}'               # put the original value into a variable
new='dba01upc\Fusion_test'               # ...no need to 'export'!

contents=$(<Sara.xml)                    # read the file's content into
new_contents=${contents//"$orig"/$new}   # use parameter expansion to replace
printf '%s' "$new_contents" >Sara.xml    # write new content to disk

请参考BashFAQ #100相关部分，了解使用参数扩展进行字符串替换的信息。

- Charles Duffy

2

真的，但总的警告是它只适用于较小的文件，因为整个输入文件一次性读取。此外，如果您希望将$orig的内容视为字面值，则必须将其用双引号括起来：new_contents=${contents//"$orig"/$new}（而您可以但不需要对$new使用双引号）。 - mklement0

1

@mklement0，谢谢 - 我从中学到了东西；之前不知道$orig在那种情况下会被视为模式。 - Charles Duffy

1

请注意，这种方法将删除文件末尾的任何空行，这可能不是理想的。 - Ed Morton

1

@EdMorton：说得好；为了澄清，因为“空白”可能被解释为包括所有空格行，我们正在谈论尾随的空行（在文件末尾连续一个或多个换行符）。注意到这一点后，最好输出带有尾随换行符的修改后字符串：printf '%s\n' ... - mklement0

1

@mklement0 同意，我的 awk 脚本会在没有换行符的情况下添加一个，但至少在 gawk 中，您始终可以通过设置 ORS=RT 来选择是否添加它。在其他 awk 中，您需要有创意！ - Ed Morton

显示剩余3条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ed Morton - SO stop bullying · Accepted Answer

你不能通过sed来健壮地解决这个问题。只需使用awk即可：

awk -v old="string1" -v new="string2" '
idx = index($0,old) {
    $0 = substr($0,1,idx-1) new substr($0,idx+length(old))
}
1' file

啊，@mklement0说得很对 - 要防止转义字符被解释，你需要在参数列表中传递值以及文件名，然后从那里分配变量，而不是使用-v为变量赋值（请参见我很久以前为comp.unix.shell FAQ编写的摘要，网址为http://cfajohnson.com/shell/cus-faq-2.html#Q24，但显然我已经忘记了！）。

以下内容将在每个搜索字符串的每一行上强大地进行所需的替换（a\ta -> e\tf）。

$ cat tst.awk
BEGIN {
    old=ARGV[1]; delete ARGV[1]
    new=ARGV[2]; delete ARGV[2]
    lgthOld = length(old)
}
{
    head = ""; tail = $0
    while ( idx = index(tail,old) ) {
        head = head substr(tail,1,idx-1) new
        tail = substr(tail,idx+lgthOld)
    }
    print head tail
}

$ cat file
a\ta    a       a       a\ta

$ awk -f tst.awk 'a\ta' 'e\tf' file
e\tf    a       a       e\tf

< p > file 中的空格是制表符。如果需要的话，您可以将 ARGV[3] 下移并调整 ARGC，但在大多数情况下这是不必要的。< /p >