用 \n 替换引号内的换行符

Question

用 \n 替换引号内的换行符

4

我需要写一个快速的（明天之前完成）过滤器脚本，将双引号字符串中的换行符（LF或CRLF）替换为转义的换行符\n。内容是一个（损坏的）JavaScript程序，因此我需要允许在字符串内使用转义序列，例如"ab\"cd"和"ab\\"cd"ef"。

我知道sed不适合这个任务，因为它按行工作，所以我转向perl，但我对perl一无所知 :)

我已经编写了这个正则表达式："(((\\.)|[^"\\\n])*\n?)*"并使用http://regex.powertoy.org进行了测试。它确实匹配带有换行符的引用字符串，但是perl -p -e 's/"(((\\.)|[^"\\\n])*(\n)?)*"/TEST/g'却没有替换成功。

所以我的问题是：

如何使perl匹配换行符？
如何编写“替换”部分，以保留原始字符串并仅替换换行符？

有一个类似问题的awk解决方案，但不完全符合我的需求。

注意：我通常不提这样的“请帮我做”的问题，但我真的不想在明天之前学习perl/awk... :)

编辑: 样本数据

"abc\"def" - matches as one string
"abc\\"def"xy" - match "abcd\\" and "xy"
"ab
cd
ef" - is replaced by "ab\ncd\nef"

- davka

好的，JavaScript，但我认为这并不相关。我不需要完整的解析，只需要识别字符串字面量。 - davka

1

处理 \" 和 \\" 可能意味着您期望字符串被扩展两次，或者您想保留一个刚好位于结束 " 之前的反斜杠。由于您没有提供除了 "正确处理" 以外的任何期望输出，我只能猜测 "正确处理" 对您来说意味着什么。 - TLP

@TLP，@Joel，我知道了，会进行编辑。 - davka

我现在明白你的意思了，这就是你一直在说的。你只想在引号内进行替换，而不是其他任何地方。这就是为什么我们必须考虑转义的引号。 - TLP

@TLP：我看到你有些困惑。我会重新表述一下。 - davka

显示剩余3条评论

4个回答

1

#!/usr/bin/perl
use warnings;
use strict;
use Regexp::Common;

$_ = '"abc\"def"' . '"abc\\\\"def"xy"' . qq("ab\ncd\nef");

print "befor: {{$_}}\n";
s{($RE{quoted})}
 {  (my $x=$1) =~ s/\n/\\n/g;
    $x
 }ge;
print "after: {{$_}}\n";

- tadmc

找不到 Regexp/Common.pm - 我猜这是一个附加组件？ - davka

1

使用 Perl 5.14.0 (可通过 perlbrew 安装) 可以这样做：

#!/usr/bin/env perl

use strict;
use warnings;

use 5.14.0;

use Regexp::Common qw/delimited/;

my $data = <<'END';
"abc\"def"
"abc\\"def"xy"
"ab
cd
ef"
END

my $output = $data =~ s/$RE{delimited}{-delim=>'"'}{-keep}/$1=~s!\n!\\n!rg/egr;

print $output;

我需要5.14.0版本来使用内部替换的/r标志。如果有人知道如何避免这个问题，请告诉我。

- Joel Berger

当我在处理这个问题时，tadmc和Qtax都比我先到达了那里！%$^# - Joel Berger

1

在 OP 发布一些示例内容进行测试之前，尝试在正则表达式的末尾添加 "m"（可能还有 "s"）标志；来自 perldoc perlreref (reference)：

m  Multiline mode - ^ and $ match internal lines
s  match as a Single line - . matches \n

为了测试，您可能还会发现添加命令行参数“-i.bak”很有用，这样您就可以保留原始文件的备份（现在扩展名为“.bak”）。

请注意，如果您想要捕获但不存储某些内容，则可以使用(?:PATTERN)而不是(PATTERN)。一旦您捕获到内容，请使用$1到$9来从匹配部分访问已存储的匹配项。

有关更多信息，请参见链接以及perldoc perlretut（教程）和perldoc perlre（全面文档）

- Joel Berger

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Qtax · Accepted Answer

这里有一个简单的Perl解决方案：

s§
    \G # match from the beginning of the string or the last match
    ([^"]*+) # till we get to a quote
    "((?:[^"\\]++|\\.)*+)" # match the whole quote
§
    $a = $1;
    $b = $2;
    $b =~ s/\r?\n/\\n/g; # replace what you want inside the quote
    "$a\"$b\"";
§gex;

如果您不想使用/e并且只想使用一个正则表达式，这里有另一种解决方案：

use strict;

$_=<<'_quote_';
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x
_quote_

print "Original:\n", $_, "\n";

s/
(
    (?:
        # at the beginning of the string match till inside the quotes
        ^(?&outside_quote) "
        # or continue from last match which always stops inside quotes
        | (?!^)\G
    )
    (?&inside_quote)  # eat things up till we find what we want
)
x   # the thing we want to replace
(
    (?&inside_quote)  # eat more possibly till end of quote
    # if going out of quote make sure the match stops inside them
    # or at the end of string
    (?: " (?&outside_quote) (?:"|\z) )?
)

(?(DEFINE)
    (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
    (?<inside_quote> (?:[^"\\x]++|\\.)*+ ) # handle escapes
)
/$1Y$2/xg;

print "Replaced:\n", $_, "\n";

输出：

Original:
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x

Replaced:
hai xtest "aa YY aaY" baix "YY"
x "aYa\"Y\\" xa "Y\\\\\"Y" ax
xbai!x

要使用换行符而不是 x，只需在正则表达式中进行替换，如下所示：

s/
(
    (?:
        # at the beginning of the string match till inside the quotes
        ^(?&outside_quote) "
        # or continue from last match which always stops inside quotes
        | (?!^)\G
    )
    (?&inside_quote)  # eat things up till we find what we want
)
\r?\n # the thing we want to replace
(
    (?&inside_quote)  # eat more possibly till end of quote
    # if going out of quote make sure the match stops inside them
    # or at the end of string
    (?: " (?&outside_quote) (?:"|\z) )?
)

(?(DEFINE)
    (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
    (?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes
)
/$1\\n$2/xg;