Perl中的正则表达式：匹配换行符和下一行的第一个单词

Question

Perl中的正则表达式：匹配换行符和下一行的第一个单词

3

我有一个文件看起来像这样：

title="title1"  
artist="artist1"  
title="title2"  
artist="artis2"  
title="title3"  
artist="artist3"

等等

这个命令
perl -pe 's/title="(.*?)"\n//ig' list.txt

并没有像我希望的那样工作。如果我只用这个命令，我只会得到艺术家的行，但是如果我做了这个

perl -pe 's/title="(.*?)"\nartist//ig' list.txt

它根本就不匹配。
我尝试使用/g和没有/g以及/m的方式进行尝试。我使用nano查看了文件，在每行的最后一个"和下一行的"artist"之间没有任何其他字符。

有人知道我错在哪里吗？（我使用perl而不是sed，因为生成此列表的正则表达式使用了负向先行断言）。

我的目标是能够使用以下行：
perl -pe 's/title="(.*?)"\nartist="(.*?)"(?:\n|$)/\2 - \1/ig' list.txt

这将产生以下输出

artist1 - title1  
artist2 - title2  
artist3 - title3

- Trel

你期望的输出是什么？ - vmachan

我会编辑另一个部分到帖子中。 - Trel

在vim中打开它并输入命令:set list，以查看是否存在其他未打印字符，例如Windows风格的换行符\r\n。 - Andrew Cheong

尝试使用“-0777” slurp 文件。 - mpapec

4个回答

2

"

如果您想要一个"slurp"方法，您可以使用这个正则表达式：

"

(^title="([^"]+)")\s*\R(^artist="([^"]+)")\s*(?:\R|\z)

演示

那么根据您的例子：

$ echo "$art" 
title="title1"  
artist="artist1"  
title="title2"  
artist="artis2"  
title="title3"  
artist="artist3"

只需使用-0777将文件读入，然后打印$2和$4：

$ echo "$art" | perl -0777 -lne 'while (/(^title="([^"]+)")\s*\R(^artist="([^"]+)")\s*(?:\R|\z)/gm) { print "$4 - $2\n"}'
artist1 - title1
artis2 - title2
artist3 - title3

- dawg

Slurp模式看起来可以满足我的需求，而且我的原始正则表达式看起来也能用，我使用了这个正则表达式：'s/title="(.?)"\nartist="(.?)"(?:\n|$)/\2 - \1\n/ig'。 - Trel

补充上一个评论，我不需要进行你所做的修改来否定“，” ，因为我在 .* 上使用了 ? 来使其变成懒惰模式。 - Trel

很好。您可能需要考虑两个修改：1）在正则表达式中使用\R或$而不是\n。\R是任何行结尾序列（Windows等）的元字符，2）您可能需要在闭合引号后添加\h*或\s*以捕获您示例中存在的不可见尾随行结尾。所以像这样：^title="(.*?)"\h*\R^artist="(.*?)"\h*$ - dawg

"我不需要进行您所做的修改来否定"。"非贪心匹配可能有些棘手，我强烈建议您坚持使用"([^"]*)"。有很多关于Non-greedy regex acts greedily的Stack Overflow帖子，人们误解了非贪婪匹配的作用。" - Borodin

@Borodin 通常我会同意，但在我的情况下，懒惰匹配在这里起作用，因为文件是按特定格式进行的，并且永远不会出现无法以这种方式工作的情况，因为我正在生成它处理的数据。(Dawg，这不是我发的，我不知道他为什么同意你的原始评论) - Trel

1

你没有提到你想做什么。如果你想提取标题和艺术家信息，你需要像这样的东西：

our $s = q|
title="title1"
artist="artist1"
title="title2"
artist="artis2"
title="title3"
artist="artist3"
|;

my @matches = $s =~ /^title="(.*?)".*?^artist="(.*?)"/smg;

print join(';', @matches);

这将打印

title1;artist1;title2;artis2;title3;artist3

- Gene

抱歉，我一分钟前已经修复了这个问题。可能你还看不到它。当我复制文本时，只是错过了行尾。 - Gene

1

如果您的文件与描述完全相符，您可以使用此命令一次读取两行。这样可以避免 slurp 模式：

perl -pe '$_.=<>;s/.*?"(.*?)".*?"(.*?)"/$2 - $1/s' file

如果你需要更明确的东西，你可以使用：

perl -pe 'if (/^title="/){$_.=<>;s/^.*?"(.*?)"\h*\Rartist="(.*?)"\h*/$2 - $1/}' file

- Casimir et Hippolyte

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Borodin · Accepted Answer

您的替代选项

s/title="(.*?)"\n//ig

这段代码的作用是将所有类似于title="xxx"的行替换为无内容，即删除这些行。

不清楚您的需求，如果您需要删除title=和引号，则应使用以下代码：

perl -pe 's/title="(.*?)"/$1/i' myfile

/g修饰符是多余的，除非你希望从文件中一行中获得许多标题。

更新：

如果你想将标题与艺术家配对，则确实需要一个脚本文件。这应该能满足你的需求。数据直接从你的问题中获取。

use strict;
use warnings 'all';
use feature 'say';

my $title;

while ( <DATA> ) {

    if ( /title="([^"]*)"/ ) {
        $title = $1;
    }
    elsif ( /artist="([^"]*)"/ ) {
        say "$1 - $title";
    }
}


__DATA__
title="title1"
artist="artist1"
title="title2"
artist="artis2"
title="title3"
artist="artist3"

输出

artist1 - title1
artis2 - title2
artist3 - title3