多行正则表达式搜索

Question

多行正则表达式搜索

3

经过在Stack Overflow和Google上的大量搜索，我不得不发布一个新问题。我正在使用TextWrangler尝试编写一个正则表达式，以给出多行模式的最短匹配。

基本上，

ہے\tVM

我要查找的是一个字符串（由制表符分隔的阿拉伯单词和其词性标记）。困难的是，我想搜索包含该字符串的所有单个句子。以下是我目前的进展：

/(<Sentence id='\d+'>(?:[^<]|<(?!\/Sentence>))*ہے\tVM(?:[^<]|<(?!\/Sentence>))*<\/Sentence>)/

我正在查看的文件是以CML编码的，因此我的问题的一部分是是否有人知道MAC上的CML解析器？

另一个明显的选择是编写一个Perl脚本 - 在这里，我感谢任何指向简单解决方案的建议。

我当前的脚本是：

use open ':encoding(utf8)';
use Encode;
binmode(STDOUT, ":utf8");
binmode(STDIN, ":utf8");

my $word = Encode::decode_utf8("ہے");

my @files = glob("*.posn");

foreach my $file (@files) {
    open FILE, "<$file" or die "Error opening file $file ($!)";
    my $file = do {local $/; <FILE>};
    close FILE or die $!;
    if ($file =~ /(<Sentence id='\d+'>(?:[^<]|<(?!\/Sentence>))*$word\tVM(?:[^<]|<(?!\/Sentence>))*<\/Sentence>)/g) {
            print STDOUT "$1\n\n\n\n";
            push(@matches, "$1\n\n");
            }
}

open(OUTPUT, ">matches.txt");
print OUTPUT "@matches";
close(OUTPUT);

- Sebastian Sulger

你可能想使用 while 而不是 if；你当前的代码只能报告每个文件中的一个匹配。 - ruakh

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ωmega · Answer 1

你可能在输入中有更多的字符串出现，因此要搜索所有这些字符串...

我认为你的代码应该像这样 >>

use open ':encoding(utf8)';
use Encode;

binmode(STDOUT, ":utf8");
binmode(STDIN,  ":utf8");

my $word = Encode::decode_utf8("ہے");
my @files = glob("*.posn");
my @matches = ();

foreach my $file (@files) {
  open FILE, "<$file" or die "Error opening file $file ($!)";
  my $file = do {local $/; <FILE>};
  close FILE or die $!;
  my @occurrences = $file =~ /<Sentence id='\d+'>(?:[^<]|<(?!\/Sentence>))*$word\tVM(?:[^<]|<(?!\/Sentence>))*<\/Sentence>/g;
  print STDOUT "$_\n\n\n\n" for (@occurrences);
  push (@matches, "$_\n\n") for (@occurrences);
}

open (OUTPUT, ">matches.txt");
print OUTPUT  "@matches";
close(OUTPUT);

在这里学习更多关于正则表达式的知识。