如何使用Perl在字符串中查找子字符串？

Question

如何使用Perl在字符串中查找子字符串？

9

我有一个字符串，希望从中提取一个单词，并附加一个数字，该数字可能在每行中都不同：

This is string1 this is string
This is string11 
This is string6 and it is in this line

我想解析这个文件并获取从0到100的"stringXXX"值。

# suppose ABC.txt contains the above lines
FH1 = open "Abc.txt"; 
@abcFile = <FH1>;

foreach $line(@abcFile) {
    if ($pattern =~ s/string.(d{0}d{100});
        print $pattern;

上面的代码会打印整行，我希望只获取字符串XXX。

- gagneet

4个回答

5

Abc.pl:

#!/usr/bin/perl -w    
while(<>) {
    while (/(string(\d{1,3}))/g) {      
    print "$1\n" if $2 <= 100;
    } 
}

例子：

$ cat Abc.txt 
This is string1 this is string
This is string11 
This is string6 and it is in this line
string1 asdfa string2
string101 string3 string100 string1000
string9999 string001 string0001

$ perl Abc.pl Abc.txt
string1
string11
string6
string1
string2
string3
string100
string100
string001
string000

$ perl -nE"say $1 while /(string(?:100|\d{1,2}(?!\d)))/g" Abc.txt
string1
string11
string6
string1
string2
string3
string100
string100

请注意输出之间的差异。哪种更合适取决于您的需求。

- J.F. Sebastian

-1

不要过度指定。只需使用(\d+)即可捕获数字部分。这将捕获任意长度的数字，因此当为您提供此文件的人决定将其范围扩展到999时，您将得到保障。现在编写和以后维护时都会更加简单。

在输出时要严格，但在输入时要宽松。

- skiphoppy

实际上这取决于你所得到的规范。如果你正在编写一个一次性脚本来捕获这些数字，你不想使用(\d+)。 - Nathan Fellman

我想不明白，Nathan...为什么呢？如果我只是写一个一次性脚本，我不想花费额外的时间来使正则表达式变得更加复杂。 - skiphoppy

-2

只需将print $pattern更改为打印已捕获的print $&。

- ididak

此外，$& 对整个系统的性能有负面影响。请参阅 http://search.cpan.org/perldoc?Devel::SawAmpersand。 - mpeters

是的，正则表达式错了，但使用$&是打印正确结果最短的代码。
这不是库代码，性能影响与使用$1相同。
全局PL_sawampersand hack是perl的内部实现问题，应该在perl中进行修复。

- ididak

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Nathan Fellman · Accepted Answer

你需要捕获它：

while ($pattern =~/(string(100|\d{1,2}))/g) {
    print $1;
}

解释:

括号会将其中的内容捕获到 $1 中。如果你有多组括号，第一组被捕获到 $1 中，第二组被捕获到 $2 中，以此类推。在这个例子中，$2 将包含实际数字。
\d{1,2} 用于捕获 1 到 3 位数字，可以匹配 0 到 99 之间的数字。额外添加的 100 表示明确地捕获数字 100，因为它是你要匹配的唯一三位数。

编辑：修正了被捕获的数字的顺序。