如果我有很多匹配项,比如在多行模式下,我想用匹配的一部分和一个递增的计数器号码来替换它们。
我想知道是否有任何正则表达式格式支持这种变量。我找不到这样的格式,但好像记得类似的格式存在...
我所说的不是脚本语言,在那些语言中你可以使用回调函数进行替换。这是关于能否在工具中进行这样的操作,例如RegexBuddy、sublime text、gskinner.com/RegExr等,就像你可以使用 \1 或 $1 引用捕获的子字符串一样。
如果我有很多匹配项,比如在多行模式下,我想用匹配的一部分和一个递增的计数器号码来替换它们。
我想知道是否有任何正则表达式格式支持这种变量。我找不到这样的格式,但好像记得类似的格式存在...
我所说的不是脚本语言,在那些语言中你可以使用回调函数进行替换。这是关于能否在工具中进行这样的操作,例如RegexBuddy、sublime text、gskinner.com/RegExr等,就像你可以使用 \1 或 $1 引用捕获的子字符串一样。
好的,我将从简单到复杂地讲解。享受吧!
考虑以下内容:
#!/usr/bin/perl
$_ = <<"End_of_G&S";
This particularly rapid,
unintelligible patter
isn't generally heard,
and if it is it doesn't matter!
End_of_G&S
my $count = 0;
那么这个:
s{
\b ( [\w']+ ) \b
}{
sprintf "(%s)[%d]", $1, ++$count;
}gsex;
生成这个
(This)[1] (particularly)[2] (rapid)[3],
(unintelligible)[4] (patter)[5]
(isn't)[6] (generally)[7] (heard)[8],
(and)[9] (if)[10] (it)[11] (is)[12] (it)[13] (doesn't)[14] (matter)[15]!
相比之下:
s/\b([\w']+)\b/#@{[++$count]}=$1/g;
生成以下内容:
#1=This #2=particularly #3=rapid,
#4=unintelligible #5=patter
#6=isn't #7=generally #8=heard,
#9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!
这将把增量放在匹配本身内部:
s/ \b ( [\w']+ ) \b (?{ $count++ }) /#$count=$1/gx;
#1=This #2=particularly #3=rapid,
#4=unintelligible #5=patter
#6=isn't #7=generally #8=heard,
#9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!
这个
s{ \b ( [\w'] + ) \b }
{ join " " => ($1) x ++$count }gsex;
This particularly particularly rapid rapid rapid,
unintelligible unintelligible unintelligible unintelligible patter patter patter patter patter
isn't isn't isn't isn't isn't isn't generally generally generally generally generally generally generally heard heard heard heard heard heard heard heard,
and and and and and and and and and if if if if if if if if if if it it it it it it it it it it it is is is is is is is is is is is is it it it it it it it it it it it it it doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't matter matter matter matter matter matter matter matter matter matter matter matter matter matter matter!
有更强大的方法来处理复数所有格(之前的方法不适用),但我怀疑您的问题在于让 ++$count
起作用,而不是涉及 \b
行为的微妙差别。
我真的希望人们理解 \b
不是他们想象中的那样。他们总是认为它意味着那里有空格或字符串的边缘。他们从未想过它作为 \w\W
或 \W\w
的转换。
# same as using a \b before:
(?(?=\w) (?<!\w) | (?<!\W) )
# same as using a \b after:
(?(?<=\w) (?!\w) | (?!\W) )
如您所见,它的条件性取决于它所接触的内容。这就是(?(COND)THEN|ELSE)
子句的作用。
当涉及到以下内容时,这会成为一个问题:
$_ = qq('Tis Paul's parents' summer-house, isn't it?\n);
my $count = 0;
s{
(?(?=[\-\w']) (?<![\-\w']) | (?<![^\-\w']) )
( [\-\w'] + )
(?(?<=[\-\w']) (?![\-\w']) | (?![^\-\w']) )
}{
sprintf "(%s)[%d]", $1, ++$count
}gsex;
print;
正确打印的代码
('Tis)[1] (Paul's)[2] (parents')[3] (summer-house)[4], (isn't)[5] (it)[6]?
上世纪60年代的ASCII编码已经过时了50年。就像每当你看到有人写[a-z]
时,它几乎总是错误的一样,事实证明,破折号和引号在模式中也不应该显示为字面值。顺便说一下,你可能不想使用\w
,因为它还包括数字和下划线,而不仅仅是字母。
想象一下这个字符串:
$_ = qq(\x{2019}Tis Ren\x{E9}e\x{2019}s great\x{2010}grandparents\x{2019} summer\x{2010}house, isn\x{2019}t it?\n);
你可以通过使用use utf8
来将其表示为文字:
use utf8;
$_ = qq(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?\n);
#!/usr/bin/perl -l
use 5.10.0;
use utf8;
use open qw< :std :utf8 >;
use strict;
use warnings qw< FATAL all >;
use autodie;
$_ = q(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?);
my $count = 0;
s{ (?<WORD> (?&full_word) )
# the rest is just definition
(?(DEFINE)
(?<word_char> [\p{Alphabetic}\p{Quotation_Mark}] )
(?<full_word>
# next line won't compile cause
# fears variable-width lookbehind
#### (?<! (?&word_char) ) )
# so must inline it
(?<! [\p{Alphabetic}\p{Quotation_Mark}] )
(?&word_char)
(?:
\p{Dash}
| (?&word_char)
) *
(?! (?&word_char) )
)
) # end DEFINE declaration block
}{
sprintf "(%s)[%d]", $+{WORD}, ++$count;
}gsex;
print;
当运行该代码时,会产生以下结果:
(’Tis)[1] (Renée’s)[2] (great‐grandparents’)[3] (summer‐house)[4], (isn’t)[5] (it)[6]?
好的,那可能是关于复杂正则表达式的文本,但你难道不高兴问吗?☺
就我所知,在普通的正则表达式中没有这个功能。
另一方面,有几个工具将其作为扩展提供,例如grepWin。在该工具的帮助文档中(按下F1):
在内部,它使用Boost的Perl正则表达式引擎,但${count}
是在其中实现的(与其他扩展一样)。
var i=0; "foobar".replace(/o/g, function(match) { return match+"("+(i++)+")";})
。 - Gumbo