我正在尝试找到一个正则表达式来匹配具有三个或更多重复部分的URL(可以包括任意数量的目录),例如:
s1 = 'http://www.foo.com/bar/bar/bar/'
s2 = 'http://www.foo.com/baz/biz/baz/biz/baz/biz/etc'
s3 = '/foo/bar/foo/bar/foo/bar/'
而不匹配如下的URL:
s4 = '/foo/bar/foo/bar/foo/barbaz'
首先,我尝试了以下正则表达式:
re1 = /((.+\/)+)\1\1/
哪些是有效的:
re1 === s1 #=> true
re1 === s2 #=> true
但是,随着段落数量的增加,正则表达式匹配的时间将呈指数级增长:
require 'benchmark'
Benchmark.bm do |b|
(10..15).each do |num|
str = '/foo/bar' * num
puts str
b.report("#{num} repeats:") { /((.+\/)+)\1\1/ === str }
end
end
user system total real
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
10 repeats: 0.060000 0.000000 0.060000 ( 0.054839)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
11 repeats: 0.210000 0.000000 0.210000 ( 0.213492)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
12 repeats: 0.870000 0.000000 0.870000 ( 0.871879)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
13 repeats: 3.370000 0.010000 3.380000 ( 3.399224)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
14 repeats: 13.580000 0.110000 13.690000 ( 13.790675)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
15 repeats: 54.090000 0.210000 54.300000 ( 54.562672)
然后,我尝试了一个与这里给出的正则表达式类似的正则表达式:
re2 = /(\/.+)(?=.*\1)\1\1/
我希望你能够提供一个性能优异且能匹配我所需字符串的解决方案:
re2 === s3 #=> true
但是它也匹配了我不想要匹配的字符串,比如:
re2 === s4 #=> true, but should be false
我接近第二个正则表达式。我错过了什么?