List :: Util }}只需要一行即可完成:
my $newpaths = join ';', uniq split /;/, $paths;
它是如何工作的?split
将创建一个路径列表,围绕;
字符进行分割;uniq
将确保没有重复项;join
将再次创建一个以;
分隔的路径字符串。
如果路径的大小写不重要,则:
my $newpaths = join ';', uniq split /;/, lc $paths;
完整的程序可能是:
use strict;
use warnings;
use List::Util qw( uniq );
my $paths = 'C:\Users\user\Desktop\TESTING\path1;C:\Users\user\Desktop\TESTING\path5;C:\Users\user\Desktop\TESTING\path1;C:\Users\user\Desktop\TESTING\path6;C:\Users\user\Desktop\TESTING\path1;C:\Users\user\Desktop\TESTING\path3;C:\Users\user\Desktop\TESTING\path1;C:\Users\user\Desktop\TESTING\path3;';
my $newpaths = join ';', uniq split /;/, $paths;
print $newpaths, "\n";
为了让事情变得有趣,让我们将此解决方案与使用临时哈希的建议方案进行时间比较。这是计时程序:
use strict;
use warnings;
use List::Util qw( uniq );
use Time::HiRes qw( time );
my @p;
for( my $i = 0; $i < 1000000; $i++ ) {
push @p, 'C:\This\is\a\random\path' . int(rand(250000));
}
my $paths = join ';', @p;
my $t = time();
my $newpaths = join ';', uniq split /;/, $paths;
$t = time() - $t;
print 'Time with uniq: ', $t, "\n";
$t = time();
my %temp = map { $_ => 1 } split /;/, $paths;
$newpaths = join ';', keys %temp;
$t = time() - $t;
print 'Time with temporaty hash: ', $t, "\n";
它生成了一百万个随机路径,应该有5:1的重复比例(每个路径有5个重复)。我测试过的服务器的时间如下:
Time with uniq: 0.849196910858154
Time with temporaty hash: 1.29486703872681
这使得uniq
库比临时哈希更快。带有100:1重复:
Time with uniq: 0.526581048965454
Time with temporaty hash: 0.823433876037598
有10000:1的重复:
Time with uniq: 0.423808097839355
Time with temporaty hash: 0.736939907073975
这两个算法发现重复项越多,它们的工作量就越小。uniq
在发现重复项增加时表现更好。
可以随意尝试随机生成器的数字。
unique
函数。这个程序是用C#还是Perl编写的? - zdim