Perl中while(each)和foreach之间的性能差异

3

我测试的哈希表包含大约70000个学院,每个学院包含大约20名学生。我尝试了5次,以下是结果。使用foreach和while(each)循环的性能存在相当大的差异。为什么会这样呢?

使用while循环的代码:

while ( my ($college_code, $college_info_hr) = each (%{$college_data_hr}) ) {
    while ( my ($student_num, $student_info_hr) = each (%{$college_info_hr->{'students'}}) ) {
        if($student_num < 104000) { ## Delete the info of students before 2004.
            delete $college_info_hr->{'students'}{$student_num};
        }
    }
}

使用foreach循环的代码:

foreach my $college_code (keys %{$college_data_hr}) {
    foreach my $student_num (keys %{$college_data_hr->{$college_code}{'students'}}) {
        if($student_num < 104000) { ## Delete the info of students before 2004.
            delete $college_data_hr->{$college_code}{'students'}{$student_num};
        }
    }
}

当学院数量为70,000时,执行时间如下:

使用while循环的代码(间隔时间以秒为单位):

间隔时间:2.186621

间隔时间:2.058644

间隔时间:2.055645

间隔时间:2.101637

间隔时间:2.124632

使用foreach循环的代码(间隔时间以秒为单位):

间隔时间:1.341768

间隔时间:1.436751

间隔时间:1.346529

间隔时间:1.302775

间隔时间:1.356765

当学院数量为248,000时,执行时间如下:

(使用while循环的执行时间)

间隔时间:9.084427

间隔时间:8.438684

间隔时间:9.329338

间隔时间:9.169687

(使用foreach循环的执行时间)

间隔时间:5.502048

间隔时间:6.386692

间隔时间:5.596032

间隔时间:5.620144


比执行时间更重要的是它随输入大小的变化。使用双倍和三倍的输入大小测试相同的代码,并观察增长情况。 - perreal
@mob,抱歉。那是一个错误。已经编辑过了。 - vinod
2
首先,while/each代码生成的临时值比foreach代码多得多。我看不到 $student_info_hr 在哪里被使用,但是 while/each 代码仍然会填充它。类似 Devel::NYTProf 的工具可以帮助您逐行分解时间消耗情况。 - Joe Z
1
另外,我不知道deleteeach生成的迭代器会产生什么影响。在foreach循环中,keys会在前面运行一次以生成键列表。 - Joe Z
@perreal..已更新帖子,包括执行时间。 - vinod
显示剩余6条评论
3个回答

7
foreach版本只会在每个学院中对$college_data_hr->{$college_code}{'students'}哈希引用进行一次解引用,因此比需要每个学生都解引用的while版本更快。 foreach版本可能会使用更多内存,因为它需要构建临时列表来包含每个哈希的键。 Data::Alias可以帮助您加速while解决方案。虽然我没有对此进行基准测试,但这应该相当快...
use Data::Alias;

while ( my ($college_code, $college_info_hr) = each %$college_data_hr ) {
    alias ( my %students = %{$college_info_hr->{'students'}} );
    while ( my ($student_num, $student_info_hr) = each %students ) {
        if ($student_num < 104000) { ## Delete the info of students before 2004.
            delete $students{$student_num};
        }
    }
}

顺便提一下,内部循环可以重写为 delete @students{grep $_ < 104000, keys %students}。因此,所有这些代码可以简化为 for my $ci (values %$college_data_hr){ my $s = $ci->{students}; delete @$s{grep $_<104000, keys %$s}; } - Hynek -Pichi- Vychodil

2

每次通过while循环都需要执行一些操作(除了enterleave都适用于您的代码)。

>perl -MO=Concise,-exec -e"my ($college_code, $college_info_hr) = each (%{$college_data_hr})"
1  <0> enter
2  <;> nextstate(main 2 -e:1) v:{
3  <0> pushmark s
4  <#> gv[*college_data_hr] s
5  <1> rv2sv sKM/DREFHV,1
6  <1> rv2hv[t4] lKRM/1
7  <1> each lK/1
8  <0> pushmark sRM*/128
9  <0> padsv[$college_code:2,3] lRM*/LVINTRO
a  <0> padsv[$college_info_hr:2,3] lRM*/LVINTRO
b  <2> aassign[t5] vKS
c  <@> leave[1 ref] vKP/REFC
-e syntax OK

其中包括将值复制到$college_code$college_info_hr中。好处是它们不是字符串。

您的foreach循环没有做任何这样的事情。每次循环只会改变$college_code的别名。非常快速。当然,缺点是会使用更多的内存。


另一种选择:

for my $college_code (keys %$college_data_hr) {
    my $students = $college_data_hr->{$college_code}{students};
    delete @$students{ grep $_ < 104000, keys %$students };
}

@Hynek -Pichi- Vychodil,我没有注意到你也提到了删除切片! - ikegami

2
问题在于Perl无法进行许多编译语言中常见的优化。正如ikegami所指出的,每个while循环中,您都要从散列表中复制数据,并且还会进行许多不必要的散列表查找。以下是一些基准代码可供测试和调试。
#!/usr/bin/env perl

use 5.10.0;
use strict;
use warnings;
use Benchmark qw(:hireswallclock :all);
use Clone qw(clone);

my $data = {
    map +( $_ => { students => { map +( $_ => undef ), 103991 .. 104010 } } ),
    1 .. 70000
};
my $college_data_hr;

sub sum_time {
    my $t = shift;
    $t = timesum( $t, $_ ) for @_;
    return $t;
}

sub my_cmp_these {
    my %bench = @_;
    my %times;
    for ( 1 .. 10 ) {
        push @{ $times{$_} }, do {
            $college_data_hr = clone($data);
            timeit( 1, $bench{$_} );
            }
            for keys %bench;
    }
    $_ = sum_time(@$_) for values %times;
    cmpthese( \%times );
}

my_cmp_these(
    orig_while => sub {
        while ( my ( $college_code, $college_info_hr )
            = each( %{$college_data_hr} ) )
        {
            while ( my ( $student_num, $student_info_hr )
                = each( %{ $college_info_hr->{'students'} } ) )
            {
                if ( $student_num < 104000 )
                {    ## Delete the info of students before 2004.
                    delete $college_info_hr->{'students'}{$student_num};
                }
            }
        }
    },
    new_while => sub {
        while ( my ( undef, $college_info_hr ) = each( %{$college_data_hr} ) )
        {
            my $s = $college_info_hr->{'students'};
            while ( my ( $student_num, undef ) = each(%$s) ) {
                if ( $student_num < 104000 )
                {    ## Delete the info of students before 2004.
                    delete $s->{$student_num};
                }
            }
        }
    },
    orig_foreach => sub {
        foreach my $college_code ( keys %$college_data_hr ) {
            foreach my $student_num (
                keys %{ $college_data_hr->{$college_code}{'students'} } )
            {
                if ( $student_num < 104000 )
                {    ## Delete the info of students before 2004.
                    delete $college_data_hr->{$college_code}{'students'}
                        {$student_num};
                }
            }
        }
    },
    new_foreach => sub {
        foreach my $college_info ( values %$college_data_hr ) {
            my $students = $college_info->{'students'};
            delete @$students{ grep $_ < 104000, keys %$students };
        }
    },
    ikegami_foreach => sub {
        for my $college_code ( keys %$college_data_hr ) {
            my $students = $college_data_hr->{$college_code}{students};
            delete @$students{ grep $_ < 104000, keys %$students };
        }
    },
);

我的笔记本上的结果:

                s/iter orig_while new_while orig_foreach ikegami_foreach new_foreach
orig_while        1.56         --      -25%         -31%            -35%        -40%
new_while         1.17        33%        --          -8%            -14%        -21%
orig_foreach      1.08        44%        8%           --             -6%        -14%
ikegami_foreach   1.01        54%       16%           7%              --         -8%
new_foreach      0.927        68%       26%          16%              9%          --

248,000 的结果如下:

                s/iter orig_while new_while orig_foreach ikegami_foreach new_foreach
orig_while        6.19         --      -27%         -30%            -33%        -38%
new_while         4.54        36%        --          -5%             -8%        -16%
orig_foreach      4.31        44%        5%           --             -4%        -11%
ikegami_foreach   4.16        49%        9%           4%              --         -8%
new_foreach       3.83        62%       19%          13%              9%          --

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接