在for循环中对数组边界进行优化检查

4
        var ar = new int[500000000];

        var sw = new Stopwatch();
        sw.Start();

        var length = ar.Length;
        for (var i = 0; i < length; i++)
        {
            if (ar[i] == 0);
        }

        sw.Stop();

sw.ElapsedMilliseconds: 约2930毫秒

        var ar = new int[500000000];

        var sw = new Stopwatch();
        sw.Start();

        for (var i = 0; i < ar.Length; i++)
        {
            if (ar[i] == 0);
        }

        sw.Stop();

sw.ElapsedMilliseconds: ~3520ms

Win8x64, VS12, .NET4.5, 发布版本,"优化代码"开启。

据我所知,第二种方法应该更快,因为它进行了一个数组边界检查优化。我有什么遗漏吗?


尝试反向迭代,这样会更快,因为它甚至不需要与数组长度进行比较,如果不为零,它可以分支。 - Ian Mercer
4个回答

6

我也在使用Win8 x64,.NET 4.5,发布版本,不在调试器中(这一点很重要);我得到的结果是:

0: 813ms vs 421ms
1: 439ms vs 420ms
2: 440ms vs 420ms
3: 431ms vs 429ms
4: 433ms vs 427ms
5: 424ms vs 437ms
6: 427ms vs 434ms
7: 430ms vs 432ms
8: 432ms vs 435ms
9: 430ms vs 430ms
10: 427ms vs 418ms
11: 422ms vs 421ms
12: 434ms vs 420ms
13: 439ms vs 425ms
14: 426ms vs 429ms
15: 426ms vs 426ms
16: 417ms vs 432ms
17: 442ms vs 425ms
18: 420ms vs 429ms
19: 420ms vs 422ms

第一个需要支付JIT /“融合”成本,但整体来看差不多(每列中有些看起来更快,但总体上没有太大的区别)。
using System;
using System.Diagnostics;
static class Program
{
    static void Main()
    {
        var ar = new int[500000000];

        for (int j = 0; j < 20; j++)
        {
            var sw = Stopwatch.StartNew();
            var length = ar.Length;
            for (var i = 0; i < length; i++)
            {
                if (ar[i] == 0) ;
            }

            sw.Stop();
            long hoisted = sw.ElapsedMilliseconds;

            sw = Stopwatch.StartNew();
            for (var i = 0; i < ar.Length; i++)
            {
                if (ar[i] == 0) ;
            }
            sw.Stop();
            long direct = sw.ElapsedMilliseconds;

            Console.WriteLine("{0}: {1}ms vs {2}ms", j, hoisted, direct);
        }
    }
}

我分别得到了627毫秒和352毫秒,结果非常相似。 - Matthew Watson
注意:一些额外的运行显示出了相当大的差异,就“哪个运行更快”而言,基本上我们应该得出的结论是它们“大致相同”——两者都没有因其实现选择而遭受重大惩罚。 - Marc Gravell
看起来对我来说,“调试器之外”是关键,现在我有了类似的结果。谢谢! - aush
1
@aush 是的,这个问题困扰了很多人。你无法在调试器附加的情况下测量性能 - 它不起作用。 - Marc Gravell

5
我进一步调查了这个问题,发现很难制定一个真正显示边界检查消除优化效果的基准。
首先旧基准测试存在一些问题:
- 反汇编显示JIT编译器同样可以将第一个版本进行优化。这令我感到惊讶,但反汇编不会撒谎。当然,这完全打击了这项基准测试的目的。修复方法:将数组长度作为函数参数。 - 数组太大了,这意味着需要缓存错过,给我们的信号增加了很多噪音。修复方法:使用短数组,但多次循环遍历它。
但现在真正的问题是:它做了过度聪明的事情。内部循环中没有数组边界检查,即使循环长度来自函数参数。生成的代码有所不同,但内部循环本质上是相同的。不完全相同(寄存器等不同),但遵循相同的模式:
_loop: mov eax, [somewhere + index]
       add index, 4
       cmp index, end
       jl _loop

执行时间没有显著差异,因为生成的代码中最重要的部分没有显著差异。


1
我认为答案是垃圾回收器正在运行并改变你的时间。

免责声明:因为您没有发布可编译的示例,所以我无法看到OP代码的整个上下文;我假设您正在重新分配数组而不是重用它。如果不是这样,那么这不是正确的答案!

请考虑以下代码:

using System;
using System.Diagnostics;

namespace Demo
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            var ar = new int[500000000];
            test1(ar);
            //ar = new int[500000000]; // Uncomment this line.
            test2(ar);
        }

        private static void test1(int[] ar)
        {
            var sw = new Stopwatch();
            sw.Start();

            var length = ar.Length;
            for (var i = 0; i < length; i++)
            {
                if (ar[i] == 0);
            }

            sw.Stop();                
            Console.WriteLine("test1 took " + sw.Elapsed);
        }

        private static void test2(int[] ar)
        {
            var sw = new Stopwatch();
            sw.Start();

            for (var i = 0; i < ar.Length; i++)
            {
                if (ar[i] == 0);
            }

            sw.Stop();
            Console.WriteLine("test2 took " + sw.Elapsed);
        }
    }
}

在我的系统上,它打印出:

test1 took 00:00:00.6643788
test2 took 00:00:00.3516378

如果我取消注释标记为// Uncomment this line.的那一行,则时间将会改变为:
test1 took 00:00:00.6615819
test2 took 00:00:00.6806489

这是因为垃圾回收器正在收集先前的数组。
[编辑]为了避免JIT启动成本,我将整个测试放入循环中:
for (int i = 0; i < 8; ++i)
{
    test1(ar);
    ar = new int[500000000]; // Uncomment this line.
    test2(ar);
}

然后,我注释掉第二个数组分配的结果是:

test1 took 00:00:00.6437912
test2 took 00:00:00.3534027
test1 took 00:00:00.3401437
test2 took 00:00:00.3486296
test1 took 00:00:00.3470775
test2 took 00:00:00.3675475
test1 took 00:00:00.3501221
test2 took 00:00:00.3549338
test1 took 00:00:00.3427057
test2 took 00:00:00.3574063
test1 took 00:00:00.3566458
test2 took 00:00:00.3462722
test1 took 00:00:00.3430952
test2 took 00:00:00.3464017
test1 took 00:00:00.3449196
test2 took 00:00:00.3438316

启用第二个数组分配:

test1 took 00:00:00.6572665
test2 took 00:00:00.6565778
test1 took 00:00:00.3576911
test2 took 00:00:00.6910897
test1 took 00:00:00.3464013
test2 took 00:00:00.6638542
test1 took 00:00:00.3548638
test2 took 00:00:00.6897472
test1 took 00:00:00.4464020
test2 took 00:00:00.7739877
test1 took 00:00:00.3835624
test2 took 00:00:00.8432918
test1 took 00:00:00.3496910
test2 took 00:00:00.6471341
test1 took 00:00:00.3486505
test2 took 00:00:00.6527160

请注意,由于垃圾回收机制的存在,test2的执行时间通常更长。
不幸的是,垃圾回收机制使得计时结果几乎没有意义。
例如,如果我将测试代码更改为以下内容:
for (int i = 0; i < 8; ++i)
{
    var ar = new int[500000000];
    GC.Collect();
    test1(ar);
    //ar = new int[500000000]; // Uncomment this line.
    test2(ar);
}

去掉注释后,我得到:

test1 took 00:00:00.6354278
test2 took 00:00:00.3464486
test1 took 00:00:00.6672933
test2 took 00:00:00.3413958
test1 took 00:00:00.6724916
test2 took 00:00:00.3530412
test1 took 00:00:00.6606178
test2 took 00:00:00.3413083
test1 took 00:00:00.6439316
test2 took 00:00:00.3404499
test1 took 00:00:00.6559153
test2 took 00:00:00.3413563
test1 took 00:00:00.6955377
test2 took 00:00:00.3364670
test1 took 00:00:00.6580798
test2 took 00:00:00.3378203

并且如果它没有被注释:

test1 took 00:00:00.6340203
test2 took 00:00:00.6276153
test1 took 00:00:00.6813719
test2 took 00:00:00.6264782
test1 took 00:00:00.6927222
test2 took 00:00:00.6269447
test1 took 00:00:00.7010559
test2 took 00:00:00.6262000
test1 took 00:00:00.6975080
test2 took 00:00:00.6457846
test1 took 00:00:00.6796235
test2 took 00:00:00.6341214
test1 took 00:00:00.6823508
test2 took 00:00:00.6455403
test1 took 00:00:00.6856985
test2 took 00:00:00.6430923

我认为这个测试的道德是:与代码的其余部分相比,该特定测试的GC开销非常大,完全扭曲了计时结果,并且不能信任它们具有任何意义。

0

你正在调用第二个属性,因此速度会变慢 ar.Length


1
不,JIT可以识别这种常见模式并对其进行优化。 - Marc Gravell
如果它有优化,执行时间肯定是相同的吧? - Dreamwalker
刚看到你的更新,所以它是我预期中应该发生的,已经进行了优化。 - Dreamwalker

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接