为什么 C# 中双精度浮点数的算术运算似乎比长整型的算术运算更快？

Question

为什么 C# 中双精度浮点数的算术运算似乎比长整型的算术运算更快？

c#performancefloating-pointdoublelong-integer

6

下面代码的惊人输出结果显示，使用双精度进行算术运算比使用长整型快了近100%：

测试除法运算符浮点算术耗时：15974.5024毫秒。

测试除法运算符整数算术耗时：28548.183毫秒。

所用构建设置为.Net4.5 C# 5.0（平台目标：x64）。

所用硬件为英特尔Core i5-2520M（运行Windows7 64位）。 注意：所使用的运算符（这里是除法）会影响结果，除法最大化了这一观察结果。

const int numOfIterations = 1; //this value takes memory access out of the game
const int numOfRepetitions = 500000000; //CPU bound application
Random rand = new Random();
double[] Operand1 = new double[numOfIterations];
double[] Operand2 = new double[numOfIterations];
double[] Operand3 = new double[numOfIterations];

long[] Int64Operand1 = new long[numOfIterations];
long[] Int64Operand2 = new long[numOfIterations];
long[] Int64Operand3 = new long[numOfIterations];

for (int i = 0; i < numOfIterations; i++)
{
    Operand1[i]=(rand.NextDouble() * 100);
    Operand2[i]=(rand.NextDouble() * 80);
    Operand3[i]=(rand.NextDouble() * 17);
    Int64Operand1[i] = (long)Operand1[i];
    Int64Operand2[i] = (long)Operand2[i]+1;
    Int64Operand3[i] = (long)Operand3[i]+1;
}

double[] StdResult = new double[numOfIterations];
long[] NewResult = new long[numOfIterations];

TimeSpan begin = Process.GetCurrentProcess().TotalProcessorTime;

for (int j = 0; j < numOfRepetitions; j++)
{
    for (int i = 0; i < numOfIterations; i++)
    {
        double result = Operand1[i] / Operand2[i];
        result = result / Operand3[i];
        StdResult[i]=(result);
    }

}

TimeSpan end = Process.GetCurrentProcess().TotalProcessorTime;
Console.WriteLine("Test_DivOperator Float arithmetic measured time: " + (end - begin).TotalMilliseconds + " ms.");

begin = Process.GetCurrentProcess().TotalProcessorTime;

for (int j = 0; j < numOfRepetitions; j++)
{
    for (int i = 0; i < numOfIterations; i++)
    {
        long result =    Int64Operand1[i] / Int64Operand2[i];
        result = result / Int64Operand3[i];
        NewResult[i]=(result);
    }

}

end = Process.GetCurrentProcess().TotalProcessorTime;
Console.WriteLine("Test_DivOperator Integer arithmetic measured time: " + (end - begin).TotalMilliseconds + " ms.");

- Ahmed Khalaf

将项目存储在长度未定义的数组中也可能会引入重新分配的开销。这会在您的测试中引入额外的变量。 - GolezTrol

1

相关阅读材料：优化整数除法。 - GolezTrol

@GolezTrol 分配并没有在规定的时间内完成，所有的数组都是固定大小的.. 另外，这两个循环是相同的。 - Ahmed Khalaf

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- harold · Accepted Answer

7

这并不出乎意料。64位整数除法就是那么慢。

你的处理器是Sandy Bridge，查看延迟和吞吐量表格，64位idiv的延迟要高得多，吞吐量也要差得多于divsd。

其他微架构也显示出类似的差异。

实际计算为2.8548183E10ns / 500000000 = 每次迭代57ns，在3.2GHz的频率下大约需要183个周期，有两个除法和一些额外的开销，所以这并不奇怪。

对于双精度浮点数，结果为32ns，102个周期，实际上比我预期的还要多。

- harold

当编译为x86（32位）时，两个操作的结果都更好，但整数除法仍然较差，这也有意义吗？Test_DivOperator浮点算术测量时间：8283.6531毫秒。 Test_DivOperator整数算术测量时间：13384.8858毫秒。公众普遍认为整数算术比浮点算术简单，因此更快，但至少对于这种微架构来说，这种假设不再成立了，您是否有更多相关信息？ - Ahmed Khalaf

1

@AhmedKhalaf，所有的内容都在我链接的表格中了。整数运算通常很快，只有除法比较慢。32位除法并不像64位除法那么糟糕，但浮点除法也比双精度除法快，所以它仍然是更优的选择。 - harold

我的意思是为什么整数除法比浮点数除法慢？（对于相同位数的64位整型和双精度浮点数而言）。在理论上，整数除法比浮点数简单。此外，使用加法而不是除法的上述代码的实验结果表明，双精度浮点数需要5740.8368毫秒，而64位整型需要6957.6446毫秒（编译为x86而非x64）。 - Ahmed Khalaf

@AhmedKhalaf 整数比同样大小的浮点数有更多的位数可以进行除法运算（64位除法实际上是将128位数字除以64位数字，而双精度只有53位可供除法运算），而且除法运算大多是一个顺序过程。此外，浮点数除法似乎更受青睐，因为它直接内置了自己的µops，而整数除法使用了一堆µops。除此之外，还有JIT添加的开销。 - harold

好的，我会尝试使用C++/汇编语言，看看效果如何 :D - Ahmed Khalaf

显示剩余3条评论