为什么“快速反平方根算法”在极大的浮点数下比1/sqrt()慢？

Question

为什么“快速反平方根算法”在极大的浮点数下比1/sqrt()慢？

3

以下完整代码可以比较快速反平方根算法与1/sqrt()的速度。根据维基百科中的一句话，即该算法大约比使用其他方法计算平方根并通过浮点除法计算倒数的速度快四倍。

但是我在这里的原因是：它比1/sqrt()慢。我的代码有问题吗？请帮忙看看。

    #include <stdio.h>
    #include <time.h>
    #include <math.h>

    float FastInvSqrt (float number);

    int
    main ()
    {
      float x = 1.0e+100;

      int N = 100000000;
      int i = 0;

      clock_t start2 = clock (); 
      do  
        {   
          float z = 1.0 / sqrt (x);
          i++;
        }   
      while (i < N); 
      clock_t end2 = clock (); 

      double time2 = (end2 - start2) / (double) CLOCKS_PER_SEC;

      printf ("1/sqrt() spends %13f sec.\n\n", time2);

      i = 0;
      clock_t start1 = clock (); 
      do  
        {   
          float y = FastInvSqrt (x);
          i++;
        }   
      while (i < N); 
      clock_t end1 = clock (); 

      double time1 = (end1 - start1) / (double) CLOCKS_PER_SEC;



      printf ("FastInvSqrt() spends %f sec.\n\n", time1);


      printf ("fast inverse square root is faster %f times than 1/sqrt().\n", time2/time1);

      return 0;
}

float
FastInvSqrt (float x)
{
  float xhalf = 0.5F * x;
  int i = *(int *) &x;  // store floating-point bits in integer
  i = 0x5f3759df - (i >> 1);    // initial guess for Newton's method
  x = *(float *) &i;            // convert new bits into float
  x = x * (1.5 - xhalf * x * x);        // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  return x;
}

结果如下：

1/sqrt() spends      0.850000 sec.

FastInvSqrt() spends 0.960000 sec.

fast inverse square root is faster 0.885417 times than 1/sqrt().

- San Tseng

1

你在测试函数的哪些值？可能在运行代码时会有一个初始的常数惩罚，例如输入1。因此，可以尝试一些非常大的值，并在其上进行基准测试。 - Tim Biegeleisen

1

你的代码很烂，因为循环内部没有使用平方根算法的结果。重新编写代码，使循环每次计算不同的倒数平方根，并打印结果的总和并启用优化。另外你的倒数平方根算法至少需要3个迭代才能达到单精度精度，应该一次计算8个以便编译器可以使用ymm寄存器向量化操作。 - user5713492

1

@San Tseng 维基百科声称FastInvSqrt比某个替代方案快四倍的参考文献是一篇发表于15年前的论文。没有特别的理由认为2003年适用于硬件平台、工具链和库仍然适用于今天，因为所有这些都已经发生了重大变化（例如，x86处理器的平方根硬件指令自那时以来已经变得更加快速）。 - njuffa

1

@DillonDavis 禁用优化进行时间测试是徒劳的。测试必须计算N个不同的倒数平方根，并且必须在输出中使用所有结果，否则优化器可以跳过循环。对于原始问题，您的循环执行了100000001次迭代，而不仅仅是1次。如果计算一个倒数平方根需要大约一秒钟的时间，那将是可悲的，这也不是正在发生的事情。 - user5713492

2

@user5713492 浮点单元整体变得更宽了，但这同样适用于平方根和乘法。据我所知，平方根与加/乘/fma的吞吐量比在十五年前已经显著缩小，从1:25降至1:6左右。 - njuffa

显示剩余10条评论

2个回答

0

我按以下方式更正我的代码： 1. 计算随机数，而不是固定的数字。 2. 在while循环内计算时间消耗并求和。

#include <stdio.h>
#include <time.h>
#include <math.h>
#include <stdlib.h>

float FastInvSqrt (float number);

int
main ()
{
  float x=0;
  time_t t;

  srand((unsigned) time(&t));

  int N = 1000000;
  int i = 0;

  double sum_time2=0.0;

  do  
    {   
      x=(float)(rand() % 10000)*0.22158;
  clock_t start2 = clock (); 
      float z = 1.0 / sqrt (x);
  clock_t end2 = clock (); 
        sum_time2=sum_time2+(end2-start2);
      i++;
    }   
  while (i < N); 


  printf ("1/sqrt() spends %13f sec.\n\n", sum_time2/(double)CLOCKS_PER_SEC);

  double sum_time1=0.0;

  i = 0;
  do  

    {
      x=(float)(rand() % 10000)*0.22158;
  clock_t start1 = clock ();
      float y = FastInvSqrt (x);
  clock_t end1 = clock ();
        sum_time1=sum_time1+(end1-start1);
      i++;
    }
  while (i < N);

  printf ("FastInvSqrt() spends %f sec.\n\n", sum_time1/(double)CLOCKS_PER_SEC);

  printf ("fast inverse square root is faster %f times than 1/sqrt().\n", sum_time2/sum_time1);

  return 0;
}

float
FastInvSqrt (float x)
{
  float xhalf = 0.5F * x;
  int i = *(int *) &x;  // store floating-point bits in integer
  i = 0x5f3759df - (i >> 1);    // initial guess for Newton's method
  x = *(float *) &i;            // convert new bits into float
  x = x * (1.5 - xhalf * x * x);        // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  return x;
}

但是快速反平方根仍然比1/sqrt()慢。

1/sqrt() spends      0.530000 sec.

FastInvSqrt() spends 0.540000 sec.

fast inverse square root is faster 0.981481 times than 1/sqrt().

- San Tseng

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Efin Fon · Accepted Answer

一个缩小计算精度范围的函数将具有更小的计算复杂度（意味着它可以更快地计算）。这可以被认为是优化计算函数形状以适用于其定义子集的过程，或者像搜索算法一样，每个算法都最适合特定类型的输入（没有免费午餐定理）。

因此，在区间[0,1]之外使用此函数（我想它是为之优化/设计的）意味着在其复杂度较高的输入子集中使用它，而不是其他可能专门计算平方根函数变体的输入子集。

你正在使用库中的sqrt()函数本身也被优化了（很可能），因为它在一种类似LUT（作为进一步近似的初始猜测）的预先计算值中。使用这样一个更“通用的函数”（意味着它涵盖了更多的域并尝试通过预计算使其更有效率，例如消除冗余计算，但限制较大，或在运行时最大化数据重用）有其复杂度限制，因为在多个预计算选项之间进行选择会增加决策开销。因此，知道在编译时所有对sqrt的输入都在区间[0,1]内将有助于减少运行时的决策开销，因为您将提前知道要使用哪个专门的近似函数（或者您可以在编译时为每个感兴趣的区间生成专门的函数，参见元编程）。