Objective-C中的非规范化浮点数?

7

Stack Overflow上这个问题 为什么将0.1f更改为0会将性能下降10倍?与Objective-C有何关联?如果有相关性,我应该如何改变我的编码习惯?是否有一些方法在Mac OS X上关闭非规格化浮点数?

看起来这与iOS完全无关。这是正确的吗?

1个回答

16

正如我在回复您的评论中所说:

这更多是一个CPU而不是语言问题,因此它可能与x86架构上的Objective-C有关。(iPhone的ARMv7似乎不支持非规格化浮点数,至少在默认运行时/构建设置下)

更新

我刚刚进行了测试。在x86架构的Mac OS X上观察到了减速现象,在ARMv7架构的iOS上没有(默认构建设置)。

并且如预期的那样,在iOS模拟器(在x86上)中出现了非规格化浮点数。

有趣的是,FLT_MINDBL_MIN分别定义为最小的非规格化数字(在iOS、Mac OS X和Linux上)。使用时会出现奇怪的问题。

DBL_MIN/2.0

在你的代码中,编译器可以愉快地设置一个非规格化常量,但是一旦(arm)CPU接触它,它将被设置为零:

double test = DBL_MIN/2.0;
printf("test      == 0.0 %d\n",test==0.0);
printf("DBL_MIN/2 == 0.0 %d\n",DBL_MIN/2.0==0.0);

输出:

test      == 0.0 1  // computer says YES
DBL_MIN/2 == 0.0 0  // compiler says NO

因此,如果要快速运行时检查是否支持非规格化值,则可以:

#define SUPPORT_DENORMALIZATION ({volatile double t=DBL_MIN/2.0;t!=0.0;})

("given without even the implied warranty of fitness for any purpose")
ARM在0刷新模式下如此表述:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204h/Bcfheche.html 更新<<1 以下是在ARMv7上禁用0刷新模式的方法:
int x;
asm(
    "vmrs %[result],FPSCR \r\n"
    "bic %[result],%[result],#16777216 \r\n"
    "vmsr FPSCR,%[result]"
    :[result] "=r" (x) : :
);
printf("ARM FPSCR: %08x\n",x);

以下是意想不到的结果。
  • 第一列:一个浮点数,每次迭代除以2
  • 第二列:这个浮点数的二进制表示
  • 第三列:将这个浮点数加和1e7次所需的时间

您可以清楚地看到,非规格化浮点数的计算成本为零。(对于iPad 2而言。在iPhone 4上,它会导致10%的速度下降。)

0.000000000000000000000000000000000100000004670110: 10111100001101110010000011100000 110 ms
0.000000000000000000000000000000000050000002335055: 10111100001101110010000101100000 110 ms
0.000000000000000000000000000000000025000001167528: 10111100001101110010000001100000 110 ms
0.000000000000000000000000000000000012500000583764: 10111100001101110010000110100000 110 ms
0.000000000000000000000000000000000006250000291882: 10111100001101110010000010100000 111 ms
0.000000000000000000000000000000000003125000145941: 10111100001101110010000100100000 110 ms
0.000000000000000000000000000000000001562500072970: 10111100001101110010000000100000 110 ms
0.000000000000000000000000000000000000781250036485: 10111100001101110010000111000000 110 ms
0.000000000000000000000000000000000000390625018243: 10111100001101110010000011000000 110 ms
0.000000000000000000000000000000000000195312509121: 10111100001101110010000101000000 110 ms
0.000000000000000000000000000000000000097656254561: 10111100001101110010000001000000 110 ms
0.000000000000000000000000000000000000048828127280: 10111100001101110010000110000000 110 ms
0.000000000000000000000000000000000000024414063640: 10111100001101110010000010000000 110 ms
0.000000000000000000000000000000000000012207031820: 10111100001101110010000100000000 111 ms
0.000000000000000000000000000000000000006103515209: 01111000011011100100001000000000 110 ms
0.000000000000000000000000000000000000003051757605: 11110000110111001000010000000000 110 ms
0.000000000000000000000000000000000000001525879503: 00010001101110010000100000000000 110 ms
0.000000000000000000000000000000000000000762939751: 00100011011100100001000000000000 110 ms
0.000000000000000000000000000000000000000381469876: 01000110111001000010000000000000 112 ms
0.000000000000000000000000000000000000000190734938: 10001101110010000100000000000000 110 ms
0.000000000000000000000000000000000000000095366768: 00011011100100001000000000000000 110 ms
0.000000000000000000000000000000000000000047683384: 00110111001000010000000000000000 110 ms
0.000000000000000000000000000000000000000023841692: 01101110010000100000000000000000 111 ms
0.000000000000000000000000000000000000000011920846: 11011100100001000000000000000000 110 ms
0.000000000000000000000000000000000000000005961124: 01111001000010000000000000000000 110 ms
0.000000000000000000000000000000000000000002980562: 11110010000100000000000000000000 110 ms
0.000000000000000000000000000000000000000001490982: 00010100001000000000000000000000 110 ms
0.000000000000000000000000000000000000000000745491: 00101000010000000000000000000000 110 ms
0.000000000000000000000000000000000000000000372745: 01010000100000000000000000000000 110 ms
0.000000000000000000000000000000000000000000186373: 10100001000000000000000000000000 110 ms
0.000000000000000000000000000000000000000000092486: 01000010000000000000000000000000 110 ms
0.000000000000000000000000000000000000000000046243: 10000100000000000000000000000000 111 ms
0.000000000000000000000000000000000000000000022421: 00001000000000000000000000000000 110 ms
0.000000000000000000000000000000000000000000011210: 00010000000000000000000000000000 110 ms
0.000000000000000000000000000000000000000000005605: 00100000000000000000000000000000 111 ms
0.000000000000000000000000000000000000000000002803: 01000000000000000000000000000000 110 ms
0.000000000000000000000000000000000000000000001401: 10000000000000000000000000000000 110 ms
0.000000000000000000000000000000000000000000000000: 00000000000000000000000000000000 110 ms
0.000000000000000000000000000000000000000000000000: 00000000000000000000000000000000 110 ms
0.000000000000000000000000000000000000000000000000: 00000000000000000000000000000000 110 ms

1
@Yar:我会建议使用“-ffast-math”进行编译,但无论我设置什么标志,它都拒绝将其刷新为零。 - mvds
1
@Yar:不太确定,可以从一些小值开始,每轮除以2。然后,如果找到一些有趣的阈值,可以与预定义的常量作比较。 - mvds
1
@Yar:“特别是,Java编程语言需要支持IEEE 754规范中的非规格化浮点数和逐渐下溢。”(来源:http://java.sun.com/docs/books/jls/second_edition/html/typesValues.doc.html) - mvds
1
@Yar:对于Java,我得到了完全相同的结果。当你的float值小于Float.MIN_NORMAL(~1E-38)时,速度会慢50倍。 - mvds
1
哦,我现在明白了。这是我的端口(有改进吗?):http://pastebin.com/2ZDvdCDv。所以只要避免在MIN_NORMAL和MIN_VALUE之间使用数字,即使在Java中,您也可以将代码加速20倍以上。接下来我要尝试Ruby :) - Dan Rosenstark
显示剩余17条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接