在C++中计算给定范围内可能的浮点数值数量

5
我正在开发一个使用Crypto++的加密应用程序。作为该应用程序的一个晦涩部分,我需要确定在某个数值范围内可能存在的唯一浮点数值的最大数量。
显然,在现实中0和1之间有无限多个数字,但并非所有数字都可以由唯一的浮点数值表示。
我有一个最小浮点值和一个最大浮点值。我需要确定此范围内可能的浮点数值的数量。
这很棘手,因为浮点值越远离0,它们之间的间隔就越大。
例如,0和1之间可能的浮点值数量与100,000和100,001之间的浮点值数量非常不同。
对于我的目的,我希望计数包括最小值和最大值。但是,生成“排除”计数的算法也同样有用,因为我只需根据需要添加1或2。
另一个问题是:如果0在范围内怎么办?例如,如果最小值为-2.0,最大值为正2.0,则我不希望将0计算两次(一次是为0,一次是为-0)。此外,如果最小值或最大值为+/-无穷大会出现什么问题?(如果最小值或最大值为NaN,则可能会引发异常)。
uint32_t RangeValueCount ( float fMin , float fMax )
{
    if ( fMin > fMax )
        swap ( fMin , fMax ) ;  // Ensure fMin <= fMax

    // Calculate the number of possible floating-point values between fMin and fMax.

    return ( *reinterpret_cast < uint32_t* > ( &fMax ) -
             *reinterpret_cast < uint32_t* > ( &fMin ) ) + 1 ;

    // This algorithm is obviously unsafe, assumes IEEE 754
    // How should I account for -0 or infinity?
}

如果这个问题可以得到解决,我认为解决方案同样适用于double值(并可能适用于long double值,但由于80位整数值等复杂原因可能会更加复杂)。

5
你想“欺骗”并使用数字的内部表示(查看它们的字节),还是要避免这样做?你想假设这些数字是IEEE 754浮点数吗?这是一个近似重复的问题,但那里的答案质量不太好。 - Eric Postpischil
请阅读此帖子:https://randomascii.wordpress.com/2012/01/11/tricks-with-the-floating-point-format/,以及同一博客上标记为“浮点数”的其他帖子。 - Richard Critten
一个有趣的路径(不一定是最好的,但我不知道足够的理由来推断它)可能是基于SAT / SMT求解器的形式验证工具。特别是SAT求解在模型计数方面有很多研究(通常情况下:尝试比详尽枚举更聪明)。 (编辑:但是模型计数肯定会计算+0和-0) - sascha
2个回答

7

以下是处理所有有限数字的代码,该代码期望使用IEEE 754算术。我已经用更简单、更清晰的代码替换了以前的版本。这个版本有两种实现方法来将浮点数转换为其编码(一种是通过拷贝位,另一种是通过数学操作)。之后,距离计算就相对简单了(需要调整负值,然后距离就是一个减法运算)。

#include <ctgmath>
#include <cstdint>
#include <cstdlib>
#include <iostream>
#include <limits>


typedef double Float;       //  The floating-point type to use.
typedef std::uint64_t UInt; //  Unsigned integer of same size as Float.


/*  Define a value with only the high bit of a UInt set.  This is also the
    encoding of floating-point -0.
*/
static constexpr UInt HighBit
    = std::numeric_limits<UInt>::max() ^ std::numeric_limits<UInt>::max() >> 1;


//  Return the encoding of a floating-point number by copying its bits.
static UInt EncodingBits(Float x)
{
    UInt result;
    std::memcpy(&result, &x, sizeof result);
    return result;
}


//  Return the encoding of a floating-point number by using math.
static UInt EncodingMath(Float x)
{
    static constexpr int SignificandBits = std::numeric_limits<Float>::digits;
    static constexpr int MinimumExponent = std::numeric_limits<Float>::min_exponent;

    //  Encode the high bit.
    UInt result = std::signbit(x) ? HighBit : 0;

    //  If the value is zero, the remaining bits are zero, so we are done.
    if (x == 0) return result;

    /*  The C library provides a little-known routine to split a floating-point
        number into a significand and an exponent.  Note that this produces a
        normalized significand, not the actual significand encoding.  Notably,
        it brings significands of subnormals up to at least 1/2.  We will
        adjust for that below.  Also, this routine normalizes to [1/2, 1),
        whereas IEEE 754 is usually expressed with [1, 2), but that does not
        bother us.
    */
    int xe;
    Float xf = std::frexp(fabs(x), &xe);

    //  Test whether the number is subnormal.
    if (xe < MinimumExponent)
    {
        /*  For a subnormal value, the exponent encoding is zero, so we only
            have to insert the significand bits.  This scales the significand
            so that its low bit is scaled to the 1 position and then inserts it
            into the encoding.
        */
        result |= (UInt) std::ldexp(xf, xe - MinimumExponent + SignificandBits);
    }
    else
    {
        /*  For a normal value, the significand is encoded without its leading
            bit.  So we subtract .5 to remove that bit and then scale the
            significand so its low bit is scaled to the 1 position.
        */
        result |= (UInt) std::ldexp(xf - .5, SignificandBits);

        /*  The exponent is encoded with a bias of (in C++'s terminology)
            MinimumExponent - 1.  So we subtract that to get the exponent
            encoding and then shift it to the position of the exponent field.
            Then we insert it into the encoding.
        */
        result |= ((UInt) xe - MinimumExponent + 1) << (SignificandBits-1);
    }

    return result;
}


/*  Return the encoding of a floating-point number.  For illustration, we
    get the encoding with two different methods and compare the results.
*/
static UInt Encoding(Float x)
{
    UInt xb = EncodingBits(x);
    UInt xm = EncodingMath(x);

    if (xb != xm)
    {
        std::cerr << "Internal error encoding" << x << ".\n";
        std::cerr << "\tEncodingBits says " << xb << ".\n";
        std::cerr << "\tEncodingMath says " << xm << ".\n";
        std::exit(EXIT_FAILURE);
    }

    return xb;
}


/*  Return the distance from a to b as the number of values representable in
    Float from one to the other.  b must be greater than or equal to a.  0 is
    counted only once.
*/
static UInt Distance(Float a, Float b)
{
    UInt ae = Encoding(a);
    UInt be = Encoding(b);

    /*  For represented values from +0 to infinity, the IEEE 754 binary
        floating-points are in ascending order and are consecutive.  So we can
        simply subtract two encodings to get the number of representable values
        between them (including one endpoint but not the other).

        Unfortunately, the negative numbers are not adjacent and run the other
        direction.  To deal with this, if the number is negative, we transform
        its encoding by subtracting from the encoding of -0.  This gives us a
        consecutive sequence of encodings from the greatest magnitude finite
        negative number to the greatest finite number, in ascending order
        except for wrapping at the maximum UInt value.

        Note that this also maps the encoding of -0 to 0 (the encoding of +0),
        so the two zeroes become one point, so they are counted only once.
    */
    if (HighBit & ae) ae = HighBit - ae;
    if (HighBit & be) be = HighBit - be;

    //  Return the distance between the two transformed encodings.
    return be - ae;
}


static void Try(Float a, Float b)
{
    std::cout << "[" << a << ", " << b << "] contains "
        << Distance(a,b) + 1 << " representable values.\n";
}


int main(void)
{
    if (sizeof(Float) != sizeof(UInt))
    {
        std::cerr << "Error, UInt must be an unsigned integer the same size as Float.\n";
        std::exit(EXIT_FAILURE);
    }

    /*  Prepare some test values:  smallest positive (subnormal) value, largest
        subnormal value, smallest normal value.
    */
    Float S1 = std::numeric_limits<Float>::denorm_min();
    Float N1 = std::numeric_limits<Float>::min();
    Float S2 = N1 - S1;

    //  Test 0 <= a <= b.
    Try( 0,  0);
    Try( 0, S1);
    Try( 0, S2);
    Try( 0, N1);
    Try( 0, 1./3);
    Try(S1, S1);
    Try(S1, S2);
    Try(S1, N1);
    Try(S1, 1./3);
    Try(S2, S2);
    Try(S2, N1);
    Try(S2, 1./3);
    Try(N1, N1);
    Try(N1, 1./3);

    //  Test a <= b <= 0.
    Try(-0., -0.);
    Try(-S1, -0.);
    Try(-S2, -0.);
    Try(-N1, -0.);
    Try(-1./3, -0.);
    Try(-S1, -S1);
    Try(-S2, -S1);
    Try(-N1, -S1);
    Try(-1./3, -S1);
    Try(-S2, -S2);
    Try(-N1, -S2);
    Try(-1./3, -S2);
    Try(-N1, -N1);
    Try(-1./3, -N1);

    //  Test a <= 0 <= b.
    Try(-0., +0.);
    Try(-0., S1);
    Try(-0., S2);
    Try(-0., N1);
    Try(-0., 1./3);
    Try(-S1, +0.);
    Try(-S1, S1);
    Try(-S1, S2);
    Try(-S1, N1);
    Try(-S1, 1./3);
    Try(-S2, +0.);
    Try(-S2, S1);
    Try(-S2, S2);
    Try(-S2, N1);
    Try(-S2, 1./3);
    Try(-N1, +0.);
    Try(-N1, S1);
    Try(-N1, S2);
    Try(-N1, N1);
    Try(-1./3, 1./3);
    Try(-1./3, +0.);
    Try(-1./3, S1);
    Try(-1./3, S2);
    Try(-1./3, N1);
    Try(-1./3, 1./3);

    return 0;
}

感谢您的回答,期待完成的产品。回答您之前的问题,我不介意通过查看字节来“作弊”。但是一般来说,非编码特定的解决方案总是更可取的,如果不是必需的话。 - Giffyguy
没问题,谢谢你的帮助。我正在研究你的代码,它非常有教育意义。 - Giffyguy
@Giffyguy:我添加了注释并参数化它,这样您就可以更改浮点类型(将“UInt”更改为匹配)。 - Eric Postpischil
如果 (a == 0) 则 ae = 最小指数; 为什么不是针对 b 呢?因为 a<=b,a!=b 和 a>=0。 - Yakk - Adam Nevraumont

-1

这有点棘手,潜在的解决方法是尝试在循环中使用 std::nexttoward(from_starting, to_end); 并计数直到结束。我自己没有尝试过,而且这可能需要很长时间才能完成。如果您这样做,请确保检查错误标志,参见:http://en.cppreference.com/w/cpp/numeric/math/nextafter


2
这不是一个实际的答案,对于64位浮点数来说肯定不可行。 - Eric Postpischil
2
完全同意并不切实际。不确定如何以高效准确的方式完成它。 - AdvSphere
4
为了让你大概了解需要多长时间,我尝试使用映射到IEEE-754二进制32位的float类型从0数到1.0,这在一个3.5 GHz的x86处理器上花费了7.6秒钟(使用Intel C/C++ 13.x进行编译,选用了/Ox /QxHOST /fp:strict编译选项)。 - njuffa

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接