将浮点数值四舍五入到最接近的二的幂次方。

Question

将浮点数值四舍五入到最接近的二的幂次方。

3

虽然有许多方法来找到一个整数或浮点数的下一个二次幂，但是对于找到一个数字的最接近的二次幂，可选择的方法并不多。

我已经实施了以下内容：

template <typename T>
static constexpr T round_pow2(T v) {
    if constexpr (std::is_floating_point_v<T>) {
        auto high = static_cast<unsigned long>(std::ceil(v));
        auto low  = static_cast<unsigned long>(std::floor(v));

        if (high == low) {
            return round_pow2<unsigned long>(high);
        } else {
            T a = static_cast<T>(round_pow2<unsigned long>(low));
            T b = static_cast<T>(round_pow2<unsigned long>(high));

            return std::abs(a - v) <= std::abs(b - v) ? a : b;
        }
    } else {
        T high = v - 1;

        for (T i = 1; i < static_cast<T>(sizeof(T)); i *= 2) {
            high |= high >> i;
        }

        high += 1;
        T low = high >> 1;

        return (high - v) < (v - low) ? high : low;
    }
}

这应该适用于任何非负且小于ULLONG_MAX的浮点值。但这似乎不是最优的。有没有更好（更高效）的实现这个函数的方法？

编辑： @Eric 对于我的应用程序，如果v == 0，则得到0是可以接受的。但对于一些人来说，这可能是一个问题，因为0不是2的幂。

@Blixodus 谢谢你的答案，它指引我朝着正确的方向。我根据你的想法创建了以下函数：


template <typename T>
constexpr T round_p2(T v) {
    if constexpr (std::is_floating_point_v<T>) {
        using R = std::conditional_t<
            std::is_same_v<T, double>, uint64_t,
            std::conditional_t<std::is_same_v<T, float>, uint32_t,
            void
        >>;

        auto [mlen, es, em] = std::is_same_v<T, double> ? std::make_tuple(52, 1024, 0x7FF) : std::make_tuple(23, 128, 0xFF);
        auto y = *reinterpret_cast<R*>(&v);
        return (T(y >> (sizeof(R) * 8 - 1)) * -2 + 1) * (2 << (((y >> mlen) & em) - es + ((y >> mlen - 1) & 0x1)));
    } else {
        using R = std::make_unsigned_t<T>;
        R rv = static_cast<R>(v);
        T sign = 1;

        if constexpr (std::is_signed_v<T>) {
            if (v < 0) {
                rv = static_cast<R>(-v);
                sign = -1;
            }
        }

        R high = rv - 1;

        for (R i = 1; i < static_cast<R>(sizeof(R)); i *= 2) {
            high |= high >> i;
        }

        high += 1;
        R low = high >> 1;

        return sign * static_cast<T>((high - rv) <= (rv - low) ? high : low);
    }
}

对于我的应用来说，它似乎运行得非常好，并且与我第一次实现相比，生成的汇编代码非常漂亮。

解释：首先，根据v是浮点数还是双精度数，我获取三个魔术值：第一个是尾数的长度，第二个是指数需要减去的量（加一），第三个是用于从fp表示中提取指数的掩码。

然后，我将fp值转换为相同大小的无符号整数，以便能够对其进行操作。

接下来，我使用(T(y >> (sizeof(R) * 8 - 1)) * -2 + 1)提取将签署最终结果的值。这会提取fp值的最高位（即符号位，正数为0，负数为1），然后将函数f(x) = x * -2 + 1应用于它，这样当x=0时，结果为1，当x=1时，结果为-1。

最终，我使用Blixodus的公式计算给定fp值的最接近的无符号二次幂。因为我不使用std::pow函数，而是使用位移（因为我们在处理二次幂）。我需要通过将我们要进行2位移的值减去1来考虑这一点（因此es的值比预期多1）。

- Sadiinso

你可以使用查找表和一些索引表达式来完成这个任务，例如 x ^ (x-1)，我记不清了。不需要使用循环。 - user207421

@user207421：x ^ (x-1) 是用来找到最低有效位的。这对于找到最接近的二的幂是没有用的，因为它受到最高有效位的影响。 - Eric Postpischil

你的代码对于0.5产生了0，但是0不是2的幂。对于小于3/4的数字，期望的结果是什么？对于恰好处于两个2的幂之间的数字，你的代码向下取整。这是有意为之吗？ - Eric Postpischil

@EricPostpischil 我会认为0是2的负无穷幂，尽管其他浮点数值都不会舍入到它。无论如何，由于浮点域是离散的，0是一个有效的浮点数二次幂。但是是的，对于0.5，代码应该返回0.5。 - Nelfeal

@EricPostpischil 我会认为0是2的负无穷次方，尽管没有其他浮点数值会四舍五入为0。无论如何，由于浮点域是离散的，0是一个有效的浮点数二次幂。但是是的，对于0.5，代码应该返回0.5。 - Nelfeal

显示剩余6条评论

2个回答

3

浮点数使用1位表示符号，n位表示指数，m位表示尾数。例如，32位浮点数的编码为1位符号，8位指数和23位尾数。32位浮点数的值可以通过以下方程简单计算得出：

value = (-1)^sign * 2^(E-127) * (1 + i从1到23的和(b_(23-1) * 2^(-i)))

详见这里的注释。

因此，如果尾数部分小于1.5，则结果更接近于(-1)^sign*2^(E-127)；如果尾数部分大于等于1.5，则结果更接近于(-1)^sign*2^(E-126)。

尾数部分的第一位指示了尾数是小于1.5还是大于等于1.5（它将0.5加到总和中）。

因此，您只需查看尾数部分的第一个比特位（在32位情况下为第22位），如果该位为0，则值更接近于(-1)^sign*2^(E-127)，如果该位为1，则值更接近于(-1)^sign*2^(E-126)。

- Blixodus

代码在哪里？ - Phil1970

代码在哪里？ - undefined

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Eric Postpischil · Accepted Answer

C++ 2020之前，没有一种不使用循环、编译器扩展或位移，并且假定某个类型的宽度有限制的方法来将整数四舍五入为2的幂。关于这个问题，Stack Overflow上有几个相关的提问。下面的代码展示了一种使用C++的std::countl_zero以及一个使用GCC的计算前导零内建函数的替代解决方案，该方法适用于任何不超过unsigned long long的整数类型。

#include <cmath>

#if 201703L <= __cplusplus
    #include <bit>
    #include <type_traits>
#endif


template <typename T> static constexpr T round_pow2(T v)
{
    /*  Since one is the smallest power of two, all numbers less than or equal
        to one round to one.
    */
    if (v <= 1) return 1;

    /*  For floating-point, the standard frexp function gives us the fraction
        and exponent, and ldexp applies an exponent.  The fraction is scaled to
        [.5, 1), so, if it is less than or equal to .75, we round down.
    */
    if constexpr (std::is_floating_point_v<T>)
    {
        int exponent = 0;
        T fraction = frexp(v, &exponent);
        return ldexp(.5, exponent + (.75 < fraction));
    }

    /*  Here we handle integer types.  The midpoints for rounding to powers of
        two are at 3*2^n.  That is, the transition between rounding to one
        power of two and another occurs at a number that has the form 3*2^n.
        To find which interval v is in, we can divide it by three and then
        find the next lower (instead of nearest) power of two.  To get the
        desired rounding at the midpoint, we use v-1.  So the general algorithm
        is to round (v-1)/3 down to the nearest power of two, then quadruple
        that.  For example:

            v   v-1   (v-1)/3   rounded down   quadrupled
            11   10     3            2             8
            12   11     3            2             8
            13   12     4            4            16

        Note that (v-1)/3 is not quite right for v=2, as the subtraction of 1
        jumps a full power of two, from 2 to 1.  (v-.01)/3 would work, but we
        want to stick with integer arithmetic.

        For the general case, we want 4 * 2**floor(log2((v-1)/3)).  To include
        v=2, we will use 2 * 2**f((v-1)/3), where f(x) is floor(log2(x))+1 but
        clamped to produce at least zero.

        If the C++ 2020 std::countl_zero function is available, we use that.
        Otherwise, we use the GCC builtin __builtin_clzll.  In either case, the
        function returns the number of leading zero bits, which depends on the
        width of the type rather than the operand value alone.  To calculate
        the power of two, we get the bit count for a fixed value (zero or one)
        as a reference point.
    */
    else
    {
        #if __cpp_lib_bitops
            /*  std::countl_zero is only provided for unsigned types, so
                define UT to be the unsigned type corresponding to T.
            */
            using UT  = std::make_unsigned<T>::type;

            return static_cast<UT>(2) << std::countl_zero<UT>(0) - std::countl_zero<UT>((v-1)/3));
        #else
            /*  Since __builtin_clzll is not defined for zero operands, we need
                to ensure its operand is at least 1.  To do this, we change
                (v-1)/3 to (v-1)/3*2+1.  The doubling increases the power of
                two by one, so we change the reference point from zero to one,
                decreasing the number of bits for it by one.
            */
            return 2ull << __builtin_clzll(1) - __builtin_clzll((v-1)/3*2+1);
        #endif
    }
}