实现整数除法的银行家舍入方式

Question

实现整数除法的银行家舍入方式

cembeddedroundinginteger-divisionbankers-rounding

5

如何用最简单的方法在C语言中实现int divround(int a, int b) {...}函数，其中输出是a/b并使用银行家舍入（四舍六入五成双）？

例如，divround(3,2)和divround(5,2)都返回2。

我正在编写嵌入式代码，所以不能依赖库。我希望这段代码能够适用于ARM和RISC-V，所以不能使用汇编语言。我想模仿NumPy中的np.around(a/b)的行为（执行银行家舍入），这样我就可以直接比较Python和我的嵌入式应用程序的输出测试向量。

- Abarajithan

@klutt 是的，正常浮点数可以准确表示一半。不过，问题本身是关于整数的，所以我不明白你的评论与此有何关联。 - undefined

@MarkRansom 我误解了 - undefined

抱歉，编辑了问题以澄清。需要实现 int divround(int a, int b) {...}。 - undefined

4

@Fe2O3 好像用户们是不同的人，有着自己的判断力。 - undefined

我不同意关闭的理由。 - undefined

显示剩余3条评论

5个回答

1

下面是一个ISO-C99实现的divround()函数，据我所知，在中间计算中没有虚假的有符号整数溢出。它可能包含依赖于有符号整数的二进制补码表示的代码。在提问者所期望的目标架构中，int使用的是二进制补码表示，因此在实践中不应该有限制。

该除法首先计算一个初步的整数商q和相应的余数r。根据r相对于除数b的大小，可能需要对q应用一个大小为1的修正，其中这个修正的符号与数学商的符号相同。通过三个状态位来实现最接近或偶数舍入，这三个状态位直接对应于舍入模式的条件，就像在浮点计算中所熟知的一样：一个舍入位、一个粘位和未舍入结果的最低有效位（截断商）。

圆位和粘位需要一些小心计算。因为 r 可以达到 b-1，所以不能使用 2*r 来检查商的小数部分是否大于或等于一半。相反，我们需要将余数 r 的大小与 b/2 进行比较，同时还要单独考虑被除数 b 的被舍弃的最低有效位。

下面的测试脚手架使用了一个特定于 x86 的参考函数。通过使用 64 个有效位进行扩展精度计算，可以得到一个准确的结果，避免了双重舍入问题：Samuel A. Figueroa. "When is Double Rounding Innocuous?" SIGNUM Newsl., 30(3):21–26, July 1995. 当整数除法失败并引发异常时，通常是因为（1）除数为零，或者（2）被除数为 INT_MIN 而除数为 -1。我的 divround() 函数没有检查这些情况，而测试脚手架则避免了它们。

下面的divround()实现应该在任何常用编译器上进行全优化编译时产生无分支代码。根据我的历史经验，在一个除法运算符/和一个取余运算符%共享同一个除数的情况下，并不一定会在整数除法返回两个结果的体系结构上产生单个idiv操作，因此在需要效率的情况下，我们需要仔细检查生成的汇编代码。

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <limits.h>

int divround (int a, int b) 
{
    int q = a / b;
    int r = a % b;
    int abs_b_half = abs (b / 2);
    int quot_sign = ((a < 0) ^ (b < 0)) ? (-1) : 1;
    int abs_r = abs (r);
    int b_is_odd = b & 1;
    int q_is_odd = q & 1;
    int q_round  = abs_r >= (abs_b_half + b_is_odd);
    int q_sticky = b_is_odd | (abs_r != abs_b_half);
    q = q + quot_sign * (q_round & (q_sticky | q_is_odd));
    return q;
}

/* x86 specific! Computing in extended precision guarantees correct result */
int divround_ref (int a, int b) 
{
    uint16_t origcw;
    uint16_t newcw = 0x37f; // PC = extended precision, RC = to nearest or even
    int r;

    __asm fstcw  [origcw];
    __asm fldcw  [newcw];
    __asm fild   dword ptr [a];
    __asm fild   dword ptr [b];
    __asm fdivp  st(1), st;
    __asm frndint;
    __asm fistp  dword ptr [r];
    __asm fldcw  [origcw];
    
    return r;
}

// George Marsaglia's KISS PRNG, period 2**123. Newsgroup sci.math, 21 Jan 1999
// Bug fix: Greg Rose, "KISS: A Bit Too Simple" http://eprint.iacr.org/2011/007
static uint32_t kiss_z=362436069, kiss_w=521288629;
static uint32_t kiss_jsr=123456789, kiss_jcong=380116160;
#define znew (kiss_z=36969*(kiss_z&65535)+(kiss_z>>16))
#define wnew (kiss_w=18000*(kiss_w&65535)+(kiss_w>>16))
#define MWC  ((znew<<16)+wnew )
#define SHR3 (kiss_jsr^=(kiss_jsr<<13),kiss_jsr^=(kiss_jsr>>17), \
              kiss_jsr^=(kiss_jsr<<5))
#define CONG (kiss_jcong=69069*kiss_jcong+1234567)
#define KISS ((MWC^CONG)+SHR3)

int main (void)
{
    int a, b, res, ref;
    unsigned long long int c = 0;
    unsigned long long int m1 = 0x5555;
    unsigned long long int m2 = 0xaaaa;

    do {
        c++;

        a = (int)KISS;
        if ((c & m1) == m1) a = INT_MIN; // increase likelihood minimum dividend
        do {
            b = (int)KISS;
            if ((c & m2) == m2) b = INT_MIN; // incr. likelihood minimum divisor
        } while ((b == 0) ||                     // division by zero
                 ((a == INT_MIN) && (b == -1))); // integer division overflow

        res = divround (a, b);
        ref = divround_ref (a, b);
        if (res != ref) {
            printf ("mismatch: a=% 11d b=% 11d res=% 11d ref=% 11d %.16f\n", a, b, res, ref, (double)a/(double)b);
            return EXIT_FAILURE;
        }

        if ((c & 0xffffff) == 0) printf ("\r%llx ", c);
    } while (c);
    return EXIT_SUCCESS;
}

- njuffa

1

这是@dbush答案的无分支版本（带有测试用例），可以处理正数或负数a和b。如果只有非负数a和正数b，则可以删除sign处理部分（或者如果负数输入对大多数用途来说很少见，则可以将该部分分支化）。

这与dbush版本具有相同的限制，特别是需要INT_MIN/2 < b < INT_MAX/2（或类似的范围）。在ARM上使用clang或GCC，这比@njuffa的完全通用版本少了一些指令，但仍然需要处理符号以及舍入的指令较多。

#include <stdio.h>

int divround( int a, int b ) {
    printf( "%3d / %2d", a, b );
    int sign = ((a<0)^(b<0))*-2+1; // if 'a' and 'b' opposite signs then -1, else 1

  // This can have signed overflow; making both negative also works, see below
    a = a<0 ? -a : a;    // abs(a)
    b = b<0 ? -b : b;    // abs(b)

    int q = a / b; // quotient (will be +'ve)
    int r = a % b * 2; // remainder ... doubled!!
    return sign * ( q + (r==b)*(q%2) + (r>b) ); // see comments below...
}

int main( void ) {
    typedef int P[2];
    P arr[] = { { 12, 4 }, { 13, 4 }, { 14, 4 }, { 15, 4 } };

    for( P *p = arr; p < arr + sizeof arr/sizeof arr[0]; p++ ) {
        printf( " = %d\n",   divround( +(*p)[0], +(*p)[1] ) );
        printf( " = %d\n",   divround( +(*p)[0], -(*p)[1] ) );
        printf( " = %d\n",   divround( -(*p)[0], +(*p)[1] ) );
        printf( " = %d\n\n", divround( -(*p)[0], -(*p)[1] ) );
    }

    return 0;
}

输出：

 12 /  4 = 3
 12 / -4 = -3
-12 /  4 = -3
-12 / -4 = 3

 13 /  4 = 3
 13 / -4 = -3
-13 /  4 = -3
-13 / -4 = 3

 14 /  4 = 4
 14 / -4 = -4
-14 /  4 = -4
-14 / -4 = 4

 15 /  4 = 4
 15 / -4 = -4
-15 /  4 = -4
-15 / -4 = 4

关于 return sign * ( q + (r==b)*(q%2) + (r>b) ); 的问题：

将计算得到的商乘以 sign，以恢复可能的负数。
'q' 是计算得到的（整数除法）商（将为正数）。
(r==b)*(q%2) 如果2倍余数 r 等于除数，则这正好是0.5的余数。如果 q 是奇数，则将1加到 q。
否则，当2倍余数大于除数时，将1加到 q。

担心尝试将 INT_MIN 改为正值？以下更改可以解决这个问题。不要除以正数，而是除以负数...

    a = a*(a<0) + -a*(a>=0); // if 'a' +'ve, then make -'ve
    b = b*(b<0) + -b*(b>=0); // same for 'b'
    /* ... */
    return sign * ( q + (r==b)*(q%2) + (r<b) ); // Now 'r<b'

An earlier version of this answer did the absolute-value part with the following code:

    a = -a*(a<0) + a*(a>=0); // change sign of 'a' if negative
    b = -b*(b<0) + b*(b>=0); // same for 'b'

But that doesn't compile as well with current GCC or clang for some ISAs. For example, GCC for x86-64 was using two CMOV instructions for each. And clang for ARM was using a mul and multiple other instructions, instead of a simple cmp / rsbmi reg, #0 to conditionally subtract from 0. See it on Godbolt.

Another way to avoid signed overflow UB is to use unsigned math in generating the abs result, like unsigned ua = a<0 ? 0U - a : a;. This relies on the well-defined range-reduction semantics for conversion of negative integer types to unsigned, so is well-defined everywhere, not relying on 2's complement int.

Compilers generally make branchless asm from ternary operators when the expressions in both halves are simple and have no side-effects, but technically only the chosen side of a ternary is evaluated in the abstract machine. (If that's a problem for more complex stuff, assign inputs to temporary variables and select between them with a ternary.)

Compilers will sometime make branchy asm from C source like return q + ( (r==b)*(q%2) + (r>b) );, as shown for the bankers_div_Fe2O3 function in the Godbolt link. (That version doesn't have sign handling; it assumes both are non-negative.)

Handling of sign could be optimized some if you're willing to assume 2's complement and that >> is an arithmetic right shift, as described in comments below.

Get 0 or -1 from (a^b) >> CHAR_BIT * sizeof(a) assuming a 2's complement system and arithmetic right shift.

Alternatives include a^b < 0 ? -1 : 0. Use it with a 2's complement identity bithack like you would for abs: (untested) return mask ^ (x + mask); which is either ~(x-1) or (x+0)

Compilers do see through ((a<0)^(b<0))*-2+1 and actually compile it to ((a^b)>>31) | 1 (for 32-bit int). But if you're assuming 2's complement, it simplifies nicely to just XORing and checking the sign bit of the result to see if they're different.

It saves one instruction to just get 0 or -1 (leave out the OR of 1), but it costs ADD/XOR at the end instead of imul. So same number of total instructions, but no multiply. This is a bigger gain on embedded systems with slow multipliers.

- Fe2O3

1

编译器确实能够理解那个((a<0)^(b<0))*-2+1的混乱代码，并将其编译成((a^b)>>31) | 1（适用于32位整数）。所以这很有趣。我的版本保存了1的或运算结果，但最后需要进行加法和异或运算，而不是乘法。因此，总指令数相同，但没有乘法运算。 - undefined

1

这个想法很巧妙，使用-abs(a)（但不会出现中间UB），所以结果总是合适的。不过，将其写成a >= 0 ? -a : a仍然更易读且更高效。另外，更紧凑的表达“如果（不是两者都）为负”是“如果a和b的符号相反”。这也直接对应你正在使用的异或运算。 - undefined

1

我注意到有一点，答案仍然在除以b，但我认为这似乎是正常的，因为它与a/b的定义域相匹配：对于除b==0和INT_MIN/-1之外的每个输入都是明确定义的。对于这个函数来说，不希望进行任何额外的检查来尝试做a/b在这些未定义情况下不会做的任何事情。它只是用来替代C中截断整数除法的函数。 - undefined

好的，我已经找时间进行了编辑，将之前在评论中提到的建议内容加入进去了。 - undefined

1

@PeterCordes 太棒了！谢谢你... 如果有人不深究这些评论，很可能会认为是“我”写了你的附言（这已经超出了我的能力范围...）感谢你的贡献，但我再次邀请你在回答的关键点上为提供的额外信息接受赞誉。很乐意共享荣誉（如果可能的话）... 再次感谢你。 :-) - undefined

显示剩余13条评论

0

对于0和任意的 a ，以下内容适用：

return (a+(b/2) - (~(b|a/b) &1))/ b

逻辑：

- 进行普通的四舍五入：(a+(b/2))/b - 如果 b 是偶数且 a/b 是偶数，向下取整：(~(b|a/b) & 1)

- Abarajithan

不是所有的a都会出现这种情况；当a和b很大时，会发生有符号溢出，比如INT_MAX / 100，执行INT_MAX + 50会导致有符号溢出的未定义行为。如果您的输入已知为非负数，您可以使用unsigned来避免这种情况。 - undefined

为了提高效率，这个操作进行了两次除法，所以在嵌入式CPU上可能不如分支操作高效，除非常数b是固定的。尤其是在Ice Lake或Zen之外的处理器上，分支操作更便宜，整数除法速度更慢。即使在现代x86处理器上，避免额外的div或idiv指令也是值得的。 - undefined

0

如果结果是非负的（即a和b具有相同的符号，即两者都是非负或两者都是负数），那么通常舍入的除法结果是

  x = (a + b/2) / b;

如果结果是负数，则通常舍入的除法结果为：

  x = (a - b/2) / b;

在四舍五入时，如果一个数恰好是一半，那么它会被向负无穷方向舍入。

这与“银行家舍入”结果相同，除非x是奇数且舍入是从一个恰好一半的数开始的。恰好一半的数可以通过以下方式获得：“a/b=x-½”或等价地，“b * x = a + b/2”（在“非负”情况下）或“b * x = a - b/2”（在“负”情况下），如果发生这种情况，则必须将x减1或加1以获得正确的结果。

请注意，如果b是奇数，并且a和b是整数，则“a/b”不可能有一个恰好为½的小数部分。因此，只有当b是偶数时，中间结果可能需要修正，在这种情况下，上述条件只涉及整数项。

总之，我们有：

int divround(int a, int b)
{
    int x;
    if((a<0) == (b<0))
    {   // a and b have same sign
        x = (a + b/2) / b;
        if((b%2 == 0) && (x%2 != 0) && (b*x == a + b/2))  // b even, x odd and a/b=x-1/2
        {
            x--;
        }
    }
    else
    {   // a and b have different sign
        x = (a - b/2) / b;
        if((b%2 == 0) && (x%2 != 0) && (b*x == a - b/2))  // b even, x odd and a/b=x+1/2
        {
            x++;
        }
    }
    return x;
}

当然，如果假设 a 和 b 都是非负数，那么只需要在 if-else 语句中考虑第一种情况。

一些测试案例：

divround(3, 2) == 2
divround(-3, 2) == -2
divround(3, -2) == -2
divround(-3, -2) == 2
divround(5, 2) == 2
divround(-5, 2) == -2
divround(5, -2) == -2
divround(-5, -2) == 2
divround(5, 1) == 5
divround(5, -1) == -5
divround(5, 3) == 2
divround(5, -3) == -2
divround(0, 2) == 0
divround(0, -2) == 0

注意：此解决方案依赖于 a+b/2 和 a-b/2 不会溢出或下溢，所以不要走得太靠近边缘。

- nielsen

@njuffa 对，我已经添加了一条关于这个的注释。我在编写这个的时候没有考虑溢出的问题，我的主要目的是处理负数（当时没有其他答案处理负数），以及一个对一些人来说可能更容易理解的替代公式。无论如何，感谢您的观察。 - undefined

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- dbush · Accepted Answer

9

假设 a 和 b 都是非负数，并且 b 小于 INT_MAX / 2，以下是一个简单的实现：

int divround(int a, int b)
{
    int q = a / b;
    int r = a % b;

    if (2*r > b) {
        return q + 1;
    } else if (2*r < b) {
        return q;
    } else if (q % 2 == 0) {
        return q;
    } else {
        return q + 1;
    }
}

对于解释：

- 如果除法的余数超过b的一半，就向上取整；如果余数小于等于一半，就向下取整。 - 如果余数恰好是b/2，则检查商是否为偶数，如果是，则向下取整；否则向上取整。

- dbush

2

无分支：int q = a / b; int r = a % b * 2; return q + ( (r==b)*(q%2) + (r>b) ); :-) - undefined

@Fe2O3：你可以将这个作为一个答案发布。在高端CPU上，分支预测错误的代价很高，所以这可能比其他答案更高效。在x86-64上，仍然只有一个idiv指令，同时返回商和余数，没有分支，后面还有一些合理数量的指令。https://godbolt.org/z/oM8EjePvj - 但请注意，当编译你的布尔乘法时，clang会在某些ISA（如ARM和x86-64，但不包括AArch64，因为csel和cinc（条件增量）非常适合这个任务）上重新发明分支。 - undefined

@PeterCordes 在我开始访问SO不久之后，我发表了一个涉及无分支代码的问题的解决方案。（是的，我在回答中有点轻率。）这引起的骚动让我不愿再在这方面高调发言。在这里，dbush甚至对使用更少变量的第二个建议保持沉默...就这样吧... - undefined

@Fe2O3：如果你用dbush的无分支版本代码发布了一个可读的答案，我会点赞的。（尤其是如果你指出了在这种情况下的任何优势，比如它在AArch64上编译得非常高效。）虽然我并不总是同意代码越少越好，有时候更多的代码可以帮助展示步骤，并为每一步单独解释提供空间，可以通过注释或有意义的变量名来解释。q + ( (r==b)*(q%2) + (r>b) ); 这就是一些人所说的“只能写不能读的代码”，写完之后，一年后或者对其他人来说，很难看出为什么它应该是那样。 - undefined

对于非负实现，b < INT_MAX/2 就可以了。对于处理负数的版本，INT_MIN/2 < b < INT_MAX/2。我已经更新了注释。 - undefined

显示剩余2条评论