如何正确地找到两个值的平均值？

Question

如何正确地找到两个值的平均值？

cundefined-behaviorinteger-overflow

21

我最近学到在C语言中，整数溢出是一种未定义行为（副问题-在C++中也是吗？）

在C编程中，通常需要找到两个值a和b的平均值。然而，使用(a+b)/2可能会导致溢出和未定义行为。

那么我的问题是 - 在C中找到两个值a和b的平均值的正确方法是什么？

- bodacydo

10

扩展这个表达式。是的，在C和C++中，有符号整数溢出都是未定义行为（UB）（虽然我不确定C是否也是如此，但根据C++标准，无符号算术不会溢出）。 - chris

7

例如，a/2 + b/2 不会导致整数溢出。 - BlackDwarf

4

如果符号相反，(a + b)/2 可以避免溢出。如果符号相同，则 a + (b - a) / 2 是安全的。 - Jonathan Leffler

10

@DeepBlackDwarf: 但是(1+3)/2=2，而1/2+3/2=1。 - Jonathan Leffler

9

除非您希望得到像1和3的平均值为1这样的结果，否则应避免使用"a/2 + b/2"。您可能可以尝试一些花哨的方法，例如"a/2 + b/2 + (a%2 + b%2)/2"，我认为您会得到正确的答案，但是... - Jonathan Leffler

显示剩余18条评论

8个回答

10

(a >> 1) + (b >> 1) + (((a & 1) + (b & 1)) >> 1)

c语言中的移位运算符(x >> i)相当于将x除以2的i次幂。因此，表达式(a >> 1) + (b >> 1)等同于a/2 + b/2。然而需要加上数字截断部分的平均值。这个值可以通过掩码(a & 1)、加法((a & 1) + (b & 1))和除法(((a & 1) + (b & 1)) >> 1)得到。平均值为(a >> 1) + (b >> 1) + (((a & 1) + (b & 1)) >> 1)。

注意：使用移位运算符>>和按位与&代替除法/和取模%的原因是效率问题。

- user1837841

3

注意：使用位运算符 >> 和 & 而不是 / 和 % 作为除法和取模运算符的原因是效率问题。你的编译器已经很聪明了。请参阅 Is multiplication and division using shift operators in C actually faster?因此，请在某些情况下使用 / 和 %，以提高代码的可读性。 - Frank Kusters

我尝试编写我的代码，使其按照我期望的方式执行，即使在我知道所有优化都是自动完成的情况下。 - user1837841

我编辑花费了太长时间：我尝试编写我的代码，使其按照我打算执行的方式运行，即使在知道所有优化都是自动完成的情况下也是如此。（尽管可能存在更复杂的情况，例如扩展环形缓冲区。） - user1837841

3

将负数向右移位的结果是由实现定义的。使用除法会更好，因为它保证得到的结果是负数或0。我也不确定 a & 1 是否保证等于 abs(a % 2)。 - dyp

3

使用 a=-2 和 b=1，这个解法得出的结果是 -1。但在 C（c99/c11）中，(-2 + 1) / 2 的结果是 0。这个解法和 Vlad 的解法有同样的问题；c99/c11 的除法会向 0 截断，而不是向 -inf 截断。 - ouah

4

一个简单的方法是以下内容：

int c = a / 2 + ( b + a % 2 ) / 2;

例如，a和b可以表示为：

a = 2 * n + r1;
b = 2 * m + r2;

然后。

( a + b ) / 2 => ( 2 * n + r1 + 2 * m + r2 ) / 2 => 2 * n / 2 + ( b + r1 ) / 2

而最后一个表达式会给你

=> a / 2 + ( b + a % 2 ) / 2

更正确的表达应该是以下内容。

int c = a / 2 + b / 2 + ( a % 2 + b % 2 ) / 2;

例如，如果我们有以下内容：

int a = INT_MAX;
int b = INT_MAX;

然后，c被计算为：

int c = a / 2 + b / 2 + ( a % 2 + b % 2 ) / 2;

会给出c == INT_MAX

编辑：发现计算机运算符的效果与数学运算符的效果之间存在有趣的差异。例如，根据数学，-1可以表示为

-1 = -1 * 2 + 1

根据这个公式来计算。

a = 2 * n + r1

2 * n 应当是一个小于等于 a 的整数。

因此，比它小的整数是减去1的结果，例如：若 2 * n = 5，那么比它小的整数为 4 = 5 - 1。

我认为我的公式是通用的，对于奇数负数，需要考虑比该奇数负数更小的偶数负数。

正确的公式似乎应该是：

int c = ( a < 0 ? a & ~1 : a ) / 2 + 
        ( b < 0 ? b & ~1 : b ) / 2 + 
        ( ( a & 1 ) + ( b & 1 ) ) / 2;

需要注意的是，从数学角度来看，-1 和 -2 的平均值应该等于 -2，而这个公式给出了正确的结果。:)

- Vlad from Moscow

1

@@bodacydo 当我将a表示为2 * n + r1时，r1等于a % 2。r1只能是0或1。 - Vlad from Moscow

2

那么，如果 a == b == INT_MAX 呢？INT_MAX % 2 通常是非零的。 "简单方法" 仍会产生溢出。 - dyp

@dyp 请重新阅读我的评论和原始帖子。它们已经足够清晰明了。 - Vlad from Moscow

1

正如我所说，“更正确的表达式”并不受此问题影响。然而，第一个表达式：int c = a / 2 + ( b + a % 2 ) / 2; 受到此问题的影响。-- 我不明白为什么你要发布那个第一种方法。 - dyp

2

使用 a = 2 和 b = -1，你的方法得到的结果是 1，但是 (2 - 1) / 2 的结果是 0。 - ouah

显示剩余19条评论

4

如果您担心溢出问题，可以将值转换为更大的类型进行数学计算，然后进行边界检查。

- user3100381

6

夺走了其中的乐趣。 - ryyker

1

如果没有“更大的类型”呢？ - Thanatos

@Thanatos 如果是这样，你已经有了无符号整数，可以使用 (a >> 1) + (b >> 1) + (((a & 1) + (b & 1)) >> 1)。 - ST3

同意在可用的情况下将类型转换为更宽的类型是最好的。然而，我认为没有必要进行“边界检查”。 - chux - Reinstate Monica

3

这是来自在单个指令周期内计算两个整数向零舍入的平均值：

(a >> 1) + (b >> 1) + (a & b & 0x1)

你必须考虑到：

it's implementation defined whether right shifting a negative integer shifts zeros or ones into the high order bits. Many CPUs often have two different instructions: an arithmetic shift right (preserves the sign bit) and a logical shift right (doesn't preserve the sign bit). The compiler is allowed to choose either (most compilers choose an arithmetic shift instruction).

ISO/IEC 9899:2011 §6.5.7 Bitwise shift operators

¶5 The result of E1 >> E2is E1 right-shifted E2 bit positions. [CUT] If E1 has a signed type and a negative value, the resulting value is implementation-defined.

Changing the expression to:
```
a / 2 + b / 2 + (a & b & 0x1)
```
isn't a solution since logical right shifts are equivalent to division by a power of 2 only for positive or unsigned numbers.
also (a & b & 0x1) isn't well defined. This term should be non-zero when both a and b are odd. But it fails with one's complement representation and ISO C, section 6.2.6.2/2, states that an implementation can choose one of three different representations for integral data types:
- two's complement
- one's complement
- sign/magnitude
(usually the two's complement far outweigh the others).

- manlio

注意，使用典型的二进制补码机器上，(a >> 1) + (b >> 1) + (a & b & 0x1) 的答案与 (a+b)/2 的答案有时会相差1，大约四分之一的时间。换句话说，答案有时会向远离0的方向舍入。第二种方法 a / 2 + b / 2 + (a & b & 0x1) 经常失败，例如 -3，-3。 - chux - Reinstate Monica

@chux 谢谢您的观察，我已经修改了答案的一部分。 - manlio

尽管ISO C允许实现使用任何这些整数格式，但它还要求支持一种数据类型（至少64位的无符号长长整型），除非其本机整数类型为65位或更长，否则在任何不能有效处理二进制补码数学的平台上都是不切实际的。据我所知，迄今为止，从未有过也永远不会有任何C99或语言的任何未来版本的生产实现不使用二进制补码数学。 - supercat

@chux: 我认为 (a>>1)+(b>>1)+(a & b & 1) 会有一致的向下舍入 (而不是向最小值舍入)。用这种方式来描述答案似乎比说它的行为有 1/4 的时间差异更有用。虽然有些应用程序可能希望采用向最小值舍入的行为，但对许多应用程序而言，如果没有溢出，avg(x,y)+z == avg(x+z,y+z) 更重要。如果 avg(-1,0)==-1 和 avg(0,1)==0，那么这个等式就成立。但是，如果 avg(-1,0) 和 avg(0,1) 都是零，这个等式将不成立。 - supercat

2

平均两个整数 int 在整个范围 [INT_MIN...INT_MAX] 上的最简单（通常也是最快）方法是采用更宽的整数类型。（由 @user3100381 建议。）我们称之为 int2x。

int average_int(int a, int b) {
  return ((int2x) a + b)/2;
}

当然，这需要存在更广泛的类型 - 让我们看看不需要更广泛类型的解决方案。

挑战：

Q：当一个整数是奇数，另一个是偶数时，应该向哪边舍入？

A：按照上面的average_int()函数并朝着0舍入（截断）。

Q：代码可以使用%吗？

A：在C99之前的代码中，a % 2的结果可能会在a<0时产生不同的结果。因此，让我们不使用%。

Q：int需要具有大致对称的正负数范围吗？

A：自C99以来，负数的数量与正数的数量相同（或多1）。让我们尝试不要求这个。

解决方案：

执行测试以确定是否可能发生溢出。如果没有，简单地使用(a + b) / 2。否则，将差的一半（签名与答案相同）添加到较小的值中。

以下代码提供了与average_int()相同的答案，而不需要使用更广泛的整数类型。它防止int溢出，并且不需要 INT_MIN + INT_MAX 为0或-1。它不依赖于编码方式是2的补码，1的补码还是符号-数量。

int avgC2(int a, int b) {
  if (a >= 0) {
    if (b > (INT_MAX - a)) {
      // (a+b) > INT_MAX
      if (a >= b) {
        return (a - b) / 2 + b;
      } else {
        return (b - a) / 2 + a;
      }
    }
  } else {
    if (b < (INT_MIN - a)) {
      // (a+b) < INT_MIN
      if (a <= b) {
        return (a - b) / 2 + b;
      } else {
        return (b - a) / 2 + a;
      }
    }
  }
  return (a + b) / 2;
}

最多，任意一对整数之间会发生3个if()。

- chux - Reinstate Monica

1

如果您只需要处理无符号整数类型（并且能够使用二进制思维），则可以将加法分解为“数字”和“进位”。我们可以将无限精度的a+b写成(a^b) + ((a&b)<<1))，因此(a+b)/2 简单地等于 ((a^b)>>1) + (a&b)。这个表达式适用于a和b的常见类型，因此您可以在代码中使用它。

unsigned semisum(unsigned a, unsigned b)
{
    return ((a^b)>>1) + (a&b);
}

- Toby Speight

-1

如果只有两个元素需要避免溢出，最简单的答案是：

(a/2) + (b/2) = average

如果需要更多元素，您可以使用：

(a/x) + (b/x) + (c/x) + (d/x) ..... = average //x = amount of elements

从数学角度来看，如果原始值中没有一个达到了溢出，那么在计算时就不会产生溢出，因为你并没有真正地将它们加在一起，而是在相加之前进行了除法运算。因此，在计算过程中进行的任何操作的结果（包括结果本身）都不会比最大的初始元素更大（假设你只使用实数与之协作）。

所以，请按照以下步骤操作：

确定'C'中元素的数量，我们称之为total。
声明一个变量来存储平均值，我们称之为average。
声明一个变量来存储余数，我们称之为remainder。
遍历它们并：
- 将当前元素除以总数total。
- 将结果加到平均值average中。
- 将除法后的余数相加，存储在remainder中。
将余数也除以总数并加到平均值average中。
根据需要对平均值进行处理。

这将给出一个最大误差为1（十进制数字系统[基数10]）的答案。我还不会C ++，所以我只能给你提供C＃的示例。

C＃伪代码（仅提供思路）：

int[] C = new int[20];            //The array of elements.
int total = C.Length;             //The total amount of elements.
int average = 0;                  //The variable to contain the result.
int remainder = 0;                //The variable to contain all the smaller bits.
foreach (int element in C)        //Iteration
{
    int temp = (element / total); //Divide the current element by the total.
    average = average + temp;     //Add the result to the average.
    temp = (temp % total);        //Get the remainder (number that was not divided)
    remainder = remainder + temp; //Add remainder to the other remainders.
}
average = average + (remainder / total); // Adds the divided remainders to the average.

压缩的C#示例：

int[] C = new int[20];               //The array of elements.
int total = C.Length;                //The total amount of elements.
int average = 0;                     //The variable to contain the result.
int remainder = 0;                   //The variable to contain all the smaller bits.
foreach (int element in C)           //Iteration
{
    average += (element / total);    //Add the current element divided by the total to the average.
    remainder += ( element % total); //Add the remainders together.
}
average += (remainder / total); //Adds the divided remainders to the total.

- EternalWulf

2

但是（请参见Jonathan Leffler的评论）：(1 + 3)/2 = 2 而 1/2 + 3/2 = 1。 - manlio

当所有的n个int都满足value%n == n-1时，这种方法的一个弱点就会出现。在这种情况下，remainder变成了大约n*n。因此，当n > sqrt(INT_MAX) > sqrt(32767)时，这种方法会遇到麻烦。为了避免这种限制，代码可以使用remainder = remainder + temp; average += remainder/total; remainder %= total. - 但是当n很小时，这似乎是昂贵且不必要的。 - chux - Reinstate Monica

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Doug Currie · Accepted Answer

在安全编码的帮助下

if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
    ((si_b < 0) && (si_a < (INT_MIN - si_b))))
{
  /* will overflow, so use difference method */
  return si_b + (si_a - si_b) / 2;
} 
else
{
 /* the addition will not overflow */
  return (si_a + si_b) / 2;
}

补充说明

感谢 @chux 指出舍入问题。这是一个经过测试的正确舍入版本...

int avgnoov (int si_a, int si_b)
{
    if ((si_b > 0) && (si_a > (INT_MAX - si_b)))
    {
      /* will overflow, so use difference method */
      /* both si_a and si_b > 0; 
          we want difference also > 0
          so rounding works correctly */
      if (si_a >= si_b)
        return si_b + (si_a - si_b) / 2;
      else
        return si_a + (si_b - si_a) / 2;
    } 
    else if ((si_b < 0) && (si_a < (INT_MIN - si_b)))
    {
      /* will overflow, so use difference method */
      /* both si_a and si_b < 0; 
          we want difference also < 0
          so rounding works correctly */
      if (si_a <= si_b)
        return si_b + (si_a - si_b) / 2;
      else
        return si_a + (si_b - si_a) / 2;
    }
    else
    {
     /* the addition will not overflow */
      return (si_a + si_b) / 2;
    }
}