C语言中允许使用负数作为数组下标吗？

Question

C语言中允许使用负数作为数组下标吗？

162

我刚刚在看一些代码，发现这个人使用arr[-2]来访问arr之前的第二个元素，就像这样：

|a|b|c|d|e|f|g|
       ^------------ arr[0]
         ^---------- arr[1]
   ^---------------- arr[-2]

这样做可以吗？

我知道arr[x]和*(arr+x)是一样的。所以arr[-2]就是*(arr-2)，看起来没问题。你觉得呢？

- bodacydo

9个回答

87

只有当arr是指向数组中第二个或之后的元素的指针时，这才是有效的。否则，这是无效的，因为你将会访问超出数组边界的内存。例如，下面的代码是错误的：

int arr[10];

int x = arr[-2]; // invalid; out of range

但这样也是可以的：

int arr[10];
int* p = &arr[2];

int x = p[-2]; // valid:  accesses arr[0]

然而，使用负下标通常是不寻常的。

- James McNellis

21

@Matt：第一个例子的代码会产生未定义行为。 - James McNellis

7

这是无效的。根据C标准，它明确具有未定义的行为。另一方面，如果int arr[10];是结构体中在其他元素之前的一部分，那么arr[-2]可能会被良好定义，并且你可以通过使用offsetof等方法来确定它是否被定义。 - R.. GitHub STOP HELPING ICE

7

在 K&R 的第5.3节末尾找到了这个内容：“如果确信元素存在，也可以在数组中向后索引；p[-1]、p[-2]等在语法上合法，并且指的是紧随p [0]之前的元素。当然，引用不在数组范围内的对象是非法的。” 不过，您的示例更好地帮助我理解它。谢谢！ - Qiang Xu

4

抱歉打扰了已经过时的讨论，但我喜欢K&R对“不合法”含义的模棱两可。最后一句话让人感觉越界访问会引发编译错误。那本书对初学者来说是有害无益的。 - Martin

4

公平地说，这本书是在我们行业历史的早期撰写的，当时合理地期望“非法”被解释为“不要做这个，你没有权限”，而不是“你将被阻止执行这个操作”。 - mtraceur

显示剩余8条评论

16

听起来不错。然而你真正需要它的情况很少见。

- Matt Joiner

13

这并不是非常罕见的 - 在例如邻域操作中非常有用，特别是在图像处理方面。 - Paul R

我只需要使用这个，因为我正在创建一个带有栈和堆[结构/设计]的内存池。栈向更高的内存地址增长，堆向更低的内存地址增长。在中间相遇。 - KANJICODER

9

可能的情况是arr指向数组的中间，因此使得arr[-2]指向原始数组中的某个元素而不会越界。

- Igor Zevaka

7

我不确定这个信息的可靠性，但我刚刚读到了关于64位系统（LP64）上负数组索引的以下注意事项：http://www.devx.com/tips/Tip/41349

作者似乎在说，使用64位寻址的32位int数组索引可能会导致错误的地址计算，除非将数组索引显式提升为64位（例如通过ptrdiff_t转换）。我实际上曾经看到过gcc 4.1.0的PowerPC版本中出现了这种错误，但我不知道它是编译器错误（即应该按照C99标准工作）还是正确的行为（即需要将索引转换为64位以获得正确的行为）？

- Paul R

3

这听起来像是编译器的一个错误。 - tbleher

4

我知道这个问题已经有答案了，但我忍不住想要分享一下我的解释。

我记得编译器设计原理中有这样一个例子：假设 a 是一个 int 数组，int 的大小是 2，并且 a 的基地址是 1000。

那么 a[5] 如何工作呢？

Base Address of your Array a + (index of array *size of(data type for array a))
Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010

这个解释也是C语言中负数索引工作的原因；例如，如果我访问a[-5]，它将给我：

Base Address of your Array a + (index of array *size of(data type for array a))
Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990

这将返回在位置990的对象。因此，按照这种逻辑，我们可以在C中访问数组中的负索引。

- Ajinkya Patil

2

关于为什么有人想要使用负数索引，我在两个情境下使用了它们：

Having a table of combinatorial numbers that tells you comb[1][-1] = 0; you can always check indexes before accessing the table, but this way the code looks cleaner and executes faster.
Putting a centinel at the beginning of a table. For instance, you want to use something like
```
 while (x < a[i]) i--;
```

但是你还应该检查i是否为正数。
解决方案：使a[-1]等于-DBLE_MAX，这样x<a[-1]将始终为false。

- Santiago Egido Arteaga

1

#include <stdio.h>

int main() // negative index
{ 
    int i = 1, a[5] = {10, 20, 30, 40, 50};
    int* mid = &a[5]; //legal;address,not element there
    for(; i < 6; ++i)
    printf(" mid[ %d ] = %d;", -i, mid[-i]);
}

- Rathinavelu Muthaliar

2

虽然这段代码可能回答了问题，但是提供关于为什么和/或如何回答问题的额外上下文可以提高其长期价值。 - β.εηοιτ.βε

Python和Groovy都支持这个功能。一个简单的用例是，可以在不知道数组大小的情况下访问数组的最后一个元素，在许多项目情况下这是一个非常真实的需求。此外，许多DSL也受益于此。 - Rathinavelu Muthaliar

-3

我想分享一个例子：

GNU C++库basic_string.h

[注意：正如有人指出这是一个“C++”示例，它可能不适合“C”的主题。我编写了一段“C”代码，它具有与示例相同的概念。至少，GNU gcc编译器没有抱怨任何东西。]

它使用[-1]将指针从用户字符串移回到管理信息块。因为它只分配一次内存并具有足够的空间。

说： " * 这种方法的巨大优势在于，字符串对象仅需要一次分配。 * 所有的丑陋都被限制在单个%pair的内联函数中，每个函数都编译为 * 单个@a add指令：_Rep::_M_data()和string::_M_rep()； * 以及获取原始字节块并构造前面的_Rep对象的分配函数。 "

源代码： https://gcc.gnu.org/onlinedocs/gcc-10.3.0/libstdc++/api/a00332_source.html

   struct _Rep_base
   {
     size_type               _M_length;
     size_type               _M_capacity;
     _Atomic_word            _M_refcount;
   };

   struct _Rep : _Rep_base
   {
      ...
   }

  _Rep*
   _M_rep() const _GLIBCXX_NOEXCEPT
   { return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }

它解释了：

*  A string looks like this:
*
*  @code
*                                        [_Rep]
*                                        _M_length
*   [basic_string<char_type>]            _M_capacity
*   _M_dataplus                          _M_refcount
*   _M_p ---------------->               unnamed array of char_type
*  @endcode
*
*  Where the _M_p points to the first character in the string, and
*  you cast it to a pointer-to-_Rep and subtract 1 to get a
*  pointer to the header.
*
*  This approach has the enormous advantage that a string object
*  requires only one allocation.  All the ugliness is confined
*  within a single %pair of inline functions, which each compile to
*  a single @a add instruction: _Rep::_M_data(), and
*  string::_M_rep(); and the allocation function which gets a
*  block of raw bytes and with room enough and constructs a _Rep
*  object at the front.
*
*  The reason you want _M_data pointing to the character %array and
*  not the _Rep is so that the debugger can see the string
*  contents. (Probably we should add a non-inline member to get
*  the _Rep for the debugger to use, so users can check the actual
*  string length.)
*
*  Note that the _Rep object is a POD so that you can have a
*  static <em>empty string</em> _Rep object already @a constructed before
*  static constructors have run.  The reference-count encoding is
*  chosen so that a 0 indicates one reference, so you never try to
*  destroy the empty-string _Rep object.
*
*  All but the last paragraph is considered pretty conventional
*  for a C++ string implementation.

// 在之前使用的概念基础上，编写一个示例 C 代码

#include "stdio.h"
#include "stdlib.h"
#include "string.h"

typedef struct HEAD {
    int f1;
    int f2;
}S_HEAD;

int main(int argc, char* argv[]) {
    int sz = sizeof(S_HEAD) + 20;

    S_HEAD* ha = (S_HEAD*)malloc(sz);
    if (ha == NULL)
      return -1;

    printf("&ha=0x%x\n", ha);

    memset(ha, 0, sz);

    ha[0].f1 = 100;
    ha[0].f2 = 200;

    // move to user data, can be converted to any type
    ha++;
    printf("&ha=0x%x\n", ha);

    *(int*)ha = 399;

    printf("head.f1=%i head.f2=%i user data=%i\n", ha[-1].f1, ha[-1].f2, *(int*)ha);

    --ha;
    printf("&ha=0x%x\n", ha);

    free(ha);

    return 0;
}



$ gcc c1.c -o c1.o -w
(no warning)
$ ./c1.o 
&ha=0x13ec010
&ha=0x13ec018
head.f1=100 head.f2=200 user data=399
&ha=0x13ec010

这个库的作者使用它。希望它能对你有所帮助。

- Jian Wang

1

问题标签适用于C而不是C ++，这个例子与主题完全无关。请先阅读标签。 - Haseeb Mir

我认为在C和C++中，[-1]数组索引是常见的。尽管如此，我写了一个C语言示例代码。至少，GCC没有抱怨使用[-1]来索引数组。 - Jian Wang

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Matthew Flaschen · Accepted Answer

219

是的，从C99 §6.5.2.1/2可以得知：

下标运算符[]的定义是E1[E2]等同于(*((E1)+(E2)))。

这并没有什么神奇的地方。这是一种一对一的等价关系。当解引用指针(*)时，像往常一样，您需要确保它指向的是一个有效的地址。

- Matthew Flaschen

2

请注意，您不必取消引用指针即可获得UB。仅计算somearray-2是未定义的，除非结果在从somearray的开头到其结尾1个位置之间的范围内。 - RBerteig

45

在早期的书籍中，[] 被称为指针算术的“语法糖”。混淆初学者的最喜欢方法是写成 1[arr]（而不是 arr[1]），看着他们猜测这意味着什么。 - Dummy00001

5

当64位系统(LP64)上出现32位int类型的负数索引时，是否应该在地址计算之前将索引提升为64位有符号整型？ - Paul R

4

根据§6.5.6/8 (加法运算符)，“当对指针进行整数类型的加减运算时，结果的类型是指针操作数的类型。如果指针操作数指向数组对象的元素，并且该数组足够大，则结果指向离原始元素有偏移量的元素，使得结果和原始数组元素的下标之差等于整数表达式。”因此，我认为它会被提升， ((E1)+(E2)) 将是一个（64位）指针，具有预期的值。 - Matthew Flaschen

@Matthew：谢谢你，听起来它应该像人们合理地期望的那样工作。 - Paul R

1

一个好奇的侧面说明：由于下标运算符[]被定义为E1[E2]与(*((E1)+(E2)))相同（请参见Matthew Flaschen的答案），因此编写2[arr]而不是arr[2]实际上是有效的C代码。尽管我承认这会故意使代码变得晦涩难懂。 - Christian Borgelt