什么是以空字符结尾的字符串？

Question

什么是以空字符结尾的字符串？

c++null-terminated

43

它与std::string有何不同之处？

- lhj7362

8

“usual” string 是什么？ - Mehrdad Afshari

2

std::string - lhj7362

7个回答

53

一个以空字符结尾的字符串是一系列连续的字符，其中最后一个字符具有二进制位模式为零的值。我不确定你所说的“usual string”是什么意思，但如果你指的是std::string，那么std::string在C++11之前不需要是连续的，并且不需要有一个终止符。此外，std::string的字符串数据始终由包含它的std::string对象分配和管理；对于以空字符结尾的字符串，没有这样的容器，通常使用裸指针引用和管理这些字符串。

所有这些都应该在任何合格的C++教材中得到涵盖 - 我建议获取Accelerated C++，其中一个最好的教材之一。

- anon

5

在C语言中，一个字符字符串的长度基本上由其中的空字节（即0）决定。 - Costique

3

我不知道任何平台会有这种区别。请注意，使用 null（或者可能更正式的 NUL）作为字符串终止符与 NULL 指针是不同的。 - anon

如果空终止符不存在，c_str()保证会将其附加。请参考http://www.cplusplus.com/reference/string/string/c_str/。 - Reunanen

3.9.1:7 暗示了两者：“整数类型的表示应使用纯二进制计数系统定义值”，并在脚注中进一步定义。所以实际上我不应该将它们分开陈述。我还发现在 3.9.1:1 中，“对于字符类型，对象表示中的所有位都参与值表示”，这就解决了问题。我错误地认为这只适用于无符号字符，因此是“较窄的”，但它也适用于有符号字符。 - Steve Jessop

Steve，谢谢你，我纠正了我的错误，我错过了3.9.1/7。但我不认为3.9.1/1是相关的，"对象表示中所有位都参与值表示"只意味着没有填充位，它并不能说明实际的位模式。无论如何，现在答案绝对正确，我会点赞的。:)为了这个长时间的讨论我表示歉意。 - avakar

显示剩余5条评论

16

表示字符串的两种主要方法：

1）一个字符序列，以ASCII空（nul）字符0结尾。通过查找终止符可以确定其长度。这称为以null结尾的字符串，有时也称为nul-terminated。

2）一个字符序列，加上一个单独的字段（一个整数长度或一个指向字符串结尾的指针），告诉您它的长度。

我不确定"usual string"是什么意思，但经常发生的情况是，在谈论特定语言时，单词"string"被用来表示该语言的标准表示形式。所以在Java中，java.lang.String是一种类型2的字符串，这就是"string"的意思。在C中，"string"可能指的是类型1字符串。标准非常详细，以便精确，但人们总是想省略"显而易见"的部分。

不幸的是，在C++中，这两种类型都是标准的。std::string是一种类型2的字符串[*]，但从C继承的标准库函数操作类型1字符串。

[*] 实际上，std::string通常实现为一个字符数组，带有单独的长度字段和nul终止符。这样就可以实现 c_str() 函数而无需复制或重新分配字符串数据。我不记得是否可以在不存储长度字段的情况下实现std::string：问题是标准需要什么复杂性保证。对于容器而言，建议 size() 为O（1），但实际上并非必须如此。因此，即使它是合法的，一个只使用nul终止符的std::string实现也将是出人意料的。

- Steve Jessop

9

'\0'

空字符是一个ASCII编码为0的字符，也被称为null终止符、null字符或NUL。在C语言中，它作为一个保留字符用于表示字符串的结尾。许多标准函数（如strcpy、strlen、strcmp等）都依赖于它。否则，如果没有NUL，就必须使用其他方式来表示字符串的结尾：

这使得字符串可以任意长度，只需要一个字节的开销；而存储计数需要要么限制字符串长度为255，要么需要超过一个字节的开销。

来自维基百科

C++ 中的 std::string 遵循这种约定，它的数据由一个名为 _Rep 的结构体表示：

// _Rep: string representation
      //   Invariants:
      //   1. String really contains _M_length + 1 characters: due to 21.3.4
      //      must be kept null-terminated.
      //   2. _M_capacity >= _M_length
      //      Allocated memory is always (_M_capacity + 1) * sizeof(_CharT).
      //   3. _M_refcount has three states:
      //      -1: leaked, one reference, no ref-copies allowed, non-const.
      //       0: one reference, non-const.
      //     n>0: n + 1 references, operations require a lock, const.
      //   4. All fields==0 is an empty string, given the extra storage
      //      beyond-the-end for a null terminator; thus, the shared
      //      empty string representation needs no constructor.

      struct _Rep_base
      {
    size_type       _M_length;
    size_type       _M_capacity;
    _Atomic_word        _M_refcount;
      };

struct _Rep : _Rep_base
      {
    // Types:
    typedef typename _Alloc::template rebind<char>::other _Raw_bytes_alloc;

    // (Public) Data members:

    // The maximum number of individual char_type elements of an
    // individual string is determined by _S_max_size. This is the
    // value that will be returned by max_size().  (Whereas npos
    // is the maximum number of bytes the allocator can allocate.)
    // If one was to divvy up the theoretical largest size string,
    // with a terminating character and m _CharT elements, it'd
    // look like this:
    // npos = sizeof(_Rep) + (m * sizeof(_CharT)) + sizeof(_CharT)
    // Solving for m:
    // m = ((npos - sizeof(_Rep))/sizeof(CharT)) - 1
    // In addition, this implementation quarters this amount.
    static const size_type  _S_max_size;
    static const _CharT _S_terminal;

    // The following storage is init'd to 0 by the linker, resulting
        // (carefully) in an empty string with one reference.
        static size_type _S_empty_rep_storage[];

        static _Rep&
        _S_empty_rep()
        { 
      // NB: Mild hack to avoid strict-aliasing warnings.  Note that
      // _S_empty_rep_storage is never modified and the punning should
      // be reasonably safe in this case.
      void* __p = reinterpret_cast<void*>(&_S_empty_rep_storage);
      return *reinterpret_cast<_Rep*>(__p);
    }

        bool
    _M_is_leaked() const
        { return this->_M_refcount < 0; }

        bool
    _M_is_shared() const
        { return this->_M_refcount > 0; }

        void
    _M_set_leaked()
        { this->_M_refcount = -1; }

        void
    _M_set_sharable()
        { this->_M_refcount = 0; }

    void
    _M_set_length_and_sharable(size_type __n)
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
        {
          this->_M_set_sharable();  // One reference.
          this->_M_length = __n;
          traits_type::assign(this->_M_refdata()[__n], _S_terminal);
          // grrr. (per 21.3.4)
          // You cannot leave those LWG people alone for a second.
        }
    }

    _CharT*
    _M_refdata() throw()
    { return reinterpret_cast<_CharT*>(this + 1); }

    _CharT*
    _M_grab(const _Alloc& __alloc1, const _Alloc& __alloc2)
    {
      return (!_M_is_leaked() && __alloc1 == __alloc2)
              ? _M_refcopy() : _M_clone(__alloc1);
    }

    // Create & Destroy
    static _Rep*
    _S_create(size_type, size_type, const _Alloc&);

    void
    _M_dispose(const _Alloc& __a)
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
        if (__gnu_cxx::__exchange_and_add_dispatch(&this->_M_refcount,
                               -1) <= 0)
          _M_destroy(__a);
    }  // XXX MT

    void
    _M_destroy(const _Alloc&) throw();

    _CharT*
    _M_refcopy() throw()
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
            __gnu_cxx::__atomic_add_dispatch(&this->_M_refcount, 1);
      return _M_refdata();
    }  // XXX MT

    _CharT*
    _M_clone(const _Alloc&, size_type __res = 0);
      };

可以使用以下方法获取实际数据：

_Rep* _M_rep() const
      { return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }

这段代码片段来自于文件 basic_string.h，在我的机器上它位于 usr/include/c++/4.4/bits/basic_string.h。

所以，可以看出两者之间的差异是显著的。

- 4pie0

2

一个以null结尾的字符串是指，字符串的结尾通过出现null字符（所有位都为零）来定义。

“其他字符串”例如必须存储它们自己的长度。

- Dario

0

空字符结尾的字符串是C语言中的一种本地字符串格式。例如，字符串字面量就是以空字符结尾实现的。因此，很多代码（首先是C运行时库）都假定字符串以空字符结尾。

- Seva Alekseyev

-1

一个以空值（0x0）结尾的字符串（C字符串）是由char数组组成，其中数组的最后一个元素是一个空值。std::string本质上是一个向量，它是一个自动调整大小的容器，用于存放值。它不需要空终止符，因为它必须跟踪大小，以知道何时需要调整大小。

说实话，我更喜欢c-strings而不是std strings，它们在基本库中拥有更多的应用程序，这些应用程序具有最小的代码和分配，并且更难使用。

- phyrrus9

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ricket · Accepted Answer

"字符串"实际上只是由char数组组成的; 以空字符'\0'结尾的字符串是其中一种形式(不一定是数组的结尾)。所有在代码中的字符串(用双引号""括起来的)都会被编译器自动添加空字符'\0' 。

举个例子，"hi"和{'h', 'i', '\0'}是一样的。