在他的回归基础文章中,Joel Spolsky描述了使用strcat
进行字符串拼接的低效问题,称之为Shlemiel the painter's algorithm(读一下这篇文章,非常好)。作为低效代码的例子,他给出了以下运行时间为O(n2)的示例:
char bigString[1000]; /* I never know how much to allocate... */
bigString[0] = '\0';
strcat(bigString,"John, ");
strcat(bigString,"Paul, ");
strcat(bigString,"George, ");
strcat(bigString,"Joel ");
如果我们已经遍历了第二个字符串,那么第一次遍历第一个字符串并不是真正的问题;因为我们已经得到了第二个字符串的遍历结果,所以单个 strcat
的运行时间与结果长度成线性关系。然而,多个 strcat
就会有问题,因为我们一遍又一遍地遍历之前连接好的结果。他提供了以下替代方案:
How do we fix this? A few smart C programmers implemented their own
mystrcat
as follows:
char* mystrcat( char* dest, char* src )
{
while (*dest) dest++;
while (*dest++ = *src++);
return --dest;
}
What have we done here? At very little extra cost we're returning a
pointer to the end of the new, longer string. That way the code that
calls this function can decide to append further without rescanning
the string:
char bigString[1000]; /* I never know how much to allocate... */
char *p = bigString;
bigString[0] = '\0';
p = mystrcat(p,"John, ");
p = mystrcat(p,"Paul, ");
p = mystrcat(p,"George, ");
p = mystrcat(p,"Joel ");
This is, of course, linear in performance, not n-squared, so it
doesn't suffer from degradation when you have a lot of stuff to
concatenate.
当然,如果您想使用标准的C字符串,这就是您可以做的。您所描述的另一种选择是缓存字符串的长度并使用特殊的连接函数(例如,使用略有不同的参数调用
strcat
),这有点像Pascal字符串的变体,Joel也提到了这点:
The designers of Pascal were aware of this problem and "fixed" it by
storing a byte count in the first byte of the string. These are called
Pascal Strings. They can contain zeros and are not null terminated.
Because a byte can only store numbers between 0 and 255, Pascal
strings are limited to 255 bytes in length, but because they are not
null terminated they occupy the same amount of memory as ASCIZ
strings. The great thing about Pascal strings is that you never have
to have a loop just to figure out the length of your string. Finding
the length of a string in Pascal is one assembly instruction instead
of a whole loop. It is monumentally faster.
…
For a long time, if you wanted to put a Pascal string literal in your
C code, you had to write:
char* str = "\006Hello!";
Yep, you had to count the bytes by hand, yourself, and hardcode it
into the first byte of your string. Lazy programmers would do this,
and have slow programs:
char* str = "*Hello!";
str[0] = strlen(str) - 1;
snprintf(buf, len, "%s%s%s", str1, str2, str3)
。 - Grijesh Chauhanstrcat()
和维护字符串长度的方法不同,前者需要遍历目标字符串(O(n)),以寻找字符串结尾,而后者可以在常数时间内完成相同操作(O(1))。 - Andreas Fester