字符串和字符串生成器在内存使用方面的比较

Question

字符串和字符串生成器在内存使用方面的比较

4

根据KathySierra的SCJP Study Guide：

当您需要修改字符串时，应使用java.lang.StringBuffer和java.lang.StringBuilder类。正如我们所讨论的那样，String对象是不可变的，因此如果您选择对String对象进行大量操作，则最终会在String池中留下许多放弃的String对象。

为了澄清这一点，我已经查看了String类和StringBuilder source here的代码。

String的简化代码如下：

public final class String(){
     private final char [] value; //Final helps in immutability, I guess.
     public String (String original){
         value = original.value;
      }
}

而 StringBuilder 的简化版本看起来像这样：

public final class StringBuilder{
    char [] value;
    public StringBuilder(String str) {
        value = new Char[(str.length() + 16)]; // 16 here is implementation dependent.
    append(str);
}

public StringBuilder append(String str){

            //Add 'str' characters in value array if its size allows,
        else
            // Create new array of size newCapacity and copy contents of 'value' in that.
            //value = Arrays.copyOf(value, newCapacity);// here old array object is lost.

        return this;
    }
}

我会尽力帮忙翻译。以下是需要翻译的内容：

假设我们有一个如下所示的案例：

使用 String 类：

String s1 = "abc"; // Creates one object on String pool.
s1 = s1+"def"; // Creates two objects - "def " on String pool
// and "abcdef" on the heap.

如果我使用StringBuilder，代码将变为：

If I use StringBuilder, the code becomes:

 StringBuilder s1 = StringBuilder("abc");

 // Creates one String object "abc " on String pool.
 // And one StringBuilder object "abc" on the heap.
 s1.append("def");
 // Creates one string object "def" on String pool.
 // And changes the char [] inside StringBuilder to hold "def" also.

在StringBuilder中，s2 = s1.append("def");有同等的机会使持有字符串的字符数组被替换为新的字符数组。旧数组现在是无引用的，将被垃圾回收。

我的问题是：

使用简单的字符串连接和StringBuilder append()方法，进入字符串池的String对象数量相同。

根据上面列出的代码，StringBuilder在首次使用更大的字符数组，而String对象包含与其所持有的字符串长度相同的字符数组。

1. StringBuilder比普通的String类更有效地进行字符串操作的原因是什么？ 2. SCJP Guide中给出的陈述是否错误？

- Aman Arora

4个回答

3

只有在循环创建字符串时才更有效率。如果您有一个循环执行以下操作：

String[] strings = { "a", "b", "c", "d" };
String result = "";
for( String s : strings) {
    result += s;
}

StringBuilder版本会生成较少的对象，并且引起较少的GC：

String[] strings = { "a", "b", "c", "d" };
StringBuilder builder = new StringBuilder();
for( String s : strings) {
    builder.append(s);
}

第一种方式会导致每次循环运行时都有一个对象被发送到垃圾回收，而第二种方式则不会。

最终，由于字符串构建器数组的大小增加了两倍，因此不会发生太多的分配。

- Maurício Linhares

谢谢回复。您能否评论关于字符串池对象的SCJP指南声明？ - Aman Arora

1

该语句本身并没有太多意义，只有声明为文字的字符串才会进入池中。连接两个字符串的结果不包括在字符串池中。 - Maurício Linhares

在这个上下文中，我猜测语句是 String s1="abc";s1=s1+"def";。现在 abc 已经丢失并留在“存储池”中。但是当我们运行 new StringBuilder("").append("abc"); 时，该语句不会计算在 String Pool 中创建的 abc 对象。 - Aman Arora

2

操作不仅仅是连接字符串。想象一下，您想在字符串中间插入一个字符。由于字符串是不可变的，所以该怎么做呢？您需要创建一个新的字符串。使用 StringBuilder 您可以使用 insert(int offset, c) 方法。

请参见StringBuilder javadoc

您还有如下方法：

delete(int start, int end)
// Removes the characters in a substring of this sequence.

replace(int start, int end, String str)
// Replaces the characters in a substring of this sequence with characters in the specified String.

reverse()
// Causes this character sequence to be replaced by the reverse of the sequence.

insert(int dstOffset, CharSequence s)
// Inserts the specified CharSequence into this sequence.

- Paul Samsotha

1

使用StringBuilder进行字符串操作比普通的String类更加高效。

如果在循环中执行多次操作，那么使用StringBuilder会大大提高效率。考虑任何需要迭代单个字符的字符串转换或替换函数，例如下面这个用于将XML或HTML中的<, >, &, ", '字符进行转义的函数：

public static String xmlEscape(String s) {
    StringBuilder sb = new StringBuilder(
        (int)Math.min(Integer.MAX_VALUE, s.length() * 5L / 4));
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c == '<') sb.append("&lt;");
        else if (c == '>') sb.append("&gt;");
        else if (c == '&') sb.append("&amp;");
        else if (c == '"') sb.append("&quot;");
        else if (c == '\'') sb.append("&#039;");
        else sb.append(c);
    }
    return sb.toString();
}

StringBuilder数组最初的大小是比输入字符串稍微大一点，以容纳原始文本和可能的替换。输出文本在预先分配的缓冲区中累积，循环期间它很可能不需要任何额外的内存分配。

如果上述函数将输出累积到String而不是StringBuilder中，那么每处理一个字符就会再次复制整个输出，使性能降至二次方（即，非常糟糕！）。

至于第二个问题：

引用块中的内容：“SCJP Guide中给出的陈述是错误的吗？”

直白地说，是的。说“String池中会有被遗弃的String对象”非常误导人。据我所知，“String池”的术语仅指像String.intern()方法使用的内部池。字符串自动放入内部池的唯一时间是当ClassLoader加载类并将字符串文字常量从源代码加载到内存中时。

在运行时操作字符串对象肯定不会在intern池中增加额外的对象(除非你有意调用.intern())。

SCJP指南应该说的是：

字符串对象是不可变的，因此如果您选择对字符串对象进行大量操作，就会在堆中留下许多废弃的字符串对象。

堆中的废弃对象并不是最大的问题，因为垃圾回收器会迅速清理它们。使用StringBuilder进行多次操作的真正原因是避免不必要的字符复制。这对性能产生了巨大的影响，如@jmiserez的基准测试所示。

- Boann

每次调用 append("<") 都会在字符串池中创建一个 < 对象吗？我的问题更多关注于字符串池对象的创建。谢谢。 - Aman Arora

1

@AmanArora 不，绝对不是这样的，无论您是将其附加到String还是StringBuilder。字符串字面量在类加载时仅加载到内存中一次。因为它们是常量，所以没有必要每次使用或调用时创建新的字符串。 - Boann

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jmiserez · Accepted Answer

关键是 expandCapacity 函数：

void expandCapacity(int minimumCapacity) {
    int newCapacity = (value.length + 1) * 2; //important part here
    if (newCapacity < 0) {
        newCapacity = Integer.MAX_VALUE;
    } else if (minimumCapacity > newCapacity) {
        newCapacity = minimumCapacity;
    }
    value = Arrays.copyOf(value, newCapacity);
}

通过每次需要调整大小时将基础数组的大小增加2倍，可以最小化添加1个字符所需的摊销时间。摊销时间的解释，请参考维基百科。

随着插入n个元素，容量形成几何级数。通过任何常数比例扩展数组可以确保插入n个元素总共需要O(n)的时间，这意味着每次插入需要摊销常数时间。这个比例a的值导致了时间和空间的权衡：每次插入操作的平均时间大约为a/(a-1)，而浪费的单元格数量上限为(a-1)n。a的选择取决于库或应用程序：一些教科书使用a = 2，但Java的ArrayList实现使用a = 3/2，Python的列表数据结构的C实现使用a = 9/8。

许多动态数组在其大小低于某个阈值（例如容量的30%）时也会释放一些底层存储。为了支持插入和删除混合序列的摊销常数成本，此阈值必须严格小于1/a。

动态数组是教授摊销分析的常见示例。

public static void main(String[] args){
    int numAppends = 200000;
    int numRepetitions = 3;
    long[] time1 = new long[numRepetitions];
    long[] time2 = new long[numRepetitions];
    long now;
    for (int k = 0; k < numRepetitions; k++){
        String s = "";
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            s = s + "a";
        }
        time1[k] = (System.nanoTime() - now) / 1000000;
        StringBuilder sb = new StringBuilder();
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            sb.append("a");     
        }
        time2[k] = (System.nanoTime() - now) / 1000000;
        System.out.println("Rep "+k+", time1: "+time1[k]+ " ms, time2: " + time2[k] + " ms");
    }
}

输出：

Rep 0, time1: 13509 ms, time2: 7 ms
Rep 1, time1: 13086 ms, time2: 1 ms
Rep 2, time1: 13167 ms, time2: 1 ms

此外，我在基准测试中统计了 extendCapacity() 方法内部调用 Arrays.copyOf() 方法的次数：第一次迭代调用了 49 次，但是第二次和第三次迭代仅调用了 15 次。输出如下：

newCapacity: 34
newCapacity: 70
newCapacity: 142
newCapacity: 286
newCapacity: 574
newCapacity: 1150
newCapacity: 2302
newCapacity: 4606
newCapacity: 9214
newCapacity: 18430
newCapacity: 36862
newCapacity: 73726
newCapacity: 147454
newCapacity: 294910
newCapacity: 42
Rep 2, time1: 12955 ms, time2: 134 ms