如何在Java中修剪字符串而不创建新对象？

Question

如何在Java中修剪字符串而不创建新对象？

3

我有一个大文本文件（约2000万行），其中每行的格式如下：

<string1>, <string2>

现在这些字符串可能具有尾部或前导的空格，我想在读取文件时将其删除。

我目前正在使用trim()来实现这一目的，但由于Java中的String是不可变的，每次修剪操作都会创建一个新对象。这会导致太多的内存浪费。

有什么更好的方法吗？

- Abhishek Kaushik

3

请展示您如何读取文件并拆分字符串。请展示您正在如何读取文件并拆分字符串。 - Andy Turner

1

你应该意识到任何未使用的字符串都会被回收，所以并没有真正浪费内存，只是创建了新的对象（这些对象会被GC高效地回收）。 - Kayaman

我不太确定，但我认为使用 sed 可以解决问题。 - Imran Ali

1

展示你正在使用的读取文件的代码；几乎可以肯定，trim() 不会成为主要的内存瓶颈。 - tucuxi

使用逗号分隔符拆分您的字符串，然后使用StringBuilder附加每个字符串。因此，如您所说，不会每次创建字符串。 - Chetan Joshi

我查看了日志，发现频繁进行完整的垃圾回收，导致应用程序无法运行。此外，该应用程序使用了单例对象模式。因此，我想缩小过多内存使用的可能原因。 - Abhishek Kaushik

7个回答

0

你可以将字符串作为字符流读取，并记录想要解析的每个标记的起始和结束位置。

这仍会为每个标记创建一个对象，但如果您的标记相对较长，则您的对象将包含的两个int字段比相应的字符串小得多。

但在开始这段旅程之前，请确保不要将修剪过的字符串保留更长的时间。

- biziclop

0

假设你有一个包含<string1>，<string2>的字符串，并且你只想拆分它而不进行修剪。

String trimmedBetween(String str, int start, int end) {
  while (start < end && Character.isWhitespace(str.charAt(start)) {
    ++start;
  }

  while (start < end && Character.isWhitespace(str.charAt(end - 1)) {
    --end;
  }

  return str.substring(start, end);
}

（请注意，这基本上就是 String.trim() 的实现方式，只不过使用 start 和 end 代替了 0 和 length。）

（然后像这样调用：）

int commaPos = str.indexOf(',');
String firstString = trimmedBetween(str, 0, commaPos);
String secondString = trimmedBetween(str, commaPos + 1, str.length());

- Andy Turner

我确实想要修剪部分，即单独的字符串。 - Abhishek Kaushik

为什么我要使用这个修剪而不是默认的修剪呢？目标是避免内存浪费，但你使用与内置的 trim() 相同的额外内存（= 返回一个新字符串）。 - tucuxi

因为String.trim()只能从字符串的开头和结尾修剪。要使用它，您必须拆分字符串（创建一个数组和两个字符串），然后修剪它们（最多两个字符串）。这种方法仅创建两个字符串，而不是4个字符串和一个数组。 - Andy Turner

0

正如您已经注意到的那样，字符串是不可变的。因此解决方案是不使用字符串，而是使用可变的东西。StringBuffer 是一个合适的类。

然而，StringBuffer 不包括 trim 方法，所以您可以使用类似以下的方法：

void trim(StringBuffer sb) {
    int start = 0;
    while (sb.length() > start && Character.isWhitespace(sb.charAt(0))) {
        start++;
    }
    sb.delete(0, start - 1);

    int end = 0;
    while (sb.length() > end && Character.isWhitespace(sb.charAt(sb.length() - 1))) {
        end++;
    }
    sb.delete(sb.length() - end, sb.length() - 1);
}

- Simon Farshid

0

如果你想避免使用 String，那么你必须自己处理它，使用 char 和 StringBuilder ，像这样：

public class Test {
    public static void main(String... args) throws Exception {
        InputStreamReader in = new InputStreamReader(new FileInputStream("<testfile>"), "UTF-8");

        char[] buffer = new char[32768];
        int read = -1;
        int index;
        StringBuilder content = new StringBuilder();
        while ((read = in.read(buffer)) > -1) {
            content.append(buffer, 0, read);
            index = 0;
            while (index > -1) {
                index = content.indexOf("\n");
                if (index > -1) {
                    char[] temp = new char[index];
                    content.getChars(0, index, temp, 0);
                    handleLine(temp);
                    content.replace(0, index + 1, "");
                }
            }
        }

        in.close();
    }

    private static void handleLine(char[] line) {
        StringBuilder content = new StringBuilder().append(line);
        int start = 0;
        int end = content.length();
        if (end > 0) {
            char ch = content.charAt(0);
            while (Character.isWhitespace(content.charAt(start))) {
                start++;
                if (end <= start) {
                    break;
                }
            }
            if (start < end) {
                while (Character.isWhitespace(content.charAt(end - 1))) {
                    end--;
                    if (end <= start) {
                        break;
                    }
                }
            }
        }

        System.out.println("***" + content.subSequence(start, end) + "***");
    }
}

- markbernard

0

我们可以通过正则表达式来处理。

   {
    String str = "abcd, efgh";
    String [] result = str.split("(,\\s)|,");
    Arrays.asList(result).forEach(s -> System.out.println(s));
   }

- Vinod

-1

我认为你可以直接将结果数据写入一个新文件。

String originStr = "   xxxxyyyy";
for (int i = 0; i < originStr.length(); i++) {
    if (' ' == originStr.charAt(i)) {
        continue;
    }
    NewFileOutPutStream.write(originStr.charAt(i));
}

- Axl

如果您使用多线程模型，可以将文件分割成几个逻辑块的小文件，然后上述方法也能很好地工作。 - Axl

一次只写入一个字符会花费很长时间，你需要进行缓冲。 - markbernard

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- sdgfsdh · Accepted Answer

我会感到惊讶，如果不可变的 String 类正在引起问题；JVM 是非常高效的，是多年工程技术成果的结晶。

话虽如此，Java 提供了一个可变的字符串操作类，叫做 StringBuilder。你可以在这里阅读文档 here。

如果你在跨线程工作，请考虑使用 StringBuffer。