为什么我的 BufferedReader 代码会泄漏内存？

Question

为什么我的 BufferedReader 代码会泄漏内存？

4

我有一个包装器，用于BufferedReader，它按顺序读取文件以在多个文件之间创建不间断的流:

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.ArrayList;
import java.util.zip.GZIPInputStream;

/**
 * reads in a whole bunch of files such that when one ends it moves to the
 * next file.
 * 
 * @author isaak
 *
 */
class LogFileStream implements FileStreamInterface{
    private ArrayList<String> fileNames;
    private BufferedReader br;
    private boolean done = false;

    /**
    * 
    * @param files an array list of files to read from, order matters.
    * @throws IOException
    */
    public LogFileStream(ArrayList<String> files) throws IOException {
        fileNames = new ArrayList<String>();
        for (int i = 0; i < files.size(); i++) {
            fileNames.add(files.get(i));
        }
        setFile();
    }

    /**
     * advances the file that this class is reading from.
     * 
     * @throws IOException
     */
    private void setFile() throws IOException {
        if (fileNames.size() == 0) {
            this.done = true;
            return;
        }
        if (br != null) {
            br.close();
        }
        //if the file is a .gz file do a little extra work.
        //otherwise read it in with a standard file Reader
        //in either case, set the buffer size to 128kb
        if (fileNames.get(0).endsWith(".gz")) {
            InputStream fileStream = new FileInputStream(fileNames.get(0));
            InputStream gzipStream = new GZIPInputStream(fileStream);
            // TODO this probably needs to be modified to work well on any
            // platform, UTF-8 is standard for debian/novastar though.
            Reader decoder = new InputStreamReader(gzipStream, "UTF-8");
            // note that the buffer size is set to 128kb instead of the standard
            // 8kb.
            br = new BufferedReader(decoder, 131072);
            fileNames.remove(0);
        } else {
            FileReader filereader = new FileReader(fileNames.get(0));
            br = new BufferedReader(filereader, 131072);
            fileNames.remove(0);
        }
    }

    /**
     * returns true if there are more lines available to read.
     * @return true if there are more lines available to read.
     */
    public boolean hasMore() {
        return !done;
    }

    /**
      * Gets the next line from the correct file.
      * @return the next line from the files, if there isn't one it returns null
      * @throws IOException
      */
    public String nextLine() throws IOException {
        if (done == true) {
            return null;
        }
        String line = br.readLine();
        if (line == null) {
            setFile();
            return nextLine();
        }
        return line;
    }
}

如果我在一个大文件列表（300MB的文件）上构建此对象，然后在while循环中一遍又一遍地打印nextLine()，性能将不断下降，直到没有更多的RAM可用。即使我读入的文件大小约为500kb，并使用了具有32MB内存的虚拟机，也会发生这种情况。

我希望这段代码能够运行在极大的数据集上（数百GB的文件），并且它是程序的一个组件，需要在32MB或更少的内存下运行。

使用的文件大多标记为CSV文件，因此在磁盘上使用Gzip进行压缩。该阅读器需要处理gzip和未压缩的文件。

如果我理解正确，一旦文件已经被读取并输出其行数据，那么与该文件相关的对象和其他所有内容都应该可以进行垃圾回收，对吗？

- Isaak

3

这与 C++ 有关吗？ - Galik

1

你可能想在构造函数中使用 fileNames.addAll(files);。 - Kayaman

我会查看堆转储以查看内存保留的位置。但是，根据你的描述，问题似乎可能出现在代码的其他地方。 - Peter Lawrey

目前最快找出问题的方法是在IDE调试器中运行并设置一些断点。如果你在处理几百个文件后停止，你应该能够轻松地找到泄漏的内存。 - Jim Garrison

5个回答

0

最后一次调用setFile不会关闭您的BufferedReader，因此您正在泄漏资源。

实际上，在nextLine中，您读取第一个文件直到结束。当到达结尾时，您调用setFile并检查是否有更多文件要处理。但是，如果没有更多文件，则立即返回而不关闭最后一个BufferReader用户。

此外，如果您没有处理所有文件，则仍将使用资源。

- JEY

你说得对。我已经修复了，但这并不能解决我的问题，因为我只创建了一个该对象，并通过它读取了数百个文件。 - Isaak

0

你的代码中至少存在一个泄漏问题：方法setFile()没有关闭最后一个BufferedReader，因为if (fileNames.size() == 0)检查位于if (br != null)检查之前。

然而，只有在多次实例化LogFileStream时才会导致描述的效果。

最好使用LinkedList而不是ArrayList，因为对于ArrayList，fileNames.remove(0)比对于LinkedList来说更“昂贵”。可以在构造函数中使用以下单行进行实例化：fileNames = new LinkedList<>(files);

- Vilmantas Baranauskas

0

偶尔，您可以使用flush()或close()方法清除BufferedReader的内容。这将清除读取器的内容，因此每次使用setFile()方法时，可能需要刷新读取器。然后，在每次调用br = new BufferedReader(decoder, 131072)之前，关闭BufferedReader

- JD9999

-1

GC（垃圾回收器）会在你关闭连接/读取器后开始工作。如果你使用的是Java 7或以上版本，你可能想考虑使用try-with-resource语句来处理IO操作，这是一种更好的方式。https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html

- Mark Xie

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Yves Martin · Accepted Answer

在Java 8中，GZIP支持已从Java代码移至本地zlib使用。

非关闭的GZIP流会泄漏本机内存（我真的说的是“本机”而不是“堆”内存），而且很难诊断。根据应用程序对这些流的使用情况，操作系统可能会很快达到其内存限制。

症状是操作系统进程内存使用量与由本地内存跟踪产生的JVM内存使用量不一致https://docs.oracle.com/javase/8/docs/technotes/guides/vm/nmt-8.html

您将在http://www.evanjones.ca/java-native-leak-bug.html找到完整的故事详情。