Java:读取一个巨大文件的最后n行

41

我想在Java中读取一个非常大的文件的最后n行,而不必将整个文件读入任何缓冲区或内存区域。

我查看了JDK API和Apache Commons I/O,并没有找到适合这个目的的方法。

我考虑使用UNIX中tail或less所使用的方法。我认为它们不会加载整个文件,然后显示文件的最后几行。在Java中应该也有类似的方法可以实现。


15个回答

1
一个RandomAccessFile允许寻址(http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html)。File.length方法将返回文件的大小。问题在于确定行数。为此,您可以寻找到文件的末尾,并向后读取,直到达到正确的行数。

0
这是我发现的最好的方法。简单快捷,占用内存较少。
public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException {
    BufferedReader reader = new BufferedReader(new FileReader(src));
    String[] lines = new String[maxLines];
    int lastNdx = 0;
    for (String line=reader.readLine(); line != null; line=reader.readLine()) {
        if (lastNdx == lines.length) {
            lastNdx = 0;
        }
        lines[lastNdx++] = line;
    }

    OutputStreamWriter writer = new OutputStreamWriter(out);
    for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) {
        if (ndx == lines.length) {
            ndx = 0;
        }
        writer.write(lines[ndx]);
        writer.write("\n");
    }

    writer.flush();
}

7
由于这会读取整个文件,所以对于较大的文件来说,它不太具有可扩展性。 - ChristopheD
此外,对于空文件,该函数会进入无限循环。 - shak
为什么空文件会循环? - The Coordinator
如果没有行或者少于 maxLines 行,第二个循环的条件不会终止。 - user207421

0

(请参见评论)

public String readFromLast(File file, int howMany) throws IOException {
    int numLinesRead = 0;
    StringBuilder builder = new StringBuilder();
    try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
        try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
            long fileLength = file.length() - 1;
            /*
             * Set the pointer at the end of the file. If the file is empty, an IOException
             * will be thrown
             */
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                byte b = (byte) randomAccessFile.read();
                if (b == '\n') {
                    numLinesRead++;
                    // (Last line often terminated with a line separator)
                    if (numLinesRead == (howMany + 1))
                        break;
                }
                baos.write(b);
                fileLength = fileLength - pointer;
            }
            /*
             * Since line is read from the last so it is in reverse order. Use reverse
             * method to make it ordered correctly
             */
            byte[] a = baos.toByteArray();
            int start = 0;
            int mid = a.length / 2;
            int end = a.length - 1;

            while (start < mid) {
                byte temp = a[end];
                a[end] = a[start];
                a[start] = temp;
                start++;
                end--;
            }// End while
            return new String(a).trim();
        } // End inner try-with-resources
    } // End outer try-with-resources

} // End method

0

代码仅有两行

     // Please specify correct Charset
     ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);

     // read last 2 lines
     System.out.println(rlf.toString(2));

Gradle:

implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'

Maven:

   <dependency>
        <groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
   </dependency>

0

我先尝试了RandomAccessFile,但读取文件倒序并在每次读操作时重新定位文件指针很烦琐。所以,我尝试了@Luca的解决方案,只用了两行代码,在几分钟内就得到了文件的最后几行字符串。

    InputStream inputStream = Runtime.getRuntime().exec("tail " + path.toFile()).getInputStream();
    String tail = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(System.lineSeparator()));

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接