快速读取文本文件的最后一行？

Question

快速读取文本文件的最后一行？

javafileio

68

什么是在Java中从[非常非常大]文件中读取最后一行文本的最快和最有效的方法？

- Jake

10个回答

38

Apache Commons提供了一个使用RandomAccessFile的实现。

它被称为ReversedLinesFileReader。

- jaco0646

我认为这是以最快的方式倒序读取文件的方法。 - Chathurika Sandarenu

2

@JuanToroMarty 可以循环使用 readLine() 方法。 - Stephan

1

这对我来说似乎是最优雅的方法。 - Rauni Lillemets

21

请看我对类似C#问题的回答。虽然Java中的编码支持略有不同，但代码基本上是相似的。

总的来说，这通常不是一件非常容易的事情。正如MSalter所指出的那样，UTF-8使得识别\r或\n非常容易，因为这些字符的UTF-8表示与ASCII相同，而这些字节不会出现在多字节字符中。

因此，基本上需要拿到一个缓冲区（比如2K），并逐步向后读取（跳到之前的2K处，读取下一个2K），检查行终止符。然后跳到流中确切的位置，在顶部创建一个InputStreamReader和一个BufferedReader。然后只需调用BufferedReader.readLine()即可。

- Jon Skeet

2

UTF-8不重要 - 你需要最后的CR或LF字符，这在ASCII和UTF-8中都是一个字节。 - MSalters

6

使用FileReader或FileInputStream不起作用-您必须使用 FileChannel或 RandomAccessFile从文件末尾向后循环。但是编码可能会成为问题，正如Jon所说。

- Michael Borgwardt

1

注意，对于单个操作，RandomAccessFile的性能很差 - 因此请进行合理大小的读取并存入缓冲区。 - Tom Hawtin - tackline

4

您可以轻松更改以下代码以打印最后一行。

使用MemoryMappedFile打印最后5行：

private static void printByMemoryMappedFile(File file) throws FileNotFoundException, IOException{
        FileInputStream fileInputStream=new FileInputStream(file);
        FileChannel channel=fileInputStream.getChannel();
        ByteBuffer buffer=channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
        buffer.position((int)channel.size());
        int count=0;
        StringBuilder builder=new StringBuilder();
        for(long i=channel.size()-1;i>=0;i--){
            char c=(char)buffer.get((int)i);
            builder.append(c);
            if(c=='\n'){
                if(count==5)break;
                count++;
                builder.reverse();
                System.out.println(builder.toString());
                builder=null;
                builder=new StringBuilder();
            }
        }
        channel.close();
    }

使用RandomAccessFile打印最后5行：

private static void printByRandomAcessFile(File file) throws FileNotFoundException, IOException{
        RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
        int lines = 0;
        StringBuilder builder = new StringBuilder();
        long length = file.length();
        length--;
        randomAccessFile.seek(length);
        for(long seek = length; seek >= 0; --seek){
            randomAccessFile.seek(seek);
            char c = (char)randomAccessFile.read();
            builder.append(c);
            if(c == '\n'){
                builder = builder.reverse();
                System.out.println(builder.toString());
                lines++;
                builder = null;
                builder = new StringBuilder();
                if (lines == 5){
                    break;
                }
            }

        }
    }

- Trying

对我有用。谢谢。这种方式有什么不方便吗？ - Omar B.

2

据我所知，读取文本文件最快的方法是使用位于“org.apache.commons.io”中的FileUtils Apache类。我有一个包含两百万行的文件，使用这个类，只需不到一秒钟就可以找到最后一行。以下是我的代码：

LineIterator lineIterator = FileUtils.lineIterator(newFile(filePath),"UTF-8");
String lastLine="";
while (lineIterator.hasNext()){
 lastLine=  lineIterator.nextLine();
}

- arash nadali

1

同Lorenzo上面的评论也适用于这里：这个方法可以工作，但可能不是最有效的解决方案。 - martin_wun

1

try(BufferedReader reader = new BufferedReader(new FileReader(reqFile))) {

    String line = null;

    System.out.println("======================================");

    line = reader.readLine();       //Read Line ONE
    line = reader.readLine();       //Read Line TWO
    System.out.println("first line : " + line);

    //Length of one line if lines are of even length
    int len = line.length();       

    //skip to the end - 3 lines
    reader.skip((reqFile.length() - (len*3)));

    //Searched to the last line for the date I was looking for.

    while((line = reader.readLine()) != null){

        System.out.println("FROM LINE : " + line);
        String date = line.substring(0,line.indexOf(","));

        System.out.println("DATE : " + date);      //BAM!!!!!!!!!!!!!!
    }

    System.out.println(reqFile.getName() + " Read(" + reqFile.length()/(1000) + "KB)");
    System.out.println("======================================");
} catch (IOException x) {
    x.printStackTrace();
}

- Ajai Singh

1

代码只有两行

     // Please specify correct Charset
     ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);

     // read last 2 lines
     System.out.println(rlf.toString(2));

Gradle：

implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'

Maven：

   <dependency>
        <groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
   </dependency>

- grep

0

为了避免与字符串（或StringBuilder）还原相关的Unicode问题，正如Eric Leschinski的优秀答案所讨论的那样，可以从文件末尾读取到一个字节列表，将其还原为一个字节数组，然后从字节数组创建字符串。

以下是对Eric Leschinski答案代码的更改，使用字节数组进行操作。代码更改在代码的注释行下面：

static public String tail2(File file, int lines) {
    java.io.RandomAccessFile fileHandler = null;
    try {
        fileHandler = new java.io.RandomAccessFile( file, "r" );
        long fileLength = fileHandler.length() - 1;
        //StringBuilder sb = new StringBuilder();
        List<Byte> sb = new ArrayList<>();
        int line = 0;

        for(long filePointer = fileLength; filePointer != -1; filePointer--){
            fileHandler.seek( filePointer );
            int readByte = fileHandler.readByte();

            if( readByte == 0xA ) {
                if (filePointer < fileLength) {
                    line = line + 1;
                }
            } else if( readByte == 0xD ) {
                if (filePointer < fileLength-1) {
                    line = line + 1;
                }
            }
            if (line >= lines) {
                break;
            }
            //sb.add( (char) readByte );
            sb.add( (byte) readByte );
        }

        //String lastLine = sb.reverse().toString();
        //Revert byte array and create String
        byte[] bytes = new byte[sb.size()];
        for (int i=0; i<sb.size(); i++) bytes[sb.size()-1-i] = sb.get(i);
        String lastLine = new String(bytes);
        return lastLine;
    } catch( java.io.FileNotFoundException e ) {
        e.printStackTrace();
        return null;
    } catch( java.io.IOException e ) {
        e.printStackTrace();
        return null;
    }
    finally {
        if (fileHandler != null )
            try {
                fileHandler.close();
            } catch (IOException e) {
            }
    }
}

- Helder Daniel

0

在C#中，您应该能够设置流的位置：

来自：http://bytes.com/groups/net-c/269090-streamreader-read-last-line-text-file

using(FileStream fs = File.OpenRead("c:\\file.dat"))
{
    using(StreamReader sr = new StreamReader(fs))
    {
        sr.BaseStream.Position = fs.Length - 4;
        if(sr.ReadToEnd() == "DONE")
            // match
    }
}

- rball

在Java的FileInputStream中（FileReader基于此），您无法设置位置；您只能向前跳过，这可能不会读取您跳过的部分，但仍然是单向操作，因此不适合查找未知偏移量处的换行符。 - Michael Borgwardt

你可以使用mark()来解决这个问题，具体取决于流的markLimit()是多少。 - James Schek

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Eric Leschinski · Accepted Answer

以下是两个函数，一个返回文件中最后一个非空行而不加载或遍历整个文件，另一个返回文件的最后N行而不遍历整个文件：

tail函数的作用是直接缩放到文件的最后一个字符，然后向后逐个字符地步进，记录它所看到的内容，直到找到一个换行符。一旦找到换行符，就会跳出循环。反转记录的内容并将其放入字符串中返回。 0xA是新行，0xD是回车。

如果您的行尾是\r\n或crlf或其他"双换行符样式换行符"，那么您必须指定n * 2行以获取最后n行，因为它对每行计数2行。

public String tail( File file ) {
    RandomAccessFile fileHandler = null;
    try {
        fileHandler = new RandomAccessFile( file, "r" );
        long fileLength = fileHandler.length() - 1;
        StringBuilder sb = new StringBuilder();

        for(long filePointer = fileLength; filePointer != -1; filePointer--){
            fileHandler.seek( filePointer );
            int readByte = fileHandler.readByte();

            if( readByte == 0xA ) {
                if( filePointer == fileLength ) {
                    continue;
                }
                break;
                
            } else if( readByte == 0xD ) {
                if( filePointer == fileLength - 1 ) {
                    continue;
                }
                break;
            }

            sb.append( ( char ) readByte );
        }

        String lastLine = sb.reverse().toString();
        return lastLine;
    } catch( java.io.FileNotFoundException e ) {
        e.printStackTrace();
        return null;
    } catch( java.io.IOException e ) {
        e.printStackTrace();
        return null;
    } finally {
        if (fileHandler != null )
            try {
                fileHandler.close();
            } catch (IOException e) {
                /* ignore */
            }
    }
}

但是您可能不想要最后一行，而是想要最后N行，请改用以下方法：

public String tail2( File file, int lines) {
    java.io.RandomAccessFile fileHandler = null;
    try {
        fileHandler = 
            new java.io.RandomAccessFile( file, "r" );
        long fileLength = fileHandler.length() - 1;
        StringBuilder sb = new StringBuilder();
        int line = 0;

        for(long filePointer = fileLength; filePointer != -1; filePointer--){
            fileHandler.seek( filePointer );
            int readByte = fileHandler.readByte();

             if( readByte == 0xA ) {
                if (filePointer < fileLength) {
                    line = line + 1;
                }
            } else if( readByte == 0xD ) {
                if (filePointer < fileLength-1) {
                    line = line + 1;
                }
            }
            if (line >= lines) {
                break;
            }
            sb.append( ( char ) readByte );
        }

        String lastLine = sb.reverse().toString();
        return lastLine;
    } catch( java.io.FileNotFoundException e ) {
        e.printStackTrace();
        return null;
    } catch( java.io.IOException e ) {
        e.printStackTrace();
        return null;
    }
    finally {
        if (fileHandler != null )
            try {
                fileHandler.close();
            } catch (IOException e) {
            }
    }
}

像这样调用上述方法：

File file = new File("D:\\stuff\\huge.log");
System.out.println(tail(file));
System.out.println(tail2(file, 10));

警告在Unicode的荒野上，这段代码可能会导致该函数的输出出错。例如，“Mary?s”而不是“Mary's”。带有帽子、重音符号、汉字等字符可能会导致输出错误，因为重音符号是在字符后添加的修饰符。反转复合字符会改变字符在反转时的身份特征。您将需要对计划使用此功能的所有语言进行全面的测试。

有关此Unicode反转问题的更多信息，请阅读以下内容： https://codeblog.jonskeet.uk/2009/11/02/omg-ponies-aka-humanity-epic-fail/