我想在Java中读取一个非常大的文件的最后n行,而不必将整个文件读入任何缓冲区或内存区域。
我查看了JDK API和Apache Commons I/O,并没有找到适合这个目的的方法。
我考虑使用UNIX中tail或less所使用的方法。我认为它们不会加载整个文件,然后显示文件的最后几行。在Java中应该也有类似的方法可以实现。
我想在Java中读取一个非常大的文件的最后n行,而不必将整个文件读入任何缓冲区或内存区域。
我查看了JDK API和Apache Commons I/O,并没有找到适合这个目的的方法。
我考虑使用UNIX中tail或less所使用的方法。我认为它们不会加载整个文件,然后显示文件的最后几行。在Java中应该也有类似的方法可以实现。
RandomAccessFile
允许寻址(http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html)。File.length
方法将返回文件的大小。问题在于确定行数。为此,您可以寻找到文件的末尾,并向后读取,直到达到正确的行数。public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException {
BufferedReader reader = new BufferedReader(new FileReader(src));
String[] lines = new String[maxLines];
int lastNdx = 0;
for (String line=reader.readLine(); line != null; line=reader.readLine()) {
if (lastNdx == lines.length) {
lastNdx = 0;
}
lines[lastNdx++] = line;
}
OutputStreamWriter writer = new OutputStreamWriter(out);
for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) {
if (ndx == lines.length) {
ndx = 0;
}
writer.write(lines[ndx]);
writer.write("\n");
}
writer.flush();
}
maxLines
行,第二个循环的条件不会终止。 - user207421public String readFromLast(File file, int howMany) throws IOException {
int numLinesRead = 0;
StringBuilder builder = new StringBuilder();
try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
long fileLength = file.length() - 1;
/*
* Set the pointer at the end of the file. If the file is empty, an IOException
* will be thrown
*/
randomAccessFile.seek(fileLength);
for (long pointer = fileLength; pointer >= 0; pointer--) {
randomAccessFile.seek(pointer);
byte b = (byte) randomAccessFile.read();
if (b == '\n') {
numLinesRead++;
// (Last line often terminated with a line separator)
if (numLinesRead == (howMany + 1))
break;
}
baos.write(b);
fileLength = fileLength - pointer;
}
/*
* Since line is read from the last so it is in reverse order. Use reverse
* method to make it ordered correctly
*/
byte[] a = baos.toByteArray();
int start = 0;
int mid = a.length / 2;
int end = a.length - 1;
while (start < mid) {
byte temp = a[end];
a[end] = a[start];
a[start] = temp;
start++;
end--;
}// End while
return new String(a).trim();
} // End inner try-with-resources
} // End outer try-with-resources
} // End method
代码仅有两行
// Please specify correct Charset
ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);
// read last 2 lines
System.out.println(rlf.toString(2));
Gradle:
implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'
Maven:
<dependency>
<groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
</dependency>
我先尝试了RandomAccessFile,但读取文件倒序并在每次读操作时重新定位文件指针很烦琐。所以,我尝试了@Luca的解决方案,只用了两行代码,在几分钟内就得到了文件的最后几行字符串。
InputStream inputStream = Runtime.getRuntime().exec("tail " + path.toFile()).getInputStream();
String tail = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(System.lineSeparator()));