因此我的问题是:你们是否知道任何开源实现此类的指针或者可以分享自己的实现?
如果这个问题可以成为一个有用的链接和代码的集合,那就太好了。我相信这个问题肯定被很多人共享,但还没有得到SUN公司的妥善解决。
请不要提及MemoryMapping,因为文件可能比Integer.MAX_VALUE大得多。
您可以使用以下代码从 RandomAccessFile 创建 BufferedInputStream:
RandomAccessFile raf = ...
FileInputStream fis = new FileInputStream(raf.getFD());
BufferedInputStream bis = new BufferedInputStream(fis);
需要注意的一些事项:
你可能想要使用的方式类似于:
RandomAccessFile raf = ...
FileInputStream fis = new FileInputStream(raf.getFD());
BufferedInputStream bis = new BufferedInputStream(fis);
//do some reads with buffer
bis.read(...);
bis.read(...);
//seek to a a different section of the file, so discard the previous buffer
raf.seek(...);
bis = new BufferedInputStream(fis);
bis.read(...);
bis.read(...);
getFD
方法。但是我没有构建一个BufferedInputStream,而是先构建了一个FileReader,然后再构建了一个BufferedReader。这样就可以访问比RandomAccessFile提供的更快(也可能更UTF友好?)的readLine
方法。 - Jeff Terrell Ph.D.即使文件大小超过Integer.MAX_VALUE,我认为没有理由不使用java.nio.MappedByteBuffer。
显然你不能为整个文件定义一个单一的MappedByteBuffer。但是你可以有几个MappedByteBuffers访问不同的文件区域。
在FileChannel.map中,位置和大小的定义类型为long,这意味着您可以提供超过Integer.MAX_VALUE的值,唯一需要注意的是缓冲区的大小不能超过Integer.MAX_VALUE。
因此,您可以像这样定义多个映射:
buffer[0] = fileChannel.map(FileChannel.MapMode.READ_WRITE,0,2147483647L);
buffer[1] = fileChannel.map(FileChannel.MapMode.READ_WRITE,2147483647L, Integer.MAX_VALUE);
buffer[2] = fileChannel.map(FileChannel.MapMode.READ_WRITE, 4294967294L, Integer.MAX_VALUE);
...
简而言之,大小不能超过Integer.MAX_VALUE,但起始位置可以位于文件的任何位置。
在书籍《Java NIO》中,作者Ron Hitchens指出:
通过内存映射机制访问文件,即使使用通道,也比传统方法读取或写入数据要高效得多。不需要进行显式的系统调用,这可能耗费时间。更重要的是,操作系统的虚拟内存系统会自动缓存内存页面。这些页面将使用系统内存缓存,并不会占用JVM内存堆的空间。
一旦内存页面变为有效状态(从磁盘加载),就可以再次以全硬件速度访问它,无需发出另一个系统调用来获取数据。包含索引或其他经常被引用或更新的节的大型结构化文件可以从内存映射中受益。当与文件锁定结合使用以保护关键部分和控制事务原子性时,您开始看到如何将内存映射缓冲区好好利用。
我真的怀疑您是否能找到比这更好的第三方API。也许您可以找到基于此架构编写的API来简化工作。
您是否觉得这种方法适合您?
Apache PDFBox 项目有一个不错并经过测试的BufferedRandomAccessFile
类。
根据Nick Zhang在 JavaWorld.com 上的描述,它是 java.io.RandomAccessFile 类的优化版本。基于jmzreader实现,并增加处理无符号字节的功能。
点击以下链接查看源代码:
如果您正在64位机器上运行,则内存映射文件是最佳方法。只需将整个文件映射到等大小的缓冲区数组中,然后根据需要为每个记录选择一个缓冲区(即edalorzo的答案,但您希望有重叠的缓冲区,以便不会跨越边界的记录)。
如果您在32位JVM上运行,则只能使用RandomAccessFile
。但是,您可以使用它来读取包含整个记录的byte[]
,然后使用ByteBuffer
从该数组中检索单个值。最坏的情况下,您应该需要进行两次文件访问:一次用于检索记录的位置/大小,一次用于检索记录本身。
但是,请注意,如果创建大量的byte[]
,则可能会开始对垃圾收集器造成压力,并且如果您在文件中反复跳动,则仍然会受到IO限制。
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
/**
* Adds caching to a random access file.
*
* Rather than directly writing down to disk or to the system which seems to be
* what random access file/file channel do, add a small buffer and write/read from
* it when possible. A single buffer is created, which means reads or writes near
* each other will have a speed up. Read/writes that are not within the cache block
* will not be speed up.
*
*
*/
public class BufferedRandomAccessFile implements AutoCloseable {
private static final int DEFAULT_BUFSIZE = 4096;
/**
* The wrapped random access file, we will hold a cache around it.
*/
private final RandomAccessFile raf;
/**
* The size of the buffer
*/
private final int bufsize;
/**
* The buffer.
*/
private final byte buf[];
/**
* Current position in the file.
*/
private long pos = 0;
/**
* When the buffer has been read, this tells us where in the file the buffer
* starts at.
*/
private long bufBlockStart = Long.MAX_VALUE;
// Must be updated on write to the file
private long actualFileLength = -1;
boolean changeMadeToBuffer = false;
// Must be update as we write to the buffer.
private long virtualFileLength = -1;
public BufferedRandomAccessFile(File name, String mode) throws FileNotFoundException {
this(name, mode, DEFAULT_BUFSIZE);
}
/**
*
* @param file
* @param mode how to open the random access file.
* @param b size of the buffer
* @throws FileNotFoundException
*/
public BufferedRandomAccessFile(File file, String mode, int b) throws FileNotFoundException {
this(new RandomAccessFile(file, mode), b);
}
public BufferedRandomAccessFile(RandomAccessFile raf) throws FileNotFoundException {
this(raf, DEFAULT_BUFSIZE);
}
public BufferedRandomAccessFile(RandomAccessFile raf, int b) {
this.raf = raf;
try {
this.actualFileLength = raf.length();
} catch (IOException e) {
throw new RuntimeException(e);
}
this.virtualFileLength = actualFileLength;
this.bufsize = b;
this.buf = new byte[bufsize];
}
/**
* Sets the position of the byte at which the next read/write should occur.
*
* @param pos
* @throws IOException
*/
public void seek(long pos) throws IOException{
this.pos = pos;
}
/**
* Sets the length of the file.
*/
public void setLength(long fileLength) throws IOException {
this.raf.setLength(fileLength);
if(fileLength < virtualFileLength) {
virtualFileLength = fileLength;
}
}
/**
* Writes the entire buffer to disk, if needed.
*/
private void writeBufferToDisk() throws IOException {
if(!changeMadeToBuffer) return;
int amountOfBufferToWrite = (int) Math.min((long) bufsize, virtualFileLength - bufBlockStart);
if(amountOfBufferToWrite > 0) {
raf.seek(bufBlockStart);
raf.write(buf, 0, amountOfBufferToWrite);
this.actualFileLength = virtualFileLength;
}
changeMadeToBuffer = false;
}
/**
* Flush the buffer to disk and force a sync.
*/
public void flush() throws IOException {
writeBufferToDisk();
this.raf.getChannel().force(false);
}
/**
* Based on pos, ensures that the buffer is one that contains pos
*
* After this call it will be safe to write to the buffer to update the byte at pos,
* if this returns true reading of the byte at pos will be valid as a previous write
* or set length has caused the file to be large enough to have a byte at pos.
*
* @return true if the buffer contains any data that may be read. Data may be read so long as
* a write or the file has been set to a length that us greater than the current position.
*/
private boolean readyBuffer() throws IOException {
boolean isPosOutSideOfBuffer = pos < bufBlockStart || bufBlockStart + bufsize <= pos;
if (isPosOutSideOfBuffer) {
writeBufferToDisk();
// The buffer is always positioned to start at a multiple of a bufsize offset.
// e.g. for a buf size of 4 the starting positions of buffers can be at 0, 4, 8, 12..
// Work out where the buffer block should start for the given position.
long bufferBlockStart = (pos / bufsize) * bufsize;
assert bufferBlockStart >= 0;
// If the file is large enough, read it into the buffer.
// if the file is not large enough we have nothing to read into the buffer,
// In both cases the buffer will be ready to have writes made to it.
if(bufferBlockStart < actualFileLength) {
raf.seek(bufferBlockStart);
raf.read(buf);
}
bufBlockStart = bufferBlockStart;
}
return pos < virtualFileLength;
}
/**
* Reads a byte from the file, returning an integer of 0-255, or -1 if it has reached the end of the file.
*
* @return
* @throws IOException
*/
public int read() throws IOException {
if(readyBuffer() == false) {
return -1;
}
try {
return (buf[(int)(pos - bufBlockStart)]) & 0x000000ff ;
} finally {
pos++;
}
}
/**
* Write a single byte to the file.
*
* @param b
* @throws IOException
*/
public void write(byte b) throws IOException {
readyBuffer(); // ignore result we don't care.
buf[(int)(pos - bufBlockStart)] = b;
changeMadeToBuffer = true;
pos++;
if(pos > virtualFileLength) {
virtualFileLength = pos;
}
}
/**
* Write all given bytes to the random access file at the current possition.
*
*/
public void write(byte[] bytes) throws IOException {
int writen = 0;
int bytesToWrite = bytes.length;
{
readyBuffer();
int startPositionInBuffer = (int)(pos - bufBlockStart);
int lengthToWriteToBuffer = Math.min(bytesToWrite - writen, bufsize - startPositionInBuffer);
assert startPositionInBuffer + lengthToWriteToBuffer <= bufsize;
System.arraycopy(bytes, writen,
buf, startPositionInBuffer,
lengthToWriteToBuffer);
pos += lengthToWriteToBuffer;
if(pos > virtualFileLength) {
virtualFileLength = pos;
}
writen += lengthToWriteToBuffer;
this.changeMadeToBuffer = true;
}
// Just write the rest to the random access file
if(writen < bytesToWrite) {
writeBufferToDisk();
int toWrite = bytesToWrite - writen;
raf.write(bytes, writen, toWrite);
pos += toWrite;
if(pos > virtualFileLength) {
virtualFileLength = pos;
actualFileLength = virtualFileLength;
}
}
}
/**
* Read up to to the size of bytes,
*
* @return the number of bytes read.
*/
public int read(byte[] bytes) throws IOException {
int read = 0;
int bytesToRead = bytes.length;
while(read < bytesToRead) {
//First see if we need to fill the cache
if(readyBuffer() == false) {
//No more to read;
return read;
}
//Now read as much as we can (or need from cache and place it
//in the given byte[]
int startPositionInBuffer = (int)(pos - bufBlockStart);
int lengthToReadFromBuffer = Math.min(bytesToRead - read, bufsize - startPositionInBuffer);
System.arraycopy(buf, startPositionInBuffer, bytes, read, lengthToReadFromBuffer);
pos += lengthToReadFromBuffer;
read += lengthToReadFromBuffer;
}
return read;
}
public void close() throws IOException {
try {
this.writeBufferToDisk();
} finally {
raf.close();
}
}
/**
* Gets the length of the file.
*
* @return
* @throws IOException
*/
public long length() throws IOException{
return virtualFileLength;
}
}