从包含许多文件的zip文件中提取一个文件的最快方法是什么？

Question

从包含许多文件的zip文件中提取一个文件的最快方法是什么？

javaunzipcompression

11

我尝试了java.util.zip包，但是速度太慢了。

然后我找到了LZMA SDK和7z jbinding，但它们都有缺陷。

LZMA SDK没有提供如何使用的教程或文档，这非常令人沮丧。没有javadoc。

而7z jbinding没有提供只提取一个文件的简单方法，它只提供提取zip文件所有内容的方法。此外，它也没有提供指定解压文件位置的方法。

有什么好的建议吗？

- lamwaiman1988

4个回答

13

我没有对速度进行基准测试，但是在使用Java 7或更高版本时，我可以按照以下方式提取文件。
我想象它比ZipFile API更快：

从zip文件test.zip中提取META-INF/MANIFEST.MF的简短示例：

// file to extract from zip file
String file = "MANIFEST.MF";
// location to extract the file to
File outputLocation = new File("D:/temp/", file);
// path to the zip file
Path zipFile = Paths.get("D:/temp/test.zip");

// load zip file as filesystem
try (FileSystem fileSystem = FileSystems.newFileSystem(zipFile)) {
    // copy file from zip file to output location
    Path source = fileSystem.getPath("META-INF/" + file);
    Files.copy(source, outputLocation.toPath());
}

- flavio.donze

1

运行良好，速度极快...这应该是被接受的答案（假设使用Java 7或更高版本）。 - leo

5

使用 ZipFile 而不是 ZipInputStream。

虽然文档没有明确说明（在 JarFile 的文档中有），但应该使用随机访问文件操作来读取文件。由于 ZIP 文件包含一个已知位置的目录，这意味着要查找特定文件时需要进行较少的 IO 操作。

一些注意事项：据我所知，Sun 实现使用了内存映射文件。这意味着您的虚拟地址空间必须足够大，能够容纳文件以及 JVM 中的其他所有内容。这可能对32位服务器造成问题。另一方面，它可能足够聪明，避免在32位上进行内存映射，或者只映射目录；我还没有尝试过。

此外，如果您正在使用多个文件，请务必使用 try/finally 确保在使用后关闭文件。

- kdgregory

0

下面的代码片段假设您已经知道目标zip文件路径和其中的目标条目文件路径。

无需遍历文件，因为ZipFile提供了一个getEntry方法来直接检索条目，以及获取其内容的byte[]或FileInputStream的方法。

在这个例子中，它从一个大小约为340KB的zip文件中读取一个protobuf二进制文件，耗时约11毫秒。您可以使用类似的方法来读取任何其他文件类型。


    /* Relevant imports */
    import com.google.protobuf.Message;
    import com.google.protobuf.Parser;
    import java.nio.file.Path;
    import java.util.zip.ZipEntry;
    import java.util.zip.ZipFile;
    
    public final class ZipFileUtils {

        ...

        public static <T extends Message> Message readMessageFromZip(
                                                final Path zipPath, 
                                                final Path entryPath, 
                                                final Parser<T> messageParser        
                                             ) throws IOException {
            try (ZipFile zipFile = new ZipFile(zipPath.toFile())) {
                ZipEntry zipEntry = zipFile.getEntry(entryPath.toString());
                return messageParser.parseFrom(zipFile.getInputStream(zipEntry));
            }
        }
    }

- rbento

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- WhiteFang34 · Accepted Answer

您的代码中使用了 java.util.zip，请问您是处理多大的zip文件？并且可以提供一下您的代码吗？

我能够在大约一秒钟内使用以下代码，从一个包含1,800个条目的200MB zip文件中提取一个4MB的条目：

OutputStream out = new FileOutputStream("your.file");
FileInputStream fin = new FileInputStream("your.zip");
BufferedInputStream bin = new BufferedInputStream(fin);
ZipInputStream zin = new ZipInputStream(bin);
ZipEntry ze = null;
while ((ze = zin.getNextEntry()) != null) {
    if (ze.getName().equals("your.file")) {
        byte[] buffer = new byte[8192];
        int len;
        while ((len = zin.read(buffer)) != -1) {
            out.write(buffer, 0, len);
        }
        out.close();
        break;
    }
}