Docx4j加载docx文件时出现NullPointerException异常

3

我想将docx转换为html。我开始编写与github中给出的示例相同的代码。这只是加载部分。但在那里,我遇到了问题。

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;

public class Main {

    public static void main(String[] args) throws Docx4JException, 
        String inputfilepath = "myfilepathhere";


        OutputStream os = new FileOutputStream(inputfilepath + ".html");

        WordprocessingMLPackage wordMLPackage = Docx4J
                .load(new FileInputStream(inputfilepath));

    }
}

我遇到了NullPointerException。查看异常追踪并在Github中导航源代码后,我怀疑它与这个类相关的JAXB有关,该类是https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/jaxb/Context.java
Docx4j源代码可在https://github.com/plutext/docx4j上找到。
异常追踪:
Exception in thread "main" org.docx4j.openpackaging.exceptions.Docx4JException: Couldn't get [Content_Types].xml from ZipFile
    at org.docx4j.openpackaging.io3.Load3.get(Load3.java:134)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:454)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:371)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:337)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:302)
    at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:170)
    at org.docx4j.Docx4J.load(Docx4J.java:195)
    at Main.main(Main.java:29)
Caused by: org.docx4j.openpackaging.exceptions.InvalidFormatException: Bad [Content_Types].xml
    at org.docx4j.openpackaging.contenttype.ContentTypeManager.parseContentTypesFile(ContentTypeManager.java:713)
    at org.docx4j.openpackaging.io3.Load3.get(Load3.java:132)
    ... 7 more
Caused by: java.lang.NullPointerException
    at org.docx4j.openpackaging.contenttype.ContentTypeManager.parseContentTypesFile(ContentTypeManager.java:679)
    ... 8 more

这个docx文档很好(由Word 2010创建)。我甚至解压它,看看Content_Types.xml是否存在。它在那里。

我正在使用Eclipse和Java SE 7。我已经在项目属性中的Java构建路径中添加了所有必需的jar文件。

请帮助我。

更新:

实际上,当我将Context.java中的这行代码添加到我的类中以查看是否是问题所在时。

     JAXBContext.newInstance("org.docx4j.openpackaging.contenttype");

我在控制台中看到以下异常:

    Exception in thread "main" javax.xml.bind.JAXBException: Provider org.eclipse.persistence.jaxb.JAXBContextFactory not found
 - with linked exception:
[java.lang.ClassNotFoundException: org.eclipse.persistence.jaxb.JAXBContextFactory]
    at javax.xml.bind.ContextFinder.newInstance(Unknown Source)
    at javax.xml.bind.ContextFinder.find(Unknown Source)
    at javax.xml.bind.JAXBContext.newInstance(Unknown Source)
    at javax.xml.bind.JAXBContext.newInstance(Unknown Source)
    at javax.xml.bind.JAXBContext.newInstance(Unknown Source)
    at Main.main(Main.java:26)
Caused by: java.lang.ClassNotFoundException: org.eclipse.persistence.jaxb.JAXBContextFactory
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at javax.xml.bind.ContextFinder.safeLoadClass(Unknown Source)
    ... 6 more

如果您将docx上传到docx4j Web应用程序(或下载Word AddIn),那么这些docx4j实例是否能够成功加载您的docx? - JasonPlutext
@JasonPlutext 我把文档上传到http://webapp.docx4java.org/OnlineDemo/docx_to_pdf.html,点击处理后,我成功地得到了PDF文件。 - pinkpanther
请尝试使用http://webapp.docx4java.org/OnlineDemo/PartsList.html。 - JasonPlutext
@JasonPlutext 我试过了,页面显示了[Content_Types].xml和部件信息。 - pinkpanther
如果您打开日志记录,docx4j在堆栈跟踪之前会输出什么?类似于http://stackoverflow.com/questions/12363169/docx4j-no-suitable-jaxb-implementation-available-runtime-error-java-1-5或http://www.docx4java.org/forums/docx-java-f6/invalidformatexception-by-using-docx4j-with-eclipse-t807.html的内容,两者的根本原因都是没有JAXB实现。 - JasonPlutext
@JasonPlutext 我不确定如何进行日志记录...所以我按照我在上面更新中提到的方法做了。异常是 java.lang.ClassNotFoundException: org.eclipse.persistence.jaxb.JAXBContextFactory。请查看我的更新。谢谢。 - pinkpanther
3个回答

2

docx4j支持多种不同的JAXB实现:

  • 参考实现
  • Sun/Oracle在Java 6/7/8中包含的实现
  • EclipseLink MOXy

如果要使用MOXy,需要:

  1. 相关的EclipseLink jars
  2. docx4j-MOXy-JAXBContext-3.0.0.jar(只包含jaxb.properties文件)

jaxb.properties文件只是简单地声明:

javax.xml.bind.context.factory=org.eclipse.persistence.jaxb.JAXBContextFactory

如果您正在使用maven,则只需添加:

<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-MOXy-JAXBContext</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>org.eclipse.persistence</groupId>
<artifactId>org.eclipse.persistence.moxy</artifactId>
<version>2.5.1</version>
</dependency>

你的classpath上是否有docx4j-MOXy-JAXBContext jar?要么删除它,要么添加相关的EclipseLink jars。


是的...谢谢...实际上我已经在docx4j-3.2.1下包含了所有库,还有可选库...这就是问题所在... - pinkpanther

0

这对我有效,你也试试吧

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.OutputStream;

import org.docx4j.Docx4J;
import org.docx4j.Docx4jProperties;
import org.docx4j.convert.out.HTMLSettings;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;

public class Test {
    public static void main(String[] args) throws Docx4JException,
            FileNotFoundException {
        String inputfilepath = "c:/file.docx";


        WordprocessingMLPackage wordMLPackage = Docx4J
                .load(new FileInputStream(inputfilepath));

        // HTML exporter setup (required)
        //.. the HTMLSettings object
        HTMLSettings htmlSettings = Docx4J.createHTMLSettings();

        htmlSettings.setImageDirPath(inputfilepath + "_files");
        htmlSettings.setImageTargetUri(inputfilepath.substring(inputfilepath
                .lastIndexOf("/") + 1) + "_files");
        htmlSettings.setWmlPackage(wordMLPackage);

        OutputStream os = new FileOutputStream(inputfilepath + ".html");

        // If you want XHTML output
        Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);

        //Prefer the exporter, that uses a xsl transformation
        Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

    }

}

我没有看到任何区别...你能告诉我你是如何配置你的项目的吗? - pinkpanther
我刚刚添加了这个 docx4j-3.2.1 和所有的依赖包。 - Joe Doe

0

确保您拥有所有正确的依赖项(包括适当的JAXB运行时)

implementation 'org.docx4j:docx4j-core:11.4.7'
implementation 'org.docx4j:docx4j-MOXy-JAXBContext:6.0.0'
implementation 'org.docx4j:docx4j-export-fo:11.4.7'
implementation 'org.docx4j:docx4j-JAXB-Internal:8.3.8'
implementation 'org.docx4j:docx4j-JAXB-ReferenceImpl:11.4.7'
implementation 'org.docx4j:docx4j-JAXB-MOXy:11.4.7'
implementation 'jakarta.xml.bind:jakarta.xml.bind-api:4.0.0'
implementation 'org.glassfish.jaxb:jaxb-runtime:4.0.0'
implementation 'jakarta.activation:jakarta.activation-api:2.1.0'

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接