如何在使用 DocumentBuilder.parse 解析格式良好的 XML 时关闭验证?

8

我正在使用Java 6。我想解析已知格式正确的XHTML文档。因此,我不想对文档中引用的DTD或其他模式进行任何验证。但是,我无法弄清如何关闭验证。我有以下代码:

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(false);
    final DocumentBuilder b = factory.newDocumentBuilder();
    final InputSource s = new InputSource(new StringReader(str));
    org.w3c.dom.Document result = b.parse(s);

但是最后一行仍然会出现异常...
java.net.SocketException: Unexpected end of file from server
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:777)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:774)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:677)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1315)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1282)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:283)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1194)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1090)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1003)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
    at com.myco.myproj.util.XmlUtilities.getStringAsDocument(XmlUtilities.java:130)
    at com.myco.myproj.util.NetUtilities.getUrlAsDocument(NetUtilities.java:30)
    at com.myco.myproj.parsers.impl.AbstractChicagoReaderParser.parsePage(AbstractChicagoReaderParser.java:144)
    at com.myco.myproj.parsers.impl.AbstractChicagoReaderParser.getEvents(AbstractChicagoReaderParser.java:112)
    at com.myco.myproj.parsers.impl.ChicagoReaderParserTest.testParser(ChicagoReaderParserTest.java:29)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

我不希望我的解析器连接到互联网。如何禁用它?谢谢。- Dave 编辑:根据Traroth的建议,我尝试了下面的代码,但是仍然遇到同样的异常。
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(false);
    final DocumentBuilder builder = factory.newDocumentBuilder();
    builder.setEntityResolver(new EntityResolver() {
        @Override
            public InputSource resolveEntity(String publicId, String systemId) {
                    return null;
            }
        });
    final InputSource s = new InputSource(new StringReader(str));
    org.w3c.dom.Document result = builder.parse(s);

如果您显示或记录DocumentBuilderFactory的isValidating(),会发生什么? - Alexis Dufrenoy
2个回答

5

以下是创建一个DocumentBuilder的方法,它将忽略所有外部引用实体,包括DTD:

final DocumentBuilder builder = factory.newDocumentBuilder();
builder.setEntityResolver(new EntityResolver() {
    @Override
        public InputSource resolveEntity(String publicId, String systemId) {
                // it might be a good idea to insert a trace logging here that you are ignoring publicId/systemId
                return new InputSource(new StringReader("")); // Returns a valid dummy source
        }
    });

3

我猜你在谈论那个帖子中的被采纳的答案。我尝试了建议(我的代码包含在编辑中),但我仍然得到相同的异常。我有什么遗漏吗? - Dave
1
@Dave - 是的。返回 null 与不应用实体解析器相同。您需要遵循另一条路径,返回一个输入源。只需将 "foo.dtd" 更改为 XML 文件的 DTD 的系统 ID 即可。 - Alohci
是的,如果您查看答案中的示例代码,您将看到 resolveEntity() 方法返回一个实际的非空 InputSource,如果 XML 引用了 DTD,则返回 null。 - Alexis Dufrenoy
我理解你用例子所说的内容。但我的问题是,我不想进行任何验证/ DTD 解析,我只想得到一堆元素。我能让解析器忽略任何DTD引用吗? - Dave

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接