org.xml.sax.SAXParseException: Content is not allowed in prolog XML解析异常：前置内容不允许存在

Question

org.xml.sax.SAXParseException: Content is not allowed in prolog XML解析异常：前置内容不允许存在

207

我有一个连接到基于Java的Web服务（在Axis1框架上实现）的Java Web服务客户端。

我在日志文件中得到以下异常：

Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
    at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
    at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at javax.xml.parsers.SAXParser.parse(Unknown Source)
    at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227)
    at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:696)
    at org.apache.axis.Message.getSOAPEnvelope(Message.java:435)
    at org.apache.ws.axis.security.WSDoAllReceiver.invoke(WSDoAllReceiver.java:114)
    at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
    at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
    at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
    at org.apache.axis.client.AxisClient.invoke(AxisClient.java:198)
    at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
    at org.apache.axis.client.Call.invoke(Call.java:2767)
    at org.apache.axis.client.Call.invoke(Call.java:2443)
    at org.apache.axis.client.Call.invoke(Call.java:2366)
    at org.apache.axis.client.Call.invoke(Call.java:1812)

- ag112

12

请给我们展示你试图解析的XML文件。只需展示前几行即可，谢谢。 - Stephen C

1

谢谢Stephen，我正在尝试从AXIS框架中检索XML请求并在此处粘贴。因此，对以上错误的一般理解是XML格式不正确。 - ag112

1

我遇到了这个问题，因为我试图转换XML文件的字符串名称，而不是将XML文件作为字符串进行转换！ :P - Gaʀʀʏ

1

Notepad++和更改编码对我来说都很好用！ - Guilherme

32个回答

40

实际上，除了Yuriy Zubarev的帖子之外

当您将不存在的xml文件传递给解析器时，例如您传递

new File("C:/temp/abc")

当你的文件系统中只存在 C:/temp/abc.xml 文件时

在任何情况下

builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
document = builder.parse(new File("C:/temp/abc"));

或者

DOMParser parser = new DOMParser();
parser.parse("file:C:/temp/abc");

所有的代码都会显示相同的错误信息。

这是一个非常令人失望的错误，因为以下的跟踪信息:

javax.servlet.ServletException
    at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
...
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
... 40 more

并没有提及“文件名不正确”或“这样的文件不存在”的事实。在我的案例中，我的XML文件完全正确，但还是花了我两天时间才确定真正的问题所在。

- Egor

1

同样的，尝试解析目录而不是文件名也是一样的。顺便说一下。 - rogerdpack

1

完全同意 @Gewure :) 那是一篇来自2012年的古老帖子，我甚至都忘记了，但确实如此。 - Egor

4

当您的文件路径中含有特殊符号时，比如：C:#MyFolder\My.XML虽然该文件存在，但是“#”符号会导致 XML 解析器出现问题... Java 和微软 Windows 本身并不会对该文件夹名称产生问题...异常信息处理非常糟糕... - Alex

1

这是我遇到的一个类似问题。我花了几个小时试图理解问题所在，甚至没有想到可能是参数格式不正确。 - Balázs Börcsök

1

我必须构建此项目。显然，我的文件位置没有从 src 中获取文件，而是在目标文件夹中查找。 - Viktor Reinok

显示剩余2条评论

28

尝试在文档开始处的 prolog 与终止的 ?> 字符串之间添加一个空格。在 XML 中，prolog 指的是位于文档开头的这个以括号和问号作为定界符的元素（而 stackoverflow 中的 tag prolog 是指编程语言）。

补充：你的 prolog 前面那个破折号是否是文档的一部分？因为这就是错误所在，也就是在 prolog 的前面出现了数据：-<?xml version="1.0" encoding="UTF-8"?>。

- hardmath

2

我发现有些XML解析器即使XML prolog包含空格也会抛出此异常，因此我认为值得检查<?xml ver...之前是否有任何内容。 - user206428

15

我在尝试使用freemarker解析XML文档时遇到了同样的问题（并解决了它）。

在XML文件头之前没有空格。

问题发生在仅当文件编码和XML编码属性不同时（例如：UTF-8文件与标题中的UTF-16属性）。

因此，我有两种解决问题的方法：

更改文件本身的编码
将标题中的UTF-16更改为UTF-8

- user2575850

2

我猜通常情况下，解析器收到关于字符编码的冲突信息可能会导致这个问题。 - Raedwald

1

这个回答已经有一段时间了，但是在2021年对我有效。我是一个用户，在Jenkins管道中进行Pester测试，并不断收到“prolog中的内容”错误。我发现JUnit结果文件是UTF16格式的，而我出于习惯使用UTF8进行Out-File。当我改为UTF-16时，它就起作用了。

Invoke-Pester -Script resources/*.Tests.ps1 -PassThru | ConvertTo-JUnitReport -AsString | Out-File -Encoding utf-16 .\results.xml

- Max Cascone

13

这意味着XML格式不正确或响应体根本不是XML文档。

- Yuriy Zubarev

我检查了一下，看起来XML格式正确。这里是快照：-<?xml version="1.0" encoding="UTF-8"?> <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="" rel = "nofollow noreferrer">http://www.w3.org/2001/XMLSchema-instance"> soapenv:Header <wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" soapenv:mustUnderstand="1">.... </wsse:Security> </soapenv:Header>soapenv:Body.XX..</soapenv:Body></soapenv:Envelope> - ag112

2

是的，如果前面有破折号，它会破坏XML。 - Yuriy Zubarev

是的，我不小心添加了一个字母，导致XML无效并引发了错误。谢谢！ - Edwin Krause

9

我刚花了4个小时追踪一个类似的WSDL问题。原来这个WSDL使用了一个导入另一个命名空间XSD的XSD。而这个被导入的XSD包含以下内容：

<?xml version="1.0" encoding="UTF-8"?>
<schema targetNamespace="http://www.xyz.com/Services/CommonTypes" elementFormDefault="qualified"
    xmlns="http://www.w3.org/2001/XMLSchema" 
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:CommonTypes="http://www.xyz.com/Services/CommonTypes">

 <include schemaLocation=""></include>  
    <complexType name="RequestType">
        <....

请注意空的include元素！这是我的困境的根源。我想这是对Egor上面的文件未找到问题的一种变化。

+1 对令人失望的错误报告。

- colin_froggatt

6

我的回答可能对你没有直接帮助，但是可以通常地解决这个问题。

当你看到这种异常时，你应该尝试在任何十六进制编辑器中打开你的xml文件，有时你会看到文件开头有额外的字节，而文本编辑器不会显示。

删除它们，你的xml就能被解析了。

- Igor Kustov

5

在我的情况下，完全移除 'encoding="UTF-8"' 属性可以解决问题。

看起来像是字符集编码问题，可能是因为你的文件并不是真正的UTF-8编码。

- Jerome Louvel

4

为解决Unix / Linux系统上的BOM问题：

检查是否存在不必要的BOM字符: hexdump -C myfile.xml | more 如果存在不必要的BOM字符，则在文件开头会出现...<?xml>
或者，执行file myfile.xml。拥有BOM字符的文件将显示为：myfile.xml: XML 1.0 document text, UTF-8 Unicode (with BOM) text
使用以下命令修复单个文件：tail -c +4 myfile.xml > temp.xml && mv temp.xml myfile.xml
重复步骤1或2以检查文件是否已经过清理。还应该执行view myfile.xml检查内容是否保持不变。

下面是一个Bash脚本，可用于清理整个XML文件夹：

#!/usr/bin/env bash

# This script is to sanitise XML files to remove any BOM characters

has_bom() { head -c3 "$1" | LC_ALL=C grep -qe '\xef\xbb\xbf'; }

for filename in *.xml ; do
  if has_bom ${filename}; then
    tail -c +4 ${filename} > temp.xml
    mv temp.xml ${filename}
  fi
done

- Lydia Ralph

4

针对同样的问题，我已经移除了以下行：

  File file = new File("c:\\file.xml");
  InputStream inputStream= new FileInputStream(file);
  Reader reader = new InputStreamReader(inputStream,"UTF-8");
  InputSource is = new InputSource(reader);
  is.setEncoding("UTF-8");

它正常工作。不太确定为什么UTF-8会出现问题。让我感到惊讶的是，UTF-8也可以正常工作。

我使用Windows-7 32位和带有Java *jdk1.6.0_13*的Netbeans IDE。不知道它是如何工作的。

- Dineshkumar

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mike Sokolov · Accepted Answer

这通常是由于XML声明之前存在空格导致的，但可能是任何文本，如短划线或任何字符。我说通常是由于空格引起的，因为人们认为空格总是可以忽略的，但在这里不是这种情况。

另一件经常发生的事情是UTF-8 BOM（字节顺序标记），如果将文档作为字符流而不是字节流传递给XML解析器，则可以在XML声明之前放置它，可以将其视为空格。

如果使用模式文件（.xsd）验证xml文件并且其中一个模式文件具有UTF-8 BOM，则可能会发生相同的情况。