当在我的Storm集群中读取AWS SQS队列时,是什么导致了这些ParseError异常?

24

我正在使用Storm 0.8.1从Amazon SQS队列中读取传入的消息,但在这样做时会出现一致的异常:

2013-12-02 02:21:38 executor [ERROR] 
java.lang.RuntimeException: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)
        at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:219)
        at REDACTED.spouts.SqsQueueSpout.nextTuple(SqsQueueSpout.java:88)
        at backtype.storm.daemon.executor$fn__3976$fn__4017$fn__4018.invoke(executor.clj:447)
        at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Thread.java:701)
Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:524)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:298)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167)
        at com.amazonaws.services.sqs.AmazonSQSClient.invoke(AmazonSQSClient.java:812)
        at com.amazonaws.services.sqs.AmazonSQSClient.receiveMessage(AmazonSQSClient.java:575)
        at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:191)
        ... 5 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.setInputSource(XMLStreamReaderImpl.java:219)
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.<init>(XMLStreamReaderImpl.java:189)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.getXMLStreamReaderImpl(XMLInputFactoryImpl.java:277)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(XMLInputFactoryImpl.java:129)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLEventReader(XMLInputFactoryImpl.java:78)
        at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:85)
        at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:41)
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:503)
        ... 10 more

我已经调试了队列上的数据,一切看起来都很正常。我无法弄清楚为什么API的XML响应会导致这些问题。有什么想法吗?

1个回答

49

为了永久记录,我自己回答自己的问题。

目前在Oracle和OpenJDK的Java中存在一个XML扩展限制处理bug,当解析多个XML文档时,会导致共享计数器击中默认的上限。

  1. https://blogs.oracle.com/joew/entry/jdk_7u45_aws_issue_123
  2. https://bugs.openjdk.java.net/browse/JDK-8028111
  3. https://github.com/aws/aws-sdk-java/issues/123

虽然我认为我们的版本(6b27-1.12.6-1ubuntu0.12.04.4)没有受到影响,但运行OpenJDK bug报告中提供的示例代码确实验证了我们容易受到这个bug的影响。

为了解决这个问题,我需要将 jdk.xml.entityExpansionLimit=0 传递给Storm worker。通过在我的集群中添加以下内容到storm.yaml,我能够缓解这个问题。

supervisor.childopts: "-Djdk.xml.entityExpansionLimit=0"
worker.childopts: "-Djdk.xml.entityExpansionLimit=0"

我应该指出,这从技术上打开了你的漏洞,可能会遭受拒绝服务(DoS)攻击,但由于我们的XML文档只来自SQS,所以我不担心有人伪造恶意XML以杀死我们的工作者。


可能还有其他问题。我使用Java6时遇到了相同的错误。我的机器上没有安装Java7。 - BrianC
顺便说一下,这是一篇非常出色的文章。 - BrianC
没关系。发现这也影响Java5、6、7和8的特定版本。请参阅以下详细信息: https://bugs.openjdk.java.net/browse/JDK-8028111 - BrianC
2
大家好,我通过在程序中添加一行代码 'System.setProperty("jdk.xml.entityExpansionLimit", "0");' 来解决了这个问题。奇怪的是,当使用从命令行传递给程序的参数时,问题仍然存在。 - Kros
根据Java文档,将“jdk.xml.entityExpansionLimit”属性的值设置为“0”将作为无限制。您应该使用负值来禁用实体扩展。https://docs.oracle.com/javase/tutorial/jaxp/limits/limits.html - wallE

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接