使用Python中的lxml.etree处理大型XML文件

22

我想使用Python中的解析一个超过200MB的大型XML文件。我尝试使用加载XML文件,但由于文件大小,这并不起作用:

etree.parse('file.xml')Traceback (most recent call last):
File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958)
  File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797)
  File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080)
  File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175)
  File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173)
  File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257)
  File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178)
  File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64521)
lxml.etree.XMLSyntaxError: Excessive depth in document: 256 use XML_PARSE_HUGE option, line 1276, column 7

因为我想要使用xpath表达式,所以我必须先解析文件。那么我该如何解析XML文件呢?我如何在使用时将XML_PARSE_HUGE连接起来?

谢谢!

1个回答

47

尝试创建自定义的 XMLParser 实例:

from lxml.etree import XMLParser, parse
p = XMLParser(huge_tree=True)
tree = parse('file.xml', parser=p)

7
如果你遇到了这个错误:"python XMLSyntaxError: internal error: Huge input lookup",这个解决方案同样适用! - ospider

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接