如何在 Python 中从 XML 文件中读取数据

Question

如何在 Python 中从 XML 文件中读取数据

3

我有以下 XML 文件数据:

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<rootnode>
  <TExportCarcass>
    <BodyNum>6168</BodyNum>
    <BodyWeight>331.40</BodyWeight>
    <UnitID>1</UnitID>
    <Plant>239</Plant>
    <pieces>
      <TExportCarcassPiece index="0">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
      <TExportCarcassPiece index="1">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
    </pieces>
  </TExportCarcass>
  <TExportCarcass>
    <BodyNum>6169</BodyNum>
    <BodyWeight>334.40</BodyWeight>
    <UnitID>1</UnitID>
    <Plant>278</Plant>
    <pieces>
      <TExportCarcassPiece index="0">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
      <TExportCarcassPiece index="1">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
    </pieces>
  </TExportCarcass>
</rootnode>

我将使用Python的lxml模块读取以下XML文件中的数据：

from lxml import etree

doc = etree.parse('file.xml')

memoryElem = doc.find('BodyNum')
print(memoryElem)

但它只打印None而不是6168。请建议我在这里做错了什么。

- S Andrew

5个回答

2

1 - 使用/来指定要提取的元素所在的树级别。

2 - 使用.text来提取元素的名称。

doc = etree.parse('file.xml')
memoryElem = doc.find("*/BodyNum") #BodyNum is one level down
print(memoryElem.text)  #Specify you want to extract the name of the element

- O Yahya

2

你需要迭代每个 TExportCarcass 标签，然后使用 find 来访问 BodyNum。 例子：

from lxml import etree

doc = etree.parse('file.xml')
for elem in doc.findall('TExportCarcass'):
    print(elem.find("BodyNum").text)

输出：

6168
6169

or

print([i.text for i in doc.findall('TExportCarcass/BodyNum')]) #-->['6168', '6169']

- Rakesh

0

只需使用 Python 的内置模块 xml.etree.Etree 即可。

https://docs.python.org/3/library/xml.etree.elementtree.html

- Faizan Naseer

0

您的文档包含多个BodyNum元素。
如果您只需要第一个元素，则需要在查询中设置明确的限制。

使用基于xpath查询的以下灵活方法：

from lxml import etree

doc = etree.parse('file.xml')
memoryElem = doc.xpath('(//BodyNum)[1]/text()')
print(memoryElem)   # ['6168']

- RomanPerekhrest

能否获取 TExportCarcass 的数量？ - S Andrew

当然，谢谢。我认为我们可以使用评论部分来请求额外的信息。 - S Andrew

@SAndrew，你确定这种方法值得被愚蠢地 downvoted 吗？ - RomanPerekhrest

1

这也是一个有效的答案。不确定为什么它被投票否决了。 - moebius

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- moebius · Accepted Answer

当你在文本字符串上运行find时，它只会搜索根级别的元素。你可以使用xpath查询在find内部来搜索文档中的任何元素:

仅获取第一个元素:

from lxml import etree
doc = etree.parse('file.xml')

memoryElem = doc.find('.//BodyNum')
memoryElem.text
# 6168

获取所有元素：

要获取所有元素：

[ b.text for b in doc.iterfind('.//BodyNum') ]
# ['6168', '6169']