使用Python获取XML节点的所有父节点

6

对于这个 XML

<Departments orgID="123" name="xmllist">
    <Department>
        <orgID>124</orgID>
        <name>A</name>
        <type>type a</type>
        <status>Active</status>
            <Department>
                <orgID>125</orgID>
                <name>B</name>
                <type>type b</type>
                <status>Active</status>
                <Department>
                    <orgID>126</orgID>
                    <name>C</name>
                    <type>type c</type>
                    <status>Active</status>
                </Department>
            </Department>
    </Department>
    <Department>
        <orgID>109449</orgID>
        <name>D</name>
        <type>type d</type>
        <status>Active</status>
    </Department>
</Departments>

如何使用Python的lxml etree获取节点的所有父节点。

期望输出:输入orgid=126,将返回所有父节点,例如:

{'A':124,'B':125,'C':126}
2个回答

7

使用 lxml 和 XPath:

>>> s = '''
... <Departments orgID="123" name="xmllist">
...     <Department>
...         <orgID>124</orgID>
...         <name>A</name>
...         <type>type a</type>
...         <status>Active</status>
...             <Department>
...                 <orgID>125</orgID>
...                 <name>B</name>
...                 <type>type b</type>
...                 <status>Active</status>
...                 <Department>
...                     <orgID>126</orgID>
...                     <name>C</name>
...                     <type>type c</type>
...                     <status>Active</status>
...                 </Department>
...             </Department>
...     </Department>
...     <Department>
...         <orgID>109449</orgID>
...         <name>D</name>
...         <type>type d</type>
...         <status>Active</status>
...     </Department>
... </Departments>
... '''

使用ancestor-or-self轴,您可以找到节点本身、父节点、祖父节点等。
>>> import lxml.etree as ET
>>> root = ET.fromstring(s)
>>> for target in root.xpath('.//Department/orgID[text()="126"]'):
...     d = {
...         dept.find('name').text: int(dept.find('orgID').text)
...         for dept in target.xpath('ancestor-or-self::Department')
...     }
...     print(d)
...
{'A': 124, 'C': 126, 'B': 125}

谢谢,如果我想在d中包含orgID=123和name=xmllist怎么办? - Nishant Nawarkhede
1
@Nishant,在print语句之前,将for depts in target.xpath('ancestor-or-self::Departments'): d[depts.get('name')] = depts.get('orgID')添加到代码中。 - falsetru
谢谢,但输出似乎是无序的,有没有什么方法可以使它有序?在这里我们得到了 {'A': 124, 'C': 126, 'B': 125},我们能不能得到像 {'A': 124, 'B': 125 ,'C': 126} 这样的结果呢? - Nishant Nawarkhede
@Nishant,dict本身是无序的数据结构。如果您想保持顺序,请使用collection.OrderedDict。或者,如果您不需要使用类似字典的容器,则可以使用list... - falsetru

5
使用lxml的iterancestors()方法。
from lxml import etree

doc = etree.fromstring(xml)
rval = {}
for org in doc.xpath('//orgID[text()="126"]'):
    for ancestor in org.iterancestors('Department'):
        id=ancestor.find('./orgID').text
        name=ancestor.find('./name').text
        rval[name]=id

print rval 

输出:

{'A': '124', 'C': '126', 'B': '125'}

如果您真正想保留元素的顺序,那么您不能使用字典,因为您无法控制字典中键的顺序。您将需要使用OrderedDict或仅使用元组数组:

doc = etree.fromstring(xml)
a = []
for org in doc.xpath('//orgID[text()="126"]'):
    for ancestor in org.iterancestors():
        if ancestor.find('./orgID') is not None:
            id=ancestor.find('./orgID').text
            name=ancestor.find('./name').text
        elif ancestor.get('orgID'):
            id=ancestor.get('orgID')
            name=ancestor.get('name')
        else:
            continue

        print id,name
        a.append((name,id))

print "In order of discovery:\n    ", a 
print "From root to child\n    ", [x for x in reversed(a)]
print "dict keys are not sorted\n    ", dict(a)

输出:

126 C
125 B
124 A
123 xmllist
In order of discovery:
     [('C', '126'), ('B', '125'), ('A', '124'), ('xmllist', '123')]
From root to child
     [('xmllist', '123'), ('A', '124'), ('B', '125'), ('C', '126')]
dict keys are not sorted
     {'A': '124', 'xmllist': '123', 'C': '126', 'B': '125'}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接