Python XML 绝对路径

3

如何使用Python打印/转储XML文档的“绝对路径”和值?

例如:

<A>
  <B>foo</B>
  <C>
    <D>On</D>
  </C>
  <E>Auto</E>
  <F>
    <G>
      <H>shoo</H>
      <I>Off</I>
    </G>
  </F>
</A>

to

/A/B, foo
/A/C/D, On
/A/E, Auto
/A/F/G/H, shoo
/A/F/G/I, Off

1
你想遍历所有文本节点并打印它们的祖先和值吗? - Chris Morgan
是的,这可能是一个更好的说法 :) - kristus
4个回答

2
from lxml import etree
root = etree.XML(your_xml_string)

def print_path_of_elems(elem, elem_path=""):
    for child in elem:
        if not child.getchildren() and child.text:
            # leaf node with text => print
            print "%s/%s, %s" % (elem_path, child.tag, child.text)
        else:
            # node with child elements => recurse
            print_path_of_elems(child, "%s/%s" % (elem_path, child.tag))

print_path_of_elems(root, root.tag)

2
另一种方法是这样的:
from lxml import etree

XMLDoc = etree.parse(open('file.xml'))

for Node in XMLDoc.xpath('//*'):
    if not Node.getchildren() and Node.text:
        print XMLDoc.getpath(Node), Node.text

根据您的文档结构,您可能会在xpath中获得节点编号,您可能需要将其删除。


0

你可以尝试使用类似这样的代码:

from xml.etree.ElementTree import ElementTree

tree = ElementTree()
tree.parse(open('file.xml'))
root = tree.getroot()

def print_abs_path(root, path=None):
    if path is None:
        path = [root.tag]

    for child in root:
        text = child.text.strip()
        new_path = path[:]
        new_path.append(child.tag)
        if text:
            print '/{0}, {1}'.format('/'.join(new_path), text)
        print_abs_path(child, new_path)

print_abs_path(root)

0

完全低效的xpath解决方案:

>>> from lxml import etree
>>> tree = etree.fromstring("""
... <A>
...   <B>foo</B>
...   <C>
...     <D>On</D>
...   </C>
...   <E>Auto</E>
...   <F>
...     <G>
...       <H>shoo</H>
...       <I>Off</I>
...     </G>
...   </F>
... </A>
... """)
>>> for node in tree.xpath('//*[normalize-space(text())]'):
...     print '/%s, %s' % (
...         '/'.join(a.tag for a in node.xpath('.//ancestor::*')), node.text)
... 
/A/B, foo
/A/C/D, On
/A/E, Auto
/A/F/G/H, shoo
/A/F/G/I, Off

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接