在Python XPath查询中注册命名空间

Question

在Python XPath查询中注册命名空间

3

这是我拥有的XML文档：

<products xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"></Attribute>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
  </Product>
  <Product Id="2">
    <Attributes xmlns="http://some/path/to/entity/def">
      <Attribute Name="Identifier">NumberTwo</Attribute>
    </Attributes>
  </Product>
</products>

我试图使用XPath通过其子元素Attributes.Attribute[Name=Identifier]的值（例如"NumberOne"）获取产品。因此，在这种情况下，我期望得到以下结果：

<Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"></Attribute>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
</Product>

根据这篇解释，我尝试使用lxml库在Python中实现查询：

found_products = xml_tree_from_string.xpath('//products//Product[c:Attributes[Attribute[@Name="Identifier" and text()="NumberOne"]]]', namespaces={"c": "http://some/path/to/entity/def"})

很遗憾，由于属性命名空间的定义，这里永远不会返回结果。

我错过了什么？

- user2549803

2个回答

1

首先需要定义一个命名空间映射，为那些没有前缀的命名空间声明一个前缀（就像这里的情况一样），然后应用xpath：

from lxml import etree
prods ="""[your xml above]"""
ns = { (k if k else "xx"):(v) for k, v in doc.xpath('//namespace::*') } #create ns map
doc = etree.XML(prods)
for product in doc.xpath('//products//Product[.//xx:Attribute[@Name="Identifier"][text()="NumberOne"]]', namespaces=ns):
    print(etree.tostring(product).decode())

输出：

<Product xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"/>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
  </Product>

为了抑制名称空间属性，请将for循环更改为：

for product in doc.xpath('//products//Product[.//xx:Attribute[@Name="Identifier"][text()="NumberOne"]]', namespaces=ns):
    etree.cleanup_namespaces(doc) #note: the parameter is "doc", not "product"
    print(etree.tostring(product).decode())

输出：

<Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"/>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
  </Product>

- Jack Fleeting

非常感谢。在原始的XML文件中，是否可以保留顶级产品定义而不包含xmlns定义呢？有趣的是 - 如果我也将c:前缀添加到属性中，那么我的初始方法也可以起作用。 - user2549803

@user2549803 是的，这是可能的。请查看编辑。 - Jack Fleeting

@JackFleeting：像这样动态创建命名空间前缀映射在大多数情况下都是过度的，包括此情况，并需要更复杂的方法来考虑XML层次结构中不同默认命名空间的可能性。 - kjhughes

哦，我并不是建议你打败命名空间 - 只是让你直接使用命名空间前缀机制，而不是使用你现有的部分通用生成代码。至少要说明它的局限性，以便未来的读者不会惊讶于它并不像看起来那么通用。但实际上，我只是减少通用性并修复OP的XPath，将c:包括在Attribute的后代元素中，并完成调用。请随意从我的下面的答案中提取并根据需要进行详细说明。 - kjhughes

@JackFleeting：我理解的对吗，你是说有一种基于local-name()的XPath查询的等效解决方案？出于好奇 - 它会是什么样子？也许有一天会派上用场 :) - user2549803

显示剩余2条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- kjhughes · Accepted Answer

我错过了什么？

你忽略了Attribute也在同一个命名空间中，因为默认的命名空间声明会被XML子元素继承。

所以，在你的XPath中添加c:到Attribute，它应该像你在评论中观察到的Jack的答案一样工作。