如何在Python中从XML中删除命名空间？

Question

如何在Python中从XML中删除命名空间？

3

我有一个像这样的xml：

<?xml version="1.0" encoding="UTF-8"?>
<ns0:epp xmlns:ns0="urn:ietf:params:xml:ns:epp-1.0" 
 xmlns:ns1="http://epp.nic.ir/ns/contact-1.0">
   <ns0:command>
      <ns0:check>
         <ns1:check>
            <ns1:id>ex61-irnic</ns1:id>
            <ns1:id>ex999-irnic</ns1:id>
            <ns1:authInfo>
               <ns1:pw>1487441516170712</ns1:pw>
            </ns1:authInfo>
         </ns1:check>
      </ns0:check>
      <ns0:clTRID>TEST-12345</ns0:clTRID>
   </ns0:command>
</ns0:epp>

我希望用Python 3将它改成这样：

<?xml version="1.0" encoding="UTF-8"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0">
   <command>
      <check>
         <check>
            <id>ex61-irnic</id>
            <id>ex999-irnic</id>
            <authInfo>
               <pw>1487441516170712</pw>
            </authInfo>
         </check>
      </check>
      <clTRID>TEST-12345</clTRID>
   </command>
</epp>

我尝试使用lxml模块中的objectify.deannotate来删除ns，但它没有起作用。你能帮我实现我的目标吗？

- mahshid.r

只需进行简单的查找和替换/删除字符串“ns1:”或“ns0:”。 - AK47

2

如果字符串“ns1：”被包含在命名空间说明符以外的其他地方，这当然会破坏事情。 - larsks

2个回答

3

这是Python中使用lxml删除XML命名空间和前缀的结合体，其中展示了如何修改元素的命名空间，以及lxml：向输入文件添加命名空间，其中展示了如何重置顶层命名空间映射。

代码有些hacky（我特别怀疑是否可以使用_setroot方法），但似乎能够正常工作：

from lxml import etree

inputfile = 'data.xml'
target_ns = 'urn:ietf:params:xml:ns:epp-1.0'
nsmap = {None: target_ns}

tree = etree.parse(inputfile)
root = tree.getroot()

# here we set the namespace of all elements to target_ns
for elem in root.getiterator():
    tag = etree.QName(elem.tag)
    elem.tag = '{%s}%s' % (target_ns, tag.localname)

# create a new root element and set the namespace map, then
# copy over all the child elements    
new_root = etree.Element(root.tag, nsmap=nsmap)
new_root[:] = root[:]

# create a new elementtree with new_root so that we can use the
# .write method.
tree = etree.ElementTree()
tree._setroot(new_root)

tree.write('done.xml',
           pretty_print=True, xml_declaration=True, encoding='UTF-8')

根据您的示例输入，这将生成 done.xml 文件：

<?xml version='1.0' encoding='UTF-8'?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0"><command>
      <check>
         <check>
            <id>ex61-irnic</id>
            <id>ex999-irnic</id>
            <authInfo>
               <pw>1487441516170712</pw>
            </authInfo>
         </check>
      </check>
      <clTRID>TEST-12345</clTRID>
   </command>
</epp>

- larsks

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Parfait · Accepted Answer

考虑 XSLT，这是一种专门用于转换XML文件（例如去除名称空间）的特殊语言。Python 的第三方模块 lxml 可以运行 XSLT 1.0 脚本。由于 XSLT 脚本本身就是 XML 文件，所以您可以像解析任何 XML 一样从文件或字符串中解析它们，而无需使用循环或条件 if 逻辑。此外，您还可以在其他语言（如 PHP、Java、C# 等）中使用此 XSLT 脚本。 XSLT（保存为 .xsl 文件以在 Python 中引用）

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <!-- IDENTITY TRANSFROM: COPY DOC AS IS -->
  <xsl:template match="@*|node()">
    <xsl:copy>    
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- REMOVE NAMESPACE PREFIXES, ADD DOC NAMESPACE -->
  <xsl:template match="*">
    <xsl:element name="{local-name()}" namespace="urn:ietf:params:xml:ns:epp-1.0">    
      <xsl:apply-templates select="@*|node()"/>
    </xsl:element>
  </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et

# LOAD XML AND XSL
doc = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')

# CONFIGURE AND RUN TRANSFORMER
transform = et.XSLT(xsl)    
result = transform(doc)

# OUTPUT RESULT TREE TO FILE
with open('Output.xml', 'wb') as f:
    f.write(result)

输出

<?xml version="1.0"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0">
  <command>
    <check>
      <check>
        <id>ex61-irnic</id>
        <id>ex999-irnic</id>
        <authInfo>
          <pw>1487441516170712</pw>
        </authInfo>
      </check>
    </check>
    <clTRID>TEST-12345</clTRID>
  </command>
</epp>