使用Python中的ElementTree发布命名空间规范

44

我试图使用element-tree生成一个包含XML声明和命名空间的XML文件。以下是我的样例代码:

from xml.etree import ElementTree as ET
ET.register_namespace('com',"http://www.company.com") #some name

# build a tree structure
root = ET.Element("STUFF")
body = ET.SubElement(root, "MORE_STUFF")
body.text = "STUFF EVERYWHERE!"

# wrap it in an ElementTree instance, and save as XML
tree = ET.ElementTree(root)

tree.write("page.xml",
           xml_declaration=True,
           method="xml" )

但是,<?xml标签和任何命名空间/前缀信息都没有出现。在这里我有些困惑。

2个回答

51
尽管文档中说不需要,但我只能通过同时指定xml_declaration和encoding来获得一个声明。
您必须在注册的命名空间中声明节点,以便在文件中的节点上获取该命名空间。以下是您代码的修正版本:
from xml.etree import ElementTree as ET
ET.register_namespace('com',"http://www.company.com") #some name

# build a tree structure
root = ET.Element("{http://www.company.com}STUFF")
body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF")
body.text = "STUFF EVERYWHERE!"

# wrap it in an ElementTree instance, and save as XML
tree = ET.ElementTree(root)

tree.write("page.xml",
           xml_declaration=True,encoding='utf-8',
           method="xml")

输出 (page.xml)

<?xml version='1.0' encoding='utf-8'?><com:STUFF xmlns:com="http://www.company.com"><com:MORE_STUFF>STUFF EVERYWHERE!</com:MORE_STUFF></com:STUFF>

ElementTree也不支持美化打印。这是美化后的输出:

<?xml version='1.0' encoding='utf-8'?>
<com:STUFF xmlns:com="http://www.company.com">
    <com:MORE_STUFF>STUFF EVERYWHERE!</com:MORE_STUFF>
</com:STUFF>
你也可以声明一个默认命名空间,而不需要注册它:
from xml.etree import ElementTree as ET

# build a tree structure
root = ET.Element("{http://www.company.com}STUFF")
body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF")
body.text = "STUFF EVERYWHERE!"

# wrap it in an ElementTree instance, and save as XML
tree = ET.ElementTree(root)

tree.write("page.xml",
           xml_declaration=True,encoding='utf-8',
           method="xml",default_namespace='http://www.company.com')

输出(美化间距为我所创造的)

<?xml version='1.0' encoding='utf-8'?>
<STUFF xmlns="http://www.company.com">
    <MORE_STUFF>STUFF EVERYWHERE!</MORE_STUFF>
</STUFF>

你如何在这里使用字符串格式化?例如f字符串?或者甚至.format()? - Kenan

9
我从未成功地以编程方式将<?xml标签从元素树库中删除,所以我建议您尝试类似以下的方法。
from xml.etree import ElementTree as ET
root = ET.Element("STUFF")
root.set('com','http://www.company.com')
body = ET.SubElement(root, "MORE_STUFF")
body.text = "STUFF EVERYWHERE!"

f = open('page.xml', 'w')
f.write('<?xml version="1.0" encoding="UTF-8"?>' + ET.tostring(root))
f.close()

非标准库的Python ElementTree实现可能有不同的方法来指定命名空间,因此如果您决定使用lxml,声明这些命名空间的方式将会有所不同。

你可以使用ET.tostring(ET.fromstring(xml)).decode()来去掉编码标签。 - Nehemias Herrera

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接