在Python中,美化XML的最佳方法是什么(或者有哪些方法)?
如果您不想重新解析,可以使用xmlpp.py库的get_pprint()
函数作为替代方案。这对我的用例非常顺利,无需重新解析成lxml ElementTree对象。
将整个xml文档转换为漂亮的xml文档
(例如:假设您已提取[解压缩]了LibreOffice Writer .odt或.ods文件,并且您想将丑陋的“content.xml”文件转换为漂亮的文件,以进行自动化git版本控制和.odt/.ods文件的 git difftool ,就像我在这里实现的那样)
import xml.dom.minidom
file = open("./content.xml", 'r')
xml_string = file.read()
file.close()
parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()
file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()
参考资料:
- 感谢本·诺兰在此页面上的回答,让我完成了大部分工作。
import subprocess
def makePretty(filepath):
cmd = "xmllint --format " + filepath
prettyXML = subprocess.check_output(cmd, shell = True)
with open(filepath, "w") as outfile:
outfile.write(prettyXML)
xmllint
软件包的基于Unix的系统。check_output
,因为你不需要进行错误检查。 - FriskySaga我遇到了这个问题,我是这样解决的:
def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
if pretty_print: pretty_printed_xml = pretty_printed_xml.replace(' ', indent)
file.write(pretty_printed_xml)
try:
with open(file_path, 'w') as file:
file.write('<?xml version="1.0" encoding="utf-8" ?>')
# create some xml content using etree ...
xml_parser = XMLParser()
xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')
except IOError:
print("Error while writing in log file!")
这仅适用于etree默认使用两个空格
进行缩进,我认为这并没有强调缩进,因此不太美观。我找不到任何设置etree或更改标准etree缩进的任何函数参数。我喜欢使用etree的简便性,但这真的让我很烦恼。
etree.indent
和 etree.tostring
。import lxml.etree as etree
root = etree.fromstring('<html><head></head><body><h1>Welcome</h1></body></html>')
etree.indent(root, space=" ")
xml_string = etree.tostring(root, pretty_print=True).decode()
print(xml_string)
输出
<html>
<head/>
<body>
<h1>Welcome</h1>
</body>
</html>
移除命名空间和前缀
import lxml.etree as etree
def dump_xml(element):
for item in element.getiterator():
item.tag = etree.QName(item).localname
etree.cleanup_namespaces(element)
etree.indent(element, space=" ")
result = etree.tostring(element, pretty_print=True).decode()
return result
root = etree.fromstring('<cs:document xmlns:cs="http://blabla.com"><name>hello world</name></cs:document>')
xml_string = dump_xml(root)
print(xml_string)
输出
<document>
<name>hello world</name>
</document>
from lxml import etree
import xml.dom.minidom as mmd
xml_root = etree.parse(xml_fiel_path, etree.XMLParser())
def print_xml(xml_root):
plain_xml = etree.tostring(xml_root).decode('utf-8')
urgly_xml = ''.join(plain_xml .split())
good_xml = mmd.parseString(urgly_xml)
print(good_xml.toprettyxml(indent=' ',))
对于包含中文的 XML,它运行良好!
我用几行代码解决了这个问题,打开文件,遍历它并添加缩进,然后再保存。我正在处理小的xml文件,并且不想为用户添加依赖项或更多的库。无论如何,这就是我最终得出的结果:
f = open(file_name,'r')
xml = f.read()
f.close()
#Removing old indendations
raw_xml = ''
for line in xml:
raw_xml += line
xml = raw_xml
new_xml = ''
indent = ' '
deepness = 0
for i in range((len(xml))):
new_xml += xml[i]
if(i<len(xml)-3):
simpleSplit = xml[i:(i+2)] == '><'
advancSplit = xml[i:(i+3)] == '></'
end = xml[i:(i+2)] == '/>'
start = xml[i] == '<'
if(advancSplit):
deepness += -1
new_xml += '\n' + indent*deepness
simpleSplit = False
deepness += -1
if(simpleSplit):
new_xml += '\n' + indent*deepness
if(start):
deepness += 1
if(end):
deepness += -1
f = open(file_name,'w')
f.write(new_xml)
f.close()
这对我有用,也许其他人会有所用处 :)