Python中漂亮地打印XML

Question

Python中漂亮地打印XML

pythonxmlpretty-print

531

在Python中，美化XML的最佳方法是什么（或者有哪些方法）？

- Hortitude

27个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- gaborous · Answer 1

1

如果您不想重新解析，可以使用xmlpp.py库的get_pprint()函数作为替代方案。这对我的用例非常顺利，无需重新解析成lxml ElementTree对象。

- gaborous

1

尝试了minidom和lxml，但没有得到格式正确且缩进良好的XML。这个解决方案完美地达成了预期效果。 - david-hoze

1

对于带有命名空间前缀并包含连字符的标签名称（例如ns:hyphenated-tag/），会失败；以连字符开头的部分被简单地删除，例如ns:hyphenated/。 - Endre Both

@EndreBoth 很好的发现，我没有测试过，但也许在xmlpp.py代码中修复这个问题会很容易？ - gaborous

- Gabriel Staples · Answer 2

将整个xml文档转换为漂亮的xml文档
（例如：假设您已提取[解压缩]了LibreOffice Writer .odt或.ods文件，并且您想将丑陋的“content.xml”文件转换为漂亮的文件，以进行自动化git版本控制和.odt/.ods文件的 git difftool ，就像我在这里实现的那样）

import xml.dom.minidom

file = open("./content.xml", 'r')
xml_string = file.read()
file.close()

parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()

file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()

参考资料：
- 感谢本·诺兰在此页面上的回答，让我完成了大部分工作。

- FriskySaga · Answer 3

如果由于某种原因你无法获取其他用户提到的任何Python模块，我建议以下解决方案适用于Python 2.7：

import subprocess

def makePretty(filepath):
  cmd = "xmllint --format " + filepath
  prettyXML = subprocess.check_output(cmd, shell = True)
  with open(filepath, "w") as outfile:
    outfile.write(prettyXML)

据我所知，这个解决方案适用于安装了xmllint软件包的基于Unix的系统。

- Zelphir Kaltstahl · Answer 4

我遇到了这个问题，我是这样解决的：

def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
    pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
    if pretty_print: pretty_printed_xml = pretty_printed_xml.replace('  ', indent)
    file.write(pretty_printed_xml)

在我的代码中，这个方法被这样调用：

try:
    with open(file_path, 'w') as file:
        file.write('<?xml version="1.0" encoding="utf-8" ?>')

        # create some xml content using etree ...

        xml_parser = XMLParser()
        xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')

except IOError:
    print("Error while writing in log file!")

这仅适用于etree默认使用两个空格进行缩进，我认为这并没有强调缩进，因此不太美观。我找不到任何设置etree或更改标准etree缩进的任何函数参数。我喜欢使用etree的简便性，但这真的让我很烦恼。

- Jossef Harush Kadouri · Answer 5

使用 etree.indent 和 etree.tostring。

import lxml.etree as etree

root = etree.fromstring('<html><head></head><body><h1>Welcome</h1></body></html>')
etree.indent(root, space="  ")
xml_string = etree.tostring(root, pretty_print=True).decode()
print(xml_string)

输出

<html>
  <head/>
  <body>
    <h1>Welcome</h1>
  </body>
</html>

移除命名空间和前缀

import lxml.etree as etree


def dump_xml(element):
    for item in element.getiterator():
        item.tag = etree.QName(item).localname

    etree.cleanup_namespaces(element)
    etree.indent(element, space="  ")
    result = etree.tostring(element, pretty_print=True).decode()
    return result


root = etree.fromstring('<cs:document xmlns:cs="http://blabla.com"><name>hello world</name></cs:document>')
xml_string = dump_xml(root)
print(xml_string)

输出

<document>
  <name>hello world</name>
</document>

- Reed_Xia · Answer 6

from lxml import etree
import xml.dom.minidom as mmd

xml_root = etree.parse(xml_fiel_path, etree.XMLParser())

def print_xml(xml_root):
    plain_xml = etree.tostring(xml_root).decode('utf-8')
    urgly_xml = ''.join(plain_xml .split())
    good_xml = mmd.parseString(urgly_xml)
    print(good_xml.toprettyxml(indent='    ',))

对于包含中文的 XML，它运行良好！

- Petter TB · Answer 7

我用几行代码解决了这个问题，打开文件，遍历它并添加缩进，然后再保存。我正在处理小的xml文件，并且不想为用户添加依赖项或更多的库。无论如何，这就是我最终得出的结果：

    f = open(file_name,'r')
    xml = f.read()
    f.close()

    #Removing old indendations
    raw_xml = ''        
    for line in xml:
        raw_xml += line

    xml = raw_xml

    new_xml = ''
    indent = '    '
    deepness = 0

    for i in range((len(xml))):

        new_xml += xml[i]   
        if(i<len(xml)-3):

            simpleSplit = xml[i:(i+2)] == '><'
            advancSplit = xml[i:(i+3)] == '></'        
            end = xml[i:(i+2)] == '/>'    
            start = xml[i] == '<'

            if(advancSplit):
                deepness += -1
                new_xml += '\n' + indent*deepness
                simpleSplit = False
                deepness += -1
            if(simpleSplit):
                new_xml += '\n' + indent*deepness
            if(start):
                deepness += 1
            if(end):
                deepness += -1

    f = open(file_name,'w')
    f.write(new_xml)
    f.close()

这对我有用，也许其他人会有所用处 :)