使用 toprettyxml() 函数时出现换行符问题

Question

使用 toprettyxml() 函数时出现换行符问题

17

我目前正在使用 Python 脚本中的 xml.dom 模块的 toprettyxml() 函数，并且在处理换行符时遇到了一些问题。如果不使用 newl 参数或者使用 toprettyxml(newl='\n')，它会显示多个换行符，而不是一个。

比如说：

f = open(filename, 'w')
f.write(dom1.toprettyxml(encoding='UTF-8'))
f.close()

显示：

<params>


    <param name="Level" value="#LEVEL#"/>


    <param name="Code" value="281"/>


</params>

有人知道这个问题出在哪里以及我该如何解决吗？顺便提一下，我正在使用Python 2.6.1。

- pierroz

8个回答

14

toprettyxml()非常糟糕。这不是Windows和'\r\n'的问题。尝试将任何字符串作为newl参数传递都会显示添加了过多的行。此外，还会添加其他空格（可能会在机器读取xml时导致问题）。

一些解决方法可在
http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace找到。

- xverges

2

非常感谢Xv！现在，我正在尽可能少地使用toprettyxml()，但知道有解决这个烦人问题的方法很好。而且这篇文章非常清晰易懂。 - pierroz

从xml.dom.ext导入PrettyPrint 从StringIO导入StringIOdef toprettyxml_fixed (node, encoding='utf-8'): tmpStream = StringIO() PrettyPrint(node, stream=tmpStream, encoding=encoding) return tmpStream.getvalue() - Jay-Pi

如果您的网站崩溃了等情况，链接代码会非常有帮助。 - Jay-Pi

xml.dom.ext没有被添加到Python libstd中。它是自定义发行版的一部分，可能是为了修复这个糟糕的问题。 - Jay-Pi

5

toprettyxml(newl='') 对于我在Windows上是可行的。

- OndrejC

也适用于Ubuntu 16.04（bash）。 - renedet

3

除了第一行以外，这个方法适用于每一行，似乎它会去掉换行符并将第一行和第二行合并... - Josh Correia

4

这是一个相当老的问题，但我想我知道问题出在哪里：

Minidom的pretty print方法非常直观。它只是添加您指定为参数的字符。这意味着，如果这些字符已经存在，它将重复这些字符。

例如，如果您解析一个XML文件，它看起来像这样：

<parent>
   <child>
      Some text
   </child>
</parent>

dom中已经有换行符和缩进。这些被minidom视为文本节点，在解析为dom对象时仍然存在。

如果您现在将dom对象转换为XML字符串，这些文本节点仍将存在。这意味着换行符和制表符仍然存在。现在使用漂亮的打印，只会添加更多的换行符和制表符。因此，在这种情况下，根本不使用漂亮的打印或指定newl =''将产生所需的输出结果。

但是，如果您在脚本中生成dom，则不存在文本节点，因此使用newl='\r\n'和/或addindent='\t'进行漂亮的打印会非常好看。

简而言之，缩进和换行符从解析中保留，漂亮的打印只会添加更多。

- Link64

2

如果您不介意安装新的包，请尝试使用beautifulsoup。我在使用它的xml prettyfier时有非常好的体验。

- felixhummel

0

以下函数对我的问题起到了作用。我必须使用Python 2.7，并且不允许安装任何第三方附加包。

实现的关键是：

使用dom.toprettyxml()
删除所有空格
根据您的要求添加新行和制表符。

~

import os
import re
import xml.dom.minidom
import sys

class XmlTag:
    opening = 0
    closing = 1
    self_closing = 2
    closing_tag = "</"
    self_closing_tag = "/>"
    opening_tag = "<"

def to_pretty_xml(xml_file_path):
    pretty_xml = ""
    space_or_tab_count = "  " # Add spaces or use \t
    tab_count = 0
    last_tag = -1

    dom = xml.dom.minidom.parse(xml_file_path)

    # get pretty-printed version of input file
    string_xml = dom.toprettyxml(' ', os.linesep)

    # remove version tag
    string_xml = string_xml.replace("<?xml version=\"1.0\" ?>", '')

    # remove empty lines and spaces
    string_xml = "".join(string_xml.split())

    # move each tag to new line
    string_xml = string_xml.replace('>', '>\n')

    for line in string_xml.split('\n'):
        if line.__contains__(XmlTag.closing_tag):

            # For consecutive closing tags decrease the indentation
            if last_tag == XmlTag.closing:
                tab_count = tab_count - 1

            # Move closing element to next line
            if last_tag == XmlTag.closing or last_tag == XmlTag.self_closing:
                pretty_xml = pretty_xml + '\n' + (space_or_tab_count * tab_count)

            pretty_xml = pretty_xml + line
            last_tag = XmlTag.closing

        elif line.__contains__(XmlTag.self_closing_tag):

            # Print self closing on next line with one indentation from parent node
            pretty_xml = pretty_xml + '\n' + (space_or_tab_count * (tab_count+1)) + line
            last_tag = XmlTag.self_closing

        elif line.__contains__(XmlTag.opening_tag):

            # For consecutive opening tags increase the indentation
            if last_tag == XmlTag.opening:
                tab_count = tab_count + 1

            # Move opening element to next line
            if last_tag == XmlTag.opening or last_tag == XmlTag.closing:
                pretty_xml = pretty_xml + '\n' + (space_or_tab_count * tab_count)

            pretty_xml = pretty_xml + line
            last_tag = XmlTag.opening

    return pretty_xml

pretty_xml = to_pretty_xml("simple.xml")

with open("pretty.xml", 'w') as f:
    f.write(pretty_xml)

- Naveed Rasheed

0

这在Python 3.6上给我很好的XML，但我还没有在Windows上尝试过：

dom = xml.dom.minidom.parseString(xml_string)

pretty_xml_as_string = dom.toprettyxml(newl='').replace("\n\n", "\n")

- n-a-t-e

-1

你是在Windows上查看结果文件吗？如果是，请尝试使用toprettyxml(newl='\r\n')。

- Will McCutchen

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- dganesh2002 · Accepted Answer

我找到了另一种很棒的解决方案：

f = open(filename, 'w')
dom_string = dom1.toprettyxml(encoding='UTF-8')
dom_string = os.linesep.join([s for s in dom_string.splitlines() if s.strip()])
f.write(dom_string)
f.close()

上述解决方案基本上从dom_string中删除了由toprettyxml()生成的不必要的换行符。

输入来自于 -> 有什么快速的一行代码可以从Python字符串中删除空行？