如何从Java中漂亮地打印XML?

499

我有一个包含XML代码的Java字符串,其中没有换行或缩进。我想将其转换为格式良好的XML字符串。如何做到这一点?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);
注意:我的输入是一个字符串(String)。我的输出也是一个字符串(String)。
(基础)模拟结果:
<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

请查看这个问题:https://dev59.com/lXM_5IYBdhLWcg3wslfs - dfa
10
只是好奇,你是将这个输出发送到XML文件或其他需要缩进很重要的地方吗?之前我非常关注格式化我的XML以便正确显示它...但在花费了大量时间后,我意识到我必须将输出发送到一个Web浏览器,并且任何相对现代的Web浏览器都可以以漂亮的树形结构显示XML,所以我可以忘记这个问题并继续前进。我提到这个只是为了防止你(或其他有同样问题的用户)可能会忽略同样的细节。 - Abel Morelos
4
@Abel,将数据保存到文本文件中,插入到HTML文本区域中,并将其倒出到控制台以进行调试。 - Steve McLeod
6
“put on hold as too broad” 的意思是“因问题过于广泛而被搁置”,目前很难比问题更加精确明了! - Steve McLeod
34个回答

4
使用jdom2:http://www.jdom.org/
import java.io.StringReader;
import org.jdom2.input.SAXBuilder;
import org.jdom2.output.Format;
import org.jdom2.output.XMLOutputter;

String prettyXml = new XMLOutputter(Format.getPrettyFormat()).
                         outputString(new SAXBuilder().build(new StringReader(uglyXml)));

太棒了,它解决了我的问题。谢谢! - Prateek Mehta

3
作为maxcodeskrapsDavid Easleymilosmns的答案的替代方案,看一下我轻量级、高性能的漂亮打印库:xml-formatter。请保留HTML标签。
// construct lightweight, threadsafe, instance
PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().build();

StringBuilder buffer = new StringBuilder();
String xml = ..; // also works with char[] or Reader

if(prettyPrinter.process(xml, buffer)) {
     // valid XML, print buffer
} else {
     // invalid XML, print xml
}

有时候,比如直接从文件运行模拟的SOAP服务时,拥有一个能够处理已经进行漂亮打印的XML的漂亮打印机是很好的。
PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

正如一些人所评论的那样,漂亮打印只是以一种更易读的形式呈现XML的方式 - 空格严格来说不属于您的XML数据。

该库旨在用于记录目的的漂亮打印,并包括用于过滤(子树删除/匿名化)和在CDATA和文本节点中漂亮打印XML的函数。


2

我曾经遇到过同样的问题,而使用 JTidy (http://jtidy.sourceforge.net/index.html) 后获得了很好的效果。

例如:

Tidy t = new Tidy();
t.setIndentContent(true);
Document d = t.parseDOM(
    new ByteArrayInputStream("HTML goes here", null);

OutputStream out = new ByteArrayOutputStream();
t.pprint(d, out);
String html = out.toString();

2

我总是使用以下函数:

最初的回答:

public static String prettyPrintXml(String xmlStringToBeFormatted) {
    String formattedXmlString = null;
    try {
        DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        documentBuilderFactory.setValidating(true);
        DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
        InputSource inputSource = new InputSource(new StringReader(xmlStringToBeFormatted));
        Document document = documentBuilder.parse(inputSource);

        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");

        StreamResult streamResult = new StreamResult(new StringWriter());
        DOMSource dOMSource = new DOMSource(document);
        transformer.transform(dOMSource, streamResult);
        formattedXmlString = streamResult.getWriter().toString().trim();
    } catch (Exception ex) {
        StringWriter sw = new StringWriter();
        ex.printStackTrace(new PrintWriter(sw));
        System.err.println(sw.toString());
    }
    return formattedXmlString;
}

这很重要:transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2"); 如果缺失,所有元素都将左对齐。 - Hank Lapidez
@HankLapidez 这是真的。 - Benson Githinji
在这里构建DOM是不必要且低效的。 - Michael Kay

2

我发现在Java 1.6.0_32中,使用Transformer和null或identity xslt来pretty print一个XML字符串的正常方法,如果标签仅由空格分隔而不是没有分隔文本,则无法按照我想要的方式工作。我尝试在我的模板中使用<xsl:strip-space elements="*"/>,但没有效果。我找到的最简单的解决方案是使用SAXSource和XML过滤器来删除我想要的空格。由于我的解决方案是用于日志记录,我还将其扩展为与不完整的XML片段一起使用。请注意,如果使用DOMSource,则正常方法似乎可以正常工作,但由于不完整性和内存开销,我不想使用它。

public static class WhitespaceIgnoreFilter extends XMLFilterImpl
{

    @Override
    public void ignorableWhitespace(char[] arg0,
                                    int arg1,
                                    int arg2) throws SAXException
    {
        //Ignore it then...
    }

    @Override
    public void characters( char[] ch,
                            int start,
                            int length) throws SAXException
    {
        if (!new String(ch, start, length).trim().equals("")) 
               super.characters(ch, start, length); 
    }
}

public static String prettyXML(String logMsg, boolean allowBadlyFormedFragments) throws SAXException, IOException, TransformerException
    {
        TransformerFactory transFactory = TransformerFactory.newInstance();
        transFactory.setAttribute("indent-number", new Integer(2));
        Transformer transformer = transFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
        StringWriter out = new StringWriter();
        XMLReader masterParser = SAXHelper.getSAXParser(true);
        XMLFilter parser = new WhitespaceIgnoreFilter();
        parser.setParent(masterParser);

        if(allowBadlyFormedFragments)
        {
            transformer.setErrorListener(new ErrorListener()
            {
                @Override
                public void warning(TransformerException exception) throws TransformerException
                {
                }

                @Override
                public void fatalError(TransformerException exception) throws TransformerException
                {
                }

                @Override
                public void error(TransformerException exception) throws TransformerException
                {
                }
            });
        }

        try
        {
            transformer.transform(new SAXSource(parser, new InputSource(new StringReader(logMsg))), new StreamResult(out));
        }
        catch (TransformerException e)
        {
            if(e.getCause() != null && e.getCause() instanceof SAXParseException)
            {
                if(!allowBadlyFormedFragments || !"XML document structures must start and end within the same entity.".equals(e.getCause().getMessage()))
                {
                    throw e;
                }
            }
            else
            {
                throw e;
            }
        }
        out.flush();
        return out.toString();
    }

1

我在这里找到的适用于Java 1.6+的解决方案如果代码已经格式化,则不会重新格式化。对我起作用的(可以重新格式化已经格式化的代码)是以下内容。

import org.apache.xml.security.c14n.CanonicalizationException;
import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.c14n.InvalidCanonicalizerException;
import org.w3c.dom.Element;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;
import java.io.IOException;
import java.io.StringReader;

public class XmlUtils {
    public static String toCanonicalXml(String xml) throws InvalidCanonicalizerException, ParserConfigurationException, SAXException, CanonicalizationException, IOException {
        Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
        byte canonXmlBytes[] = canon.canonicalize(xml.getBytes());
        return new String(canonXmlBytes);
    }

    public static String prettyFormat(String input) throws TransformerException, ParserConfigurationException, IOException, SAXException, InstantiationException, IllegalAccessException, ClassNotFoundException {
        InputSource src = new InputSource(new StringReader(input));
        Element document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
        Boolean keepDeclaration = input.startsWith("<?xml");
        DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
        LSSerializer writer = impl.createLSSerializer();
        writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
        writer.getDomConfig().setParameter("xml-declaration", keepDeclaration);
        return writer.writeToString(document);
    }
}

这是一个很好的工具,可用于单元测试中进行完整字符串xml比较。

private void assertXMLEqual(String expected, String actual) throws ParserConfigurationException, IOException, SAXException, CanonicalizationException, InvalidCanonicalizerException, TransformerException, IllegalAccessException, ClassNotFoundException, InstantiationException {
    String canonicalExpected = prettyFormat(toCanonicalXml(expected));
    String canonicalActual = prettyFormat(toCanonicalXml(actual));
    assertEquals(canonicalExpected, canonicalActual);
}

1

我看到一个回答使用Scala,所以这里提供另一种使用Groovy的方法,以防有人觉得有趣。默认缩进是2步,XmlNodePrinter构造函数也可以传递另一个值。

def xml = "<tag><nested>hello</nested></tag>"
def stringWriter = new StringWriter()
def node = new XmlParser().parseText(xml);
new XmlNodePrinter(new PrintWriter(stringWriter)).print(node)
println stringWriter.toString()

如果Groovy jar包在类路径中,可以从Java中使用

  String xml = "<tag><nested>hello</nested></tag>";
  StringWriter stringWriter = new StringWriter();
  Node node = new XmlParser().parseText(xml);
  new XmlNodePrinter(new PrintWriter(stringWriter)).print(node);
  System.out.println(stringWriter.toString());

1

1

对于那些寻求快速且不需要XML 100%有效的解决方案的人 - 例如在REST / SOAP日志记录的情况下(您永远不知道其他人发送了什么;-)

我找到并改进了在线发现的代码片段,我认为这仍然是一种有效的可能方法:

public static String prettyPrintXMLAsString(String xmlString) {
    /* Remove new lines */
    final String LINE_BREAK = "\n";
    xmlString = xmlString.replaceAll(LINE_BREAK, "");
    StringBuffer prettyPrintXml = new StringBuffer();
    /* Group the xml tags */
    Pattern pattern = Pattern.compile("(<[^/][^>]+>)?([^<]*)(</[^>]+>)?(<[^/][^>]+/>)?");
    Matcher matcher = pattern.matcher(xmlString);
    int tabCount = 0;
    while (matcher.find()) {
        String str1 = (null == matcher.group(1) || "null".equals(matcher.group())) ? "" : matcher.group(1);
        String str2 = (null == matcher.group(2) || "null".equals(matcher.group())) ? "" : matcher.group(2);
        String str3 = (null == matcher.group(3) || "null".equals(matcher.group())) ? "" : matcher.group(3);
        String str4 = (null == matcher.group(4) || "null".equals(matcher.group())) ? "" : matcher.group(4);

        if (matcher.group() != null && !matcher.group().trim().equals("")) {
            printTabs(tabCount, prettyPrintXml);
            if (!str1.equals("") && str3.equals("")) {
                ++tabCount;
            }
            if (str1.equals("") && !str3.equals("")) {
                --tabCount;
                prettyPrintXml.deleteCharAt(prettyPrintXml.length() - 1);
            }

            prettyPrintXml.append(str1);
            prettyPrintXml.append(str2);
            prettyPrintXml.append(str3);
            if (!str4.equals("")) {
                prettyPrintXml.append(LINE_BREAK);
                printTabs(tabCount, prettyPrintXml);
                prettyPrintXml.append(str4);
            }
            prettyPrintXml.append(LINE_BREAK);
        }
    }
    return prettyPrintXml.toString();
}

private static void printTabs(int count, StringBuffer stringBuffer) {
    for (int i = 0; i < count; i++) {
        stringBuffer.append("\t");
    }
}

public static void main(String[] args) {
    String x = new String(
            "<soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\"><soap:Body><soap:Fault><faultcode>soap:Client</faultcode><faultstring>INVALID_MESSAGE</faultstring><detail><ns3:XcbSoapFault xmlns=\"\" xmlns:ns3=\"http://www.someapp.eu/xcb/types/xcb/v1\"><CauseCode>20007</CauseCode><CauseText>INVALID_MESSAGE</CauseText><DebugInfo>Problems creating SAAJ object model</DebugInfo></ns3:XcbSoapFault></detail></soap:Fault></soap:Body></soap:Envelope>");
    System.out.println(prettyPrintXMLAsString(x));
}

这里是输出:
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <soap:Fault>
        <faultcode>soap:Client</faultcode>
        <faultstring>INVALID_MESSAGE</faultstring>
        <detail>
            <ns3:XcbSoapFault xmlns="" xmlns:ns3="http://www.someapp.eu/xcb/types/xcb/v1">
                <CauseCode>20007</CauseCode>
                <CauseText>INVALID_MESSAGE</CauseText>
                <DebugInfo>Problems creating SAAJ object model</DebugInfo>
            </ns3:XcbSoapFault>
        </detail>
    </soap:Fault>
  </soap:Body>
</soap:Envelope>

“快速而不精确”可以很好地描述它。 - Michael Kay

1

Underscore-java有一个静态方法U.formatXml(string)实时示例

import com.github.underscore.U;

public class MyClass {
    public static void main(String args[]) {
        String xml = "<tag><nested>hello</nested></tag>";

        System.out.println(U.formatXml("<?xml version=\"1.0\" encoding=\"UTF-8\"?><root>" + xml + "</root>"));
    }
}

输出:

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <tag>
      <nested>hello</nested>
   </tag>
</root>

这太棒了! - senyor

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接