在Java中比较两个XML文档的最佳方法

Question

在Java中比较两个XML文档的最佳方法

javaxmltestingparsingcomparison

223

我正在尝试编写自动化测试用例，测试一个应用程序，该程序将定制的消息格式转换为XML消息并将其发送到另一端。我已经准备好了一组良好的输入/输出消息对，所以我只需要将输入消息发送进去，并等待XML消息从另一端出现。

当比较实际输出和期望输出时，我遇到了一些问题。我的第一个想法是仅对期望消息和实际消息进行字符串比较。但这种方法不太可行，因为我们拥有的示例数据格式不一致，并且XML命名空间通常使用不同的别名（有时根本不使用命名空间）。

我知道可以解析两个字符串，然后逐个元素进行比较，虽然这样做不太困难，但我感觉还有更好的方法或者可以利用的库。

综上所述，问题就是：

给定两个包含有效XML的Java字符串，如何确定它们是否在语义上相等？如果你有一种确定差异的方法，那就更好了。

- Mike Deck

15个回答

41

以下代码使用标准JDK库检查文档是否相等。

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc1 = db.parse(new File("file1.xml"));
doc1.normalizeDocument();
Document doc2 = db.parse(new File("file2.xml"));
doc2.normalizeDocument();
Assert.assertTrue(doc1.isEqualNode(doc2));

normalize() 用于确保没有循环引用（实际上不会出现任何问题）。

上述代码需要在元素内部具有相同的空格，因为它会保留并评估这些空格。Java附带的标准XML解析器不允许您设置功能以提供规范版本或了解xml:space，如果这是一个问题，则可能需要使用替代的XML解析器，例如xerces或使用JDOM。

- Archimedes Trajano

4

这对于没有命名空间或具有“规范化”命名空间前缀的XML文件完美地起作用。我怀疑如果一个XML是 <ns1:a xmlns:ns1="ns" /> 而另一个是 <ns2:a xmlns:ns2="ns" />，那么它可能不起作用。 - koppor

dbf.setIgnoringElementContentWhitespace(true) 的结果并不如我所预期，使用这种解决方案（两个空格填充）<root>name</root> 与 <root> name </name> 不相等，但在此情况下（JDK8），XMLUnit 给出了相等的结果。 - Miklos Krivan

对我来说，它不忽略换行符，这是一个问题。 - Flyout91

setIgnoringElementContentWhitespace(false) - Archimedes Trajano

28

Xom有一个Canonicalizer实用程序，可以将您的DOM转换为常规形式，然后您可以将其字符串化并进行比较。因此，无论空格不规则性或属性顺序如何，您都可以获得文档的规则、可预测的比较。

在具有专用视觉字符串比较器（如Eclipse）的IDE中，这特别有效。您可以获得文档之间语义差异的视觉表示。

- skaffman

28

XMLUnit的最新版本可以帮助断言两个XML是否相等。还需要使用XMLUnit.setIgnoreWhitespace()和XMLUnit.setIgnoreAttributeOrder()，以便满足当前情况的需求。

下面是一个简单的XML Unit使用示例。

import org.custommonkey.xmlunit.DetailedDiff;
import org.custommonkey.xmlunit.XMLUnit;
import org.junit.Assert;

public class TestXml {

    public static void main(String[] args) throws Exception {
        String result = "<abc             attr=\"value1\"                title=\"something\">            </abc>";
        // will be ok
        assertXMLEquals("<abc attr=\"value1\" title=\"something\"></abc>", result);
    }

    public static void assertXMLEquals(String expectedXML, String actualXML) throws Exception {
        XMLUnit.setIgnoreWhitespace(true);
        XMLUnit.setIgnoreAttributeOrder(true);

        DetailedDiff diff = new DetailedDiff(XMLUnit.compareXML(expectedXML, actualXML));

        List<?> allDifferences = diff.getAllDifferences();
        Assert.assertEquals("Differences found: "+ diff.toString(), 0, allDifferences.size());
    }

}

如果使用Maven，请将以下内容添加到您的pom.xml文件中：

<dependency>
    <groupId>xmlunit</groupId>
    <artifactId>xmlunit</artifactId>
    <version>1.4</version>
</dependency>

- acdcjunior

这非常适合需要从静态方法进行比较的人。 - Andy B

1

XMLUnit.setIgnoreAttributeOrder(true); 不起作用。如果某些节点顺序不同，则比较将失败。 - Bevor

你应该意识到，“IgnoreAttributeOrder” 的意思是忽略属性顺序而不是忽略节点顺序，对吧？ - acdcjunior

如果节点在不同的位置...这个会起作用吗？ - Abhijit Bashetti

@AbhijitBashetti 嗯...那很糟糕...对不起，我没有其他的了。 - acdcjunior

显示剩余5条评论

10

在 Tom 的回答基础上，这里给出一个使用XMLUnit v2的示例。

它使用了以下 Maven 依赖项：

    <dependency>
        <groupId>org.xmlunit</groupId>
        <artifactId>xmlunit-core</artifactId>
        <version>2.0.0</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.xmlunit</groupId>
        <artifactId>xmlunit-matchers</artifactId>
        <version>2.0.0</version>
        <scope>test</scope>
    </dependency>

这里是测试代码：

import static org.junit.Assert.assertThat;
import static org.xmlunit.matchers.CompareMatcher.isIdenticalTo;
import org.xmlunit.builder.Input;
import org.xmlunit.input.WhitespaceStrippedSource;

public class SomeTest extends XMLTestCase {
    @Test
    public void test() {
        String result = "<root></root>";
        String expected = "<root>  </root>";

        // ignore whitespace differences
        // https://github.com/xmlunit/user-guide/wiki/Providing-Input-to-XMLUnit#whitespacestrippedsource
        assertThat(result, isIdenticalTo(new WhitespaceStrippedSource(Input.from(expected).build())));

        assertThat(result, isIdenticalTo(Input.from(expected).build())); // will fail due to whitespace differences
    }
}

这个文档可以说明问题: https://github.com/xmlunit/xmlunit#comparing-two-documents

- Tom Saleeba

8

感谢您的扩展，尝试一下这个...

import java.io.ByteArrayInputStream;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;

public class XmlDiff 
{
    private boolean nodeTypeDiff = true;
    private boolean nodeValueDiff = true;

    public boolean diff( String xml1, String xml2, List<String> diffs ) throws Exception
    {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        dbf.setCoalescing(true);
        dbf.setIgnoringElementContentWhitespace(true);
        dbf.setIgnoringComments(true);
        DocumentBuilder db = dbf.newDocumentBuilder();


        Document doc1 = db.parse(new ByteArrayInputStream(xml1.getBytes()));
        Document doc2 = db.parse(new ByteArrayInputStream(xml2.getBytes()));

        doc1.normalizeDocument();
        doc2.normalizeDocument();

        return diff( doc1, doc2, diffs );

    }

    /**
     * Diff 2 nodes and put the diffs in the list 
     */
    public boolean diff( Node node1, Node node2, List<String> diffs ) throws Exception
    {
        if( diffNodeExists( node1, node2, diffs ) )
        {
            return true;
        }

        if( nodeTypeDiff )
        {
            diffNodeType(node1, node2, diffs );
        }

        if( nodeValueDiff )
        {
            diffNodeValue(node1, node2, diffs );
        }


        System.out.println(node1.getNodeName() + "/" + node2.getNodeName());

        diffAttributes( node1, node2, diffs );
        diffNodes( node1, node2, diffs );

        return diffs.size() > 0;
    }

    /**
     * Diff the nodes
     */
    public boolean diffNodes( Node node1, Node node2, List<String> diffs ) throws Exception
    {
        //Sort by Name
        Map<String,Node> children1 = new LinkedHashMap<String,Node>();      
        for( Node child1 = node1.getFirstChild(); child1 != null; child1 = child1.getNextSibling() )
        {
            children1.put( child1.getNodeName(), child1 );
        }

        //Sort by Name
        Map<String,Node> children2 = new LinkedHashMap<String,Node>();      
        for( Node child2 = node2.getFirstChild(); child2!= null; child2 = child2.getNextSibling() )
        {
            children2.put( child2.getNodeName(), child2 );
        }

        //Diff all the children1
        for( Node child1 : children1.values() )
        {
            Node child2 = children2.remove( child1.getNodeName() );
            diff( child1, child2, diffs );
        }

        //Diff all the children2 left over
        for( Node child2 : children2.values() )
        {
            Node child1 = children1.get( child2.getNodeName() );
            diff( child1, child2, diffs );
        }

        return diffs.size() > 0;
    }


    /**
     * Diff the nodes
     */
    public boolean diffAttributes( Node node1, Node node2, List<String> diffs ) throws Exception
    {        
        //Sort by Name
        NamedNodeMap nodeMap1 = node1.getAttributes();
        Map<String,Node> attributes1 = new LinkedHashMap<String,Node>();        
        for( int index = 0; nodeMap1 != null && index < nodeMap1.getLength(); index++ )
        {
            attributes1.put( nodeMap1.item(index).getNodeName(), nodeMap1.item(index) );
        }

        //Sort by Name
        NamedNodeMap nodeMap2 = node2.getAttributes();
        Map<String,Node> attributes2 = new LinkedHashMap<String,Node>();        
        for( int index = 0; nodeMap2 != null && index < nodeMap2.getLength(); index++ )
        {
            attributes2.put( nodeMap2.item(index).getNodeName(), nodeMap2.item(index) );

        }

        //Diff all the attributes1
        for( Node attribute1 : attributes1.values() )
        {
            Node attribute2 = attributes2.remove( attribute1.getNodeName() );
            diff( attribute1, attribute2, diffs );
        }

        //Diff all the attributes2 left over
        for( Node attribute2 : attributes2.values() )
        {
            Node attribute1 = attributes1.get( attribute2.getNodeName() );
            diff( attribute1, attribute2, diffs );
        }

        return diffs.size() > 0;
    }
    /**
     * Check that the nodes exist
     */
    public boolean diffNodeExists( Node node1, Node node2, List<String> diffs ) throws Exception
    {
        if( node1 == null && node2 == null )
        {
            diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2 + "\n" );
            return true;
        }

        if( node1 == null && node2 != null )
        {
            diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2.getNodeName() );
            return true;
        }

        if( node1 != null && node2 == null )
        {
            diffs.add( getPath(node1) + ":node " + node1.getNodeName() + "!=" + node2 );
            return true;
        }

        return false;
    }

    /**
     * Diff the Node Type
     */
    public boolean diffNodeType( Node node1, Node node2, List<String> diffs ) throws Exception
    {       
        if( node1.getNodeType() != node2.getNodeType() ) 
        {
            diffs.add( getPath(node1) + ":type " + node1.getNodeType() + "!=" + node2.getNodeType() );
            return true;
        }

        return false;
    }

    /**
     * Diff the Node Value
     */
    public boolean diffNodeValue( Node node1, Node node2, List<String> diffs ) throws Exception
    {       
        if( node1.getNodeValue() == null && node2.getNodeValue() == null )
        {
            return false;
        }

        if( node1.getNodeValue() == null && node2.getNodeValue() != null )
        {
            diffs.add( getPath(node1) + ":type " + node1 + "!=" + node2.getNodeValue() );
            return true;
        }

        if( node1.getNodeValue() != null && node2.getNodeValue() == null )
        {
            diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2 );
            return true;
        }

        if( !node1.getNodeValue().equals( node2.getNodeValue() ) )
        {
            diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2.getNodeValue() );
            return true;
        }

        return false;
    }


    /**
     * Get the node path
     */
    public String getPath( Node node )
    {
        StringBuilder path = new StringBuilder();

        do
        {           
            path.insert(0, node.getNodeName() );
            path.insert( 0, "/" );
        }
        while( ( node = node.getParentNode() ) != null );

        return path.toString();
    }
}

- Javelin

3

有点晚了，但是我想指出这段代码有一个错误：在diffNodes（）函数中，node2没有被引用 - 第二个循环错误地重复使用了node1（我编辑了代码来修复这个问题）。此外，它还有一个限制：由于子映射的键方式，此差异不支持元素名称不唯一的情况，即包含可重复子元素的元素。 - aberrant80

7

AssertJ 1.4+具有特定的断言来比较XML内容：

String expectedXml = "<foo />";
String actualXml = "<bar />";
assertThat(actualXml).isXmlEqualTo(expectedXml);

这里是文档。

- Gian Marco

然而，两个文档之间微不足道的命名空间前缀差异会导致AssertJ失败。AssertJ是一个很好的工具，但这项工作确实适合XMLUnit。 - Alexander Vasiljev

4

以下代码对我有效：

String xml1 = ...
String xml2 = ...
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreAttributeOrder(true);
XMLAssert.assertXMLEqual(actualxml, xmlInDb);

- arunkumar sambu

3

任何情境？图书馆参考资料？ - Ben

3

Skaffman似乎给出了一个不错的答案。

另一种方法可能是使用命令行工具（例如xmlstarlet(http://xmlstar.sourceforge.net/)）格式化XML，然后格式化两个字符串，最后使用任何差异工具（库）来比较输出文件。我不知道当问题涉及命名空间时是否是一个好的解决方案。

- anjanb

3

我正在使用 Altova DiffDog，它具有比较XML文件结构（忽略字符串数据）的选项。

这意味着（如果检查“忽略文本”选项）：

<foo a="xxx" b="xxx">xxx</foo>

并且

<foo b="yyy" a="yyy">yyy</foo>

它们在结构上完全相等。如果您有在数据上有所不同但结构相同的示例文件，这就非常方便了！

- Pimin Konstantin Kefaloukos

4

唯一的缺点是它不是免费的（99欧元得购买专业版许可证），但有30天的试用期。 - Pimin Konstantin Kefaloukos

2

我只找到了这个工具(http://www.altova.com/diffdog/diff-merge-tool.html)，拥有一个库会更好。 - dma_k

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tom · Accepted Answer

211

听起来需要用XMLUnit解决

示例：

public class SomeTest extends XMLTestCase {
  @Test
  public void test() {
    String xml1 = ...
    String xml2 = ...

    XMLUnit.setIgnoreWhitespace(true); // ignore whitespace differences

    // can also compare xml Documents, InputSources, Readers, Diffs
    assertXMLEqual(xml1, xml2);  // assertXMLEquals comes from XMLTestCase
  }
}

- Tom

1

我曾经在使用XMLUnit时遇到了问题，它对于XML API的版本过于敏感，并且并不可靠。不过，我已经很久没有再用它了，转而使用XOM，也许它已经有所改进。 - skaffman

68

针对XMLUnit初学者，需要注意的是，默认情况下，若控制文档和测试文档在缩进/换行方面不同，myDiff.similar()会返回false。我希望这种行为出现在myDiff.identical()中，而非myDiff.similar()。在setUp方法中使用XMLUnit.setIgnoreWhitespace(true)可以改变整个测试类中所有测试的行为，或在单独的测试方法中使用它来仅改变该测试的行为。 - Stew

1

@Stew 感谢您的评论，我刚开始使用XMLUnit，肯定会遇到这个问题。+1 - Jay

3

如果您正在GitHub上尝试使用XMLUnit 2，请注意它是完全重写的版本。因此，本示例适用于SourceForge上的XMLUnit 1。另外，SourceForge页面指出："XMLUnit for Java 1.x将继续得到维护"。 - Yngvar Kristiansen

1

该方法是assertXMLEqual，源自XMLAssert.java。 - user2818782

显示剩余3条评论