如何在将XML文件读入XmlDocument时忽略注释?

26

我正在尝试使用C#读取XML文档,我是这样做的:

XmlDocument myData = new XmlDocument();
myData.Load("datafile.xml");

无论如何,当我阅读XmlNode.ChildNodes时,有时会收到评论。

为了使有同样需求的人受益,这是我最终完成的方法:

/** Validate a file, return a XmlDocument, exclude comments */
private XmlDocument LoadAndValidate( String fileName )
{
    // Create XML reader settings
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreComments = true;                         // Exclude comments
    settings.ProhibitDtd = false;                           
    settings.ValidationType = ValidationType.DTD;           // Validation

    // Create reader based on settings
    XmlReader reader = XmlReader.Create(fileName, settings);

    try {
        // Will throw exception if document is invalid
        XmlDocument document = new XmlDocument();
        document.Load(reader);
        return document;
    } catch (XmlSchemaException) {
        return null;
    }
}
谢谢你
Tommaso
6个回答

45
您可以使用 XmlReader,并将XmlReaderSettings.IgnoreComments设置为true:
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.IgnoreComments = true;
using (XmlReader reader = XmlReader.Create("input.xml", readerSettings))
{
    XmlDocument myData = new XmlDocument();
    myData.Load(reader);
    // etc...
}

(通过在这里搜索XmlDocument ignore comments找到)


这不包括<!-- -->注释,但/* */是有效的XML注释(使用w3c验证器进行了检查)... - bkovacic

26
foreach(XmlNode node in nodeList)
  if(node.NodeType != XmlNodeType.Comment)
     ...

3
为了提高可读性,你可以选择将筛选器从循环中分离出来:foreach(XmlNode node in nodeList.Where(node => node.NodeType != XmlNodeType.Comment)) ... (翻译尽可能保持原意和简洁,如有歧义或不够准确之处,请谅解并指出) - Curt Nichols

5
您可以在ChildNodes上直接添加过滤器。例如:
var children = myNode.ChildNodes.Cast<XmlNode>().Where(n => n.NodeType != XmlNodeType.Comment);

或者,您可以加载XmlDocument并传入一个XmlReader,该XmlReader的设置为XmlReaderSettings.IgnoreComments为true。

using (var file = File.OpenRead("datafile.xml"))
{
    var settings = new XmlReaderSettings() { IgnoreComments = true, IgnoreWhitespace = true };
    using (var xmlReader = XmlReader.Create(file, settings))
    {
        var document = new XmlDocument();
        document.Load(xmlReader);

        // Process document nodes...
    }
}

4

使用XmlReaderSettings

XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.IgnoreComments = true;
XmlReader reader = XmlReader.Create(sourceFilePath, readerSettings);
XmlDocument myXmlDoc = new XmlDocument();
myXmlDoc.Load(reader);

2

如果你想使用XmlDocument而不是XmlReader,最好按名称引用子节点或使用XPath。

这样就不需要担心添加的注释、其他节点或顺序是否已更改。

XmlDocument myData = new XmlDocument();
myData.Load("datafile.xml");

XmlNode DocNode = myData.DocumentElement;

XmlNode Child = DocNode ["SomeChildNode"];

这将选择“SomeChildNode”,即根元素的子元素。
下一个示例将循环遍历books.xml中的所有书籍并打印作者。它使用字符串属性选择器和Xpath。它不应受到注释等的影响。
XmlDocument myData = new XmlDocument();
myData.Load("books.xml");

XmlNode DocNode = myData.DocumentElement;

XmlNodeList BookNodeList = DocNode.SelectNodes("./book");

foreach (XmlNode Book in BookNodeList)
{
    Console.WriteLine(Book["author"].InnerText);
}

请注意,使用XPath,您可以轻松搜索文档中的所有书籍元素,例如“.//book”。
books.xml:
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
<catalog>

参考资料:

XmlNode.Item 属性 (String) hxxp://msdn.microsoft.com/zh-cn/library/sss31aas.aspx XmlNode.SelectNodes 方法 (String) http://msdn.microsoft.com/zh-cn/library/hcebdtae.aspx XmlNode.SelectSingleNode 方法 (String) http://msdn.microsoft.com/zh-cn/library/fb63z0tw.aspx


2
Dim pattern As String = String.Empty
Dim xDoc As XmlDocument = New XmlDocument()

xDoc.Load(path)

''Pattern of comments
pattern = "(<!--.*?--\>)"
xDoc.InnerXml = Regex.Replace(xDoc.InnerXml, pattern, String.Empty, RegexOptions.Singleline)

<!--aftr this run ur code-->

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接