如何在HTMLAgility Pack中从节点访问子节点

21
<html>
    <body>
        <div class="main">
            <div class="submain"><h2></h2><p></p><ul></ul>
            </div>
            <div class="submain"><h2></h2><p></p><ul></ul>
            </div>
        </div>
    </body>
</html>
我将HTML加载到了一个HtmlDocument对象中。然后我选择了XPath为submain。但我不知道如何分别访问每个标签,例如h2p
HtmlAgilityPack.HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class=\"submain\"]");
foreach (HtmlAgilityPack.HtmlNode node in nodes) {}
如果我使用node.InnerText,我会得到所有的文本,而InnerHtml也没有用。如何选择单独的标签?
3个回答

44
以下内容可能有所帮助:
HtmlAgilityPack.HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class=\"submain\"]");
foreach (HtmlAgilityPack.HtmlNode node in nodes) {
    //Do you say you want to access to <h2>, <p> here?
    //You can do:
    HtmlNode h2Node = node.SelectSingleNode("./h2"); //That will get the first <h2> node
    HtmlNode allH2Nodes= node.SelectNodes(".//h2"); //That will search in depth too

    //And you can also take a look at the children, without using XPath (like in a tree):        
    HtmlNode h2Node = node.ChildNodes["h2"];
}

感谢您使用语法 node.SelectSingleNode("./h2") - Cyclion

6
你正在寻找后代。
var firstSubmainNodeName = doc
   .DocumentNode
   .Descendants()
   .Where(n => n.Attributes["class"].Value == "submain")
   .First()
   .InnerText;

2

据我记忆,每个Node都有自己的ChildNodes集合,所以在你的for…each块内部,你应该能够检查node.ChildNodes


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接