我可以使用HtmlAgilityPack在特定标签处拆分HTML文档吗?

4
例如,我有一堆<tr>标签需要收集。我需要将每个标签拆分成单独的元素,以便更容易地进行解析。
这是否可行?
以下是示例标记:
<tr class="first-in-year">
  <td class="year">2011</td>

  <td class="img"><a href="/battlefield-3/61-27006/"><img src=
  "http://media.giantbomb.com/uploads/6/63038/1700748-bf3_thumb.jpg" alt=""></a></td>

  <td class="title">
    <a href="/battlefield-3/61-27006/">Battlefield 3</a>

    <p class="deck">Battlefield 3 is DICE's next installment in the franchise and
    will be on PC, PS3 and Xbox 360. The game will feature jets, prone, a
    single-player and co-op campaign, and 64-player multiplayer (on PC). It's due out
    in Fall of 2011.</p>
  </td>

  <td class="date">Expected: Q4 2011</td>

  <td><a href="/pc/60-94/" class="PC">PC</a>, <a href="/xbox-360/60-20/" class=
  "X360">X360</a>, <a href="/playstation-3/60-35/" class="PS3">PS3</a></td>
</tr>

<tr>
  <td class="year"></td>

  <td class="img"><a href="/forza-motorsport-4/61-33400/"><img src=
  "http://media.giantbomb.com/uploads/0/1992/1654849-forza4_thumb.jpg" alt=
  ""></a></td>

  <td class="title">
    <a href="/forza-motorsport-4/61-33400/">Forza Motorsport 4</a>

    <p class="deck">The next installment of Turn 10's racing franchise slated for
    release in Fall 2011. It is set to feature 16 player online races, dynamic race
    conditions, cars from over 80 manufacturers, and compatibility with Kinect, both
    on and off the racetrack.</p>
  </td>

  <td class="date">Expected: Oct 2011</td>

  <td><a href="/xbox-360/60-20/" class="X360">X360</a></td>
</tr>

<tr>
  <td class="year"></td>

  <td class="img"><a href="/max-payne-3/61-23398/"><img src=
  "http://media.giantbomb.com/uploads/0/1400/938434-custom_1237811317319_mp3_poster_thumb.jpg"
  alt=""></a></td>

  <td class="title">
    <a href="/max-payne-3/61-23398/">Max Payne 3</a>

    <p class="deck">The long awaited third instalment in Remedy's beloved series, in
    which an aging Max Payne faces one final chance to redeem himself.</p>
  </td>

  <td class="date">Expected: 2011</td>

  <td><a href="/pc/60-94/" class="PC">PC</a>, <a href="/playstation-3/60-35/" class=
  "PS3">PS3</a>, <a href="/xbox-360/60-20/" class="X360">X360</a></td>
</tr>

所以,对于这个例子,我会有三个元素。 :)
1个回答

2

如果您的意思是将其拆分为多个HTML文档,则无法在标记上执行此操作。您可以选择单个TD元素并单独解析它们。

XPath选择器//td将选择所有元素,您可以将其传递到解析方法中。

HtmlAgilityPack.HtmlDocument doc = LoadHtmlHowever();
doc.DocumentNode.SelectNodes("//td");

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接