使用HtmlAgilityPack在C#中获取HTML表格数据

5
我会尝试使用HtmlAgilityPack解析HTML表格以获取信息。以下是HTML的样例代码:
...
...
...
<tbody>
                    <tr>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_18">AA00857</div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div></div>
                            <div class="style_20">TPRCF</div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_21"></div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_21">16908/2</div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_18">&nbsp;ETG_C</div>
                        </td>
                    </tr>
                    <tr>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_18">AA01231</div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div></div>
                            <div class="style_20">TPRCF</div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_21"></div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_21">16909/19</div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_18">&nbsp;ETG_C</div>
                        </td>
                    </tr>
                    <tr>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_18">AA01233</div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div></div>
                            <div class="style_20">TPRCF</div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_21"></div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_21">16907/7</div>
                        </td>
                        <td class="style_19" style="vertical-align: baseline;">
                            <div class="style_18">&nbsp;ETG_C</div>
                        </td>
                    </tr>
...
...

我需要从上面提取出以下值:
AA00857, TPRCF, 16908/2, ETG_C

到目前为止,我只有这个:
HtmlWeb hw = new HtmlWeb();
            HtmlAgilityPack.HtmlDocument htmlDoc = hw.Load(@"http://www.some123123site.com/index");



            if (htmlDoc.DocumentNode != null)
            {
                HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//tbody");

                if (bodyNode != null)
                {
                    // Do something with bodyNode
                }
            }

please help!

1个回答

2

试试这个:

HtmlWeb hw = new HtmlWeb();              
HtmlAgilityPack.HtmlDocument htmlDoc = hw.Load(@"http://www.some123123site.com/index");                 
if (htmlDoc.DocumentNode != null)              
{                   
        foreach(HtmlNode text in htmlDoc.DocumentNode.SelectNodes("//tr/td/div/text()"))
        {     
            Console.WriteLine(text.InnerText);  
        }
}

错误1:'HtmlAgilityPack.HtmlDocument'不包含'DocumentElement'的定义,也没有接受类型为'HtmlAgilityPack.HtmlDocument'的第一个参数的扩展方法'DocumentElement'。 - JOE SKEET
@cybernate 谢谢你,由于某些原因它不喜欢这一行代码:HtmlAgilityPack.HtmlDocument htmlDoc = hw.Load(@"http://www.some123123site.com/index"); 当我运行它时,它试图保存文件。 - JOE SKEET
我在本地主机上使用URL进行了测试,可以看到结果。你是使用相同的代码还是修改过? - Chandu
@cybernate:这是我的问题,我试图打开的URL受到限制,我首先需要登录到不同的页面,我该怎么办? - JOE SKEET
@spark,你知道如何解决这个问题吗? - JOE SKEET
这是Windows身份验证还是自定义身份验证? - Chandu

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接