C# HTMLAgilityPack删除<img>节点

3

我是使用 HTMLAgilityPack 的新手。我有以下的 HTML 文档:

<a href="https://twitter.com/RedGiantNews" target="_blank"><img 
src="http://image.e.redgiant.com/lib/998.png" width="24" border="0" 
alt="Twitter" title="Twitter" class="smImage"></a><a 
href="https://www.facebook.com/RedGiantSoftware" target="_blank"><img 
src="http://image.e.redgiant.com/lib/db5.png" width="24" border="0" 
alt="Facebook" title="Facebook" class="smImage"></a>
http://click.e.redgiant.com/?qs=d2ad061f
<a href="https://www.instagram.com/redgiantnews/" target="_blank"><img 
src="http://image.e.redgiant.com/aa10-f8747e56f06d.png" width="24" 
border="0" alt="Instagram" title="Instagram" class="smImage"></a>

我试图从HTML文件中删除所有图片,也就是说删除所有<img....>节点。我尝试了StackOverflow上另一个解决方案中的以下代码,但失败了,因为它返回与上面相同的HTML:

var sb = new StringBuilder();
doc.LoadHtml(inputHTml);

foreach (var node in doc.DocumentNode.ChildNodes)
{
 if (node.Name != "img" && node.Name!="a")
  {
    sb.Append(node.InnerHtml);
  }
}
1个回答

4
static string OutputHtml = @"<a href=""https://twitter.com/RedGiantNews"" target=""_blank""><img 
                                    src=""http://image.e.redgiant.com/lib/998.png"" width=""24"" border=""0"" 
                                    alt=""Twitter"" title=""Twitter"" class=""smImage""></a><a
                                    href = ""https://www.facebook.com/RedGiantSoftware"" target=""_blank""><img
                                    src = ""http://image.e.redgiant.com/lib/db5.png"" width=""24"" border=""0"" 
                                    alt=""Facebook"" title=""Facebook"" class=""smImage""></a>
                                    <a href = ""https://www.instagram.com/redgiantnews/"" target=""_blank""><img
                                    src = ""http://image.e.redgiant.com/aa10-f8747e56f06d.png"" width=""24"" 
                                    border=""0"" alt=""Instagram"" title=""Instagram"" class=""smImage""></a>";

我从原始的 HTML 代码中移除了浮动链接(http://click.e.redgiant.com/?qs=d2ad061f)。
方法一:
public static string RemoveAllImageNodes(string html)
    {
        try
        {
            HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
            document.LoadHtml(html);

            var nodes = document.DocumentNode.SelectNodes("//img");

            foreach (var node in nodes)
            {
                node.Remove();
                //node.Attributes.Remove("src"); //This only removes the src Attribute from <img> tag
            }

            html = document.DocumentNode.OuterHtml;
            return html;
        }
        catch (Exception ex)
        {
            throw ex;
        }
    }

方法二:

public static string RemoveAllImageNodes(string html)
    {
        try
        {
            HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
            document.LoadHtml(html);

            if (document.DocumentNode.InnerHtml.Contains("<img"))
            {
                foreach (var eachNode in document.DocumentNode.SelectNodes("//img"))
                {
                    eachNode.Remove();
                    //eachNode.Attributes.Remove("src"); //This only removes the src Attribute from <img> tag
                }
            }

            html = document.DocumentNode.OuterHtml;
            return html;
        }
        catch (Exception ex)
        {
            throw ex;
        }
    }

输出 HTML:

<a href="https://twitter.com/RedGiantNews" target="_blank"></a>
<a href="https://www.facebook.com/RedGiantSoftware" target="_blank"></a>
<a href="https://www.instagram.com/redgiantnews/" target="_blank"></a>

移除“img”标签中仅包含“src”属性后的HTML输出结果:

<a href="https://twitter.com/RedGiantNews" target="_blank"><img width="24" border="0" alt="Twitter" title="Twitter" class="smImage"></a>
<a href="https://www.facebook.com/RedGiantSoftware" target="_blank"><img width="24" border="0" alt="Facebook" title="Facebook" class="smImage"></a>
<a href="https://www.instagram.com/redgiantnews/" target="_blank"><img width="24" border="0" alt="Instagram" title="Instagram" class="smImage"></a>

我尝试了你的代码...它可以工作...但问题是,我的HTML比发布的要多得多.....你的代码删除了一些<img>标签,但并非全部... - user8697090
谢谢,它帮了我很多 :) - user8697090
嘿,我能否仅删除<img>节点中的“src”? - user8697090
@nabil3342。当然可以。在每种方法中,“foreach”循环中,只需删除“src”属性而不是完整节点即可。我已经更新了答案并附上了代码。 - Kishore

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接