我能使用Html Agility Pack使输出呈现出漂亮的缩进,同时去除不必要的空白吗?
我能使用Html Agility Pack使输出呈现出漂亮的缩进,同时去除不必要的空白吗?
HAP无法给您想要的结果。
尝试使用类似于这里找到的HtmlTidy的.net包装器。
using System;
using System.IO;
using System.Net;
using Mark.Tidy;
namespace CleanupHtml
{
/// <summary>
/// http://markbeaton.com/SoftwareInfo.aspx?ID=81a0ecd0-c41c-48da-8a39-f10c8aa3f931
/// </summary>
internal class Program
{
private static void Main(string[] args)
{
string html =
new WebClient().DownloadString(
"https://dev59.com/kHE85IYBdhLWcg3w1HGF#2610903");
using (Document doc = new Document(html))
{
doc.ShowWarnings = false;
doc.Quiet = true;
doc.OutputXhtml = true;
doc.OutputXml = true;
doc.IndentBlockElements = AutoBool.Yes;
doc.IndentAttributes = false;
doc.IndentCdata = true;
doc.AddVerticalSpace = false;
doc.WrapAt = 120;
doc.CleanAndRepair();
string output = doc.Save();
Console.WriteLine(output);
File.WriteAllText("output.htm", output);
}
}
}
}
结果:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="HTML Tidy for Windows (vers 14 October 2008), see www.w3.org" />
<title>
Html Agility Pack: make code look neat - Stack Overflow
</title>
<link rel="stylesheet" href="http://sstatic.net/so/all.css?v=6638" type="text/css" />
<link rel="shortcut icon" href="http://sstatic.net/so/favicon.ico" />
<link rel="apple-touch-icon" href="http://sstatic.net/so/apple-touch-icon.png" />
<link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href=
"http://sstatic.net/so/opensearch.xml" />
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js">
</script>
<script type="text/javascript" src="http://sstatic.net/so/js/master.js?v=6523">
</script>
<script type="text/javascript"></script>
<link rel="canonical" href="https://dev59.com/kHE85IYBdhLWcg3w1HGF" />
<link rel="alternate" type="application/atom+xml" title=
"Feed for question 'Html Agility Pack: make code look neat'" href="/feeds/question/2593147" />
<script src="http://sstatic.net/so/js/question.js?v=6714" type="text/javascript">
</script>
<script type="text/javascript"></script>
</head>
<body>
<noscript>
<div id="noscript-padding"></div></noscript>
<div id="notify-container"></div><script type="text/javascript"></script>
<div class="container">
<div id="header">
<div id="topbar">
<div id="hlinks">
<a href=
"/users/login?returnurl=%2fquestions%2f2593147%2fhtml-agility-pack-make-code-look-neat%2f2610903">login</a>
<span class="lsep">|</span> <a href="http://careers.stackoverflow.com/">careers</a> <span class=
"lsep">|</span> <a href="/about">about</a> <span class="lsep">|</span> <a href="/faq">faq</a>
</div>
<div id="hsearch">
<form id="search" action="/search" method="get" name="search">
<div>
<input name="q" class="textbox" tabindex="1" onfocus="if (this.value=='search') this.value = ''" type=
"text" maxlength="80" size="28" value="search" />
</div>
</form>
</div>
</div><br class="cbt" />
<div id="hlogo">
<a href="/"><img src="http://sstatic.net/so/img/logo.png" width="250" height="61" alt="Stack Overflow" /></a>
</div>
<div id="hmenus">
<div class="nav">
<ul>
<li class="youarehere">
<a href="/questions">Questions</a>
</li>
<li>
<a href="/tags">Tags</a>
</li>
<li>
<a href="/users">Users</a>
</li>
<li>
<a href="/badges">Badges</a>
</li>
<li>
<a href="/unanswered">Unanswered</a>
</li>
</ul>
</div>
<div class="nav" style="float:right">
<ul>
<li style="margin-right:0px">
<a href="/questions/ask">Ask Question</a>
</li>
</ul>
</div>
</div>
</div>
<div id="content">
<div id="question-header">
<h2>
<a href="/questions/2593147/html-agility-pack-make-code-look-neat" class="question-hyperlink">Html Agility
Pack: make code look neat</a>
</h2>
</div>
<div id="mainbar">
<div id="question" class="">
<div class="everyonelovesstackoverflow">
<script type="text/javascript">
//<![CDATA[
document.write('<s'+'cript lang' + 'uage="jav' + 'ascript" src="http://ads.stackoverflow.com/a.aspx?ZoneID=3&Task=Get&IFR=False&PageID=52405&SiteID=1&Random=' + (+new Date()) + '&Keywords=htmlagilitypack">');
document.write('</'+'scr'+'ipt>');
//]]>
</script> <noscript>
<div>
<a href=
"http://ads.stackoverflow.com/a.aspx?ZoneID=3&Task=Click&Mode=HTML&SiteID=1&PageID=52405">
<img src=
"http://ads.stackoverflow.com/a.aspx?ZoneID=3&Task=Get&Mode=HTML&SiteID=1&PageID=52405"
alt="" /></a>
</div></noscript>
</div>
<table>
<tr>
<td class="votecell">
<div class="vote">
<input type="hidden" value="2593147" /> <img class="vote-up" src=
"http://sstatic.net/so/img/vote-arrow-up.png" width="40" height="25" alt="vote up" title=
"This question is useful and clear (click again to undo)" /> <span class="vote-count-post">1</span>
<img class="vote-down" src="http://sstatic.net/so/img/vote-arrow-down.png" width="40" height="25"
alt="vote down" title="This question is unclear or not useful (click again to undo)" /> <img class=
"vote-favorite" src="http://sstatic.net/so/img/vote-favorite-off.png" width="32" height="31" alt=
"star" title="This is a favorite question (click again to undo)" />
<div class="favoritecount"></div>
</div>
</td>
<td>
<div>
<div class="post-text">
<p>
Can I use Html Agility Pack to make the output look nicely indented, unnecessary white space
stripped?
</p>
</div>
<div class="post-taglist">
<a href="/questions/tagged/htmlagilitypack" class="post-tag" title=
"show questions tagged 'htmlagilitypack'" rel="tag">htmlagilitypack</a>
</div>
<table class="fw">
<tr>
<td class="vt">
<div class="post-menu">
<a id="flag-post-2593147" title="flag this post for serious problems" name=
"flag-post-2593147">flag</a>
</div>
</td>
<td class="post-signature owner">
<div class="user-info">
<div class="user-action-time">
asked <span title="2010-04-07 14:13:47Z" class="relativetime">2 days ago</span>
</div>
<div class="user-gravatar32">
<a href="/users/51795/illdev"><img src=
"http://www.gravatar.com/avatar/52dc0db2cdacc6e9769d074a37466317?s=32&d=identicon&r=PG"
height="32" width="32" alt="" /></a>
</div>
<div class="user-details">
<a href="/users/51795/illdev">illdev</a><br />
<span class="reputation-score" title="reputation score">53</span><span title=
"5 bronze badges"><span class="badge3">●</span><span class=
"badgecount">5</span></span>
</div>
</div><br class="cbt" />
<div class="accept-rate cool" title=
"this user has accepted an answer for 2 of 4 eligible questions">
50% accept rate
</div>
</td>
</tr>
</table>
</div>
</td>
</tr>
<tr>
<td class="votecell"></td>
<td>
<div id="comments-2593147" class="comments">
<table>
<tbody>
<tr id="comment-2600849" class="comment">
<td></td>
<td class="comment-text">
<div>
what output? From where? some more details perhaps? – <a href=
"/users/97614/sam-holder" title="1868" class="comment-user">Sam Holder</a> <span class=
"comment-date"><span title="2010-04-07 14:16:41Z">2 days ago</span></span>
</div>
</td>
</tr>
<tr id="comment-2600851" class="comment">
<td></td>
<td class="comment-text">
<div>
<i>(reference)</i> <a href="http://htmlagilitypack.codeplex.com/Wikipage" rel=
"nofollow">htmlagilitypack.codeplex.com/Wikipage</a> – <a href=
"/users/208809/gordon" title="16497" class="comment-user">Gordon</a> <span class=
"comment-date"><span title="2010-04-07 14:16:55Z">2 days ago</span></span>
</div>
</td>
</tr>
<tr id="comment-2624419" class="comment">
<td></td>
<td class="comment-text">
<div>
output = html code output – <a href="/users/51795/illdev" title="53" class=
"comment-user owner">illdev</a> <span class="comment-date"><span title=
"2010-04-10 13:14:42Z">12 secs ago</span></span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</table>
</div>
<div id="answers">
<a name="tab-top" id="tab-top"></a>
<div id="answers-header">
<div id="subheader">
<h2>
2 Answers
</h2>
<div id="tabs">
<a href="/questions/2593147?tab=oldest#tab-top" title=
"Answers in the order they were given">oldest</a> <a href="/questions/2593147?tab=newest#tab-top"
title="Most recent answers first">newest</a> <a class="youarehere" href=
"/questions/2593147?tab=votes#tab-top" title="Answers with the most votes first">votes</a>
</div>
</div>
</div><a name="2610845"></a>
<div id="answer-2610845" class="answer">
<table>
<tr>
<td class="votecell">
<div class="vote">
<input type="hidden" value="2610845" /> <img class="vote-up" src=
"http://sstatic.net/so/img/vote-arrow-up.png" width="40" height="25" alt="vote up" title=
"This answer is useful (click again to undo)" /> <span class="vote-count-post">0</span>
<img class="vote-down" src="http://sstatic.net/so/img/vote-arrow-down.png" width="40" height="25"
alt="vote down" title="This answer is not useful (click again to undo)" />
</div>
</td>
<td>
<div class="post-text">
<p>
A variation of this question has been answered recently
</p>
<ul>
<li>
<a href=
"http://stackoverflow.com/questions/2490765/which-is-the-best-html-tidy-pack-is-there-any-option-in-html-agility-pack-to-mak/2507673#2507673">
http://stackoverflow.com/questions/2490765/which-is-the-best-html-tidy-pack-is-there-any-option-in-html-agility-pack-to-mak/2507673#2507673</a>
</li>
</ul>
<p>
Basically the outcome of this was that while you <strong>can</strong> use HtmlAgilityPack to
clean it up a bit by using the fix nested tags.
</p>
<p>
The best solution is to use something called Tidy which is an application that was originally
created by some developers at w3c and then made open source. Its the engine that powers the w3c
validator as well.
</p>
<p>
This article covers how to use it but you had to sign up (free) to view it:
</p>
<ul>
<li>
<a href="http://www.devx.com/dotnet/Article/20505/1763/" rel=
"nofollow">http://www.devx.com/dotnet/Article/20505/1763/</a>
</li>
</ul>
<p>
It seems like a legit article but its funny because nobody else seems to have covered this
topic in the last six years...
</p>
</div>
<table class="fw">
<tr>
<td class="vt">
<div class="post-menu">
<a href="/questions/2593147/html-agility-pack-make-code-look-neat/2610845#2610845" title=
"permalink to this answer">link</a><span class="lsep">|</span><a id="flag-post-2610845"
title="flag this post for serious problems" name="flag-post-2610845">flag</a>
</div>
</td>
<td align="right" class="post-signature">
<div class="user-info">
<div class="user-action-time">
answered <span title="2010-04-09 20:55:18Z" class="relativetime">16 hours ago</span>
</div>
<div class="user-gravatar32">
<a href="/users/156388/rtpharry"><img src=
"http://www.gravatar.com/avatar/6811db2b37e824fdf6c5c4fcdddd4146?s=32&d=identicon&r=PG"
height="32" width="32" alt="" /></a>
</div>
<div class="user-details">
<a href="/users/156388/rtpharry">rtpHarry</a><br />
<span class="reputation-score" title="reputation score">88</span><span title=
"6 bronze badges"><span class="badge3">●</span><span class=
"badgecount">6</span></span>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td class="votecell"></td>
<td>
<div id="comments-2610845" class="comments dno">
<table>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</table>
</div>
<div class="everyonelovesstackoverflow">
<script type="text/javascript">
//<![CDATA[
document.write('<s'+'cript lang' + 'uage="jav' + 'ascript" src="http://ads.stackoverflow.com/a.aspx?ZoneID=14&Task=Get&IFR=False&PageID=52405&SiteID=1&Random=' + (+new Date()) + '&Keywords=htmlagilitypack">');
document.write('</'+'scr'+'ipt>');
//]]>
</script> <noscript>
<div>
<a href=
"http://ads.stackoverflow.com/a.aspx?ZoneID=14&Task=Click&Mode=HTML&SiteID=1&PageID=52405">
<img src=
"http://ads.stackoverflow.com/a.aspx?ZoneID=14&Task=Get&Mode=HTML&SiteID=1&PageID=52405"
alt="" /></a>
</div></noscript>
</div><a name="2610903"></a>
<div id="answer-2610903" class="answer">
<table>
<tr>
<td class="votecell">
<div class="vote">
<input type="hidden" value="2610903" /> <img class="vote-up" src=
"http://sstatic.net/so/img/vote-arrow-up.png" width="40" height="25" alt="vote up" title=
"This answer is useful (click again to undo)" /> <span class="vote-count-post">0</span>
<img class="vote-down" src="http://sstatic.net/so/img/vote-arrow-down.png" width="40" height="25"
alt="vote down" title="This answer is not useful (click again to undo)" />
</div>
</td>
<td>
<div class="post-text">
<p>
Output as XHTML and run that through an <a href=
"http://msdn.microsoft.com/en-us/library/system.xml.xmltextwriter.indentation.aspx" rel=
"nofollow">XmlTextWriter</a>
</p>
</div>
<table class="fw">
<tr>
<td class="vt">
<div class="post-menu">
<a href="/questions/2593147/html-agility-pack-make-code-look-neat/2610903#2610903" title=
"permalink to this answer">link</a><span class="lsep">|</span><a id="flag-post-2610903"
title="flag this post for serious problems" name="flag-post-2610903">flag</a>
</div>
</td>
<td align="right" class="post-signature">
<div class="user-info">
<div class="user-action-time">
answered <span title="2010-04-09 21:02:34Z" class="relativetime">16 hours ago</span>
</div>
<div class="user-gravatar32">
<a href="/users/242897/sky-sanders"><img src=
"http://www.gravatar.com/avatar/df4a7fbd8a054fd6193ca0ee62952f1f?s=32&d=identicon&r=PG"
height="32" width="32" alt="" /></a>
</div>
<div class="user-details">
<a href="/users/242897/sky-sanders">Sky Sanders</a><br />
<span class="reputation-score" title="reputation score">4,014</span><span title=
"2 silver badges"><span class="badge2">●</span><span class=
"badgecount">2</span></span><span title="14 bronze badges"><span class=
"badge3">●</span><span class="badgecount">14</span></span>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td class="votecell"></td>
<td>
<div id="comments-2610903" class="comments dno">
<table>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</table>
</div>
<form id="post-form" action="/questions/2593147/answer/submit" method="post" name="post-form">
<h2 class="space">
Your Answer
</h2><script src="http://sstatic.net/so/Js/wmd.js?v=6016" type="text/javascript">
</script> <script type="text/javascript"></script>
<div id="post-editor">
<div id="wmd-container">
<div id="wmd-button-bar"></div>
<textarea id="wmd-input" name="post-text" cols="92" rows="15" tabindex="101">
</textarea>
</div>
<div class="community-option">
<input id="communitymode" name="communitymode" type="checkbox" /> <label for="communitymode" title=
"community owned posts do not generate any reputation for the owner, have a lower reputation barrier for collaborative editing, and show only a revision history instead of a signature block">
community wiki</label>
</div>
<div id="wmd-preview"></div>
<div id="edit-block">
<input id="fkey" name="fkey" type="hidden" value="b00609a1a5f2966a687eca3f84e4dd64" /> <input id=
"author" name="author" type="text" />
</div>
</div>
<div class="form-item">
<table>
<tr>
<td class="vm">
<label for="openid_identifier">OpenID Login</label> <input id="openid_identifier" name=
"openid_identifier" class="openid-identifer" type="text" size="40" maxlength="200" value=""
tabindex="104" />
<div class="form-item-info">
Get an <a href="http://openid.net/get/" target="_blank">OpenID</a>
</div>
</td>
<td class="orcell">
<div class="orword">
or
</div>
<div class="orline"></div>
</td>
<td class="vm">
<div>
<label for="display-name">Name</label> <input id="display-name" name="display-name" type="text"
size="30" maxlength="30" value="" tabindex="105" />
</div>
<div>
<label for="m-address">Email</label> <input id="m-address" name="m-address" type="text" size=
"40" maxlength="100" value="" tabindex="106" /> <span class="edit-field-overlay" style=
"color:#999; font-weight:normal">never shown</span>
</div>
<div>
<label for="home-page">Home Page</label> <input id="home-page" name="home-page" type="text"
size="40" maxlength="200" value="" tabindex="107" />
</div>
</td>
</tr>
</table>
</div>
<div class="form-submit cbt">
<input id="submit-button" type="submit" value="Post Your Answer" tabindex="110" />
</div>
</form>
<h2 class="space">
Not the answer you're looking for? Browse other questions tagged <a href=
"/questions/tagged/htmlagilitypack" class="post-tag" title="show questions tagged 'htmlagilitypack'" rel=
"tag">htmlagilitypack</a> or <a href="/questions/ask">ask your own question</a>.
</h2>
</div><img src="/posts/2593147/ivc/1707" class="dno" alt="" />
</div>
这个问题的变体最近已经得到了回答
基本上,结果是虽然您可以使用HtmlAgilityPack通过使用修复嵌套标记来稍微清理一下。
最好的解决方案是使用一个叫做Tidy的东西,它是一个应用程序,最初由w3c的一些开发人员创建,然后成为开源。它也是驱动w3c验证器的引擎。
本文介绍了如何使用它,但您需要注册(免费)才能查看:
这篇文章看起来很正经,但有趣的是,在过去的六年里似乎没有其他人涉及过这个话题...
在这里可以看到一个类似的问题:HtmlAgilityPack: how to create indented HTML?,以及我的答案:
不行,这是“按设计”选择。 XML(或XHTML,它是XML而不是HTML) 在大多数情况下, 空格没有特定的含义, 而HTML则不同。
这不是一个小改进,因为 更改空格可能会更改某些浏览器呈现给定HTML块的方式, 特别是格式不正确的HTML(通常由库很好地处理)。 Html Agility Pack旨在最小化HTML的呈现方式,而不是标记的编写方式。
我并不是说这是不可行的或根本不可能的。 显然,您可以转换为XML,然后就完成了(您可以编写扩展方法使其更容易), 但在一般情况下,呈现的输出可能会有所不同。