我可以解析文档并生成输出,但由于p标签,输出无法解析为XElement,字符串中的其他内容都被正确解析。
我的输入:
我的输入:
var input = "<p> Not sure why is is null for some wierd reason!<br><br>I have implemented the auto save feature, but does it really work after 100s?<br></p> <p> <i>Autosave?? </i> </p> <p>we are talking...</p><p></p><hr><p><br class=\"GENTICS_ephemera\"></p>";
我的代码:
public static XElement CleanupHtml(string input)
{
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionOutputAsXml = true;
//htmlDoc.OptionWriteEmptyNodes = true;
//htmlDoc.OptionAutoCloseOnEnd = true;
htmlDoc.OptionFixNestedTags = true;
htmlDoc.LoadHtml(input);
// ParseErrors is an ArrayList containing any errors from the Load statement
if (htmlDoc.ParseErrors != null && htmlDoc.ParseErrors.Count() > 0)
{
}
else
{
if (htmlDoc.DocumentNode != null)
{
var ndoc = new HtmlDocument(); // HTML doc instance
HtmlNode p = ndoc.CreateElement("body");
p.InnerHtml = htmlDoc.DocumentNode.InnerHtml;
var result = p.OuterHtml.Replace("<br>", "<br/>");
result = result.Replace("<br class=\"special_class\">", "<br/>");
result = result.Replace("<hr>", "<hr/>");
return XElement.Parse(result, LoadOptions.PreserveWhitespace);
}
}
return new XElement("body");
}
我的输出:
<body>
<p> Not sure why is is null for some wierd reason chappy!
<br/>
<br/>I have implemented the auto save feature, but does it really work after 100s?
<br/>
</p>
<p>
<i>Autosave?? </i>
</p>
<p>we are talking...</p>
**<p>**
<hr/>
<p>
<br/>
</p>
</body>
粗体的p标签是没有正确输出的...有什么办法可以解决吗?我在代码方面做错了什么吗?
OptionOutputAsXml
(以及它的用例)存在。 - BrokenGlass'', 十六进制值0x03,是无效的字符。行2081,位置822。 行号2081 行位置822 - Bent Rasmussen
StringWriter
和StringReader
会产生太多的开销。只需使用MemoryStream
并重置位置即可。这比使用ToString()
分配临时字符串更好。 - Baccata