我有以下的html内容,希望通过Cheerio进行解析。
var $ = cheerio.load('<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>This works well.</div><div><br clear="none"/></div><div>So I have been doing this for several hours. How come the space does not split? Thinking that this could be an issue.</div><div>Testing next paragraph.</div><div><br clear="none"/></div><div>Im testing with another post. This post should work.</div><div><br clear="none"/></div><h1>This is for test server.</h1></body></html>', {
normalizeWhitespace: true,
});
// trying to parse the html
// the goals are to
// 1. remove all the 'div'
// 2. clean up <br clear="none"/> into <br>
// 3. Have all the new 'empty' element added with 'p'
var testData = $('div').map(function(i, elem) {
var test = $(elem)
if ($(elem).has('br')) {
console.log('spaceme');
var test2 = $(elem).removeAttr('br');
} else {
var test2 = $(elem).removeAttr('div').add('p');
}
console.log(i +' '+ test2.html());
return test2.html()
})
res.send(test2.html())
我的最终目标是尝试解析HTML。
- 删除所有div
- 清理
<br clear="none"/>
并更改为<br>
- 最后删除所有空的'element' (即那些带有'div'的句子) 并替换为 'p' 句子 '/p'
我试图从上面的代码开始实现一个较小的目标。我尝试删除所有'div'(这是成功的),但我无法找到'br'。我已经尝试了几天,没有任何头绪。
所以我在这里写下来请求一些帮助和提示,想知道如何达到我的最终目标。
谢谢:D