使用Jsoup替换树中所有文本节点中的文本

Question

使用Jsoup替换树中所有文本节点中的文本

3

例如，我们需要将一些 HTML 标签中的所有文本转换为大写。我们可以像这样完成此操作：

String htmlText = "<h1>Apollo 11</h1><p><strong>Apollo 11</strong> "
            + "was the spaceflight that landed the first humans, Americans <strong>"
            + "<a href=\"http://en.wikipedia.org/wiki/Neil_Armstrong\">Neil Armstrong</a></strong> and... </p>";
Document document = Jsoup.parse(htmlText);
Elements textElements = document.select("h1, p");

for (Element element : textElements) {
        List<TextNode> textNodes = element.textNodes();
        for (TextNode textNode : textNodes){
            textNode.text(textNode.text().toUpperCase());
        }

}
System.out.println(document.html());

结果：

<html><head></head><body><h1>阿波罗11号</h1><p><strong>阿波罗11号</strong>是第一个载人登月任务，美国宇航员<strong><a href="http://en.wikipedia.org/wiki/Neil_Armstrong">尼尔·阿姆斯特朗</a></strong>和...</p></body></html>

因此，所有子元素中的文本都没有大写（例如< strong>阿波罗11号< /strong>）。

我可以循环遍历元素并检查节点和子元素，如下所示：

for (Node node : element.childNodes()){
    if (node instanceof TextNode) {
        String nodeText = ((TextNode) node).text();
        nodeText = nodeText.toUpperCase();
        ((TextNode) node).text(nodeText);
    } else {
        String nodeText = ((Element) node).text();
        nodeText = nodeText.toUpperCase();
        ((Element) node).text(nodeText);
    }
}

但是((Element) node).text()会截取所有子标签，我们得到的是：<html><head></head><body><h1>APOLLO 11</h1><p><strong>APOLLO 11</strong>是第一次载人航天飞行任务，美国宇航员<strong>尼尔·阿姆斯特朗</strong>和...</p></body></html>

注意“尼尔·阿姆斯特朗”上缺少链接标签。

我们可以添加另一个内部循环，并在其中检查TextNode和Element，但我不认为这是一个解决方案。

所以我的问题是，如何在某个html树中对所有Elements/TextNodes的文本进行操作并保留所有子标签不变？

- Ivan

你为什么不直接将整个<p>标签转换为大写？你这样做的目的是什么？ - Scary Wombat

大写只是一个例子。在实际应用中，我需要替换文本标记内的一些文本，无论它们位于哪里（作为父级、子级、子级的子级等等）。element.text().replace("one", "another"); - 剪切所有子标记 element.html().replace("one", "another"); - 修改标记和内容 - Ivan

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- fabian · Accepted Answer

只需将第一种方法应用于所有选定的节点及其后代：

Elements textElements = document.select("h1, p");

for (Element e : textElements.select("*")) {
    for (TextNode tn : e.textNodes()) {
        tn.text(tn.text().toUpperCase());
    }
}