XPath: 在特定标签后选择文本并在相同的下一个标签之前选择。

Question

XPath: 在特定标签后选择文本并在相同的下一个标签之前选择。

7

I have html code like this:

<strong>Term:</strong>
Some text<br />
More text<br />
Some more lines of text
<strong>Term:</strong>
Some text<br />
More text<br />
Some more lines of text
<strong>Second term:</strong>
Some text<br />
More text<br />
Some more lines of text
<strong>Term:</strong>
Some text<br />
More text<br />
Some more lines of text

我需要获取在标签中，文本为“Term”和下一个标签之间的文本节点：

Some text
More text
Some more lines of text
Some text
More text
Some more lines of text
Some text
More text
Some more lines of text

在这里可以使用条件：之前的标签必须包含文本“Term”，但我不知道如何创建像这样的xpath选择器。

- Stephan Olmer

1

嗨，我觉得问题不是很清楚。能否请您发布所需的输出结果？也许这样我就能明白您究竟想要什么了。 - Ravish

我更新了问题。抱歉我的英语不好。 - Stephan Olmer

您已更改输入，请同时更新所需的输出。此外，添加有意义的文本以区分子节点。根据您的描述，仍然很难理解您需要什么。 - Emiliano Poggi

我已经扩展了我的答案，即使你仍然对你的需求感到怀疑。看看是否对你没有问题。 - Emiliano Poggi

2个回答

2

您的问题仍未明确，您的输入文档格式不正确。请检查以下内容：

root/text()[preceding::strong[1][contains(text(),'Term')]]

应用于：

<root>
<strong>Term:</strong>
Some text<br />
More text<br />
Some more lines of text
<strong>Term:</strong>
Some text2<br />
More text2<br />
Some more lines of text2
<strong>Second term:</strong>
Some text3<br />
More text3<br />
Some more lines of text3
<strong>Term:</strong>
Some text4<br />
More text4<br />
Some more lines of text4
</root>

产生：

Some text
More text
Some more lines of text

Some text2
More text2
Some more lines of text2

Some text4
More text4
Some more lines of text4

这个XPath选择所有在一个包含字符串“Term:”的元素和一个包含任何字符串的元素之间的文本节点。

//text()[preceding::*[contains(text(),'Term:')] and following::*[text()]]

应用于：

<root>
<strong>Term:</strong>
Some text<br />
More text<br />
Some more lines of text
<strong>Second term:</strong>
Some text2<br />
More text2<br />
Some more lines of text2
</root>

返回：

Some text
More text
Some more lines of text

- Emiliano Poggi

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ravish · Accepted Answer

//text()[preceding::*[contains(text(),'Term:')] and following::*[contains(text(),'Term:')]]

这与empo建议的相同。但是我正在寻找一个包含术语并返回它们之间所有文本节点的节点。

然而，仅当您没有任何其他“Term”集时才能正常工作。如果是这种情况，请告诉我，因为那么这个Xpath也会返回一些不需要的值。

由于现在您已经更新了输入。我只是在先前的Xpath中添加了一个条件。

//text()[preceding::*[contains(text(),'Term:')] and following::*[contains(text(),'Term:')] and not(contains(., 'Term:'))]

@empo 解决方案也可以使用。但是我们需要考虑 <strong> 标签。我编写的 xpath 只检查单词'Term:'，并输出它们之间的所有文本节点。

如果这个方案适合您，请告诉我。

谢谢。