将xPath转换为JSoup查询

Question

将xPath转换为JSoup查询

20

有人知道一种能将xPath转换为JSoup的工具吗？我从Chrome中获取到以下的xPath：

 //*[@id="docs"]/div[1]/h4/a

我想将其转换为Jsoup查询。该路径包含一个我正在尝试引用的href。

- Josh

7个回答

18

我正在使用Google Chrome 版本 47.0.2526.73 m（64位），现在可以直接复制选择器路径，这与JSoup兼容。

屏幕截图中元素的复制选择器为span.com是
#question > table > tbody > tr:nth-child(1) > td.postcell > div > div.post-text > pre > code > span.com

- zackygaurav

我在表格上执行了选择器路径，但它只返回了元素上的#id标签。你能帮忙检查一下吗？ - Murtaza Haji

有没有办法使用Firefox实现相同的功能？ - AhmedRana

@AhmedRana 我没有在 Firefox 上尝试过，但是这个选项应该在 Firefox 的开发者工具中存在。 - zackygaurav

不太适用于像 @ 这样的语法。 - mhstnsc

4

您不一定需要将Xpath转换为JSoup特定的选择器。

相反，您可以使用基于JSoup并支持Xpath的XSoup。

https://github.com/code4craft/xsoup

以下是使用XSoup的示例文档。

@Test
public void testSelect() {

    String html = "<html><div><a href='https://github.com'>github.com</a></div>" +
            "<table><tr><td>a</td><td>b</td></tr></table></html>";

    Document document = Jsoup.parse(html);

    String result = Xsoup.compile("//a/@href").evaluate(document).get();
    Assert.assertEquals("https://github.com", result);

    List<String> list = Xsoup.compile("//tr/td/text()").evaluate(document).list();
    Assert.assertEquals("a", list.get(0));
    Assert.assertEquals("b", list.get(1));
}

- Kotlinboy

很好，这在一定程度上提高了性能。 - sandeep P

2

这是一个使用Xsoup和Jsoup的独立代码片段：

import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import us.codecraft.xsoup.Xsoup;

public class TestXsoup {
    public static void main(String[] args){

            String html = "<html><div><a href='https://github.com'>github.com</a></div>" +
                    "<table><tr><td>a</td><td>b</td></tr></table></html>";

            Document document = Jsoup.parse(html);

            List<String> filasFiltradas = Xsoup.compile("//tr/td/text()").evaluate(document).list();
            System.out.println(filasFiltradas);

    }
}

输出：

[a, b]

包含的库：

xsoup-0.3.1.jar jsoup-1.103.jar

- Om Sao

2

我已经测试了以下XPath和Jsoup，它们可以正常工作。

示例1：

[XPath]

//*[@id="docs"]/div[1]/h4/a

[JSoup]

document.select("#docs > div > h4 > a").attr("href");

例子2：

[XPath]

//*[@id="action-bar-container"]/div/div[2]/a[2]

[JSoup]

document.select("#action-bar-container > div > div:eq(1) > a:eq(1)").attr("href");

- Sivitry

1

尽管这个问题相当老了，我只想提一下最新的 Jsoup 发布版本有一些 Beta 特性，就像在这个问题中要求的那样。

发布版本1.14.3添加了原生 XPath 选择器。自己看一下：https://jsoup.org/news/release-1.14.3 现在你可以使用 Jsoup 的本地方法：

    File downloadedPage = new File("/path/to/your/page.html");
    String xPathSelector = "//*[@id="docs"]/div[1]/h4/a";
    Document document = Jsoup.parse(downloadedPage, "UTF-8");
    Elements elements = document.selectXpath(xPathSelector);

您可以迭代返回的元素！

- camikiller

0

这要看你想要什么。

Document doc = JSoup.parse(googleURL);
doc.select("cite") //to get all the cite elements in the page

doc.select("li > cite") //to get all the <cites>'s that only exist under the <li>'s

doc.select("li.g cite") //to only get the <cite> tags under <li class=g> tags


public static void main(String[] args) throws IOException {
    String html = getHTML();
    Document doc = Jsoup.parse(html);
    Elements elems = doc.select("li.g > cite");
    for(Element elem: elems){
        System.out.println(elem.toString());
    }
}

- Jose Manuel Vega

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MariuszS · Accepted Answer

这很容易手动转换。

类似于这样（未经测试）

document.select("#docs > div:eq(1) > h4 > a").attr("href");

文档：

http://jsoup.org/cookbook/extracting-data/selector-syntax

将xPath转换为JSoup查询

相关评论问题