Scrapy获取div中的href链接

Question

11

我开始使用Scrapy进行一个小项目，但是我无法提取链接。每次找到该类时，我只能得到"[]"而不是网址。我是否遗漏了一些显而易见的东西？

sel = Selector(response)
for entry in sel.xpath("//div[@class='recipe-description']"):
    print entry.xpath('href').extract()

网站示例：

<div class="recipe-description">
    <a href="http://www.url.com/">
        <h2 class="rows-2"><span>SomeText</span></h2>
    </a>
</div>

- Trollbrot

我认为你的xpath查询有误。你必须先选择链接，然后获取href属性。类似这样：//a[@href] - narko

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- akhter wahab · Accepted Answer

您的 XPath 查询有误

for entry in sel.xpath("//div[@class='recipe-description']"):

在这一行中，您实际上正在迭代我们没有任何Href属性的div元素

要使它正确，您应该选择

中的achor元素:

for entry in sel.xpath("//div[@class='recipe-description']/a"):
    print entry.xpath('href').extract()

最佳解决方案是在 for 循环中直接提取 href 属性。

for href in sel.xpath("//div[@class='recipe-description']/a/@href").extract():
    print href

为了简单起见，您也可以使用CSS选择器。

for href in sel.css("div.recipe-description a::attr(href)").extract():
    print href