我正在尝试使用Scrapy爬取一个网站。
以下是我根据http://thuongnh.com/building-a-web-crawler-with-scrapy/编写的代码。(原始代码完全无法工作,所以我试图重建它。)
问题在于爬虫可以访问起始页面,但之后无法抓取任何页面。
以下是我根据http://thuongnh.com/building-a-web-crawler-with-scrapy/编写的代码。(原始代码完全无法工作,所以我试图重建它。)
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import Spider
from scrapy.selector import HtmlXPathSelector
from nettuts.items import NettutsItem
from scrapy.http import Request
from scrapy.linkextractors import LinkExtractor
class MySpider(Spider):
name = "nettuts"
allowed_domains = ["net.tutsplus.com"]
start_urls = ["http://code.tutsplus.com/posts?"]
rules = [Rule(LinkExtractor(allow = ('')), callback = 'parse', follow = True)]
def parse(self, response):
hxs = HtmlXPathSelector(response)
item = []
titles = hxs.xpath('//li[@class="posts__post"]/a/text()').extract()
for title in titles:
item = NettutsItem()
item["title"] = title
yield item
return
问题在于爬虫可以访问起始页面,但之后无法抓取任何页面。