如何向Scrapy爬虫传递两个自定义参数

Question

如何向Scrapy爬虫传递两个自定义参数

3

import scrapy

class Funda1Spider(scrapy.Spider):
    name = "funda1"
    allowed_domains = ["funda.nl"]

    def __init__(self, place='amsterdam'):
        self.start_urls = ["http://www.funda.nl/koop/%s/" % place]

    def parse(self, response):
        filename = response.url.split("/")[-2] + '.html'
        with open(filename, 'wb') as f:
            f.write(response.body)

这似乎有效；例如，如果我使用命令行运行它，如下所示：

scrapy crawl funda1 -a place=rotterdam

它生成一个名为“rotterdam.html”的文件，看起来类似于http://www.funda.nl/koop/rotterdam/。接下来，我想扩展它，以便可以指定子页面，例如http://www.funda.nl/koop/rotterdam/p2/。我尝试了以下内容：

import scrapy

class Funda1Spider(scrapy.Spider):
    name = "funda1"
    allowed_domains = ["funda.nl"]

    def __init__(self, place='amsterdam', page=''):
        self.start_urls = ["http://www.funda.nl/koop/%s/p%s/" % (place, page)]

    def parse(self, response):
        filename = response.url.split("/")[-2] + '.html'
        with open(filename, 'wb') as f:
            f.write(response.body)

然而，如果我尝试用以下方式运行它

scrapy crawl funda1 -a place=rotterdam page=2

I get the following error:

crawl: error: running 'scrapy crawl' with more than one spider is no longer supported

我不太理解这个错误信息，因为我并没有尝试爬取两个蜘蛛，只是试图传递两个关键字参数来修改 start_urls。我该如何使其正常工作？

- Kurt Peek

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Granitosaurus · Accepted Answer

当提供多个参数时，您需要为每个参数前缀-a。

对于您的情况，正确的命令是：

scrapy crawl funda1 -a place=rotterdam -a page=2