Scrapy：如何依次运行两个爬虫？

Question

Scrapy：如何依次运行两个爬虫？

pythonscrapy

3

我在同一个项目中有两个爬虫程序。其中一个依赖于另一个先运行。它们使用不同的管道。我该如何确保它们按顺序运行？

- yayu

请问给我点踩的人可以解释一下，您觉得这个问题有什么不值得的地方吗？ - yayu

可能是运行多个爬虫的顺序重复问题。 - Qiang Zhang

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- foolcage · Accepted Answer

从文档中可以看到：https://doc.scrapy.org/en/1.2/topics/request-response.html

相同的示例，但通过链接延迟方式按顺序运行爬虫：

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

configure_logging()
runner = CrawlerRunner()

@defer.inlineCallbacks
def crawl():
    yield runner.crawl(MySpider1)
    yield runner.crawl(MySpider2)
    reactor.stop()

crawl()
reactor.run() # the script will block here until the last crawl call is finished