使用Scrapy的process.crawl()将数据导出为json

Question

使用Scrapy的process.crawl()将数据导出为json

7

这可能是在Scrapy Python中传递参数到process.crawl的子问题，但作者将回答（不回答我自己提出的子问题）标记为满意的答案。

我的问题是：我无法使用scrapy crawl mySpider -a start_urls(myUrl) -o myData.json，而是想/需要使用crawlerProcess.crawl(spider)。我已经找到了几种传递参数的方法（无论如何，在我链接的问题中都有答案），但我不知道该如何告诉它将数据转储到myData.json中...即-o myData.json部分。有人有建议吗？或者我只是没有理解它应该如何工作..？

这是代码：

crawlerProcess = CrawlerProcess(settings)
crawlerProcess.install()
crawlerProcess.configure()

spider = challenges(start_urls=["http://www.myUrl.html"])
crawlerProcess.crawl(spider)
#For now i am just trying to get that bit of code to work but obviously it will become a loop later.

dispatcher.connect(handleSpiderIdle, signals.spider_idle)

log.start()
print "Starting crawler."
crawlerProcess.start()
print "Crawler stopped."

- Carele

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- eLRuLL · Accepted Answer

8

您需要在设置中指定它：

process = CrawlerProcess({
    'FEED_URI': 'file:///tmp/export.json',
})

process.crawl(MySpider)
process.start()

- eLRuLL

1

您IP地址为143.198.54.68，由于运营成本限制，当前对于免费用户的使用频率限制为每个IP每72小时10次对话，如需解除限制，请点击左下角设置图标按钮（手机用户先点击左上角菜单按钮）。 - hAcKnRoCk

@hAcKnRoCk 或许可以像这样写： "FEEDS": { "items.json": {"format": "json"}, } 来源：https://docs.scrapy.org/en/latest/topics/practices.html?highlight=run%20from%20script#run-scrapy-from-a-script - undefined