将Scrapy Python的输出写入JSON文件

Question

将Scrapy Python的输出写入JSON文件

4

我刚开始学Python和网络爬虫。在这个程序中，我想把最终输出（从所有3个链接获取的产品名称和价格）写入JSON文件。请帮忙！

Translated:

我是新手，正在学习Python和网络爬取。在这个程序中，我希望将最终输出（来自三个链接的产品名称和价格）写入JSON文件。请帮助！

    import scrapy
    from time import sleep
    import csv, os, json
    import random


    class spider1(scrapy.Spider):
        name = "spider1"

        def start_requests(self):
            list = [
                "https://www. example.com/item1",
                "https://www. example.com/item2",
                "https://www. example.com/item3"]

            for i in list:
                yield scrapy.Request(i, callback=self.parse)
                sleep(random.randint(0, 5))

        def parse(self, response):
            product_name = response.css('#pd-h1-cartridge::text')[0].extract()
            product_price = response.css(
                '.product-price .is-current, .product-price_total .is-current, .product-price_total ins, .product-price ins').css(
                '::text')[3].extract()

            name = str(product_name).strip()
            price = str(product_price).replace('\n', "")

data = {name, price}

yield data

extracted_data = []
    while i < len(data):

        extracted_data.append()
        sleep(5)
    f = open('data.json', 'w')
    json.dump(extracted_data, f, indent=4)

- amal

3个回答

7

你不需要手动创建文件，Scrapy可以帮你完成这个过程。首先创建一个 ItemLoader 和 Item，在最后的解析过程中返回 Item，如果你需要以 JSON 格式保存数据，可以在启动爬虫时添加一个参数 -o。

例如：

scrapy crawl <spidername> -o <filename>.json

- Justo

0

您没有关闭 data.json 文件，因此它仍处于缓冲状态，不会被写入。

或者添加一个 close() 方法：

f = open('data.json', 'w')
json.dump(extracted_data, f, indent=4)
f.close()

或者使用 with 语句，它会自动为您关闭文件：

with open('data.json', 'w') as f:
    json.dump(extracted_data, f, indent=4)

确保每次使用'w'标志覆盖文件时，您真的想要覆盖该文件。如果不是，请改用'a'附加标志。

- Basile

谢谢。但是JSON输出仅显示最后一个链接的名称和价格。我想将所有3个链接的名称和价格添加到提取的数据中，然后将其转储到JSON文件中。 - amal

'a' 追加标志是你的好朋友。 - Basile

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mehrdad · Accepted Answer

实际上，有一个Scrapy命令可以完成这个任务(阅读)：

scrapy crawl <spidername> -o <outputname>.<format>
scrapy crawl quotes -o quotes.json

但既然您要求提供Python代码，我想到了以下代码：

    def parse(self, response):
        with open("data_file.json", "w") as filee:
            filee.write('[')
            for index, quote in enumerate(response.css('div.quote')):
                json.dump({
                    'text': quote.css('span.text::text').extract_first(),
                    'author': quote.css('.author::text').get(),
                    'tags': quote.css('.tag::text').getall()
                }, filee) 
                if index < len(response.css('div.quote')) - 1:
                    filee.write(',')
            filee.write(']')

这只是与scrapy输出命令相同，用于json文件的简单操作。