根据scrapy教程,我制作了一个简单的图片爬虫(用于爬取布加迪汽车的图片)。示例见下:EXAMPLE
然而,跟随指南却导致我的爬虫无法运行!它找到所有网址,但不下载图片。
我找到了一个临时解决方案:将ITEM_PIPELINES
和IMAGES_STORE
替换为以下内容:
ITEM_PIPELINES['scrapy.pipeline.images.FilesPipeline'] = 1
和
IMAGES_STORE
-> FILES_STORE
但是我不知道为什么会起作用?我想使用文档中记录的ImagePipeline。
EXAMPLE
settings.py
BOT_NAME = 'imagespider'
SPIDER_MODULES = ['imagespider.spiders']
NEWSPIDER_MODULE = 'imagespider.spiders'
ITEM_PIPELINES = {
'scrapy.pipelines.images.ImagesPipeline': 1,
}
IMAGES_STORE = "/home/user/Desktop/imagespider/output"
items.py
import scrapy
class ImageItem(scrapy.Item):
file_urls = scrapy.Field()
files = scrapy.Field()
imagespider.py
from imagespider.items import ImageItem
import scrapy
class ImageSpider(scrapy.Spider):
name = "imagespider"
start_urls = (
"https://www.find.com/search=bugatti+veyron",
)
def parse(self, response):
for elem in response.xpath("//img"):
img_url = elem.xpath("@src").extract_first()
yield ImageItem(file_urls=[img_url])
__main__
桩代码?我们如何调用这些函数? - Nathan majicvr.com