Scrapy教程异常处理

8
我正在遵循Scrapy教程文档(http://media.readthedocs.org/pdf/scrapy/0.14/scrapy.pdf)进行学习,我已经核实了items.py和dmoz_spider.py代码的正确性(并非复制粘贴)。
对于我来说,第一个困惑的地方是这条指令: “这是我们第一个Spider的代码;请将其保存在名为dmoz_spider.py的文件中,放置于dmoz / spiders目录下。” 因为我使用的是最新版本的Ubuntu,没有创建dmoz文件夹,所以我把这段代码放到了 ~/tutorial/tutorial/spiders 目录下。(这是我的第一个错误吗?)
以下是我的dmoz_spider.py脚本:
from scrapy.spider import BaseSpider

class DmozSpider(BaseSpider):
   name = "dmoz"
   allowed_domains = ["dmoz.org"]
   start_urls = [
    "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
    "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
   ]

def parse(self, response):
   filename = response.url.split("/")[-2]
   open(filename, 'wb').write(response.body)

在我的终端中,我输入:
scrapy crawl dmoz

我得到了以下内容:

2012-10-08 13:20:22-0700 [scrapy] INFO: Scrapy 0.12.0.2546 started (bot: tutorial)
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, DownloaderStats
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled item pipelines: 
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2012-10-08 13:20:22-0700 [dmoz] INFO: Spider opened
2012-10-08 13:20:22-0700 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
2012-10-08 13:20:22-0700 [dmoz] ERROR: Spider error processing <http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: <None>)
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop
    self.runUntilCurrent()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/scrapy/spider.py", line 62, in parse
    raise NotImplementedError
exceptions.NotImplementedError: 

2012-10-08 13:20:22-0700 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2012-10-08 13:20:22-0700 [dmoz] ERROR: Spider error processing <http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: <None>)
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop
    self.runUntilCurrent()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/scrapy/spider.py", line 62, in parse
    raise NotImplementedError
exceptions.NotImplementedError: 

2012-10-08 13:20:22-0700 [dmoz] INFO: Closing spider (finished)
2012-10-08 13:20:22-0700 [dmoz] INFO: Spider closed (finished)

在我的搜寻中,我看到有人说可能没有安装twisted...但是如果我使用Ubuntu软件包安装程序来安装Scrapy,twisted不是应该也被安装了吗?
提前致谢!

为什么不首先检查它是否已安装?不要相信你的猜测 :) - Alfabravo
1个回答

15

因为您没有正确覆盖解析方法,所以 BaseSpider 中的 parse 方法被调用而不是您自己的方法。您的缩进不正确,因此 parse 被声明为 DmozSpider 类之外的函数。欢迎使用 Python :)

这与 Twisted 无关,我可以看到 Twisted 在 traceback 中,因此它显然已安装。


1
啊,就是这样。谢谢!在缩进“def parse”行之后,一切都正常了!确实欢迎来到Python :) - user1729889

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接