传递参数给回调函数

Question

传递参数给回调函数

37

def parse(self, response):
    for sel in response.xpath('//tbody/tr'):
        item = HeroItem()
        item['hclass'] = response.request.url.split("/")[8].split('-')[-1]
        item['server'] = response.request.url.split('/')[2].split('.')[0]
        item['hardcore'] = len(response.request.url.split("/")[8].split('-')) == 3
        item['seasonal'] = response.request.url.split("/")[6] == 'season'
        item['rank'] = sel.xpath('td[@class="cell-Rank"]/text()').extract()[0].strip()
        item['battle_tag'] = sel.xpath('td[@class="cell-BattleTag"]//a/text()').extract()[1].strip()
        item['grift'] = sel.xpath('td[@class="cell-RiftLevel"]/text()').extract()[0].strip()
        item['time'] = sel.xpath('td[@class="cell-RiftTime"]/text()').extract()[0].strip()
        item['date'] = sel.xpath('td[@class="cell-RiftTime"]/text()').extract()[0].strip()
        url = 'https://' + item['server'] + '.battle.net/' + sel.xpath('td[@class="cell-BattleTag"]//a/@href').extract()[0].strip()

        yield Request(url, callback=self.parse_profile)

def parse_profile(self, response):
    sel = Selector(response)
    item = HeroItem()
    item['weapon'] = sel.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4]
    return item

好的，我正在主要解析方法中爬取整个表格，并从该表格中获取了多个字段。其中一个字段是URL，我想要探索它以获取一批新的字段。如何将已经创建的ITEM对象传递给回调函数，以便最终的item保留所有字段？

正如上面的代码所示，我能够保存url内部的字段（目前的代码），或者只保存表格中的字段（简单地写上 yield item），但我无法仅生成包含所有字段的单个对象。

我尝试过这样做，但显然不起作用。

yield Request(url, callback=self.parse_profile(item))

def parse_profile(self, response, item):
    sel = Selector(response)
    item['weapon'] = sel.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4]
    return item

- vic

尝试看一下装饰器，例如：http://thecodeship.com/patterns/guide-to-python-function-decorators/ - Overclover

所以URL返回的字段不在“item”中，您想将这些字段添加到“item”中并返回它？ - Michael S Priz

对于Python通用方法，请参考callback - Python，如何将参数传递给函数指针参数？- Stack Overflow - 但在这种情况下，有一种特定于scrapy的（可能更好的）方法。 - user202729

4个回答

16

这里有一种更好的将参数传递给回调函数的方法：

def parse(self, response):
    request = scrapy.Request('http://www.example.com/index.html',
                             callback=self.parse_page2,
                             cb_kwargs=dict(main_url=response.url))
    request.cb_kwargs['foo'] = 'bar'  # add more arguments for the callback
    yield request

def parse_page2(self, response, main_url, foo):
    yield dict(
        main_url=main_url,
        other_url=response.url,
        foo=foo,
    )

来源: https://docs.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-callback-arguments

- penduDev

-1

我曾经遇到过与Tkinter的额外参数传递相关的类似问题，并发现这个解决方案可行（在这里：http://infohost.nmt.edu/tcc/help/pubs/tkinter/web/extra-args.html），可以转化为你的问题：

def parse(self, response):
    item = HeroItem()
    [...]
    def handler(self = self, response = response, item = item):
        """ passing as default argument values """
        return self.parse_profile(response, item)
    yield Request(url, callback=handler)

- rolika

1

这是一个危险的建议。他正在循环遍历在response.xpath('//tbody/tr')中找到的所有“items”。由于请求不会在回调中提供项目作为参数（永远不会），处理程序方法将始终使用item作为默认值。不幸的是，item将是在回调调用时它所处的状态，而不是在请求被yield时的状态。您收集的数据将是不可靠和不一致的。 - Rejected

@Rejected 不，通过在函数头中分配变量（self=self...），它将保存在定义“handler”函数时变量的值。只要“handler”的定义在循环内部，“parse_profile”就会获得迭代的每个项目的值。 - Alan Hoover

这是一个非常优雅的解决方案。 - Alan Hoover

@AlanHoover 我的理解是，由于请求的回调可能会在稍后发生，因此函数本身被重新定义，并且在执行回调时调用重新定义的函数。我记得自己也遇到过这种情况，而且我很确定我没有对任何参数进行延迟绑定。我会进行一些测试！ - Rejected

-2

@peduDev

我尝试了你的方法，但由于意外的关键字出现，导致失败。

scrapy_req = scrapy.Request(url=url, 
callback=self.parseDetailPage,
cb_kwargs=dict(participant_id=nParticipantId))


def parseDetailPage(self, response, participant_id ):
    .. Some code here..
    yield MyParseResult (
        .. some code here ..
        participant_id = participant_id
    )

Error reported
, cb_kwargs=dict(participant_id=nParticipantId)
TypeError: _init_() got an unexpected keyword argument 'cb_kwargs'

除了可能是旧版scrapy的原因，您有什么想法导致了意外的关键字参数吗？

是的。我验证了自己的建议，在升级后一切都按预期工作。

sudo pip install --upgrade scrapy

- Jan

答案应该是答案，而不是更多的问题。 - Steven Almeroth

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rejected · Accepted Answer

这就是您将使用 meta 关键字的内容。

def parse(self, response):
    for sel in response.xpath('//tbody/tr'):
        item = HeroItem()
        # Item assignment here
        url = 'https://' + item['server'] + '.battle.net/' + sel.xpath('td[@class="cell-BattleTag"]//a/@href').extract()[0].strip()

        yield Request(url, callback=self.parse_profile, meta={'hero_item': item})

def parse_profile(self, response):
    item = response.meta.get('hero_item')
    item['weapon'] = response.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4]
    yield item

请注意，执行sel = Selector(response)是一种浪费资源的做法，并且与之前的做法不同，因此我进行了更改。它在response中自动映射为response.selector，同时也具有方便的快捷方式response.xpath。