我正在使用Kimurai Ruby gem进行网页抓取。我有一个很好的脚本:
require 'kimurai'
class SimpleSpider < Kimurai::Base
@name = "simple_spider"
@engine = :selenium_chrome
@start_urls = ["https://apply.workable.com/taxjar/"]
def parse(response, url:, data: {})
# Update response to current response after interaction with a browser
count = 0
# browser.click_button "Show more"
doc = browser.current_response
returned_jobs = doc.css('.careers-jobs-list-styles__jobsList--3_v12')
returned_jobs.css('li').each do |char_element|
# puts char_element
title = char_element.css('a')[0]['aria-label']
link = "https://apply.workable.com" + char_element.css('a')[0]['href']
#click on job link and get description
browser.visit(link)
job_page = browser.current_response
description = job_page.xpath('/html/body/div[1]/div/div[1]/div[2]/div[2]/div[2]').text
puts '*******'
puts title
puts link
puts description
puts count += 1
end
puts "There are #{count} jobs total"
end
end
SimpleSpider.crawl!
然而,我希望所有这些都返回一个对象数组...或者在这种情况下是工作。我想在解析方法中创建一个jobs数组,并在returned_jobs循环内执行类似于
jobs << [title, link, description, company]
的操作,当我调用SimpleSpider.crawl!
时返回它,但那样做不起作用。任何帮助都会受到赞赏。