Ruby 2.1.2的超时机制仍然不支持多线程吗？

Question

Ruby 2.1.2的超时机制仍然不支持多线程吗？

ruby-on-railsrubymultithreadingtimeoutsidekiq

4

我有50个Sidekiq线程在爬取网络，几周前这些线程在运行了约20分钟后开始卡死。当我进行回溯转储时，大多数线程都卡在net/http initialize上：

/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `initialize'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `open'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `block in connect'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:76:in `timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:878:in `connect'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:863:in `do_start'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:858:in `start'
/app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:700:in `start'
/app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:631:in `connection_for'
/app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:994:in `request'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:257:in `fetch'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:974:in `response_redirect'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:298:in `fetch'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize.rb:432:in `get'
/app/app/workers/crawl_page.rb:24:in `block in perform'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:91:in `block in timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `block in catch'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `catch'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `catch'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:106:in `timeout'

我原以为在整个调用中使用超时（timeout）可以防止sidekiq在net/http上卡住，例如：Timeout::timeout(APP_CONFIG['crawl_page_timeout']) { @page = agent.get(url) }

但后来我看到一些老帖子，提到ruby的Timeout不是线程安全的：http://blog.headius.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html

ruby的Timeout现在是否仍然不安全？

我知道很多人用Ruby编写网络爬虫。如果Timeout不安全，人们如何处理net/http被卡住的问题呢？

更新：

我已经切换到HTTPClient（它专门声明自己是线程安全的）来替换Mechanize。但我们似乎仍然卡在初始化线程上。这可能是由于ruby的Timeout无法正常工作，也可能是一个sidekiq问题。以下是最近挂起的sidekiq线程的堆栈跟踪：

/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `initialize'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `new'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `create_socket'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:752:in `block in connect'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:91:in `block in timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:101:in `call'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:101:in `timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:127:in `timeout'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:751:in `connect'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:609:in `query'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:164:in `query'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:1087:in `do_get_block'
/app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/instrumentation/httpclient.rb:34:in `block in do_get_block_with_newrelic'
/app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/cross_app_tracing.rb:43:in `tl_trace_http_request'
/app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/instrumentation/httpclient.rb:33:in `do_get_block_with_newrelic'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:891:in `block in do_request'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:985:in `protect_keep_alive_disconnected'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:890:in `do_request'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:963:in `follow_redirect'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:776:in `request'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:677:in `get'
/app/app/ohm_models/queued_page.rb:20:in `run_crawl'

- Josh

猜猜看，今天早上我也遇到了完全相同的问题。所有爬虫都卡在HTTP方法中。 - Hendrik

哇，有趣。你的任何宝石刚刚更新了吗？ - Josh

不行。我们每两个月就会遇到这种情况。问题是，所有其他的工作人员也会陷入http进程中。这非常糟糕，因为我们有一些关键任务的工作人员真的不能被卡住。 - Hendrik

你每两个月就会遇到这个问题吗？这是否意味着你的爬虫之前也出现过这种情况？你以前找到过解决方法，还是它在某个时候又开始正常工作了，然后又在两个月后出现了问题？ - Josh

找不到比2012年更近的内容，但问题可能出在Mechanize上：https://github.com/sparklemotion/mechanize/issues/256@Hendrik，你也在使用Mechanize吗？ - Josh

不好意思，我们一直在使用Polipus作为爬虫。这种情况偶尔会发生。我真的说不准 :(。 - Hendrik

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Xavier Shay · Accepted Answer

正确，除非您确切知道块内发生了什么（包括任何C代码可能正在执行的操作），否则在Ruby代码中使用Timeout仍然不安全。我亲眼目睹过由于这个原因导致连接池发生灾难性事情。

您可以尝试通过捕获错误并重试来解决问题，但如果运气不好，您的进程可能会被卡住并需要重新启动。

如果您使用fork创建新进程，则可以安全地杀死长时间运行的进程（或使用timeout(1)，因为它们没有任何方式来破坏父进程）。

我知道很多人用Ruby编写网络爬虫。如果Timeout不是线程安全的，那么人们如何处理net/http卡住的问题？

您有一个可以工作的具体示例吗？