如何在Ruby中追踪死锁

10

我使用BrB来在Ruby 1.9中使用Process#fork分叉出的各种工作进程之间共享数据源:

Thread.abort_on_exception = true

fork do
  puts "Initializing data source process... (PID: #{Process.pid})"
  data = DataSource.new(files)

  BrB::Service.start_service(:object => data, :verbose => false, :host => host, :port => port)
  EM.reactor_thread.join
end

工人分叉如下:
8.times do |t|  
  fork do
    data = BrB::Tunnel.create(nil, "brb://#{host}:#{port}", :verbose => false)

    puts "Launching #{threads_num} worker threads... (PID: #{Process.pid})"    

    threads = []
    threads_num.times { |i|
      threads << Thread.new {
        while true
          begin
            worker = Worker.new(data, config)

          rescue OutOfTargetsError
            break

          rescue Exception => e
            puts "An unexpected exception was caught: #{e.class} => #{e}"
            sleep 5

          end
        end
      }
    }
    threads.each { |t| t.join }

    data.stop_service
    EM.stop
  end
end

这个功能基本上运行得很完美,但是大约在运行10分钟后,我会收到以下错误信息:

bootstrap.rb:47:in `join': deadlock detected (fatal)
    from bootstrap.rb:47:in `block in <main>'
    from bootstrap.rb:39:in `fork'
    from bootstrap.rb:39:in `<main>'</pre>

这个错误并没有告诉我死锁实际发生的地方,它只是指向了在EventMachine线程上的join。

我该如何追踪程序卡死的具体位置呢?


你尝试在代码块结束前加入 Thread.exit 了吗? - glebm
2个回答

5
它在父线程中的 join 处卡住了,这个信息是准确的。要跟踪子线程卡住的位置,请尝试将线程的工作包装在一个 timeout 中。您需要暂时删除捕获所有超时异常的 rescue
目前,父线程按顺序尝试加入所有线程,每个线程都只会加入一个 OutOfTargetsError。通过使用短暂的线程并将 while 循环移动到父级中,可以避免死锁。虽然没有保证,但也许像这样做会起作用?
8.times do |t|  
  fork do
    running = true
    Signal.trap("INT") do
      puts "Interrupt signal received, waiting for threads to finish..."
      running = false
    end

    data = BrB::Tunnel.create(nil, "brb://#{host}:#{port}", :verbose => false)

    puts "Launching max #{threads_num} worker threads... (PID: #{Process.pid})"    

    threads = []
    while running
      # Start new threads until we have threads_num running
      until threads.length >= threads_num do
        threads << Thread.new {
          begin
            worker = Worker.new(data, config)
          rescue OutOfTargetsError
          rescue Exception => e
            puts "An unexpected exception was caught: #{e.class} => #{e}"
            sleep 5
          end
        }
      end

      # Make sure the parent process doesn't spin too much
      sleep 1

      # Join finished threads
      finished_threads = threads.reject &:status
      threads -= finished_threads
      finished_threads.each &:join
    end

    data.stop_service
    EM.stop
  end
end

伙计,你用这种方法有什么运气了吗? - captainpete

3

我曾经遇到过同样的问题,并通过以下代码片段解决了它:

# Wait for all threads (other than the current thread and
# main thread) to stop running.
# Assumes that no new threads are started while waiting
def join_all
  main     = Thread.main       # The main thread
  current  = Thread.current    # The current thread
  all      = Thread.list       # All threads still running
  # Now call join on each thread
  all.each{|t| t.join unless t == current or t == main }
end

来源:《Ruby编程语言》, O'Reilly (2008)

Ruby是一种面向对象的动态编程语言。它非常强大,易于使用和学习,并且具有丰富的库和工具。
Ruby支持多种编程范式,包括面向对象、函数式和命令式。它还提供了垃圾回收功能,自动内存管理和动态类型检查等高级功能。
Ruby可以轻松地与其他编程语言进行交互,例如Java和C ++。此外,它还支持多线程编程和网络编程。
总之,Ruby是一种广泛使用的编程语言,适用于各种应用场景,包括Web开发、数据科学、游戏开发等等。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接