在Heroku上是否可能避免ActiveRecord :: ConnectionTimeoutError?

9

我在Heroku上运行一个Rails应用程序,其中包括几个Web Dynos和一个Worker Dyno。我每天通过Sidekiq运行数千个工作任务,但偶尔会出现ActiveRecord :: ConnectionTimeoutError错误(大约每天出现50次)。我已按以下方式设置我的独角兽服务器:

worker_processes 4
timeout 30
preload_app true

before_fork do |server, worker|
    # As suggested here: https://devcenter.heroku.com/articles/rails-unicorn
    Signal.trap 'TERM' do
        puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
        Process.kill 'QUIT', Process.pid
    end

    if defined?(ActiveRecord::Base)
        ActiveRecord::Base.connection.disconnect!
    end
end

after_fork do |server,worker|
    if defined?(ActiveRecord::Base)
        config = Rails.application.config.database_configuration[Rails.env]
        config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
        config['pool']            = ENV['DB_POOL'] || 10
        ActiveRecord::Base.establish_connection(config)
    end

    Sidekiq.configure_client do |config|
        config.redis = { :size => 1 }
    end

    Sidekiq.configure_server do |config|
        config = Rails.application.config.database_configuration[Rails.env]
        config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
        config['pool']            = ENV['DB_POOL'] || 10
        ActiveRecord::Base.establish_connection(config)
    end
end

在Heroku上,我将DB_POOL配置变量设置为2,如Heroku推荐。这些错误应该完全避免不了吗?似乎很奇怪。你有什么建议吗?
2个回答

14

一个 Sidekiq 服务器(在您的服务器上运行实际执行延迟任务的进程)默认会拨号到 25 个线程以处理其队列中的工作。如果您的任务需要,每个线程都可能通过 ActiveRecord 请求连接到您的主数据库。

如果您只有一个包含5个连接的连接池,但是您有25个线程尝试连接,那么如果它们无法从池中获取可用连接,那么这些线程将在5秒后放弃并出现连接超时错误。

将您的 Sidekiq 服务器的池大小设置为接近于并发级别的值(使用启动进程时使用的 -c 标志设置),将有助于缓解此问题,但代价是在您的数据库中打开更多的连接。例如,如果您正在 Heroku 上使用 Postgres,则其中的某些计划限制为 20,而其他计划的连接限制为 500(引自)。

如果您正在运行类似 Unicorn 的多进程服务器环境,则还需要监视每个派生进程所建立的连接数量。如果您有4个 Unicorn 进程,并且默认的连接池大小为5,则您的 Unicorn 环境在任何给定时间可能会拥有20个活动连接。您可以在Heroku的文档中了解更多信息。还请注意,DB 池大小并不意味着每个 dyno 现在都会有那么多打开的连接,而是只有在需要新连接时才会创建,直到最多创建该数量。

话虽如此,这是我所做的事情。

# config/initializers/unicorn.rb

if ENV['RACK_ENV'] == 'development'
  worker_processes 1
  listen "#{ENV['BOXEN_SOCKET_DIR']}/rails_app"
  timeout 120
else
  worker_processes Integer(ENV["WEB_CONCURRENCY"] || 2)
  timeout 29
end

# The timeout mechanism in Unicorn is an extreme solution that should be avoided whenever possible. 
# It will help catch bugs in your application where and when your application forgets to use timeouts,
# but it is expensive as it kills and respawns a worker process.
# see http://unicorn.bogomips.org/Application_Timeouts.html

# Heroku recommends a timeout of 15 seconds. With a 15 second timeout, the master process will send a 
# SIGKILL to the worker process if processing a request takes longer than 15 seconds. This will 
# generate a H13 error code and you’ll see it in your logs. Note, this will not generate any stacktraces 
# to assist in debugging. Using Rack::Timeout, we can get a stacktrace in the logs that can be used for
# future debugging, so we set that value to something less than this one

preload_app true # for new relic

before_fork do |server, worker|
  Signal.trap 'TERM' do
    puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
    Process.kill 'QUIT', Process.pid
  end

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
  end

end

after_fork do |server, worker|
  Signal.trap 'TERM' do
    puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT'
  end

  Rails.logger.info("Done forking unicorn processes")

  #https://devcenter.heroku.com/articles/concurrency-and-database-connections
  if defined?(ActiveRecord::Base)

    db_pool_size = if ENV["DB_POOL"]
      ENV["DB_POOL"]
    else
      ENV["WEB_CONCURRENCY"] || 2
    end

    config = Rails.application.config.database_configuration[Rails.env]
    config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
    config['pool']              = ENV['DB_POOL'] || 2
    ActiveRecord::Base.establish_connection(config)

    # Turning synchronous_commit off can be a useful alternative when performance is more important than exact certainty about the durability of a transaction
    ActiveRecord::Base.connection.execute "update pg_settings set setting='off' where name = 'synchronous_commit';"    

    Rails.logger.info("Connection pool size for unicorn is now: #{ActiveRecord::Base.connection.pool.instance_variable_get('@size')}")
  end

end

而对于 Sidekiq:

# config/initializers/sidekiq.rb

Sidekiq.configure_server do |config|

  sidekiq_pool = ENV['SIDEKIQ_DB_POOL'] || 20

  if defined?(ActiveRecord::Base)
    Rails.logger.debug("Setting custom connection pool size of #{sidekiq_pool} for Sidekiq Server")
    db_config = Rails.application.config.database_configuration[Rails.env]
    db_config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
    cb_config['pool']              = sidekiq_pool
    ActiveRecord::Base.establish_connection(db_config)

    Rails.logger.info("Connection pool size for Sidekiq Server is now: #{ActiveRecord::Base.connection.pool.instance_variable_get('@size')}")
  end
end
如果一切顺利,当您启动进程时,您将在日志中看到如下内容:
Setting custom connection pool size of 10 for Sidekiq Server
Connection pool size for Sidekiq Server is now: 20
Done forking unicorn processes
   (1.4ms)  update pg_settings set setting='off' where name = 'synchronous_commit';
Connection pool size for unicorn is now: 2

资料来源:


1
这是配置两个服务器的极好说明。我已经有一个非常接近的,并将尝试您的建议。唯一可以提出的为了清晰起见的评论是,Unicorn 和 Sidekiq 是单独的服务器,并将从它们的 config.rb 文件分别启动。这绝对是问题的正确答案。 - nmott
在 sidekiq.rb 中,您正在使用本地变量遮蔽块变量 config。如果您想在 sidekiq 的 config 上调用任何内容,则应将本地变量重命名为 db_config 或其他名称。 - mpoisot
@mpoisot:感谢你的评论。我们在应用程序中使用此配置,我刚刚检查了一下,结果发现我们曾经将本地config重命名为db_config。我刚刚更新了答案以反映这一点。再次感谢您。 - stereoscott

0

对于Sidekiq服务器配置,建议将db_pool数字与并发数设置相同,我假设您已经将并发数设置为2以上。

假设在unicorn.rb中设置了db_pool(我没有这样做的经验),一种可能的解决方案是设置另一个环境变量直接控制Sidekiq的db_pool

如果您的sidekiq并发达到20,则可以使用以下代码:

配置变量 - SIDEKIQ_DB_POOL = 20

Sidekiq.configure_server do |config|
  config = Rails.application.config.database_configuration[Rails.env]
  config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
  config['pool']            = ENV['SIDEKIQ_DB_POOL'] || 10
  ActiveRecord::Base.establish_connection(config)
end

这样可以确保您拥有两个单独的池,分别针对您的 Web 工作者 DB_POOL 和后台工作者 SIDEKIQ_DB_POOL 进行优化。


为什么我要将Sidekiq的数据库池设置为20呢?它仍在Unicorn上运行。 - Nick ONeill
Siddkiq.configure_server --> 这个过程实际上是执行工作的过程,并且默认情况下可以旋转25个线程。Sidekiq.configure_client --> 您的Rails进程(以及您的Sidekiq工作者,因为它们也可以添加工作!)。因此,您将使用接近于sidekiq并发级别大小的db池配置Sidekiq '服务器'(执行工作的进程)。 - stereoscott
"db_pool的数量应该与您的并发数相同。为了防止死连接,实际上db_pool应该略高于并发数,例如并发数+2(https://github.com/mperham/sidekiq/issues/503#issuecomment-33547209)" - mahemoff
你正在使用本地变量来遮蔽块变量config。如果你想在sidekiq的config上调用任何东西,那么你应该真正将本地变量重命名为db_config或其他名称。 - mpoisot

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接