Monitoring Ruby on Rails Apps

At Clarisights, we instrument our Ruby on Rails workload, figuring these things was bit of a work so I decided to share it with the world.

We monitor our puma(web server) workers, database connections, sidekiq workers, and ruby process stats.

These are code snippets that show how to collect monitoring stats. I am not sharing how to send it to monitoring service is because it will differ based on your monitoring service

We use graphite (as protocol) for our monitoring, and metrictank for storage, but these will work for any service with some glue code for shipping it.

Collecting ActiveRecord Connection Info

get details of database connections used by ActiveRecord

# get db pool stats
db_stats = ActiveRecord::Base.connection_pool.stat

Collecting Web Server Stats (puma)

We use Puma as our web server, and Puma.stats shows stats about Puma, we parse it and ship it.

We run a simple puma plugin which runs in background and ships stats to our monitoring system.

Snippet to get puma stats(works in normal and cluster mode)

# Puma.stats: {
#        started_at: @started_at.utc.iso8601,
#        workers: @workers.size,
#        phase: @phase,
#        booted_workers: worker_status.count { |w| w[:booted] },
#        old_workers: old_worker_count,
#        worker_status: worker_status,
#      }
puma_stats = []
stats = JSON.parse(Puma.stats, symbolize_names: true)
# get puma stats
if stats[:worker_status].present?
  # in cluster mode, get stats from each worker
  # stats[:worker_status]: List of workers with their corresponding statuses
  stats[:worker_status].each do |worker|
    # worker[:last_status]: { "backlog":0,"running":2, "pool_capacity":2,"max_threads":2 }
    stat = worker[:last_status]
    puma_stats << stat
  end
else
  puma_stats << stats
end

PS: Someday I will turn it into a gem with support for graphite and prometheus, If you want do it please go ahead and ping me @electron0zero, and I will link it here

Collecting Sidekiq Queue and Worker Stats

Sidekiq::ProcessSet.new contains stats about sidekiq queues and workers.

Snippet to get sidekiq stats

# overall and per-queue sidekiq stats
def compute_sidekiq_stats
  workers = Sidekiq::ProcessSet.new
  # compute per queue workers
  total_capacity = 0
  total_busy = 0
  queue_workers = {}
  workers.each do |worker|
    total_capacity += worker['concurrency'].to_i
    total_busy += worker['busy'].to_i
    worker['queues'].each do |queue|
      queue_workers[queue] = queue_workers[queue].to_i + 1
    end
  end

  # compute queue metrics
  queue_metrics = {}
  Sidekiq::Queue.all.each do |queue|
    queue_metrics[queue.name] = {
      backlog: queue.size,
      latency: queue.latency.to_i,
      workers: queue_workers[queue.name]
    }
  end

  return {
    queues: queue_metrics,
    workers: {
      total_size: workers.size,
      total_capacity: total_capacity,
      total_busy: total_busy
    }
  }
end

Ruby VM Stats

We use ObjectSpace and GC class to collect stats and then ship them to our monitoring system

Stats about objects and size of those objects

counts = ObjectSpace.count_objects
sizes = ObjectSpace.count_objects_size

GC stats tells us about (you guessed it) GC

stats = GC.stat

Closing Note

This is very minimal and basic snippets to collect stats, with some glue code you can ship it monitoring system of your choice. Hope it is helpful

sidenote: This is just monitoring for systems that are part of web service, we have other monitoring in place for different systems that we have run. In rails we monitor typical http stuff for example duration, response code, and other metrics for each route using influxdb-rails


Continue Reading