Monitoring Ruby on Rails Apps
At Clarisights, we instrument our Ruby on Rails workload, figuring these things was bit of a work so I decided to share it with the world.
We monitor our puma(web server) workers, database connections, sidekiq workers, and ruby process stats.
These are code snippets that show how to collect monitoring stats. I am not sharing how to send it to monitoring service is because it will differ based on your monitoring service
We use graphite (as protocol) for our monitoring, and metrictank for storage, but these will work for any service with some glue code for shipping it.
Collecting ActiveRecord Connection Info
get details of database connections used by ActiveRecord
# get db pool stats
db_stats = ActiveRecord::Base.connection_pool.stat
Collecting Web Server Stats (puma)
We use Puma as our web server, and Puma.stats
shows stats about Puma, we parse it and ship it.
We run a simple puma plugin which runs in background and ships stats to our monitoring system.
Snippet to get puma stats(works in normal and cluster mode)
# Puma.stats: {
# started_at: @started_at.utc.iso8601,
# workers: @workers.size,
# phase: @phase,
# booted_workers: worker_status.count { |w| w[:booted] },
# old_workers: old_worker_count,
# worker_status: worker_status,
# }
puma_stats = []
stats = JSON.parse(Puma.stats, symbolize_names: true)
# get puma stats
if stats[:worker_status].present?
# in cluster mode, get stats from each worker
# stats[:worker_status]: List of workers with their corresponding statuses
stats[:worker_status].each do |worker|
# worker[:last_status]: { "backlog":0,"running":2, "pool_capacity":2,"max_threads":2 }
stat = worker[:last_status]
puma_stats << stat
end
else
puma_stats << stats
end
PS: Someday I will turn it into a gem with support for graphite and prometheus, If you want do it please go ahead and ping me @electron0zero, and I will link it here
Collecting Sidekiq Queue and Worker Stats
Sidekiq::ProcessSet.new
contains stats about sidekiq queues and workers.
Snippet to get sidekiq stats
# overall and per-queue sidekiq stats
def compute_sidekiq_stats
workers = Sidekiq::ProcessSet.new
# compute per queue workers
total_capacity = 0
total_busy = 0
queue_workers = {}
workers.each do |worker|
total_capacity += worker['concurrency'].to_i
total_busy += worker['busy'].to_i
worker['queues'].each do |queue|
queue_workers[queue] = queue_workers[queue].to_i + 1
end
end
# compute queue metrics
queue_metrics = {}
Sidekiq::Queue.all.each do |queue|
queue_metrics[queue.name] = {
backlog: queue.size,
latency: queue.latency.to_i,
workers: queue_workers[queue.name]
}
end
return {
queues: queue_metrics,
workers: {
total_size: workers.size,
total_capacity: total_capacity,
total_busy: total_busy
}
}
end
Ruby VM Stats
We use ObjectSpace
and GC
class to collect stats and then ship them to our monitoring system
Stats about objects and size of those objects
counts = ObjectSpace.count_objects
sizes = ObjectSpace.count_objects_size
GC stats tells us about (you guessed it) GC
stats = GC.stat
Closing Note
This is very minimal and basic snippets to collect stats, with some glue code you can ship it monitoring system of your choice. Hope it is helpful
sidenote: This is just monitoring for systems that are part of web service, we have other monitoring in place for different systems that we have run. In rails we monitor typical http stuff for example duration, response code, and other metrics for each route using influxdb-rails