Lately I have been using bluepill to to monitor long-running processes on my application servers. The guys at serious business wrote bluepill out of their frustrations with god and monit, which gradually leak memory over long periods in certain conditions. bluepill is a simple piece of code with a small feature set, but does all you need to keep your processes alive. It even has parent/child monitoring for the likes of Unicorn master/worker processes.
Right now I am using it in Bugle production to monitor the delayed_job master process. It will also be useful if (or when) I get a chance to try Unicorn. Delayed Job is used in Bugle for two things right now, processing uploads (storing/deleting to/from S3) and delivering all application emails asynchronously. Here is the bluepill monitoring configuration script for it;
Bluepill.application("bugle") do |app|
app.process("delayed_job") do |process|
process.start_command = "/apps/bugle/current/script/delayed_job start -eproduction"
process.pid_file = "/apps/bugle/current/tmp/pids/delayed_job.pid"
process.uid = "bugle"
process.gid = "bugle"
end
end
4 comments so far
matt Feb 17, 2010
Plataforma just did a nice write up of this same task. They also go into detail on setting up a capistrano hook to start/stop/restart bluepill.
Kenny Johnston May 27, 2010
I have spent quite a bit of time on this topic. I was fed up with not having a good solution for it so I wrote the delayed_job_tracer plugin that specifically addresses monitoring of delayed_job and its jobs.
Here’s is an article I’ve written about it – Monitoring delayed_job and its jobs
This plugin will monitor your delayed job process and send you an e-mail if delayed_job crashes or if one of its jobs fail.
Guest Sep 27, 2010
I get this error in the Bluepill log: <pre> W, [2010-09-27T12:06:17.746270 #25621] WARN -- : [fsg_distro:delayed_job] pid_file /srv/my_app/current/tmp/pids/delayed_job.pid does not exist or cannot be read </pre> Any idea how to fix?
Matt Sep 28, 2010
Check permissions for writing to that file/directory and check delayed_job is running under the expected Rails environment.
If that fails maybe try posting an issue (or searching) on the delayed_job git repo