Rabbitmq is running on the gearman boxes (or at least one of them). We should send stats to munin so we can monitor it over time: http://github.com/ask/rabbitmq-munin/ (I'm assuming nagios is already making sure it's running, but if that's not the case please do that also)
Installed munin plugins. What about rabbitmq should I be monitoring, just a process or tcp check?
(In reply to comment #1) > Installed munin plugins. What about rabbitmq should I be monitoring, just a > process or tcp check? I think we should be monitoring the same stuff munin is. Munin is just execing rabbitmqctl, the scripts it's running are almost good enough for nagios - it even has warn/crit thresholds. They are all at http://github.com/ask/rabbitmq-munin
What will the action be if nagios goes off? Need to make docs for the other admins.
If it's below threshold for workers, start more workers and then figure out why they disappeared. If the queue is too high, order more hardware I guess. Also let webdev know so we can throttle back unimportant jobs. I think amo-developers should get these pages too.
Only the connections graph is working in munin, all the rest are blank. If you run the commands manually do they execute? If you're running them through sudo don't forget to add rabbitmqctl to what it can run.
I didn't have "env.vhost vhostname" set. I was hoping by default it would just graph all vhosts. Kind of lame that it will only do 1.
Turns out these plugins don't work with celery at all. It expects just a couple of queues to exists and celery has created over 7,000 queues.
Can we try these again now that we're not creating tons of result queues? (bug 567932)