It would be incredibly valuable to get graphs of RabbitMQ's status in ganglia, specifically the number of messages in the celery queue. An example of using rabbitmqctl for this is: $ sudo rabbitmqctl list_queues -p kitsune Listing queues ... celeryctl_localhost.localdomain 0 celery 0 ...done. (In production, it's probably not called 'kitsune'.) We care about the "celery 0" line. It's not always 0. This would help us keep an eye on and understand the throughput rate of tasks in production, and diagnose problems.
I've created this script, and have it graphing on sumocelery01 with one caveat. The monitoring software runs as the 'nobody' user, which obviously doesn't have access to sbin/rabbitmqctl, so I've created a sudoers line that allows it to execute a single command. I'm unaware of a way to allow sudoers to take arguments (especially safely), so a new sudoers line should be added for each queue to monitor. If we're only monitoring "celery", that makes it easy. The graphs for this can be found here: https://ganglia.mozilla.org/phx1/?c=sumo&h=rabbit-sumo&m=load_one&r=hour&s=descending&hc=4&mc=2 Please let me know if any more hosts require these graphs.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.