Closed Bug 652615 Opened 13 years ago Closed 13 years ago

[sumo] Ganglia report of RabbitMQ status

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jsocol, Assigned: bkero)

Details

It would be incredibly valuable to get graphs of RabbitMQ's status in ganglia, specifically the number of messages in the celery queue. An example of using rabbitmqctl for this is:

$ sudo rabbitmqctl list_queues -p kitsune
Listing queues ...
celeryctl_localhost.localdomain	0
celery	0
...done.

(In production, it's probably not called 'kitsune'.) We care about the "celery 0" line. It's not always 0.

This would help us keep an eye on and understand the throughput rate of tasks in production, and diagnose problems.
Assignee: server-ops → bkero
I've created this script, and have it graphing on sumocelery01 with one caveat.

The monitoring software runs as the 'nobody' user, which obviously doesn't have access to sbin/rabbitmqctl, so I've created a sudoers line that allows it to execute a single command.  I'm unaware of a way to allow sudoers to take arguments (especially safely), so a new sudoers line should be added for each queue to monitor.  If we're only monitoring "celery", that makes it easy.

The graphs for this can be found here: https://ganglia.mozilla.org/phx1/?c=sumo&h=rabbit-sumo&m=load_one&r=hour&s=descending&hc=4&mc=2

Please let me know if any more hosts require these graphs.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.