Status

Infrastructure & Operations
WebOps: Other
RESOLVED FIXED
8 years ago
4 years ago

People

(Reporter: clouserw, Assigned: oremj)

Tracking

Details

(Reporter)

Description

8 years ago
Rabbitmq is running on the gearman boxes (or at least one of them).  We should send stats to munin so we can monitor it over time:

http://github.com/ask/rabbitmq-munin/

(I'm assuming nagios is already making sure it's running, but if that's not the case please do that also)
(Assignee)

Updated

8 years ago
Assignee: server-ops → jeremy.orem+bugs
(Assignee)

Comment 1

8 years ago
Installed munin plugins.  What about rabbitmq should I be monitoring, just a process or tcp check?
(Reporter)

Comment 2

8 years ago
(In reply to comment #1)
> Installed munin plugins.  What about rabbitmq should I be monitoring, just a
> process or tcp check?

I think we should be monitoring the same stuff munin is.  Munin is just execing rabbitmqctl, the scripts it's running are almost good enough for nagios - it even has warn/crit thresholds.  They are all at http://github.com/ask/rabbitmq-munin
(Assignee)

Comment 3

8 years ago
What will the action be if nagios goes off? Need to make docs for the other admins.
(Reporter)

Comment 4

8 years ago
If it's below threshold for workers, start more workers and then figure out why they disappeared.  

If the queue is too high, order more hardware I guess.  Also let webdev know so we can throttle back unimportant jobs.

I think amo-developers should get these pages too.
(Reporter)

Comment 5

8 years ago
Only the connections graph is working in munin, all the rest are blank.  If you run the commands manually do they execute?  If you're running them through sudo don't forget to add rabbitmqctl to what it can run.
(Assignee)

Comment 6

8 years ago
I didn't have "env.vhost vhostname" set. I was hoping by default it would just graph all vhosts. Kind of lame that it will only do 1.
(Assignee)

Comment 7

8 years ago
Turns out these plugins don't work with celery at all. It expects just a couple of queues to exists and celery has created over 7,000 queues.
Can we try these again now that we're not creating tons of result queues? (bug 567932)
(Assignee)

Comment 9

8 years ago
Graphs are up: http://munin.mozilla.org/munin/gearman/pm-gearman-amo01.mozilla.org/
(Assignee)

Updated

8 years ago
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.