Closed Bug 1233424 Opened 9 years ago Closed 8 years ago

rabbitmq1.private.scl3.mozilla.com:Rabbit Unread Messages is CRITICAL: RABBITMQ_OVERVIEW CRITICAL - messages CRITICAL (4370) messages_ready CRITICAL (4317), messages_unacknowledged

Categories

(Tree Management :: Treeherder: Infrastructure, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mlankford, Unassigned)

References

Details

Queue messages are building up

5:01 AM <@nagios-scl3> (IRC) Thu 05:01:55 PST [5057] treeherder-rabbitmq1.private.scl3.mozilla.com:Rabbit Unread Messages is CRITICAL: RABBITMQ_OVERVIEW CRITICAL - messages CRITICAL (4370) messages_ready CRITICAL (4317), messages_unacknowledged OK (53) (http://m.mozilla.org/Rabbit+Unread+Messages)
7:01 AM <@nagios-scl3> (IRC) Thu 07:01:57 PST [5094] treeherder-rabbitmq1.private.scl3.mozilla.com:Rabbit Unread Messages is CRITICAL: RABBITMQ_OVERVIEW CRITICAL - messages CRITICAL (5171) messages_ready CRITICAL (5119), messages_unacknowledged OK (52) (http://m.mozilla.org/Rabbit+Unread+Messages)
The PR in bug 1162947 will help with this. In the meantime all tasks in the fetch_missing_pushlogs queue can be deleted (on stage+prod), to reduce the spam on the API.

Will/Cameron, could you do the latter?
Flags: needinfo?(wlachance)
Flags: needinfo?(cdawson)
(In reply to Ed Morley (Away 15th Dec-4th Jan) [:emorley] from comment #2)
> The PR in bug 1162947 will help with this. In the meantime all tasks in the
> fetch_missing_pushlogs queue can be deleted (on stage+prod), to reduce the
> spam on the API.
> 
> Will/Cameron, could you do the latter?

I don't have the admin rights to do this, as far as I know. I may well be missing something, but it looks like rabbitmqctl requires root.
Flags: needinfo?(wlachance)
Everyone in the vpn_treeherder group has sudoers access as of bug 1193942.

The queues can also be cleared using celery from any treeherder node:
http://stackoverflow.com/a/33531638

Or via the rabbitmq control panel (though you need the fixed password for this):
http://treeherder-rabbitmq1.private.scl3.mozilla.com:15672/#/queues
http://treeherder-rabbitmq1.stage.private.scl3.mozilla.com:15672/#/queues
Ok, after a bit of coaching from Mauro, I just ran `../venv/bin/python ../venv/bin/celery -A treeherder worker -Q fetch_missing_pushlogs --purge`. I don't know if it helped anything.
still alerting as the numbers have increased to 6000+

•nagios-scl3> (IRC) Thu 11:11:58 PST [5196] treeherder-rabbitmq1.private.scl3.mozilla.com:Rabbit Unread Messages is CRITICAL: RABBITMQ_OVERVIEW CRITICAL - messages CRITICAL (6662) messages_ready CRITICAL (6622), messages_unacknowledged OK (40) (http://m.mozilla.org/Rabbit+Unread+Messages)
So the large queue is the "pushlog" queue, not the "fetch_missing_pushlogs" queue, at this point.  I'm watching it and it's at least going in the right direction now.  It was over 6700, and is not below 6500.  I've ping'd fubar and he and I are keeping an eye on it.  The count is gradually going down as I type.  6385 now...

I could purge that queue, but that would make the "fetch_missing_pushlogs" queue go berzerk and I fear we might lose some pushes.

For now, TH looks like it's keeping up with the latest pushes, so no interruption in service.  I'm going to keep an eye on the queue size and focus on getting Ed's PR reviewed.
Flags: needinfo?(cdawson)
nagios-scl3> (IRC) Thu 17:50:42 PST [5344] treeherder-rabbitmq1.private.scl3.mozilla.com:Rabbit Unread Messages is CRITICAL: RABBITMQ_OVERVIEW CRITICAL - messages CRITICAL (7419) messages_ready CRITICAL (7314), messages_unacknowledged OK (105) (http://m.mozilla.org/Rabbit+Unread+Messages)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.