treeherder-rabbitmq1.private.scl3.mozilla.com:Rabbit Unread Messages is CRITICAL: RABBITMQ_OVERVIEW CRITICAL - messages CRITICAL (5031) messages_ready CRITICAL (4919),

RESOLVED FIXED

Status

Tree Management
Treeherder: Infrastructure
P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: Marlena, Assigned: wlach)

Tracking

Details

(Reporter)

Description

2 years ago
7:17 AM <@nagios-scl3> (IRC) Thu 07:17:31 PST [5479] treeherder-rabbitmq1.private.scl3.mozilla.com:Rabbit Unread Messages is CRITICAL: RABBITMQ_OVERVIEW CRITICAL - messages CRITICAL (5031) messages_ready CRITICAL (4919), messages_unacknowledged OK (112) (http://m.mozilla.org/Rabbit+Unread+Messages)

Comment 1

2 years ago
Will, could you take a look? The generate-alerts task is still maxing out the prod rabbitmq node. I'm wondering if we can modify the task to not try to generate alerts for data older than N weeks? (There's not much point generating alerts for data from a year ago etc). Either way, should we back this out/disable until the perf issues are sorted?

https://rpm.newrelic.com/accounts/677903/servers/5575925?tw%5Bend%5D=1449159210&tw%5Bstart%5D=1449137610

https://rpm.newrelic.com/accounts/677903/applications/4180461?tw%5Bend%5D=1449159234&tw%5Bstart%5D=1449137634 (switch to non-web if not already)

https://rpm.newrelic.com/accounts/677903/applications/4180461_h5411945/transactions?tw%5Bend%5D=1449159307&tw%5Bstart%5D=1449137707
Flags: needinfo?(wlachance)
Priority: -- → P1
Yeah, we shouldn't generate alerts for really old data. Filed bug 1230188 to take care of that.

I'd prefer not to revert prod if it's not necessary. It seems to be stabilizing: can we see where we are in an hour or so?
Flags: needinfo?(wlachance)

Updated

2 years ago
Depends on: 1230188
(Reporter)

Comment 3

2 years ago
New alerts appearing this morning again


9:42 AM <@nagios-scl3> (IRC) Fri 09:42:46 PST [5136] treeherder-rabbitmq1.private.scl3.mozilla.com:Rabbit Unread Messages is CRITICAL: RABBITMQ_OVERVIEW CRITICAL - messages CRITICAL (7370) messages_ready CRITICAL (7338), messages_unacknowledged OK (32) (http://m.mozilla.org/Rabbit+Unread+Messages)
Yeah this was still expensive even after the initial run. Bug 1230188 (now deployed) helps a ton. Load on rabbitmq is way down now.

https://rpm.newrelic.com/accounts/677903/servers/5575925
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED

Comment 5

2 years ago
Thanks for sorting this :-)
Assignee: nobody → wlachance
You need to log in before you can comment on or make changes to this bug.