7:17 AM <@nagios-scl3> (IRC) Thu 07:17:31 PST  treeherder-rabbitmq1.private.scl3.mozilla.com:Rabbit Unread Messages is CRITICAL: RABBITMQ_OVERVIEW CRITICAL - messages CRITICAL (5031) messages_ready CRITICAL (4919), messages_unacknowledged OK (112) (http://m.mozilla.org/Rabbit+Unread+Messages)
Will, could you take a look? The generate-alerts task is still maxing out the prod rabbitmq node. I'm wondering if we can modify the task to not try to generate alerts for data older than N weeks? (There's not much point generating alerts for data from a year ago etc). Either way, should we back this out/disable until the perf issues are sorted? https://rpm.newrelic.com/accounts/677903/servers/5575925?tw%5Bend%5D=1449159210&tw%5Bstart%5D=1449137610 https://rpm.newrelic.com/accounts/677903/applications/4180461?tw%5Bend%5D=1449159234&tw%5Bstart%5D=1449137634 (switch to non-web if not already) https://rpm.newrelic.com/accounts/677903/applications/4180461_h5411945/transactions?tw%5Bend%5D=1449159307&tw%5Bstart%5D=1449137707
Priority: -- → P1
Yeah, we shouldn't generate alerts for really old data. Filed bug 1230188 to take care of that. I'd prefer not to revert prod if it's not necessary. It seems to be stabilizing: can we see where we are in an hour or so?
New alerts appearing this morning again 9:42 AM <@nagios-scl3> (IRC) Fri 09:42:46 PST  treeherder-rabbitmq1.private.scl3.mozilla.com:Rabbit Unread Messages is CRITICAL: RABBITMQ_OVERVIEW CRITICAL - messages CRITICAL (7370) messages_ready CRITICAL (7338), messages_unacknowledged OK (32) (http://m.mozilla.org/Rabbit+Unread+Messages)
Yeah this was still expensive even after the initial run. Bug 1230188 (now deployed) helps a ton. Load on rabbitmq is way down now. https://rpm.newrelic.com/accounts/677903/servers/5575925
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Thanks for sorting this :-)
Assignee: nobody → wlachance
You need to log in before you can comment on or make changes to this bug.