Closed Bug 1141993 Opened 9 years ago Closed 9 years ago

Pick & set thresholds for alerts using the rabbitmq New Relic plugin

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Now that bug 1093757 has installed the rabbitmq plugin, we can set up alerts for a number of rabbitmq related things:
* Messages Available (# messages)
* Publish Rate (messages/sec)
* Delivery Rate (messages/sec)
* Open Channels
* Consumers

We'll need to leave the plugin running for a while to figure out sensible thresholds for these for both prod and stage.

https://rpm.newrelic.com/accounts/677903/dashboard/6293241/page/4
Done today as part of dealing with bug 1152681.

Prod + stage:
* Messages Available (# messages): warn=50, crit=100
* Publish Rate (messages/sec): warn=60, crit=100
* Delivery Rate (messages/sec): warn=60, crit=100
* Open Channels: warn=80
* Consumers: warn=60

The "Messages Available" stat is the most critical.

To change them later, use the settings cog on:
https://rpm.newrelic.com/accounts/677903/plugins/16138

The email alerts for plugins don't use the new alert section, so have to be set up here:
https://rpm.newrelic.com/accounts/677903/integrations?page=alerting#tab-integrations=_mobile_amp_plugin_alert_email

I've added my email to start with, and once I know how noisy it will/won't be (eg the text help on that page contradicts itself a few times - not clear what other non-plugin notifications will be sent to the email there as well) I can try adding others.
Assignee: nobody → emorley
Blocks: 1152681
Status: NEW → RESOLVED
Closed: 9 years ago
Priority: P3 → P2
Resolution: --- → FIXED
But the alerts/thresholds themselves seem to be working well, eg see the red shaded areas here:
https://rpm.newrelic.com/accounts/677903/dashboard/6293241?tw[dur]=last_6_hours&tw[end]=1428587506
We were getting alerts on stage, since the recent new queue additions have bumped up the totals. As such I've updated stage/prod to the following:

Note: higher values=worse (ie: the alert goes off when the actual value goes above the alert value).

(In reply to Ed Morley [:emorley] from comment #1)
> * Open Channels: warn=80

warn=110, crit=130

> * Consumers: warn=60

warn=65, crit=75

These values had previously been updated to be more strict, including here for reference:

> * Publish Rate (messages/sec): warn=60, crit=100

warn=50, crit=80

> * Delivery Rate (messages/sec): warn=60, crit=100

warn=50, crit=80
You need to log in before you can comment on or make changes to this bug.