Adjust nagios alert levels for git1.dmz.scl3.mozilla.com:Load

RESOLVED FIXED

Status

Infrastructure & Operations
MOC: Service Requests
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: hwine, Assigned: w0ts0n)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

3 years ago
git spawns many processes for small tasks, so git servers operate just fine with much higher load numbers than usual.

Please set the current "critical" level to be the new warning level, and make the critical level be 150.

Todays event alerted a 113 (140 seen upon login). A previous alerting "non event" was also around 104 bug 1087597.

An event that did require intervention (but not a restart) load numbers > 500 for more than 4 hours (see bug 1087640 attachment 8510038 [details])

The proposed values would only have alerted for the case where action was needed.

(We do expect to see higher load on git.mozilla.org as the FxOS release engineering team starts taking more of the build load. We may need to adjust further.))
(Reporter)

Comment 1

3 years ago
Created attachment 8546312 [details]
graphite-scl3.mozilla.org.png

1 yr max process load on git1
(Assignee)

Updated

3 years ago
Assignee: nobody → rwatson
(Assignee)

Comment 2

3 years ago
Worked with Ashish on this. Adjustments have been made. 

"Please set the current "critical" level to be the new warning level, and make the critical level be 150."
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.