Closed Bug 914686 Opened 11 years ago Closed 11 years ago

The builds-4hr.js.gz nagios http file age check didn't notify after 900s

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: rbryce)

References

Details

Bug 914574 caused the nagios 'builds-4hr.js.gz http file age' check to fire, however the email that came through said:

{
***** Nagios  *****

Notification Type: PROBLEM

Service: http file age - /buildjson/builds-4hr.js.gz
Host: builddata.pub.build.mozilla.org
Address: 63.245.215.57
State: CRITICAL

Date/Time: 09-10-2013 04:24:09

Additional Info:
HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:24:40 ago - 1367826 bytes in 0.051 second response time
}

The alert is set to 900s, but the email wasn't sent until 1480s after.

I'm presuming the check only runs periodically, so took a while to fire?

Please can:
a) The check run more frequently.
b) The age-threshold used when the check actually runs, be changed from 900s to 300s (I'm pretty sure we won't get any false positives with this value, the cron runs every 60s subject to lock, and shouldn't take anywhere near 5 mins to complete - and I'd rather get the odd false positive than have delays when it is actually broken).

Cheers! :-)
changed the check_interval to 5mins (300 secs). As well, changed the file age threshold on the check script from 900 secs to 300secs.
Assignee: server-ops → rbryce
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Blocks: 926246
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.