Closed Bug 838925 Opened 11 years ago Closed 11 years ago

Add monitoring for stuck timeout loops

Categories

(Testing Graveyard :: Mozpool, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

Details

Attachments

(3 files)

Bug 817762 seems to be back again, at least (thankfully) in staging.

We should monitor for this in production, too.

I think the easiest way will be for mozpool to touch a file every time it runs the timeout loop.  Then, nagios can check the age of that file.
Attached patch bug838925.patchSplinter Review
Easy.  I'll add this config item via puppet, and then monitor it with nagios.
Attachment #711943 - Flags: review?(mcote)
corresponding patch for puppet
Attachment #711949 - Flags: review?(jwatkins)
And the change to add the monitoring in nagios.  check_file_age already exists on the imaging servers and in nrpe.cfg.
Attachment #711954 - Flags: review?(ashish)
Comment on attachment 711954 [details] [diff] [review]
infrapuppet.patch

Looks good!
Attachment #711954 - Flags: review?(ashish) → review+
Attachment #711949 - Flags: review?(jwatkins) → review+
Comment on attachment 711943 [details] [diff] [review]
bug838925.patch

D'oh! I just landed this one instead of the puppet patch.  I will back out if it's not OK.
Comment on attachment 711943 [details] [diff] [review]
bug838925.patch

Looks good.
Attachment #711943 - Flags: review?(mcote) → review+
The puppetagain and mozpool patches are landed.  I'll land the infra puppet patch when the others are in production.
infra puppet patch landed, although it had at least three bugs in it (wrong nagios server, wrong hostgroup, and in the wrong file)!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: