Closed Bug 661522 Opened 13 years ago Closed 11 years ago

nagios checks for monitoring master uptime

Categories

(Release Engineering :: General, defect, P3)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: catlee, Unassigned)

References

Details

(Whiteboard: [nagios][buildbotmaster])

masters shouldn't be up and running for months at a time, they should be restarted periodically (that's another bug). We need nagios checks to assert that master processes are no more than X days old. I'm thinking 14 is a good number to start with.

Leaving in releng until we figure out exactly what we want here.
Blocks: 661523
Priority: -- → P3
(In reply to Chris AtLee [:catlee] from comment #0)
> masters shouldn't be up and running for months at a time, they should be
> restarted periodically (that's another bug). We need nagios checks to assert
> that master processes are no more than X days old. I'm thinking 14 is a good
> number to start with.
An alert for any master with >14days uptime sounds good to me also. This will help us keep masters "fresh" by rebooting them. We can always adjust the 14 threshold up/down later based on how this feels after trying it.

> Leaving in releng until we figure out exactly what we want here.

I think thats all we need - anything else before we push over to IT for the nagios setup?
Component: Release Engineering → Release Engineering: Developer Tools
QA Contact: hwine
Product: mozilla.org → Release Engineering
Do we still think masters shouldn't be running for so long?
Flags: needinfo?(catlee)
doesn't seem to be an issue latetly
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: needinfo?(catlee)
Resolution: --- → WORKSFORME
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.