Closed Bug 661523 Opened 13 years ago Closed 9 years ago

automatically gracefully restart buildbot masters every week

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P5)

x86_64
Linux

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: catlee, Unassigned)

References

Details

(Whiteboard: [buildmasters])

The idea here is to avoid long-running processes which suffer from things like memory leaks. It'd be great to fix those, but we don't always have time to.

I'm thinking something like this...On Sunday night, for each pool of masters, gracefully restart each master in a pool in series. Pools can be processed in parallel.

On Monday morning, buildduty (with help from nagios!) checks to see if the restarts completed successfully, or if they're hung somewhere.
This sounds pretty difficult.  Let's get periodic *manual* restarts going first.  I'd suggest doing this with slavealloc - bring a new instance up, disable the old in slavealloc, and graceful it.  Then we don't lose capacity at any point.

Anyway, futuristic for now..
Severity: normal → enhancement
Priority: -- → P5
Whiteboard: [buildmasters]
Product: mozilla.org → Release Engineering
Found in triage.
Component: Other → Platform Support
Do we still care about this now that we're moving most things to Taskcluster?
Flags: needinfo?(catlee)
QA Contact: coop
We have nagios alerts for buildbot process age, and we restart them manually periodically.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(catlee)
Resolution: --- → WORKSFORME
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.