Comment on attachment 8681493 [details] [diff] [review] master-age.diff Review of attachment 8681493 [details] [diff] [review]: ----------------------------------------------------------------- FYI, we just got done increasing this number because people only wanted to reboot during a TCW every 6+ weeks. We need an accompanying process change to go along with the nagios change.

Attachment #8681493 - Flags: review?(arich) → review+

Chris AtLee [:catlee]

Assignee

Comment 2

•

10 years ago

That's true...but I don't think there's a particular need to do this inside a TCW. Slow rolling restarts should be ok.

hwine

Comment 3

•

10 years ago

any timing adjustments should be reflected in bug 1197853 - fwiw, we cancelled October's restart based on report that is wasn't needed (and would have been on TCW activity)

Updated

•

10 years ago

Attachment #8681493 - Flags: checked-in+

Chris AtLee [:catlee]

Assignee

Updated

•

10 years ago

Assignee: nobody → catlee

Nick Thomas [:nthomas] (UTC+12)

Comment 4

•

10 years ago

I think I did the October restart anyway as nagios was alerting. +1 to restarting more frequently. I think we need to take assorted tools we have and productionise them. Pretty sure coop has something, and I've kept catlee's fabric enhancements going at https://github.com/nthomas-mozilla/build-tools/tree/fabric (this got flaky on the last few masters when I used it, for unknown reasons).

Chris AtLee [:catlee]

Assignee

Updated

•

10 years ago

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Armen [:armenzg]

Comment 5

•

10 years ago

Do we have a check to detect the masters starting to get into this bad state? I would assume it won't be hard to sell rebooting the masters more often if it help us keeping the Windows throughput better. Even if there is an increased risk for the master rebooting and getting into a bad state for a short time.

Chris Cooper [:coop] (he/him)

Comment 6

•

10 years ago

My script is here: https://github.com/ccooper/build-tools/blob/master/buildfarm/maintenance/restart_masters.py I ran it this weekend to restart all the masters. It's not perfect -- we hung on two masters requiring manual intervention -- but we could certainly dig into those issues and fix them. We could schedule the script to trigger restarts every weekend without much issue.

BMO Automation

Updated

•

7 years ago

Product: Release Engineering → Infrastructure & Operations

BMO Automation

Updated

•

6 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Restart buildbot masters more frequently

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

Tracking

(Not tracked)

People

(Reporter: catlee, Assigned: catlee)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Updated

Updated

Comment 4

Updated

Comment 5

Comment 6

Updated

Updated

Attachment

General

Description

File Name

Content Type