Closed Bug 1010126 Opened 10 years ago Closed 9 years ago

Buildbot masters should only be able to run a single reconfig at once

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
All
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: pmoore, Assigned: coop)

References

Details

We ran into an issue where a reconfig was initiated (based off more than one bbot csets) up to three times before any of them finished on certain masters. See bug 1009880.

This was partly due to human error but as we move to an automatic/cron method of reconfigs there is a danger of this happening again; i.e. a cron reconfig starting and a human initiated reconfig being triggered before the cron finishes.

We should have some way of ensuring that a reconfig can only happen on a master when there is not already a reconfig currently being currently conducted on that master (for example, using a lock file).
Component: Buildduty → Platform Support
QA Contact: bugspam.Callek → coop
I think he easiest thing to do here is to have manage_masters.py generate and use lockfiles. The question becomes: which manage_masters actions we want to protect with lockfiles? Just reconfig? All actions?

We'd also need to add a nagios alert or similar check for stale lockfiles.
See Also: → 1040013
Bug 978928 introduced the reconfig.lock file, and made the other tools respect it.
Assignee: nobody → coop
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.