Closed Bug 768042 Opened 13 years ago Closed 10 years ago

builders should not take jobs when there are down infrastructure services

Categories

(Release Engineering :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: k0scist, Unassigned)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1929] [schedulers])

Currently, slaves take jobs regardless of the state of infrastructure. Since we have e.g. status.mozilla, the (buildbot) scheduler could check with this before dispatching jobs. This could be blanket (easiest: having a list of all services that builders/testers need and don't dispatch any jobs unless they're all up since we know something will fail) or fine-grained (having each class of test/build have a list of services that it needs). Pulse messages could also be utilized to this end, or a RESTful HTTP client could poll to determine the state. Obviously not a trivial task. I wasn't sure whether to ticket here or Testing:General but it would be nice to aspire to this.
Severity: normal → enhancement
Priority: -- → P3
Whiteboard: [schedulers]
Product: mozilla.org → Release Engineering
I wonder how useful this is now that we have pretty good RETRYing?
Whiteboard: [schedulers] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1929] [schedulers]
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.