Closed Bug 644402 Opened 14 years ago Closed 12 years ago

Try harder to get a fast slave

Categories

(Release Engineering :: General, defect, P5)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Unassigned)

Details

(Whiteboard: [automation])

When there are lots of checkins we are often getting VMs for win32 compiles, which are much slower to complete the job. We need to try harder to find a fast slave before falling back to a VM. I think what is happening is * one of the masters polls the db for pending work (every minute) * finds some * checks slave availability * doesn't have any fast builders, and so gives work to a VM That makes sense if we lose a colo or something similarly disastrous, but not for normal operation. I think we need to add some sort of delay, so that work is only assigned to VMs after all the masters have had a chance to poll and assign it to an available fast slave.
This is a dupe of another bug that catlee's working on, but I can't find it right now..
Perhaps you're thinking of bug 636101, but that's a bit different.
Priority: -- → P5
Whiteboard: [automation][scheduler]
Whiteboard: [automation][scheduler] → [automation]
Fom discussion with catlee here's a sketch of how this might work. In _nextFastSlave (http://mxr.mozilla.org/build/source/buildbotcustom/misc.py#286) we can get the list of build requests for the builder (but not the actual request(s) looking for a slave). If we have a fast slave there's no change. If we only have slow slaves, then we only return one if the oldest request is older than some threshold (say 5 minutes). Otherwise we claim there are no slaves available.
Is this still an issue?
Product: mozilla.org → Release Engineering
We don't use Windows VMs any more, but we have AWS instances for linux instead. Bug 936222 is the most recent bug to handle this issue.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.