Closed Bug 415298 Opened 17 years ago Closed 17 years ago

qm-mini-xp01 is freaked out

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
Windows XP
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: stanshebs, Unassigned)

References

()

Details

One of the Talos machines is burning for unclear reasons: see http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1201909140.1201911987.19938.gz .
Actually, qm-mini-xp01 and qm-mini-vista01 both, and unless they have a counter and quit trying after n failures, this looks from the outside rather like the fatal flaw in the "Talos is robust because there are multiple machines" plan - they are both burning very quickly, and apparently when multiple machines are waiting for a build to test, the 01 machines get the first build available, and since fx-win32-tbox can't possibly crank out builds faster than the 01 brothers can unzip them and get permission denied errors and fail, I'd expect that without outside intervention the 02 and 03 boxes will never get a chance to show whether or not they will be all robust and unaffected.
These Talos bugs where one box takes all the builds, burns, basically closes the tree, and then just magically fixes itself after seven hours, are really rather annoying.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
If it's any consolation, I know what causes the bug (our current top contender for worst buildbot-failure-meets-talos-stupidity). I'm going to get a bug filed next week and I'll try and start putting something together ASAP.
(In reply to comment #1) > they are both burning very quickly, and apparently when multiple machines are > waiting for a build to test, the 01 machines get the first build available For what it's worth, there is a patch to solve this. It's in Buildbot 0.7.6 but Talos runs 0.7.5. If you want the patch without upgrading to 0.7.6 you can get it from here: http://buildbot.net/trac/changeset?new=buildbot%2Fprocess%2Fbuilder.py%40349&old=buildbot%2Fprocess%2Fbuilder.py%40310 (click 'unified diff' at the bottom).
You can follow the possible talos fix in bug 419492.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.