Vista talos boxes reporting as busted

RESOLVED WORKSFORME

Status

RESOLVED WORKSFORME
11 years ago
6 years ago

People

(Reporter: rflint, Assigned: anodelman)

Tracking

Details

qm-mini-vista01-03 are completing their runs but are reporting in as busted - the logs don't seem to indicate any real issues either.
As far as either I or Justin know, we aren't handling Talos boxes yet...
Assignee: server-ops → anodelman
(Assignee)

Comment 2

11 years ago
These machines got confused as something internal changed that caused me to have to switch from cvs.mozilla.org to cvs-mirror.mozilla.org (see bug 410830) - I think that they should be fine after the next cycle.  If they don't right themselves I'll take a closer look.
They have indeed taken care of themselves, thanks Alice!
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → WORKSFORME
Maybe just a red herring, but I notice that the Vista boxes that claimed failure for their first successful run all had a "rm: cannot lstat `*.zip': Invalid argument" line in the log (from failing to have a previous build to remove, I assume), while the XP boxes that didn't call their first successful runs failed didn't squawk about not rm'ing something that wasn't there. I don't have much faith in it as a theory, though, since ep_unix.pl doesn't look like it would care about anything in that line.
(Assignee)

Comment 5

11 years ago
This is my going theory as well (the rm error message), I'm not quite sure why it results in a red tree.  I'll admit to not looking into it too deeply as it always seems to right itself after a cycle or two...
Not that I have any ideas other than maybe blame it on a newer MozillaBuild, but, institutionalizing "just ignore the red Vista Talos boxes" doesn't sound like a great thing. Slightly better than keeping the tree closed an extra two to four hours every time, but still...
(Assignee)

Comment 7

11 years ago
I think that there is a possibility of a fix on the buildbot side of the world to correctly ignore this sort of 'error'.  It's on my list of things to work on, but it's pretty low considering that there has to be some major upset in the everyday goings on of the talos boxes to hit this problem.
Phil: FWIW, tinderbox only calls a build busted if the status email has "status: busted" in it, not on log output, so the buildbot obviously believed for some reason that it had failed.
That makes me zero for two, then, since the rm thing wouldn't be "every time" but only "every time there's a connectivity problem that stopped it from grabbing a build the time before," a pretty small percentage of bustages.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.