Closed Bug 398460 Opened 17 years ago Closed 16 years ago

Intermittent slave failures on qm-pxp0*

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rcampbell, Unassigned)

References

Details

Seen on the qm-pxp0n "blades", occasionally a talos run will turn red due to download or zip file failures. Download failures could probably be fixed by repolling the download site a few times. Zip failures are not really "fixable" and should probably continue to fail.
If you are being served corrupted zips is there someone in build who could look at it?
yes, absolutely. And they should!
I've been watching for these lately and I'm 99% sure the packages get corrupted during the transfer. On a few occasions I've manually tested a package that the Talos machine had trouble with and each time I had no issues with it.
So, we could try something simple like adding a step to test the zip and looping to re-download, with some failsafe to break out if it looks like there really is something wrong with the copy on the server.

unzip comes with a test feature, you can check if a zip is okay with unzip -tq.  I'd assume that other unzippers have similar options for us to work with.
It's going to be a bit more complicated than that -- Buildbot doesn't have a "looping" concept for BuildSteps.

We may be able to string some shell commands together to accomplish this. "wget ... && unzip -tq firefox.zip"..something like that.

Another idea that popped into my head is simply doing a TinderboxPrint when unpacking the zip file fails. Showing "bad build" on the main page may mitigate the red tree.
Can't buildbot have a buildstep which is a script that can loop?  It's got to be running the Talos code somehow...
Yeah, I think that's what I was trying to say in my second paragraph (but did a poor job of it).
This was resolved with having talos pull build zips from dated directories.  The redness was due to talos attempting to download a build while a given build machine was dropping a new build with the same name in the same directory.  Having builds go into unique, dated directories means that we no longer get any collisions.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
(In reply to comment #8)
> This was resolved with having talos pull build zips from dated directories. 
> The redness was due to talos attempting to download a build while a given build
> machine was dropping a new build with the same name in the same directory. 
> Having builds go into unique, dated directories means that we no longer get any
> collisions.
 
...and we really like that! :-D
Mass move of Core:Testing bugs to mozilla.org:ReleaseEngineering. Filter on RelEngMassMove to ignore.
Component: Testing → Release Engineering
Product: Core → mozilla.org
QA Contact: testing → release
Version: Trunk → other
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.