Closed
Bug 1028304
Opened 11 years ago
Closed 10 years ago
Test jobs on OSX on Cedar are busted.
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jgraham, Unassigned)
References
Details
Lots of errors of the form:
3 not in success codes: [0, 11]
Halting on failure while running ['unzip', '-q', '-o', '/builds/slave/talos-slave/test/build/firefox-33.0a1.en-US.mac.tests.zip']
I can't see anything obvious in the mozharness changes from production.
Comment 1•11 years ago
|
||
It's possible the upload was borked. Try a new osx build.
Reporter | ||
Comment 2•11 years ago
|
||
A build failed yesterday and another one today in the same way, so unless something changed it seems unlikely retrying will work (it also appears that debug builds were broken in this way as early as Monday).
Comment 3•11 years ago
|
||
Then I think the bug here is probably that osx *builds* are broken on cedar.
Comment 4•11 years ago
|
||
this might hint at something being wrong with osx builds on cedar - https://tbpl.mozilla.org/php/getParsedLog.php?id=42094331&full=1&branch=cedar
although upload and sendchange appears 'normal'
Comment 5•11 years ago
|
||
actually this unexpected args for for log spans more than one platform and more than one day/rev ?
Reporter | ||
Comment 6•11 years ago
|
||
I grabbed the build and tests package from one of the problematic builds and — I assuming I got the right one — they seemed to work (or at least decompress) fine.
Comment 7•11 years ago
|
||
Assuming you're downloading the same installer+test zip that the test is, I would then suspect potential disk space related bustage. I would suspect proxxy but that seems to not be live.
Reporter | ||
Comment 8•11 years ago
|
||
Do we have some ETA on fixing this? Unfortunately it is the OSX builds that I'm most interested in :(
Comment 9•11 years ago
|
||
Have you asked your team or checked Cedar's changelog to see if someone changed something that broke them?
I don't think any other tree is having issues.
Comment 10•11 years ago
|
||
We could reset cedar again if needed...
Reporter | ||
Comment 11•11 years ago
|
||
I guess https://hg.mozilla.org/projects/cedar/rev/48cb1e27d9a3 is a reasonable guess, as it seems to be the only cedar-specific change implicated in this breakage. Having said that, other than making the zipfiles larger it isn't obvious to me why it should have broken anything. It was also backed out from cedar, although later reintroduced in a merge. But it wasn't in the merge on 16th June when the debug tests started failing.
The error message on tbpl is:
13:15:10 INFO - error [/builds/slave/talos-slave/test/build/firefox-33.0a1.en-US.mac64.tests.zip]: reported length of central directory is
13:15:10 INFO - -76 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1
13:15:10 INFO - zipfile?). Compensating...
13:15:16 INFO - error: expected central file header signature not found (file #14081).
13:15:16 INFO - (please check that you have transferred or created the zipfile in the
13:15:16 INFO - appropriate BINARY mode and that you have compiled UnZip properly)
Which does suggest that the problem is that the zip doesn't get created correctly. But the file seemed OK when I tried it. Maybe I need to test OSX's command line unzip.
Comment 12•11 years ago
|
||
I agree that testing an osx unzip problem on non-osx is a non-valid test.
Comment 13•11 years ago
|
||
So... I found: Bug 971687
with the code at: http://mxr.mozilla.org/build/source/buildbot-configs/mozilla-tests/config.py#1646
Which is "enable mozbase unit tests on cedar"
I note the failing code in the log I was shared today:
mozversion.mozversion.LocalB2GVersion WARNING | Error pulling gaia file
ok
======================================================================
ERROR: test_save_path (test.TestCrash)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/builds/slave/ced-osx64-00000000000000000000/build/testing/mozbase/mozcrash/tests/test.py", line 97, in test_save_path
quiet=True))
File "/builds/slave/ced-osx64-00000000000000000000/build/testing/mozbase/mozcrash/mozcrash/mozcrash.py", line 92, in check_for_crashes
save_dump_file(dump_save_path, dump["minidump_path"], dump["minidump_extra"])
File "/builds/slave/ced-osx64-00000000000000000000/build/testing/mozbase/mozcrash/mozcrash/mozcrash.py", line 109, in save_dump_file
os.path.join(dump_save_path, os.path.basename(dump_path)))
TypeError: log() takes exactly 2 arguments (3 given)
Thats a failure of "log" in a mozbase test.
Is this anything to worry about dan?
Comment 14•11 years ago
|
||
To be clear, we're also failing on android atm with:
File "/builds/tegra-129/test/build/tests/mozbase/mozlog/mozlog/structured/commandline.py", line 5, in <module>
import argparse
ImportError: No module named argparse
Which again, is the in-tree mozlog
Comment 15•11 years ago
|
||
taking http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1403318743/firefox-33.0a1.en-US.mac.tests.zip
I hit a similar error:
snippet:
inflating: xpcshell/tests/xpcom/tests/unit/TestStringAPI
inflating: xpcshell/tests/xpcom/tests/unit/TestTArray
inflating: xpcshell/tests/xpcom/tests/unit/TestTextFormatter
inflating: xpcshell/tests/xpcom/tests/unit/TestThreadPoolListener
error: expected central file header signature not found (file #66185).
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
inflating: xpcshell/tests/xpcom/tests/unit/TestThreadUtils
inflating: xpcshell/tests/xpcom/tests/unit/TestTimers
inflating: xpcshell/tests/xpcom/tests/unit/TestUnicodeArguments
inflating: xpcshell/tests/xpcom/tests/unit/xpcshell.ini
inflating: xpcshell/tests/xpcshell.ini
I am not sure if that is in xpcshell or if the error just made it to output at that point. Either way, I think this suggests that the zip is corrupt AKA their is something wrong with the 'build/compile' job. Looking at the build job that uploaded the zip, 'make upload' appears to have uploaded fine and sha's match. So not sure where the corruption is happening.
Also, isn't it a sign the cedar tree is unhealthy that 'make -k check' is failing on all our cedar 'build' jobs?
Comment 16•11 years ago
|
||
(tested on an os x machine)
Comment 17•11 years ago
|
||
There are mozbuild and mozlog differences between m-c and cedar. I'm going to resolve the mozbuild changes and see if that helps.
Comment 18•11 years ago
|
||
......additionally based on Bug 989583 and https://tbpl.mozilla.org/?tree=Cedar&rev=48cb1e27d9a3 with https://tbpl.mozilla.org/?tree=Cedar&rev=bc4d904c46b8 it looks like its a possible that "number of things in the zip" that is at fault.
I think I've exhausted all the efforts I can expel as buildduty, my suggestions:
* Do a try run of cedar tip, and with a handful of testss disabled (in a way that doesn't add them to zip)
* Reset cedar again[?]
* Request an OSX build and test loaner and have someone figure out what needs to change in order to fix this
I'm not sure relative priorities here, so which method we go for will likely depend on those factors.
Comment 19•11 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #17)
> There are mozbuild and mozlog differences between m-c and cedar. I'm going
> to resolve the mozbuild changes and see if that helps.
Let's see how this fares: https://tbpl.mozilla.org/?tree=Cedar&showall=1&rev=f747fc6077ea
Reporter | ||
Comment 20•11 years ago
|
||
There are 66185 files in the archive; this seems suspiciously close to the Zip (not Zip64) limit of 65535. A working build had 63909 files.
I strongly suspect this is a file number limit and that resetting cedar won't help. Other trees don't have web-platform-tests, so they are presumably well inside the limit. A little Googling suggests that the shipped unzip with OSX might not support Zip64 (although the shipped zip does). So I guess we need a newer version of unzip on these machines; presumably it's only a matter of time before we hit this limit in other places.
Comment 21•11 years ago
|
||
(In reply to James Graham [:jgraham] from comment #20)
> There are 66185 files in the archive; this seems suspiciously close to the
> Zip (not Zip64) limit of 65535. A working build had 63909 files.
>
> I strongly suspect this is a file number limit and that resetting cedar
> won't help. Other trees don't have web-platform-tests, so they are
> presumably well inside the limit. A little Googling suggests that the
> shipped unzip with OSX might not support Zip64 (although the shipped zip
> does). So I guess we need a newer version of unzip on these machines;
> presumably it's only a matter of time before we hit this limit in other
> places.
Sounds logical.
This is somewhat related: http://serverfault.com/questions/454935/zip-3-0-not-backwardly-compatible-with-zip-2-3-1
although unlike that situation, I believe we zip and unzip our tests.zip against version 5.52 on our build and test machines respectively. The key thing here is, we are using < 6.0 (6.0 supports zip64)
Comment 22•11 years ago
|
||
I imagine backing out bug 989583 on Cedar will help get things going in the short term.
Reporter | ||
Comment 23•11 years ago
|
||
Yeah, that works for me if there isn't a short term solution. I don't know if anyone else has plans for cedar that particularly depend on the work in that bug. Getting a proper fix here is, at least, a prerequisite for landing web-platform-tests on m-c, and not doing backouts will make keeping Cedar largely consistent with m-c that much easier, so it would be great if we could find some better solution relatively quickly (more quickly than we can implement the real solution of not shipping every test for every test job, for example).
Comment 24•11 years ago
|
||
(In reply to Jordan Lund (:jlund) from comment #21)
> (In reply to James Graham [:jgraham] from comment #20)
> > There are 66185 files in the archive; this seems suspiciously close to the
> > Zip (not Zip64) limit of 65535. A working build had 63909 files.
> >
> > I strongly suspect this is a file number limit and that resetting cedar
> > won't help. Other trees don't have web-platform-tests, so they are
> > presumably well inside the limit. A little Googling suggests that the
> > shipped unzip with OSX might not support Zip64 (although the shipped zip
> > does). So I guess we need a newer version of unzip on these machines;
> > presumably it's only a matter of time before we hit this limit in other
> > places.
>
> Sounds logical.
>
> This is somewhat related:
> http://serverfault.com/questions/454935/zip-3-0-not-backwardly-compatible-
> with-zip-2-3-1
>
> although unlike that situation, I believe we zip and unzip our tests.zip
> against version 5.52 on our build and test machines respectively. The key
> thing here is, we are using < 6.0 (6.0 supports zip64)
How big a task is it to update the slaves with a new version of zip?
Comment 25•11 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #24)
> (In reply to Jordan Lund (:jlund) from comment #21)
> > (In reply to James Graham [:jgraham] from comment #20)
> > > There are 66185 files in the archive; this seems suspiciously close to the
> > > Zip (not Zip64) limit of 65535. A working build had 63909 files.
> > >
> > > I strongly suspect this is a file number limit and that resetting cedar
> > > won't help. Other trees don't have web-platform-tests, so they are
> > > presumably well inside the limit. A little Googling suggests that the
> > > shipped unzip with OSX might not support Zip64 (although the shipped zip
> > > does). So I guess we need a newer version of unzip on these machines;
> > > presumably it's only a matter of time before we hit this limit in other
> > > places.
> >
> > Sounds logical.
> >
> > This is somewhat related:
> > http://serverfault.com/questions/454935/zip-3-0-not-backwardly-compatible-
> > with-zip-2-3-1
> >
> > although unlike that situation, I believe we zip and unzip our tests.zip
> > against version 5.52 on our build and test machines respectively. The key
> > thing here is, we are using < 6.0 (6.0 supports zip64)
>
> How big a task is it to update the slaves with a new version of zip?
I see we have things like http://mxr.mozilla.org/build/source/puppet/modules/toplevel/manifests/slave/releng/build/standard.pp#13 but I'm not sure if we manage unzip itself with puppet. My guess is we use the default one that's installed with the machine.
We should upgrade both our build and test osx slaves if we go this route.
dustin - any ideas or thoughts WRT upgrading unzip on osx in terms of how and feasibility?
Flags: needinfo?(dustin)
Comment 26•11 years ago
|
||
If updating zip on the OSX slaves is non-trivial, we're going to try to switch tests.zip to tar files, which should get around the 64k limitation.
Comment 27•11 years ago
|
||
At least on OS X, yes, we're using the zip/unzip that ship with OS X:
[root@bld-lion-r5-068.build.releng.scl3.mozilla.com ~]# which zip
/usr/bin/zip
[root@bld-lion-r5-068.build.releng.scl3.mozilla.com ~]# which unzip
/usr/bin/unzip
Updating that just means building a PKG/DMG and installing it. You'd need to make sure the updated version is earlier in the PATH (which should be as easy as putting the results in /usr/local/bin).
It does seem like tar is a better choice, though.
Reporter | ||
Comment 28•11 years ago
|
||
I did a backout of the package-all-tests patch for now, so hopefully that will be enough to unblock me. How hard is it going to be to switch to tar?
Comment 29•11 years ago
|
||
This is a partial list of where we have a tests.zip hardcoded:
http://mxr.mozilla.org/build/search?string=tests.zip
There may be others where we detect if it endswith('zip') and has 'tests' in the name, or go by regex.
We probably need to change all of these to allow for either a tarball or zip, reconfig, make the change on one branch, and roll it out.
Comment 30•11 years ago
|
||
As Aki noted, this change would have to ride the trains, so for several releases we'd have to have logic in many places that could deal with a zip or a tarball.
As Callek pointed out on irc, switching to tar files may have implications for projects like Seamonkey and Thunderbird which do not use mozharness.
As Catlee pointed out, we don't use tarballs currently because of problems handling these on Windows; these problems were a couple of years ago and we don't know if they exist today.
We also do not have any visibility into how this change could impact downstream consumers that aren't part of buildbot, if any.
So, I don't think is option is particularly well-scoped, and I think this is likely to be at least a moderate pain, with lots of room to break things due to all the moving parts. Upgrading zip on OSX seems like a less painful option in the short term, although we may want to investigate switching to tar files for other reasons.
Reporter | ||
Comment 31•11 years ago
|
||
Note that even with the backout of Armen's patch, this is still breaking all OSX debug jobs on cedar.
Comment 32•11 years ago
|
||
I am currently trying to wrap up my Q2 so I won't be able to get to this this week. But to help move along, I filed 1032391. There is arguments both ways but the comment 30 here sums up why this might be best for the short term.
Flags: needinfo?(dustin)
Comment 33•10 years ago
|
||
It looks like the zip upgrade on mac was successful in bug 1032391 - can you confirm this fixed the underlying problem, and this bug can now be closed?
Thanks,
Pete
Flags: needinfo?(jlund)
Flags: needinfo?(james)
Reporter | ||
Comment 34•10 years ago
|
||
Yes this works fine now.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(james)
Resolution: --- → FIXED
Comment 35•10 years ago
|
||
Thanks James!
Updated•10 years ago
|
Flags: needinfo?(jlund)
You need to log in
before you can comment on or make changes to this bug.
Description
•