<philor> oh, bummer, before you get to push you'll have to file the "10.6 times out linking" bug OS X 10.6.2 mozilla-central leak test build [busted] Started 19:29, finished 21:08, took 99min http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1272410986.1272416871.6556.gz&fulltext=1
The log snippet (shortened) is: rm -f XUL Making symlinks to the original object files in the archive libraries ../../toolkit/xre/libxulapp_s.a ../../staticlib/components/libxpconnect.a ... g++-4.2 -fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-variadic-macros -Werror=return-type -Wno-long-long -fno-strict-aliasing -fpascal-strings -fno-common -fshort-wchar -pthread -DNO_X11 -pipe -DDEBUG -D_DEBUG -DDEBUG_cltbld -DTRACING -gdwarf-2 -fPIC -o XUL nsStaticXULComponents.o nsUnicharUtils.o nsCompressedCharMap.o ... command timed out: 5400 seconds without output, killing pid 1713 ----- There are 4 more occurrences of timeouts on m-c in the last few days, so including the one in comment #0 we have: * moz2-darwin10-slave18, Mon Apr 26 00:15:24 2010, 2968d19b0165 * moz2-darwin10-slave09, Tue Apr 27 02:20:07 2010, 590a7be0e60f * moz2-darwin10-slave18, Tue Apr 27 06:59:51 2010, c685e50cf4b6 * moz2-darwin10-slave18, Tue Apr 27 16:29:46 2010, 87ceb85b85df * moz2-darwin10-slave09, Tue Apr 27 23:32:23 2010, cdacbfe66545 The timestamps are when the build started. Overall that's 5 timeouts in 100 builds. The strange thing is that the same machines will do a debug build after failing and be fine, then later still fail again. So a clobber probably won't help. We have slightly different gcc's on the command line on 10.5 vs 10.6: moz2-darwin9-slave18:~ cltbld$ g++-4.2 --version i686-apple-darwin9-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5564) moz2-darwin10-slave18:~ cltbld$ g++-4.2 --version i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1) and use different SDKs. Bug 558496 for other 10.6 flakiness - alive test 5 (aka trace malloc) hanging.
Summary: 10.6 times out linking → 10.6 debug build times out linking XUL
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1272771435.1272777188.5902.gz OS X 10.6.2 mozilla-central leak test build on 2010/05/01 20:37:15 s: moz2-darwin10-slave09
Summary: 10.6 debug build times out linking XUL → 10.6 debug build times out linking XUL | command timed out: 5400 seconds without output
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1272893210.1272900071.26217.gz OS X 10.6.2 mozilla-central leak test build on 2010/05/03 06:26:50 Also, edited the summary to make it easier to find.
Summary: 10.6 debug build times out linking XUL | command timed out: 5400 seconds without output → 10.6 debug build times out linking XUL (timeout when linking libxul) | command timed out: 5400 seconds without output
The 10.6 builds are happening on Mac Minis with 1 GB of RAM, which I think explains the timeouts. Will bump up the timeout for now, and look into our other options.
Created attachment 443122 [details] [diff] [review] Bump timeout on builds to 2 hours
Attachment #443122 - Flags: review?(lsblakk)
Attachment #443122 - Flags: review?(lsblakk) → review+
Comment on attachment 443122 [details] [diff] [review] Bump timeout on builds to 2 hours changeset: 711:9a76896c8c10
Attachment #443122 - Flags: checked-in+
Nothing more for me to do here. We're looking at taking one or two of our 10.5 xserves and putting 10.6 on it to do the 10.6 builds. This will impact leopard build and test times though.
Assignee: catlee → nobody
I will need an xserve assigned to me by bhearsum to research this
Status: NEW → ASSIGNED
ok, bhearsum has assigned me bm-xserve07
in #developers it was pointed out that this article may help with any xul linking memory issues (by cjones) - tho it may not work with osx. http://blog.mozilla.com/cjones/2009/11/25/helping-ld-link-libxul-more-quickly/
Over the weekend I ran a batch file that did nothing but "make clean; make" in a loop with all output logged and wrapped with gnu time. Not once during the 70 full builds did it fail on the libxul link step. Rail notes in IRC if we can increase the swap to effectively increase memory available without having to change hardware.
Was that on a 1G mini or the 4G xserve we had imaged up with 10.6 ? AIUI mac handles swap in a dynamic way, grabbing free disk space as it needs it. If possible we should avoid swapping at all.
The test was run on the 4G xserve.
I have now let the test run for 5 days without hitting any linking issues and from the period I'm seeing above the only way I think I can continue to test (realistically) would be to constrain the memory (and disk) to force the issue. Unless the priority of the bug is raised higher I don't think that would be a good use of my time with other Q2 goals at hand. The solution, such as it is, seems to be to make sure that disk space on the smaller ram minis is large enough to allow swap or to target only the larger ram macs.
What's the status of the xserve used in this test? If it's not doing anything else it needs to be re-imaged back to a 10.5 one.
it's idle afaik - just logged in to it and it's not running anything buildbot related
Status: ASSIGNED → NEW
The font of all knowledge (aka philor) hasn't seen this in ages, so WFM since we don't know what fixed it. More recently we jumped from 1G to 3G of RAM.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → WORKSFORME
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.