Closed Bug 417045 Opened 12 years ago Closed 10 years ago

Tracking bug for migrating build machines from 10.4 to 10.5

Categories

(Release Engineering :: General, defect, P3)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sayrer, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 1 obsolete file)

This improves SunSpider performance and allows us to use profile-guided optimizations on the mac. GCC has conformance bugs when using PGO with the 10.4uSDK.
(reassigning to correct component)

I assume this is for the mac machines on trunk/1.9 and also Moz2?
Component: Build Config → Build & Release
Product: Firefox → mozilla.org
QA Contact: build.config → build
Version: unspecified → other
I assume we need these landed before beta4?
I tested using "-mmacosx-version-min=10.4", so I don't think I was leaning on Leopard-only enhancements.
trunk/1.9 should be ready for beta4.
Yep
Flags: blocking1.9+
Priority: -- → P1
It's OK to set the SDK to 10.5 (--with-macos-sdk) and leave the deployment target at 10.4 (--enable-macos-target).  When doing that, it'll be possible to use 10.5-only APIs, and there won't be any warnings.  If used unchecked, they'll result in run-time (not load-time) crashes when run on Tiger.
If I understand correctly, we do *not* need to upgrade the OS from Tiger (10.4) to Leopard (10.5). Instead we just need to provide 10.5 SDK headers on the 10.4 machines. Adjusting summary to match.

If this is incorrect, please let us know.
Summary: Switch Mac build to Leopard machines, 10.5 SDK → Add 10.5SDK to existing Mac (tiger) build machines
add -fprofile-generate to CFLAGS(In reply to comment #7)
> 
> If this is incorrect, please let us know.

Is that the same gcc version? I only tested with the gcc version available on Leopard.

Who owns this?
John O unless he finds someone else..
Assignee: nobody → joduinn
I do not believe simply updating the SDK is enough to allow us to do PGO (bug  	419344).  We actually need the newer version of gcc, which comes with XCode 3, which requires Leopard AFAIK.  Perhaps mento or another more mac-savvy person can correct me if I'm wrong.
(In reply to comment #11)
> I do not believe simply updating the SDK is enough to allow us to do PGO (bug  
> 419344).  We actually need the newer version of gcc, which comes with XCode 3,
> which requires Leopard AFAIK.  Perhaps mento or another more mac-savvy person
> can correct me if I'm wrong.
Who could confirm this? (Rob Sayre - when you tried it on Mac in comment#3, comment#8 were you running on 10.4 or 10.5?)

Its one thing to add extra headers to a machine. Its another to upgrade the OS, and try to confirm there's no broken downward compatibility.
10.5
ugh, not what I was hoping to hear. Has anyone actually tried this on 10.4 with the 10.5 headers? 

Is there anyone else we should cc on this bug to help confirm/deny needing to upgrade OS?
(In reply to comment #14)
> ugh, not what I was hoping to hear. Has anyone actually tried this on 10.4 with
> the 10.5 headers? 
> 
> Is there anyone else we should cc on this bug to help confirm/deny needing to
> upgrade OS?
> 

The bug was opened with the title "Switch Mac build to Leopard machines, 10.5 SDK". Is 10.4 with Leopard SDK a supported configuration?
Depends on: 414434
From the release notes (http://developer.apple.com/releasenotes/DeveloperTools/RN-Xcode/index.html):

Xcode 3.0 will run on Mac OS X 10.5 (Leopard) on a Macintosh with either a PowerPC or an Intel processor. It will not install or run on earlier versions of Mac OS X. Xcode supports development for Mac OS X 10.3 (Panther) and Universal development for Mac OS X 10.4 (Tiger) and Mac OS X 10.5 (Leopard) using the Mac OS X SDK support.

The immediate previous version is XCode 2.5 (http://developer.apple.com/releasenotes/DeveloperTools/RN-XcodePrevious/index.html#//apple_ref/doc/uid/TP40001436):

Xcode 2.5 will run on Mac OS X 10.4 (Tiger) or Mac OS X 10.5 (Leopard) on a Macintosh with either a PowerPC or an Intel processor. It will not install or run on earlier versions of Mac OS X. Xcode supports development for Mac OS X 10.2 (Jaguar), Mac OS X 10.3 (Panther), or Mac OS X 10.4 (Tiger) (both PowerPC and Intel) using the Mac OS X SDK support.

So if the requirement is the 10.5 SDK for the later version of gcc, then we do require 10.5/Leopard.

(IANA OS X development guru, and this is the result of a quick search)
From conf call with damon, rsayre, ted just now:

1) This *does* require upgrading the OS on trunk/mac nightly and release automation machines from 10.4 to 10.5. It is not enough to just add the 10.5 headers to a 10.4 machine. Comment#7 is confirmed wrong, updating summary to match reality.

2) This requires changing trunk-nightly and trunk-release-automation machines. Nothing to do in unittests, even though that does do recompiles, as we already have machines running 10.4 and machines running 10.5.

3) This requires changing mozconfig to have additional backward-compatible flags, so code compiled on 10.5 will still run on 10.4.
Summary: Add 10.5SDK to existing Mac (tiger) build machines → Update Mac trunk nightly and release build machines from 10.4 to 10.5
I've been doing some work to setup bm-xserve16, it's in progress.
This is a diff against the trunk, destined for the test_pgo branch.
Assignee: joduinn → nrthomas
Status: NEW → ASSIGNED
Attachment #306315 - Flags: review?(rhelmer)
Attachment #306315 - Flags: review?(rhelmer) → review+
Comment on attachment 306315 [details] [diff] [review]
[checked in] Config changes for running two tinderboxes at once

Checking in mozconfig;
/cvsroot/mozilla/tools/tinderbox-configs/firefox/macosx/mozconfig,v  <--  mozconfig
new revision: 1.16.4.1; previous revision: 1.16
done
Checking in tinder-config.pl;
/cvsroot/mozilla/tools/tinderbox-configs/firefox/macosx/tinder-config.pl,v  <--  tinder-config.pl
new revision: 1.39.2.1; previous revision: 1.39
done
Attachment #306315 - Attachment description: Config changes for running two tinderboxes at once → [checked in] Config changes for running two tinderboxes at once
Documentation for the ref platform is going into:
  http://wiki.mozilla.org/ReferencePlatforms/Mac-10.5
including any changes from the 10.4 setup.
Box is up and building, although we're still finalising some software and testing before creating an image.

A clobber build should pop out here in about an hour
 http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/experimental/leopard/
Depend builds are at
 http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/bm-xserve16-trunk/
XCode 3 defaults to DWARF style symbols instead of stabs, so to get a debug build in the usual sense for mozilla, you need to add

 export CFLAGS="-gstabs+ -gfull"
 export CXXFLAGS="-gstabs+ -gfull"

to the mozconfig (thanks Ted). If you do that, then you get an error
 
c++ -o nsDependentString.o -c -I../../../dist/include/system_wrappers -include /builds/tinderbox/Fx-Trunk-test_mem/Darwin_9.2.0_Depend/mozilla/config/gcc_hidden.h -DMOZILLA_INTERNAL_API -DOSTYPE=\"Darwin9.2.0\" -DOSARCH=Darwin -D_IMPL_NS_COM  -I/builds/tinderbox/Fx-Trunk-test_mem/Darwin_9.2.0_Depend/mozilla/xpcom/string/src -I. -I../../../dist/include/xpcom -I../../../dist/include   -I../../../dist/include/string -I../../../dist/include/nspr     -I/usr/X11/include   -fPIC  -I/usr/X11/include -fno-rtti -fno-exceptions -Wall -Wconversion -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-long-long -gstabs+ -gfull -fno-strict-aliasing -fpascal-strings -fno-common -fshort-wchar -pthread -I/Developer/Headers/FlatCarbon -pipe  -DDEBUG -D_DEBUG -DDEBUG_cltbld -DTRACING -g  -I/usr/X11/include -DMOZILLA_CLIENT -include ../../../mozilla-config.h -Wp,-MD,.deps/nsDependentString.pp /builds/tinderbox/Fx-Trunk-test_mem/Darwin_9.2.0_Depend/mozilla/xpcom/string/src/nsDependentString.cpp
nsDependentSubstring.cpp
In file included from /builds/tinderbox/Fx-Trunk-test_mem/Darwin_9.2.0_Depend/mozilla/xpcom/string/src/nsDependentString.cpp:40:
../../../dist/include/string/nsDependentString.h:1: internal compiler error: Bus error

Same result for just -gstabs+, and disabling -j4. Compiles fine when no using CFLAGS or CXXFLAGS. This is a plain XCode3 + CHUD 4.5 install as doc'd per comment #21, so no gcc-select trickery applied.

With DWARF symbols, dist/bin is 70-80MB instead of 600+ on the 10.4 tinderbox. Shark builds give 22MB dmg's instead of 200MB. 

Need input on how to proceed.

If you have -j4, then you get similar errors. One each each for 
* nsDependentSubstring.cpp
* nsDependentString.cpp
* nsPrintfCString.cpp
* nsPromiseFlatString.cpp
each barfing when getting the associated .h file.

Also says
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See <URL:http://developer.apple.com/bugreporter> for instructions.
  {standard input}:37:FATAL:.abort  detected.  Assembly stopping.
for each.
Yeah, I hit that first error myself, I thought maybe it was a problem with my machine. I tried to get a reproducible testcase with -save-temps, but that made the problem go away.
Google finds other people having problems after adding the gstabs option, but no solutions. What's the fallout if we have to switch from stabs to DWARF ?
Shebs is already working on code to make Breakpad handle DWARF, so we may be able to just switch over if need be.
Adding -save-temps to the flags works here too, with warnings about -pipe being ignored. dist/bin is then 489MB, in a 7.8GB objdir; for comparison, we get 607MB and 3.1GB on the 10.4 box.

Perhaps we can add a make rule to remove the .ii files immediately after their creation.
Depends on: 421534
Depends on: 421923
This box is up and running [1] and dogfooding a nightly-ish build on 10.4 is going fine. Config on the (misnamed) test_pgo branch of 
  mozilla/tools/tinderbox-configs/firefox/macosx
and builds are at [2].

There's the stabs issue blocking debug builds, and some concern about sorting out the unittest failures on Leopard (bug 411999 and dependents), but we should move forward on getting bm-xserve16 cloned for the release automation and a moz2 machine. That's bug 421923.

[1] http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaExperimental
[2] 
 http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/experimental/leopard/latest-trunk/ 
I'm not sure if bug 411999 should be blocking or not, but I'd like to get some more eyes on those mochitest failures. Will have those filed RSN.
Depends on: 411999
I realised I hadn't tried a -gstabs+ -gfull -save-temps build with a nightly config. The major differences here are no debug flags, optimisation is enabled, it's a universal build, and I had -j4 disabled on the Debug.

Blew up pretty rapidly at mozilla/xpcom/typelib/xpt/src/xpt_arena.c [1]:
gcc-4.0 -arch ppc -o xpt_arena.o -c -I../../../../dist/include/system_wrappers -include /builds/tinderbox/Fx-Trunk-PGO/Darwin_9.2.0_Depend/mozilla/config/gcc_hidden.h -DOSTYPE=\"Darwin\" -DOSARCH=Darwin -DEXPORT_XPT_API  -I/builds/tinderbox/Fx-Trunk-PGO/Darwin_9.2.0_Depend/mozilla/xpcom/typelib/xpt/src -I.  -I../../../../dist/include   -I../../../../dist/include/xpcom -I../../../../dist/include/nspr     -I../../../../dist/sdk/include -I/usr/X11/include   -fPIC -I/usr/X11/include -Wall -W -Wno-unused -Wpointer-arith -Wcast-align -Wno-long-long -gstabs+ -gfull -save-temps -isysroot /Developer/SDKs/MacOSX10.4u.sdk -fno-strict-aliasing -fpascal-strings -fno-common -pthread -I/Developer/SDKs/MacOSX10.4u.sdk/Developer/Headers/FlatCarbon  -DNDEBUG -DTRIMMED -O2  -I/usr/X11/include -include ../../../../mozilla-config.h -DMOZILLA_CLIENT -Wp,-MD,.deps/xpt_arena.pp /builds/tinderbox/Fx-Trunk-PGO/Darwin_9.2.0_Depend/mozilla/xpcom/typelib/xpt/src/xpt_arena.c
line-map.c: file "/usr/include/sys/wait.h" left but not entered
In file included from ../../../../dist/include/nspr/obsolete/protypes.h:87,
                 from ../../../../dist/include/nspr/prtypes.h:561,
                 from ../../../../dist/include/system_wrappers/prtypes.h:4,
                 from ../../../../dist/include/xpcom/xpt_arena.h:46,
                 from /builds/tinderbox/Fx-Trunk-PGO/Darwin_9.2.0_Depend/mozilla/xpcom/typelib/xpt/src/xpt_arena.c:47:
/usr/include/sys/wait.h:258: error: syntax error before ‘id_t’
line-map.c: file "/usr/include/stdlib.h" left but not entered
line-map.c: file "/usr/include/stdlib.h" left but not entered
line-map.c: file "../../../../dist/include/xpcom/xpt_arena.h" left but not entered
line-map.c: file "/builds/tinderbox/Fx-Trunk-PGO/Darwin_9.2.0_Depend/mozilla/xpcom/typelib/xpt/src/xpt_arena.c" left but not entered
line-map.c: file "/builds/tinderbox/Fx-Trunk-PGO/Darwin_9.2.0_Depend/mozilla/xpcom/typelib/xpt/src/xpt_arena.c" left but not entered
line-map.c: file "/builds/tinderbox/Fx-Trunk-PGO/Darwin_9.2.0_Depend/mozilla/xpcom/typelib/xpt/src/xpt_arena.c" left but not entered
make[9]: *** [host_xpt_arena.o] Error 1

After some experimentation, -j2 or higher was found to cause errors (the above is -j4). Adding a .NOTPARALLEL. in the Makefile.in allows the build to progress slightly further, failing on mozilla/xpcom/typelib/xpt/tools/xpt_link.c [2]. This is feeling a bit whack-a-mole, is the combination of -save-temps and -jN a good idea for N>1 ?

The box is running -j1 now, and I'll report cycle times when they're in.
Forgot to say above that the aim was to verify that valid crash reports could be generated, which seems like a blocker to deploying for nightlies/releases. 

Supposing the -j1 build can do that, then we'll have to look at the build time. Including tests, -j4 takes 1hr for a clobber, and 27 mins for depend build, so a factor of 2 might be OK. If -j1 is much slower than this, and we don't reach a rapid conclusion for bug 421534 (DWARF symbols), this bug is at risk for b5.
Machine		Clobber		Depend 
10.4		1 hr		27 min
10.5		1hr 45min	32 min

including the test time which will go away with bug 413695.
http://crash-stats.mozilla.com/report/index/e8d9232b-f40c-11dc-877f-001a4bd43ed6
using Crash Me Now with a built-on-leopard, running on 10.4.11.

For comparison, this is from a recent built-on-Tiger nightly:
http://crash-stats.mozilla.com/report/index/295b8175-ef6a-11dc-9b4c-001a4bd43ef6

There's no difference in the Mozilla parts of the stack, but we can resolve 10.4.11 symbols since Friday. 

So, looks like we can get valid stabs symbols out of it, at the cost of the increased build time from -save-temps.
Flags: tracking1.9+ → blocking1.9+
Nick said he's trying the .NOTPARALLEL bits from comment 32. I think if that works right we should get a patch in to do that conditionally, and then switch to this box for the nightly builds.
This fixes the errors in comment 32 and comment 37.

Machine         		Clobber         Depend 
10.4 (-j4)			1 hr		27 min
10.5 (-j1, -save-temps)		1hr 45min	32 min
10.5 (-j4, -save-temps)		1hr 15min       28 min

Disabling most of the tinderbox tests (bug 413695) saved 18 minutes on a 10.4 depend run.
Attachment #310002 - Flags: review?(ted.mielczarek)
Comment on attachment 310002 [details] [diff] [review]
Workaround for -j4 bustage with -save-temps

Looks reasonable to me, but needs an XXX comment pointing back to this bug, since ideally we'll remove this later. I guess we could go even further and filter in MAKEFLAGS to see if -jsomething is specified, but it might not make much difference.
Attachment #310002 - Flags: review?(ted.mielczarek) → review+
Checking in xpcom/typelib/xpt/src/Makefile.in;
/cvsroot/mozilla/xpcom/typelib/xpt/src/Makefile.in,v  <--  Makefile.in
new revision: 1.34; previous revision: 1.33
done
Checking in xpcom/typelib/xpt/tools/Makefile.in;
/cvsroot/mozilla/xpcom/typelib/xpt/tools/Makefile.in,v  <--  Makefile.in
new revision: 1.32; previous revision: 1.31
done
Checking in toolkit/crashreporter/google-breakpad/src/common/Makefile.in;
/cvsroot/mozilla/toolkit/crashreporter/google-breakpad/src/common/Makefile.in,v  <--  Makefile.in
new revision: 1.6; previous revision: 1.5
done
(In reply to comment #39)
> (From update of attachment 310002 [details] [diff] [review])
> Looks reasonable to me, but needs an XXX comment pointing back to this bug,
> since ideally we'll remove this later.

Better form in our experience (tip from Nat Friedman years ago): FIXME: nnnnnn where nnnnnn is not this bug's number, but a fresh followup bug. You'll get more action by front-loading for fixage. ;-)

/be
We realized that this box was building with the 10.4u SDK (since the universal mozconfig specifies that). Switching to the 10.5 SDK revealed a number of problems:
1) sqlite winds up linked with some $UNIX2003 variant symbols which don't work on 10.4. I think there's a compiler or linker bug involved here, as we explicitly specify 10.4 as our MacOSX target. We were able to work around this by adding this to the sqlite makefile:
ifeq ($(OS_ARCH),Darwin)
DEFINES += -D_NONSTD_SOURCE -D__DARWIN__
endif

2) nsMacShellService.cpp uses an undocumented API that appears to have changed somehow in the 10.5 SDK, as it causes a dynamic linker error when running on 10.4:
dyld: lazy symbol binding failed: Symbol not found: __LSCopyDefaultSchemeHandlerURL
Referenced from: /Users/nrthomas/Desktop/Minefield.app/Contents/MacOS/components/libbrowsercomps.dylib
Expected in: /System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices

Web searches on this have proven fruitless, so we're stuck on that one.
Filed bug 423672 on that second issue (thanks Mark!)
I think it is time to call it quits on this and pick it back up for .next...
Yeah, there's just too much to account for right now. Sucks.
Flags: wanted-next+
Flags: blocking1.9-
Flags: blocking1.9+
Ted: do you have a 10.5 mac that you can continue experimenting on or do you need us to keep aside one of these xserves? They were all earmarked for projects, but if you *need* one, we'll try to hold one aside.

I'd like to re-image the new xserves back to running 10.4, so we get them into production asap... 
My mac is on 10.5, so you do whatever you need with those xserves.
Assignee: nrthomas → nobody
Status: ASSIGNED → NEW
Component: Build & Release → Release Engineering: Projects
Priority: P1 → P3
QA Contact: build → release
Summary: Update Mac trunk nightly and release build machines from 10.4 to 10.5 → Tracking bug for migrating build machines from 10.4 to 10.5
Depends on: 464093
Attachment #348966 - Flags: review?(ted.mielczarek) → review+
don't know how to request the required appoval (1.9.1b2), attachment details doesn't offer it.
Keywords: checkin-needed
ause@sun.com, can I suggest you to file a name or alias in addition to your email address ?
Attachment #310015 - Attachment is obsolete: true
Attachment #310015 - Attachment is obsolete: false
Attachment #310002 - Attachment is obsolete: true
Attachment #306315 - Attachment description: [checked in] Config changes for running two tinderboxes at once → Config changes for running two tinderboxes at once [Checkin: Comment 20]
Comment on attachment 306315 [details] [diff] [review]
[checked in] Config changes for running two tinderboxes at once

Serge, please mess around with checkin comments in RelEng components. We have a specific system which we stick to.
Attachment #306315 - Attachment description: Config changes for running two tinderboxes at once [Checkin: Comment 20] → [checked in] Config changes for running two tinderboxes at once
Argh, that should read "Please don't mess with..."
Attachment #310015 - Attachment description: [as checked in] Workaround for -j4 bustage with -save-temps → Workaround for -j4 bustage with -save-temps [Checkin: Comment 40]
Attachment #310015 - Attachment description: Workaround for -j4 bustage with -save-temps [Checkin: Comment 40] → [checked in] Workaround for -j4 bustage with -save-temps
This is FIXED, given that:
* All build machines for 1.9.1, 1.9.2 and all project branches are running 10.5
* mozilla-central and all project branches that track it are building with the 10.5 SDK.

The dependent bug still seems to be valid, but it's not something that has blocked us on migrating the build machines.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
The dep bug assumed we'd want to continue running on 10.4, which we've decided we don't need on trunk.
Moving closed Future bugs into Release Engineering in preparation for removing the Future component.
Component: Release Engineering: Future → Release Engineering
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.