Closed Bug 571367 Opened 14 years ago Closed 14 years ago

turn production x86_64 builds into i386/x86_64 universal binaries

Categories

(Release Engineering :: General, defect, P2)

All
macOS
defect

Tracking

(blocking2.0 beta7+)

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta7+

People

(Reporter: jaas, Assigned: nthomas)

References

Details

(Whiteboard: [10.6][automation])

Attachments

(8 files, 7 obsolete files)

15.38 KB, patch
catlee
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
15.42 KB, patch
catlee
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
4.60 KB, patch
catlee
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
2.45 KB, patch
ted
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
9.84 KB, patch
catlee
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
5.17 KB, text/plain
Details
6.91 KB, patch
nthomas
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
799 bytes, patch
bhearsum
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
We should turn production x86_64 builds into i386/x86_64 universal binaries. If we ship an x86_64 build we'll want this, and because it doesn't interfere with our existing PPC/i386 builds we can start on this immediately.

We should add plist data to make sure i386 is always preferred at this point.
I've heard that this depends on at least a couple of things in the build system:
* Universal build support for fat binaries other than ppc+i386
* Changing the OS in the update pings (would currently be set to Darwin_Universal-gcc3, the same as ppc+i386)
I believe we have universal build support for fat binaries other than ppc+i386.
It's close, but it doesn't quite work yet. bug 411588 covers the remaining build system work. We'll also need to fix bug 552924 before turning these on.
Depends on: 411588, 552924
Priority: -- → P4
We'll need to fix the 64-bit issues with Breakpad as well, since a 32-bit build of Breakpad on a 64-bit OS X will still try to build dump_syms as a host (64-bit) binary, and that currently fails.
Depends on: 567424
Whiteboard: [10.6][automation]
(In reply to comment #0)
Could you fill in the motivation for this change ? Is there a migration strategy to drop ppc ?
Our strategy is to move forward with the i386/x86_64 universal binary without pulling the plug on our existing setup quite yet. At some point in the near future we'll probably end the PPC/i386 builds and consider the i386/x86_64 builds to be our official builds for Firefox 4. We need to start work on this strategy now in order to avoid delays later.
Depends on: 576053
No longer depends on: 580375
Depends on: 559142
Depends on: 570286
blocking2.0: --- → beta6+
Requirements from Josh: 

1. Switch our x86_64-only build on Mac OS X 10.6 to an i386/x86_64 universal binary. This will run in x86_64 mode by default on Mac OS X 10.6.
2. Stop producing builds on Mac OS X 10.5.
3. Test the i386/x86_64 universal binary on Mac OS X 10.5 test machines where it will run in i386 mode.

Bug 559142 (for plugin support) isn't far off, and we should not block on bug 583671 (automated update support). I don't know how important bug 570286 is, Engineering should find an owner if it truly blocks.
Assignee: nobody → nrthomas
Priority: P4 → P2
Josh, I've run a i386+x86_64 universal build in our staging system, using this mozconfig
 http://hg.mozilla.org/users/nthomas_mozilla.com/buildbot-configs/file/7dbceccea10d/mozilla2/macosx64/mozilla-central/nightly/mozconfig
It adds only 
 . $topsrcdir/build/macosx/universal/mozconfig-64
to the existing mac64 build. The build succeeded, I was able to update a mac64-only build to that without problems (using a partial update), and it seems to work OK. It started in 64bit mode on my 10.6 machine.

But there were a few issues in the build process:
1, 'make buildsymbols' didn't extract x64_64 symbols because we hardcode 'ppc i386' at 
 http://hg.mozilla.org/mozilla-central/file/default/Makefile.in#l150
so we won't get crash stacks/stats for 64bit crashes

2, 'make package' produced files named firefox-4.0b6pre.en-US.mac.dmg/etc instead of mac64. This is a regression from the mac64 builds we have right now and will break nightly update generation. The call to 'make package' was made in objdir/i386, if that matters.

3, we didn't get chk files for libfreebl3 libnssdmb3, or libsoftokn3 because the build system expects shlibsign for i386
 run-mozilla.sh: Cannot execute ../../../i386/dist/bin/shlibsign.
This means FIPS mode doesn't work for NSS. There is a shlibsign in objdir/x84_64/dist/bin/; calling make package in objdir/x84_64 doesn't help and fails in other ways.

4, there are some errors packaging tests, like
 tar: crashreporter_test.xpt: Cannot stat: No such file or directory
This might be 'normal', haven't checked.

I think the first two are blockers to making the new universal binaries. The build log is at
  http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTest/1284355946.1284365579.27796.gz
The binary is copied to
  http://people.mozilla.com/~nthomas/misc/firefox-4.0b6pre.en-US.mac.dmg

Still need to look at getting testings on 10.5 and 10.6 for these new builds. Will consult the talos gurus on this tomorrow. In principle it's possible because we do the same thing for win7 and xp, just need to figure out how to configure it to do so. Probably still a day away on our side.
Attached patch Symbol and packaging fixes (obsolete) — Splinter Review
Updates the hardcoding of the architectures for symbols (since MOZ_BUILD_PROJECTS isn't available to the topmost Makefile), and hardcodes mac64 as the platform name in packaging for universal builds to keep buildbot happy.

This would land just after switching the mozconfig to the universal build.
Attachment #475054 - Flags: review?(ted.mielczarek)
Adjusting the configs for the tests & talos is proving to be difficult.
Comment on attachment 475054 [details] [diff] [review]
Symbol and packaging fixes

I'm a little uncomfortable with these changes since we're:
a) swapping one hard-coded set of platforms for another (not terrible, certainly)
b) Hard-coding all universal binary packages as mac64, which doesn't seem right.
No longer depends on: 570286
Status update
* Spent most of today on release work for 4.0b6 and 3.6.10
* Ted suggested that removing the ppc+i386 mozconfig would make the build config changes more palatable. I'll update the patch to do that.
* catlee wrote a patch that does most of what we need for talos/tests. Need to figure out how to not change the scheduler branch on 1.9.1 & 1.9.2 (getting mozilla-1.9.1-macosx64-talos instead of mozilla-1.9.1-macosx-talos), and confirm it works in practice
Attachment #475054 - Flags: review?(ted.mielczarek)
Attached patch test master changes (obsolete) — Splinter Review
Untested except for using dump_master.py and writing the config out.

Notes:
* moves leopard to macosx64 for Talos and creates a fictional leopard0 platform, otherwise we get incorrect name in scheduler branches when it matches on macosx64 first
* duplicates platform-dependent symbol download into Talos config definition, to not use unit prefs in talos factories
* create OLD_BRANCH_{OLD_PLATFORMS,NO_WIN,NO_MAC} for 1.9.1, 1.9.2 and addontester, dropping the Linux64 and Snow Leopard builds which have been idle for ages
* remove a 'copy block' which entangles talos and unit prefs, which made the changes here difficult.
Attached patch test master changes (obsolete) — Splinter Review
Overriding the platform is unnecessary.
Attachment #475842 - Attachment is obsolete: true
Nick, got a SWAG ETA for us?
Attached patch test master changes, v3 (obsolete) — Splinter Review
Attached patch buildbotcustom changes, v2 (obsolete) — Splinter Review
Attachment #475843 - Attachment is obsolete: true
Attachment #475844 - Attachment is obsolete: true
(In reply to comment #16)
> Nick, got a SWAG ETA for us?

I'm aiming for the Sunday tree closure to deploy this. The road there is have  the patches tested in our staging by the end of my Friday (today), reviewed by Toronto folks while I sleep & travel to MV on Saturday, deploy Sunday.
Josh, what are the requirements for debug ? Right now we have:
* i386 debug build on 10.5, unit tests run on 10.5
* x86_64 debug build on 10.6, unit tests run on 10.6

The i386 builds and tests can be removed ?
No, we need to test both i386 and x86_64 build still...
Josh says that the software installed on the 10.5 builder is not sufficient to build m-c these days. See also bug 584919. 

So do we need to
* build debug i386 on 10.6, test on 10.5
* build debug x86_64 on 10.6, test on 10.6
or
* build debug i386+x86_64 universal, test on 10.5 & 10.6 (arch selection via plist)
I think building debug i386 on 10.6 is fine. I don't have a strong preference for whether we test it on 10.5 or 10.6. I really don't think we need to test it on both.

It seems to me that it would be faster to build separate i386 and x86_64 builds, rather than a single universal build.
OK, I think that is too ambitious to shoot for that in the first rollout and should be addressed by a followup. I can leave the existing i386 debug & tests in the meantime, if you wish.
Yes, we're not removing anything we already have in that case, at least.
The intent here is to disable all 10.5 builds except for debug (excluding 1.9.1 and 1.9.2). In the meantime we will have

* i386+x86_64 opt universal, tested on 10.5 & 10.6 minis for talos and opt unit
* x64_64 debug on 10.6, unit tested on 10.6 minis
* i386 debug on 10.6, unit tested on 10.6 minis
* shark moves from 10.5 to 10.6, and is known to be broken (missing chud, need to set mozconfig to build 32bit)
* xulrunner is known broken, compiles both arches but fails on unify
* the update dir for snippets isn't changing here; handling that with snippets for now
Attachment #476213 - Flags: review?(catlee)
(In reply to comment #26)
> * the update dir for snippets isn't changing here; handling that with snippets
> for now

That's '... handling that with symlinks for now.'
* Creates a leopard-old slave_platfrom which is just leopard by another name, for legacy branches
* Adds leopard to the slave_platform list for macosx64 builds (which is the new universal)
* Makes symbol download configurable by slave_platform for talos (unit already has this)
* removes macosx as default platform, except for 1.9.1 & 1.9.2
* removes linux64 and mac64 tests from 1.9.1, 1.9.2, and addontester. The linux ones are perma-red on failed symbol download and hidden; the mac64 ones are idle. We aren't sendchanging addon for linux64 or macosx64.
* PLATFORM_UNITTEST_VARS is a bit funky to meet the requirements for matching builds with testers (see comment 26).
Attachment #476082 - Attachment is obsolete: true
Attachment #476216 - Flags: review?(catlee)
Supports building shark on macosx64 as well as macosx, using a mozconfig in sensible locations. Hacks off the '-old' suffix when creating the talos factory to avoid changing the mac special casing inside there.
Attachment #476085 - Attachment is obsolete: true
Attachment #476217 - Flags: review?(catlee)
Hardcodes a i386 + x64_64 universal build instead of ppc + i386. Removes mozconfig for the latter. Uses 'mac64' for any package that is solely or partly 64bit, otherwise 'mac'.
Attachment #476219 - Flags: review?(ted.mielczarek)
* Sprinkles
    . $topsrcdir/build/macosx/universal/mozconfig
  around the macosx64 nightly & xulrunner mozconfigs to make them universal
* sync the release config for mozilla-2.0 while we're here
* break the symlinks for electrolysis and tracemonkey, so we can set a different update channel in the nightly config (otherwise symlink back to m-c). This got missed when I did this the first time
* these shark configs are known to fail, need more testing with fix for 'missing required architecture x86_64 in file' from https://developer.mozilla.org/en/Profiling_JavaScript_with_Shark
Attachment #475054 - Attachment is obsolete: true
Attachment #476223 - Flags: review?(catlee)
And the status with all of the above is
* universal build is green, packages have mac64 in the name and generate symbols for both architectures
* debug builds are still green
* schedulers verified by inspection of dump_master.py to be connected to the right builders, but didn't have time for much testing on a staging test master. Would not have caught any failures or new oranges, since the environment was chaotic with other production and staging sendchanges coming in
Comment on attachment 476219 [details] [diff] [review]
[mozilla-central] (partly) drop ppc support from build system

Sensible, thanks. This does make me feel better about changing the hardcoded values for some reason. :)
Attachment #476219 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 476216 [details] [diff] [review]
[buildbot-configs] test master changes

Note to self (and other readers): we don't do packaged unittests on macosx on 1.9.1, and 1.9.2, so removing opt_unittest_suites from 'macosx'/'leopard' is ok.
Comment on attachment 476217 [details] [diff] [review]
[buildbotcustom] accomodations

A comment about why you're doing slave_platform.split('-')[0] would not go unappreciated.
Attachment #476217 - Flags: review?(catlee) → review+
Attachment #476223 - Flags: review?(catlee) → review+
Attachment #476213 - Flags: review?(catlee) → review+
Comment on attachment 476213 [details] [diff] [review]
[buildbot configs] build master changes

I needed to patch preproduction_config.py to remove the references to 'macosx'
Comment on attachment 476213 [details] [diff] [review]
[buildbot configs] build master changes

How does this give us 32-bit builds on 10.6 for testing?
(In reply to comment #37)
> How does this give us 32-bit builds on 10.6 for testing?

Sorry, typo in comment #26. Should read
* i386 debug on 10.6, unit tested on 10.5 minis

The existing debug builds on 10.5, and their testing, should be unchanged from current production.
And that's wrong too, how about
* i386 debug on 10.5, unit tested on 10.5 minis
Attached file sendchange testing
This is from a local master running all the buildbot patches, doing the sendchanges listed to make sure the right builds are scheduled. Looks like comment #26 with the correction in comment #39.
Whiteboard: [10.6][automation] → [10.6][automation][ETA: 9/23 unless oranges are out of control]
Depends on: 598507
Depends on: 561350
I'm not going to block this bug on fixing bug 561350, which has existed for some time. I will promise to look at it after we get the universals swapped over.

Bug 583671 is about update support, primarily for releases. It doesn't block changing the nightly builds, or even a release, when we have a symlink workaround.
No longer depends on: 561350, 583671
Attachment #476216 - Flags: review?(catlee) → review+
Blocks: 599524
I've verified that xpcshell changes work OK by modifying a test package but the Makefile.in isn't checked. I hoping the ordering is not important there.
Attachment #478441 - Flags: review?(ted.mielczarek)
Whiteboard: [10.6][automation][ETA: 9/23 unless oranges are out of control] → [10.6][automation][ETA: 9/26]
Comment on attachment 478441 [details] [diff] [review]
Disable tests that need extra harness help

>diff --git a/toolkit/mozapps/plugins/tests/Makefile.in b/toolkit/mozapps/plugins/tests/Makefile.in
>--- a/toolkit/mozapps/plugins/tests/Makefile.in
>+++ b/toolkit/mozapps/plugins/tests/Makefile.in
>@@ -44,22 +44,26 @@ include $(DEPTH)/config/autoconf.mk
> 
> MODULE = test_plugins
> relativesrcdir  = toolkit/mozapps/plugins/tests
> TESTROOT = $(DEPTH)/_tests/testing/mochitest/browser/$(relativesrcdir)
> USE_STATIC_LIBS = 1
> 
> ifneq (mobile,$(MOZ_BUILD_APP))
> _BROWSER_FILES = \
>-  browser_bug435788.js \
>   pfs_bug435788_1.rdf \
>   pfs_bug435788_2.rdf \
>   GoodExtension.xpi \
>   BadExtension.xpi \
>   $(NULL)
>+ifneq ($(OS_ARCH),Darwin)
>+_BROWSER_FILES += \
>+  browser_bug435788.js \
>+endif

You either need a $(NULL) on a line after the line continuation here, or drop the line continuation at the end of the line. (And no, the order doesn't matter.)
Attachment #478441 - Flags: review?(ted.mielczarek) → review+
Updated patch, carrying ted's review.
Attachment #478441 - Attachment is obsolete: true
Attachment #478448 - Flags: review+
Comment on attachment 476213 [details] [diff] [review]
[buildbot configs] build master changes

http://hg.mozilla.org/build/buildbot-configs/c885072aeb84
Attachment #476213 - Flags: checked-in+
Comment on attachment 476223 [details] [diff] [review]
[buildbot-configs] mozconfig changes

http://hg.mozilla.org/build/buildbot-configs/66753e2e8e5d

Had to recreate the tracemonkey and electrolysis parts as hg could not import. Got same hg diff before checkin.
Attachment #476223 - Flags: checked-in+
Comment on attachment 476219 [details] [diff] [review]
[mozilla-central] (partly) drop ppc support from build system

http://hg.mozilla.org/mozilla-central/3cd770c5b5db
Attachment #476219 - Flags: checked-in+
Comment on attachment 478448 [details] [diff] [review]
Disable tests that need extra harness help, v2

http://hg.mozilla.org/mozilla-central/6eb42326798b
Attachment #478448 - Flags: checked-in+
All our leopard minis are on test-master02, which needs to enable the macosx64 platform so that it creates the builders for opt tests and talos against the new universal binary.
Attachment #478701 - Flags: review?(bhearsum)
Attachment #478701 - Flags: review?(bhearsum) → review+
Comment on attachment 478701 [details] [diff] [review]
[buildbot-configs] do tests on leopard

http://hg.mozilla.org/build/buildbot-configs/rev/5266cd2e3e87

talos-master02 reconfig'd.
Attachment #478701 - Flags: checked-in+
Depends on: 599795
Depends on: 599796
Done.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 599797
Blocks: 599862
I just pulled my nightly update on 10.5.8 (32-bit) and now Firefox is busted. Likely to be related to this work?

If I pull a fresh build from ftp, it launches just fine, but the version updated through the updater (which was a 30MB download, so I'm thinking not a partial) dies immediately on startup with a dyld error (before our crash reporter, so I just have the mac report). dbolter confirms the same behaviour for him.

Speculatively re-opening this bug. If the problem is elsewhere, we can file a different (blocking b7) bug, but we need to answer this before we push b7 updates.

Process:         firefox-bin [28407]
Path:            /Applications/Minefield.app/Contents/MacOS/firefox-bin
Identifier:      org.mozilla.minefield
Version:         ??? (???)
Code Type:       X86-64 (Native)
Parent Process:  launchd [151]

Interval Since Last Report:          6950433 sec
Crashes Since Last Report:           61
Per-App Interval Since Last Report:  0 sec
Per-App Crashes Since Last Report:   7

Date/Time:       2010-09-27 10:55:22.318 -0400
OS Version:      Mac OS X 10.5.8 (9L31a)
Report Version:  6
Anonymous UUID:  5A6858D5-E0E2-40F5-B43A-DF075584EEDD

Exception Type:  EXC_BREAKPOINT (SIGTRAP)
Exception Codes: 0x0000000000000002, 0x0000000000000000
Crashed Thread:  0

Dyld Error Message:
  unknown required load command 0x80000022
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Sounds like the updater is launching the wrong architecture. :-/
Filed bug 599871 to turn off nightly updates on mac until we know what's happening, here.
(In reply to comment #55)
> Sounds like the updater is launching the wrong architecture. :-/

That would imply that we never tested the updates on 32-bit OSes ... did we perform that test before making this change?
For what it's worth, 64-bit -> i386+64-bit worked fine on my 10.6 machine.
Who needs what to dig into and test this? We're now in a situation where our nightly users on OSX are no longer receiving updates. It's the right short term tradeoff, but not long-term tenable. As Johnath said, if we need to file a different bug for this, just say the word and we'll happily do so.
Whiteboard: [10.6][automation][ETA: 9/26] → [10.6][automation]
I believe Josh is looking into it. I can help him if he needs information from the AUS side or anything like that.
(In reply to comment #57)
> (In reply to comment #55)
> > Sounds like the updater is launching the wrong architecture. :-/
> 
> That would imply that we never tested the updates on 32-bit OSes ... did we
> perform that test before making this change?
We do have tests that run the updater directly via xpcshell tests but they were disabled on Mac OS X in bug 599477 and these were working with the i386 / ppc builds. For testing the actual applying of an update mozmill is used since none of our test harnesses besides mozmill support restarting and it isn't possible to update a build that has just been created to a mar that requires a future build to create the mar.
(In reply to comment #57)
> That would imply that we never tested the updates on 32-bit OSes ... did we
> perform that test before making this change?

No, it was a gap in my testing. I did check on 10.6 that the existing universal and 64bit builds worked properly when updated to the new universal. To the level of diffing the updated .app with the downloaded dmg.

To make sure I understand this right, if I'm on (old enough) 10.5 I get an update to the new universal, the updater runs OK but fails to start Firefox afterwards. Does Firefox start OK from the dock/applications list ?
(In reply to comment #62)
> To make sure I understand this right, if I'm on (old enough) 10.5 I get an
> update to the new universal, the updater runs OK but fails to start Firefox
> afterwards. Does Firefox start OK from the dock/applications list ?

If you're on 10.5 and you pull the update, you are boned thereafter. Restart fails, dock launching fails, ditto command line launching. Josh is working on a fix, and a pave-over install would work, and there's a quick-but-non-obvious workaround -- nevertheless, yeah, post-update: boned.
I will be looking into this today. My plan is to:

1. Make the updater "touch" the top-level application bundle directory so that the OS knows to update its launch cache. This will fix normal launching to use the correct architecture.

2. Make our app-initiated restart code launch the bundle rather than the firefox-bin binary so that Info.plist and architecture prefs are taken into account.

I'll file separate bugs on these things.
Depends on: 600098
re-closing as bug 600098 blocks beta7
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Depends on: 600362
Depends on: 600433
Depends on: 600434
Depends on: 600435
Depends on: 602049
No longer depends on: 602049
Depends on: 624860
No longer depends on: 624860
Depends on: 634908
No longer depends on: 600434
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: