Closed Bug 733905 Opened 12 years ago Closed 12 years ago

switch OS X builds to clang

Categories

(Firefox Build System :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(firefox16 wontfix, firefox17 fixed)

RESOLVED FIXED
mozilla17
Tracking Status
firefox16 --- wontfix
firefox17 --- fixed

People

(Reporter: espindola, Assigned: espindola)

References

Details

Attachments

(1 file, 3 obsolete files)

      No description provided.
Attached patch switch OS X builds to clang (obsolete) — Splinter Review
I am still debugging a failure on linux 64, but I know it is related to the old linker we use on centos 5, so it should not block a clang move. A try push switching us to clang is at

https://tbpl.mozilla.org/?tree=Try&rev=af60b4debfaf

and a dummy push for comparison is at

https://tbpl.mozilla.org/?tree=Try&rev=b361228ce8f8

I am r? jp so he can proxy this as appropriate.
Assignee: nobody → respindola
Status: NEW → ASSIGNED
Attachment #603901 - Flags: review?(jpr)
Comment on attachment 603901 [details] [diff] [review]
switch OS X builds to clang

Review of attachment 603901 [details] [diff] [review]:
-----------------------------------------------------------------

\o/
Attachment #603901 - Flags: review?(jpr) → review+
\o/
Should we wait one week in order to go through a full 6 week release cycle before landing this?
(In reply to JP Rosevear [:jpr] from comment #6)
> Should we wait one week in order to go through a full 6 week release cycle
> before landing this?

There were some network problems when running talos, so I pushed again 

https://tbpl.mozilla.org/?tree=Try&rev=a28ccea64488 (clang)
https://tbpl.mozilla.org/?tree=Try&rev=1f8988ce003c (gcc))

What is the best way to compare the talos results of two runs?

I think we should push as soon as possible. Waiting would force us to keep gcc-4.2 for another full release cycle, and since the switch to msvc 2010 it is the oldest compiler we have.

While it is possible that clang has a bug that we don't see on try, we know that gcc 4.2 has bugs that cause problems for using new APIs.  See bugs 682445, 681694 and 678607.

Also, if we do find a problem, reverting to gcc-4.2 is really easy (this patch changes 2 lines).

Last but not least, maintaining firefox building with an unsupported compiler is not trivial, as it is easy for people to check in non standard conformant code: http://clang.llvm.org/compatibility.html#cxx
This is a better Talos compare: <http://perf.snarkfest.net/compare-talos/index.html?oldRevs=2f6368ca605e,3dcb40ebd487,8ef88a69f861,fca77ac7cdf9,78e56fd22f2a,75fcd465d506&newRev=a28ccea64488&tests=a11y,tdhtml,tdhtml_nochrome,a11y_paint,tdhtml_paint,tdhtml_nochrome_paint,tp4,tp4_memset,tp4_pbytes,tp4_rss,tp4_shutdown,tp4_xres,tp4m,tp4m_content_rss,tp4m_main_rss,tp4m_main_rss_nochrome,tp4m_nochrome,tp4m_shutdown,tp4m_shutdown_nochrome,tp5r,tp5r_memset,tp5r_pbytes,tp5r_rss,tp5r_shutdown,tp5r_xres,tp5r_paint,tp5r_memset_paint,tp5r_pbytes_paint,tp5r_%cpu_paint,tp5r_modlistbytes_paint,tp5r_rss_paint,tp5r_shutdown_paint,tp5r_xres_paint,dromaeo_basics,dromaeo_css,dromaeo_dom,dromaeo_jslib,dromaeo_sunspider,dromaeo_v8,tsspider,tsspider_nochrome,tsspider_paint,tsspider_nochrome_paint,v8,tgfx,tgfx_nochrome,tgfx_paint,tgfx_nochrome_paint,tscroll,tsvg,tsvg_opacity,tzoom,ts,ts_paint,ts_cold,ts_cold_generated_max,ts_cold_generated_max_shutdown,ts_cold_generated_med,ts_cold_generated_med_shutdown,ts_cold_generated_min,ts_cold_generated_min_shutdown,ts_cold_shutdown,ts_places_generated_max,ts_places_generated_max_shutdown,ts_places_generated_med,ts_places_generated_med_shutdown,ts_places_generated_min,ts_places_generated_min_shutdown,ts_shutdown,twinopen,tpaint&submit=true>

Here's what's happening.  dromaeo_css is regressed by about 4%.  dromaeo_dom by about 3%.  dromaeo_jslib by about 4%.  dromaeo_v8 by about 2%.  I'm not sure if I believe the rest of the numbers.

CCing Boris to see if he has ideas here.
Do we have instructions on how to run these tests in a profiler?
For dromaeo, you can run a particular subtest with shark hook by adding &shark to the url if you're using a --enable-shark build on 10.6 and shark is in programmatic mode....

The other thing worth doing is looking at the old and new numbers side-by-side to see whether it's an across-the-board regression or just a big slowdown on a few tests.
(In reply to Boris Zbarsky (:bz) from comment #11)
> For dromaeo, you can run a particular subtest with shark hook by adding
> &shark to the url if you're using a --enable-shark build on 10.6 and shark
> is in programmatic mode....

I have done that back in firefox 4 with dromaeo.com. Is that the same version we run on talus? If not, where do I find the correct version?
 
> The other thing worth doing is looking at the old and new numbers
> side-by-side to see whether it's an across-the-board regression or just a
> big slowdown on a few tests.

In the data ehsan collected yui.html has a -8.8, so it is probably o good candidate. I will check if that reproduces with http://dromaeo.com/?yui.
> Is that the same version we run on talus?

It's not quite identical, but close enough.  I'd start with debugging/profiling against dromaeo.com if the regression can be reproduced there.  If it can't, then you'd need to ask our infra/QA folks where to get the exact thing talos is running; I don't have that information...
Rafael, did you get a chance to profile this?
(In reply to Ehsan Akhgari [:ehsan] from comment #14)
> Rafael, did you get a chance to profile this?

Yes, it found http://llvm.org/bugs/show_bug.cgi?id=12251

I will work on it as soon as I block on 735697 on other places work.
I'm planing to rebuild clang based on revision 155417 and test the performance again.
Depends on: 748208
(In reply to Ehsan Akhgari [:ehsan] from comment #16)
> I'm planing to rebuild clang based on revision 155417 and test the
> performance again.

Awesome!
I pushed a new try build to test the new version of clang:

https://tbpl.mozilla.org/?tree=Try&rev=bf7ed18e78c9

I also wrote down the instructions for this test on the wiki: https://wiki.mozilla.org/Testing_Clang_Builds
Target Milestone: --- → mozilla14
Version: unspecified → Trunk
Target Milestone: mozilla14 → ---
(In reply to Ehsan Akhgari [:ehsan] from comment #18)
> I pushed a new try build to test the new version of clang:
> 
> https://tbpl.mozilla.org/?tree=Try&rev=bf7ed18e78c9
> 
> I also wrote down the instructions for this test on the wiki:
> https://wiki.mozilla.org/Testing_Clang_Builds

Apparently my patch was not enough to make us use clang.  Rafael, what am I missing?
A port of what I have been using is at

https://tbpl.mozilla.org/?tree=Try&rev=344602b4d649

I am using "export CC=.." to avoid the libffi bug, but I think that breaks the universal build. Lets see.
Try run for bf7ed18e78c9 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=bf7ed18e78c9
Results (out of 7 total builds):
    exception: 7
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/eakhgari@mozilla.com-bf7ed18e78c9
It looks like we have to debug:

* mochitest-4 on linux32 debug and linux64 debug
* crashtest on linux32 debug
Here's the performance comparison: http://bit.ly/Km8892

We have regressions on dromaeo_css and dromaeo_jslib.
Target Milestone: --- → mozilla14
Target Milestone: mozilla14 → ---
Here's the comparison between multiple runs on the try job and on the m-c base changeset for Dromaeo: http://bit.ly/IzHrHF

Looks like the regression is real.
I did another try push based on the same revision with -O3, to see if it can buy back the lost performance above.  https://tbpl.mozilla.org/?tree=Try&rev=5b49fd91dc16
(In reply to Ehsan Akhgari [:ehsan] from comment #28)
> I did another try push based on the same revision with -O3, to see if it can
> buy back the lost performance above. 
> https://tbpl.mozilla.org/?tree=Try&rev=5b49fd91dc16

I realized that I pushed this with -t none by mistake; here's a version with -t all: https://tbpl.mozilla.org/?tree=Try&rev=e9d0834b5d29
(In reply to Ehsan Akhgari [:ehsan] from comment #29)
> (In reply to Ehsan Akhgari [:ehsan] from comment #28)
> > I did another try push based on the same revision with -O3, to see if it can
> > buy back the lost performance above. 
> > https://tbpl.mozilla.org/?tree=Try&rev=5b49fd91dc16
> 
> I realized that I pushed this with -t none by mistake; here's a version with
> -t all: https://tbpl.mozilla.org/?tree=Try&rev=e9d0834b5d29

The regressions are still there with a -O3 build, although there are a number of other improvements in other tests.
(In reply to Ehsan Akhgari [:ehsan] from comment #30)
> (In reply to Ehsan Akhgari [:ehsan] from comment #29)
> > (In reply to Ehsan Akhgari [:ehsan] from comment #28)
> > > I did another try push based on the same revision with -O3, to see if it can
> > > buy back the lost performance above. 
> > > https://tbpl.mozilla.org/?tree=Try&rev=5b49fd91dc16
> > 
> > I realized that I pushed this with -t none by mistake; here's a version with
> > -t all: https://tbpl.mozilla.org/?tree=Try&rev=e9d0834b5d29
> 
> The regressions are still there with a -O3 build, although there are a
> number of other improvements in other tests.

http://bit.ly/K8LBdx
Depends on: 762071
I have tried running dromaeo locally and it looks like we are really close:

http://dromaeo.com/?id=171519,171522

I will take a look at the "Convert pixels to canvas" benchmark.
I did a new performance comparison. It is at

http://bit.ly/LQqC0D

I intentionally ran the win opt tests multiple times to have an idea on how reliable they are. I wonder if I doing something wrong, but windows 7 shows regressions (like 5.16% in dromaeo_dom) without me changing anything in the windows build :-(

Ironically the linux builds show improvements in dromaeo_dom when building with clang and the os X one shows regression, which is also unexpected given that the linux one is building with a much newer gcc.
This is the latest comparison: http://bit.ly/Pf6abL

This is now all pure win on Linux (non-PGO of course).  We still regress two tests (dromaeo_css and dromaeo_dom) on Mac, but we think that it's time to switch to clang now, and possibly fix these regressions in the future.

I'll write to dev-platform about this.
Depends on: 774462
Attached patch switch os x build (obsolete) — Splinter Review
This is a patch on top of m-i.

https://tbpl.mozilla.org/?tree=Try&pusher=respindola@mozilla.com

Since the current tooltool files are empty, I just made them a symlink to the clang file. Let me know if you would prefer some other organization.
Attachment #603901 - Attachment is obsolete: true
Attachment #643234 - Flags: review?(ehsan)
Attachment #643234 - Flags: feedback?(rail)
Comment on attachment 643234 [details] [diff] [review]
switch os x build

Symlinks are not a good idea, Windows will not be able to handle them correctly.  I would suggest copying them (hg cp).  (Yeah, it sucks, I know, but Windows sucks more...)
Attachment #643234 - Flags: review?(ehsan) → review-
Comment on attachment 643234 [details] [diff] [review]
switch os x build

I agree with Ehsan, the dumber file type you use the less troubles you have. Let's go with regular files instead symlinks.
Attachment #643234 - Flags: feedback?(rail) → feedback-
Attached patch copying the files (obsolete) — Splinter Review
Attachment #643234 - Attachment is obsolete: true
Attachment #643346 - Flags: review?(ehsan)
Attachment #643346 - Flags: feedback?(rail)
Attachment #643346 - Attachment is obsolete: true
Attachment #643346 - Flags: review?(ehsan)
Attachment #643346 - Flags: feedback?(rail)
Attachment #643354 - Flags: review?(ehsan)
Attachment #643354 - Flags: feedback?(rail)
Comment on attachment 643354 [details] [diff] [review]
change only os x :-)

lgtm!
Attachment #643354 - Flags: feedback?(rail) → feedback+
Comment on attachment 643354 [details] [diff] [review]
change only os x :-)

Review of attachment 643354 [details] [diff] [review]:
-----------------------------------------------------------------

\o/
Attachment #643354 - Flags: review?(ehsan) → review+
Comment on attachment 643354 [details] [diff] [review]
change only os x :-)

Checked it in to inbound. Waiting for one build before checking it in to m-mc.
Attachment #643354 - Flags: checkin+
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
As best I can tell, this patch broke OS X builds on all versions of OS X -- even those that have clang installed and in the path.

Releng builds were fixed by bug 775509.  But that fix depends on tooltool, which isn't part of the "standard" tree.

Bug 775509 seems to be about moving tooltool to the "standard" tree.  But in the meantime I think we should have some other way to build using clang.  How about just having autoconf look for "clang" and "clang++"?  Or do older versions of clang not work properly?

(We can still have the bundled clang (built using tooltool) override whatever autoconf finds, if the bundled clang has actually been extracted.)
> Bug 775509 seems to be about moving tooltool to the "standard" tree.

Oops, that's bug 768879.
(In reply to comment #45)
> As best I can tell, this patch broke OS X builds on all versions of OS X --
> even those that have clang installed and in the path.

Do you set CC and CXX to clang/clang++ in your mozconfig?
> Do you set CC and CXX to clang/clang++ in your mozconfig?

I've just tried that, but it doesn't work (at least not with the version of clang that comes with OS X 10.6 snd/or XCode 3.2.6).

But in any case you shouldn't have to do that.
(In reply to comment #48)
> > Do you set CC and CXX to clang/clang++ in your mozconfig?
> 
> I've just tried that, but it doesn't work (at least not with the version of
> clang that comes with OS X 10.6 snd/or XCode 3.2.6).

I think those versions of clang are too old.  You need to have the open source 3.1 release at a minimum IIRC.

> But in any case you shouldn't have to do that.

Do what?
(Following up comment #48)

So apparently old versions of clang *don't* work, and we'll have to wait until tooltool is moved to the standard tree.

Anyone know how to extract the bundled clang "by hand"?  I.e. to do what tooltool does, but "manually"?

> But in any case you shouldn't have to do that.

Do what?

You shouldn't have to set CC and CXX in your mozconfig file.  These values should be found "automatically" (as they always have been previously).  Though of course whatever you set in your mozconfig file should override the "automatic" values.
> Do what?
> 
> You shouldn't have to set CC and CXX in your mozconfig file.  These values
> should be found "automatically" (as they always have been previously). 
> Though of course whatever you set in your mozconfig file should override the
> "automatic" values.

The automatic value is to select in order
* gcc-4.2
* clang in your $PATH

which looks like a reasonable default. The mozconfigs we have in m-c are specif to how releng does its builds.

In summary, I don't think OS X is special in this case. If you have a working compiler in your PATH it will be used by default.

It *is* really bad that we don't have a publicly readable tooltool repo and I hope 768879 gets fixed, but before that you can upgrade xcode, build the open source clang, ask releng for a particular binary or, if it was me that did the upgrade you want, get it from
http://people.mozilla.org/~respindola/.
> If you have a working compiler in your PATH it will be used by default.

That isn't true -- not in my tests.
(In reply to Steven Michaud from comment #52)
> > If you have a working compiler in your PATH it will be used by default.
> 
> That isn't true -- not in my tests.

Please check what is going wrong in config.log and report then. The autoconf macro responsible for selecting the compiler is MOZ_DEFAULT_COMPILER (and it was not modified in this bug).
I'll shortly (tomorrow?) post a patch that restores the behavior you describe.

In the meantime, try yourself to build a universal build (with a simple mozconfig like I'll paste in below).  I think you'll find that it doesn't work on any version of OS X.

export CFLAGS="-g -gfull"
export CXXFLAGS="-g -gfull"
. $topsrcdir/browser/config/mozconfig
. $topsrcdir/build/macosx/universal/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-firefox-univ
mk_add_options MOZ_MAKE_FLAGS=-j4
mk_add_options AUTOCONF=autoconf213
ac_add_options --enable-application=browser
ac_add_options --disable-optimize
ac_add_options --disable-debug
ac_add_options --enable-tests
ac_add_options --disable-strip
ac_add_options --disable-install-strip
# For NSS symbols
export MOZ_DEBUG_SYMBOLS=1
ac_add_options --enable-debug-symbols="-g -gfull"
> . $topsrcdir/browser/config/mozconfig
> . $topsrcdir/build/macosx/universal/mozconfig

As I mentioned on comment 51 "The mozconfigs we have in m-c are specif to how releng does its builds." They set the CC and CXX to the values that releng uses and that correctly cuts off the compiler detection.

If you don't set CC or CXX configure will detect and use a compiler in your PATH.
Rafael, please try to build with the mozconfig from comment #54, or with the one I'll paste in below.  I think you'll find that the builds fail on all versions of OS X.

export CFLAGS="-gdwarf-2"
export CXXFLAGS="-gdwarf-2"
. $topsrcdir/build/macosx/universal/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-firefox-nightly
mk_add_options MOZ_MAKE_FLAGS=-j4
mk_add_options AUTOCONF=autoconf213
ac_add_options --enable-application=browser
ac_add_options --enable-tests
ac_add_options --enable-codesighs
ac_add_options --disable-install-strip
ac_add_options --enable-js-diagnostics
ac_add_options --with-macbundlename-prefix=Firefox
# For NSS symbols
export MOZ_DEBUG_SYMBOLS=1
ac_add_options --enable-debug-symbols="-gdwarf-2"
# Needed to enable breakpad in application.ini
export MOZILLA_OFFICIAL=1
They *should* fail if you don't have the same setup as releng does. Your mozconfig includes  $topsrcdir/build/macosx/universal/mozconfig which includes build/macosx/common which sets CC and CXX to the setup that releng uses.

This was true before. The releng setup was to use /usr/bin/gcc-4.2. If you had a system without that, the build would fail. The current releng setup is to use clang from a directory in the source tree. If you don't have that it should fail.

Your build will also fail if you don't have ccache installed for example, since that is another tool that releng uses.
So you're suggesting not using a mozconfig at all?

Do you use a mozconfig successfully?  If so please paste it in.
(In reply to Steven Michaud from comment #58)
> So you're suggesting not using a mozconfig at all?

You have two options
*) Don't use the .mozconfigs that we have in m-c
*) Set up your system to look like a releng bot.
 
> Do you use a mozconfig successfully?  If so please paste it in.

This one correctly selects the clang I have on my path (/usr/bin/clang):

ac_add_options --target=x86_64-apple-darwin10
ac_add_options --enable-application=browser

ac_add_options --enable-update-channel=${MOZ_UPDATE_CHANNEL}
ac_add_options --enable-update-packaging
ac_add_options --enable-codesighs
ac_add_options --disable-install-strip
ac_add_options --enable-signmar
ac_add_options --enable-profiling

# Nightlies only since this has a cost in performance
ac_add_options --enable-js-diagnostics

# Needed to enable breakpad in application.ini
export MOZILLA_OFFICIAL=1

export MOZ_TELEMETRY_REPORTING=1
mk_add_options MOZ_MAKE_FLAGS="-j12"

ac_add_options --with-macbundlename-prefix=Firefox

# Treat warnings as errors in directories with FAIL_ON_WARNINGS.
ac_add_options --enable-warnings-as-errors

# Package js shell.
export MOZ_PACKAGE_JSSHELL=1
I find that I can build successfully, using either of my mozconfig files (from comment #54 or comment #56, if I remove the following line:

. $topsrcdir/build/macosx/universal/mozconfig

But if I do that I don't get a universal build.

So once again, Rafael, I challenge you to do a successful universal build in current trunk, without having an environment identical to that used by releng.
You "found" that? From comment 57:

 Your mozconfig includes  $topsrcdir/build/macosx/universal/mozconfig which includes build/macosx/common which sets CC and CXX to the setup that releng uses.

I can take you challenge tomorrow, but I think you can do it. Give it a try.
For universal you have (and had back in the gcc 4.2 days) to provide an explicit CC and CXX since our build system has no direct support for universal binaries and so has to be provided with a CC and CXX that produce binaries for the correct architecture.

So far this .mozconfig is working for me:

mk_add_options MOZ_BUILD_PROJECTS="i386 x86_64"
mk_add_options MOZ_UNIFY_BDATE=1
mk_add_options MOZ_POSTFLIGHT_ALL+=build/macosx/universal/flight.mk

CC=/usr/bin/clang
CXX=/usr/bin/clang++

ac_add_app_options i386 --target=i386-apple-darwin10
ac_add_app_options x86_64 --target=x86_64-apple-darwin10

if test -n "$MOZ_BUILD_APP" ; then
  TARGET_CPU=$MOZ_BUILD_APP
  HOST_CC=$CC
  HOST_CXX=$CXX
  NATIVE_CPU=`$topsrcdir/build/autoconf/config.guess | cut -f1 -d-`
  CC="$CC -arch $TARGET_CPU"
  CXX="$CXX -arch $TARGET_CPU"
  RANLIB=ranlib
  AR=ar
  AS=$CC
  LD=ld
  STRIP="strip -x -S"
  if test "$NATIVE_CPU" != "$TARGET_CPU" ; then
    CROSS_COMPILE=1
  fi
  UNIVERSAL_BINARY=1
  export CC CXX HOST_CC HOST_CXX RANLIB AR AS LD STRIP
fi

ac_add_options --enable-application=browser

ac_add_options --enable-update-channel=${MOZ_UPDATE_CHANNEL}
ac_add_options --enable-update-packaging
ac_add_options --enable-codesighs
ac_add_options --disable-install-strip
ac_add_options --enable-signmar
ac_add_options --enable-profiling
mk_add_options MOZ_MAKE_FLAGS="-j12"
We are still targeting 10.5 for 32 bits, so you will have to add

ac_add_options --enable-stdcxx-compat 

to that list.
And you also need an sdk:

ac_add_options --with-macos-sdk=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk

btw, note how this is an example where the mozconfig files in m-c would not work on my machine (xcode 4.4) even if we had decided to stay with gcc: The SDK is not installed on the same directory.
Target Milestone: --- → mozilla17
Thanks, Rafael.

I'll try your mozconfig on several different versions of OS X and report back.
Please do not that at the very least you have to update

ac_add_options --with-macos-sdk=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk

to point to where you have the SDK on those systems. This path is the new default for xcode 4.4 (and 4.3 maybe?). Older versions install in /Developer.
(Following up comment #65)

Also with and without the built-in clang (the one that comes with XCode, to see which platforms have a recent enough version of clang).

(In reply to comment #66)

For newer versions of XCode (those without installers), I've just been making /Developer/SDKs a soft link to the new location.
Depends on: 779333
Depends on: 778364
No longer depends on: 778364
Depends on: 787302
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.