Closed Bug 870173 Opened 11 years ago Closed 11 years ago

Upgrade to clang 3.3

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ehsan.akhgari, Assigned: rail)

References

Details

Attachments

(2 files, 7 obsolete files)

clang 3.3rc1 is out, it would be great if we can start testing it out as soon as possible.  Rail, is this something you'd like to do?
By "start testing" do you mean just ability to test on try or switching to it on mozilla-central?
Assignee: nobody → rail
(In reply to comment #1)
> By "start testing" do you mean just ability to test on try or switching to it
> on mozilla-central?

Well, initially I think we should do try pushes to make sure everything is green and in good order, and that there are no performance regressions, so that when 3.3 is released (should happen very shortly) we can upgrade to it on mozilla-central.  That puts us at the beginning of the 24 cycle which is a nice coincidence!
FWIW, please make sure that the backout of bug 860867 is in the tree that you're testing (it is on central now) as I think we should see if we can benefit from loop vectorizer optimizations in 3.3.
Attached patch build-clang.py (obsolete) — Splinter Review
* I'm not going to land this patch unless we decide to proceed with this version
* I grabbed the latest version on the release_33 branch (it's not the same now, thought)
* I had to recreate build/unix/build-clang/no-sse-on-linux.patch, it would be great to check if I did it properly and if it's still makes sense to use it
* I'll upload manifests once the binaries are ready (probably later today)
Attachment #747601 - Flags: review?(ehsan)
Attachment #747601 - Flags: review?(ehsan) → review+
Attached patch clang manifests (obsolete) — Splinter Review
These are manifests that be used for try pushes. Feel free to test the new version. Let me know when we are ready for another round of version bump.
(In reply to comment #5)
> Created attachment 748030 [details] [diff] [review]
>   --> https://bugzilla.mozilla.org/attachment.cgi?id=748030&action=edit
> clang manifests
> 
> These are manifests that be used for try pushes. Feel free to test the new
> version. Let me know when we are ready for another round of version bump.

I'm afraid I don't have the cycles to do this myself.  I will ask for help on dev.platform I guess.
Status update:

First try run is at: https://tbpl.mozilla.org/?tree=Try&rev=3bb5addd12e1 (the Linux builds were probably unnecessary, shame on me)

I have a fix in hand for the OS X opt build failure; hopefully upstream will approve.  Should I hand fixes directly to Rail for him to build new binaries, or just wait until they get accepted upstream and then ping for a new build then?

The reftest crashes look like real crashes and will probably take a bit of time to debug.
Flags: needinfo?(rail)
(In reply to comment #7)
> Status update:
> 
> First try run is at: https://tbpl.mozilla.org/?tree=Try&rev=3bb5addd12e1 (the
> Linux builds were probably unnecessary, shame on me)
> 
> I have a fix in hand for the OS X opt build failure; hopefully upstream will
> approve.  Should I hand fixes directly to Rail for him to build new binaries,
> or just wait until they get accepted upstream and then ping for a new build
> then?

We usually do the latter.  It's usually quite easy to land llvm/clang patches upstream.  But IIRC we did have the mechanism to apply our local patch es to clang at some point...
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #8)
> (In reply to comment #7)
> > Status update:
> > 
> > First try run is at: https://tbpl.mozilla.org/?tree=Try&rev=3bb5addd12e1 (the
> > Linux builds were probably unnecessary, shame on me)
> > 
> > I have a fix in hand for the OS X opt build failure; hopefully upstream will
> > approve.  Should I hand fixes directly to Rail for him to build new binaries,
> > or just wait until they get accepted upstream and then ping for a new build
> > then?
> 
> We usually do the latter.  It's usually quite easy to land llvm/clang
> patches upstream.  But IIRC we did have the mechanism to apply our local
> patch es to clang at some point...

We still do, we just would land those patches on m-c directly.

Turns out that the "fix" I made interferes with a similar fix landed several weeks ago:

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130401/077557.html

and sounds like it was made explicitly to help out with the case that comes up in Firefox.  However, the fix is obviously failing on our builds, so there's some more digging required...  I've been working all day to try and get a 10.6 SDK put together so I can do a universal build on my machine, but it has not been easy. :(
An alternate fix would just be to fix our xpct* files to do what clang expects for those versions that we care about.  That might be easier in the long run.
It will be much better to get the changes upstreamed.
Flags: needinfo?(rail) → needinfo?
needinfo? y u no gone?
Flags: needinfo?
Depends on: 872031
(In reply to Nathan Froyd (:froydnj) from comment #9)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #8)
> > (In reply to comment #7)
> > > Status update:
> > > 
> > > First try run is at: https://tbpl.mozilla.org/?tree=Try&rev=3bb5addd12e1 (the
> > > Linux builds were probably unnecessary, shame on me)
> > > 
> > > I have a fix in hand for the OS X opt build failure; hopefully upstream will
> > > approve.  Should I hand fixes directly to Rail for him to build new binaries,
> > > or just wait until they get accepted upstream and then ping for a new build
> > > then?
> > 
> > We usually do the latter.  It's usually quite easy to land llvm/clang
> > patches upstream.  But IIRC we did have the mechanism to apply our local
> > patch es to clang at some point...
> 
> We still do, we just would land those patches on m-c directly.

The relevant patch was actually committed to trunk last week and is waiting for a backport.  I will ping the backport request with a note that it's needed to build Firefox later this week if necessary.

The reftest crash doesn't manifest on my machine with homebuilt clang 3.3, but it does manifest if I use the build from try.  I will be looking at that tomorrow.  It's possible that something has been fixed on 3.3 since Rail built binaries; there've been quite a lot of patches nominated for backports.
(In reply to Nathan Froyd (:froydnj) from comment #13)
> The relevant patch was actually committed to trunk last week and is waiting
> for a backport.  I will ping the backport request with a note that it's
> needed to build Firefox later this week if necessary.

This backport has been committed.

Rail, can you build and upload new binaries based on a version later than r181909, please?
Flags: needinfo?(rail)
Sure. On it.
Flags: needinfo?(rail)
Do we need to update https://wiki.mozilla.org/ReleaseEngineering/How_To/Clang_update for where and how we're doing the compiling now ?
Attached patch clang manifests (obsolete) — Splinter Review
r181915
Attachment #748030 - Attachment is obsolete: true
(In reply to Nick Thomas [:nthomas] from comment #16)
> Do we need to update
> https://wiki.mozilla.org/ReleaseEngineering/How_To/Clang_update for where
> and how we're doing the compiling now ?

I'll update the docs.
Still seeing reftest crashes, but not with my self-compiled binaries:

https://tbpl.mozilla.org/?tree=Try&rev=40507fc4e2a6
Comment on attachment 750210 [details] [diff] [review]
clang manifests

Can we fix the whitespace on end of lines, or at least be consistent with it? Having a mixture of an additional space or not, is confusing the diffs.
The 3.3 branch has been updated with (at least) a fix to the reftest crash problem.  Rail, can you build new binaries, please?
Flags: needinfo?(rail)
Nathan, have you tried getting some perf numbers yet?
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #22)
> Nathan, have you tried getting some perf numbers yet?

No.  Correctness first. ;)
I will update the binaries today/tomorrow.
Flags: needinfo?(rail)
(In reply to Rail Aliiev [:rail] from comment #25)
> I will update the binaries today/tomorrow.

Thanks!
Attached patch clang manifests (obsolete) — Splinter Review
rev 182605
Attachment #750210 - Attachment is obsolete: true
We cannot deploy this yet, the ASan builds fail with a startup crash with this (and the tip version) or llvm/clang. I am bisecting in the LLVM repo right now to find out what patch caused this and I hope we can find a quick workaround/solution to get this upgrade working.
So I think we figured out the startup crash. It crashes because we call clock_gettime in the main binary but it is not linked to librt (instead, libxul.so is linked to it and that is dlopened). ASan now intercepts clock_gettime and calls a NULL ptr when calling the real function.

ASan devs will solve this by linking librt to their runtime and will provide a patch. Meanwhile, I'll do another try push with LDFLAGS="-lrt" which works around the bug, so we can see other failures.
Revision r182645 in the clang repository fixes this by forcing -lrt when clang links the ASan runtime. The patch is easy and should also be easy to apply/backport to the revision we're aiming for.
Depends on: 876717
The necessary ASan bits have landed.  Rail, can you build new binaries, please?
Flags: needinfo?(rail)
On it.
Flags: needinfo?(rail)
Attached patch clang manifests (r182998) (obsolete) — Splinter Review
fresh meat!
Attachment #753543 - Attachment is obsolete: true
Talos comparison; baseline is present day, clang 3.3 is the new hotness:

http://perf.snarkfest.net/compare-talos/index.html?newRev=b60117cd5d2e&oldRevs=09acee1711ba&submit=true

We seem to lose a little bit on the JS tests (dromaeo, kraken, v8), slight improvements on other tests.  Not sure about the tp5o_shutdown_paint tests.  Will try retriggering some tests.
We might have found one more ASan bug that needs fixing on the ASan side before upgrading (see bug 864008 for more information). The problem is that this bug causes mochitest-1 to abort and also triggers on certain images. I will try to get a quick fix from the ASan devs as soon as possible. Sorry for the delay.
Depends on: 864008
No worries. I'd prefer to switch to a stable and tested version. :)
Upstream fixed this on trunk r183647 and I confirmed their fix. The change is here:

http://llvm.org/viewvc/llvm-project?view=revision&revision=183647

Do you guys think we can get this backported to the 3.3 branch or should we try to apply this as a patch on our own?
(In reply to Christian Holler (:decoder) from comment #37)
> Upstream fixed this on trunk r183647 and I confirmed their fix. The change
> is here:
> 
> http://llvm.org/viewvc/llvm-project?view=revision&revision=183647
> 
> Do you guys think we can get this backported to the 3.3 branch or should we
> try to apply this as a patch on our own?

We will have to apply the patches ourselves: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130610/177353.html
Can someone attach the patch we want to apply?
(In reply to Rail Aliiev [:rail] from comment #39)
> Can someone attach the patch we want to apply?

The one in http://llvm.org/viewvc/llvm-project?view=revision&revision=183647 doesn't apply properly. Could someone backport the patch?
(In reply to Rail Aliiev [:rail] from comment #40)
> (In reply to Rail Aliiev [:rail] from comment #39)
> > Can someone attach the patch we want to apply?
> 
> The one in http://llvm.org/viewvc/llvm-project?view=revision&revision=183647
> doesn't apply properly. Could someone backport the patch?

I will do this today.
Refresh build-clang.py with the 3.3 release branch and include the necessary
bits to build ASan so posix_memalign works properly.  Tested by running the
ASan tests and sanity-checking build-clang.py locally.
Attachment #747601 - Attachment is obsolete: true
Attachment #760961 - Flags: review+
Pinging rail to build new binaries.
Flags: needinfo?(rail)
On it.
Flags: needinfo?(rail)
Attached patch clang manifests (obsolete) — Splinter Review
here we go
Attachment #756623 - Attachment is obsolete: true
Is this version any better?
Flags: needinfo?(nfroyd)
(In reply to Rail Aliiev [:rail] from comment #46)
> Is this version any better?

Not sure, the only problem with the previous one was with asan, so I assumed decoder would look at it.  I think he was on PTO, though...

Christian, did you try out the new binaries?
Flags: needinfo?(nfroyd) → needinfo?(choller)
Doing a try run now, leaving the needinfo until the results are there :)
No rush. Just wanted to ping because it's so easy to miss a comment on a CC'ed bug.
Small update: Seeing a startup crash now which makes all tests orange. But I'll have to investigate first if this is something introduced by us, or related to Clang.
(In reply to Christian Holler (:decoder) from comment #50)
> Small update: Seeing a startup crash now which makes all tests orange. But
> I'll have to investigate first if this is something introduced by us, or
> related to Clang.

It's entirely possible that my backport is buggy.  But it did pass clang's internal tests...at least, I *think* I was running them correctly. :)
Depends on: 886842
FWIW, with bug 886842, we should no longer be blocked on the ASan issues.
Summary: Upgrade to clang 3.3rc1 → Upgrade to clang 3.3
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #52)
> FWIW, with bug 886842, we should no longer be blocked on the ASan issues.

Well, that bug blocks this bug. We cannot upgrade to 3.3 until we have at least one Clang version that can still build and run ASan properly, otherwise we break all the automated builds. If we keep the old version around, then the 3.3 upgrade can be done right now, but if not, then I'd like to have trunk clang available before we upgrade.
(In reply to comment #53)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #52)
> > FWIW, with bug 886842, we should no longer be blocked on the ASan issues.
> 
> Well, that bug blocks this bug. We cannot upgrade to 3.3 until we have at least
> one Clang version that can still build and run ASan properly, otherwise we
> break all the automated builds. If we keep the old version around, then the 3.3
> upgrade can be done right now, but if not, then I'd like to have trunk clang
> available before we upgrade.

We will keep the old compiler around, we just won't use it for anything other than ASan builds.  Does that sound good?
Sounds good to me :) You should probably still backout the backported patch that doesn't work.
Flags: needinfo?(choller)
What about this plan:

1) rail preps a new patch using r183744 and dropping asan-posix-memalign-fix.patch
2) rail provides new binaries and manifests
3) ??? tests the binaries
4) someone pushes the patch from 1) and manifests to mozilla-inbound
Flags: needinfo?(ehsan)
Sounds like a plan!  I believe that Nathan will be the person testing the binaries.
Flags: needinfo?(ehsan)
Attached patch build-clang.pySplinter Review
This is attachment 747601 [details] [diff] [review] with a fixed version.
Attachment #760961 - Attachment is obsolete: true
Attachment #770635 - Flags: review+
Comment on attachment 770635 [details] [diff] [review]
build-clang.py

Could we get a version of the patch without gratuitous whitespace/formatting changes?
Nathan, can you verify that these binaries work fine?
Attachment #761206 - Attachment is obsolete: true
Attachment #770734 - Flags: feedback?(nfroyd)
(In reply to Nathan Froyd (:froydnj) from comment #59)
> Comment on attachment 770635 [details] [diff] [review]
> build-clang.py
> 
> Could we get a version of the patch without gratuitous whitespace/formatting
> changes?

Basically it does the same as https://bugzilla.mozilla.org/attachment.cgi?id=760961 without asan-posix-memalign-fix.patch. The patch I attached is based on attachment 747601 [details] [diff] [review] reviewed by Ehsan with one liner change.
Try push with new manifests: https://tbpl.mozilla.org/?tree=Try&rev=2271ee27d652

Assuming everything looks ok with that, I will push the updated script and manifests.
I retriggered one of the orange tests and it passed.
Ping. What are the next steps/actions here?
(In reply to Rail Aliiev [:rail] from comment #64)
> Ping. What are the next steps/actions here?

My understanding is that committing the manifests that we have here will break the ASan builds we are currently running, so we're waiting for ASan builds to get their own separate manifests (that can just be copies of the current ones for now) before we can safely fix this bug.  That is what bug 886842 is all about.
Comment on attachment 770734 [details] [diff] [review]
clang manifests (r183744)

Yup, these work.
Attachment #770734 - Flags: feedback?(nfroyd) → feedback+
Should we block this bug on bug 886842 since it breaks ASan builds (AFAIK)?
(In reply to Rail Aliiev [:rail] from comment #67)
> Should we block this bug on bug 886842 since it breaks ASan builds (AFAIK)?

It's already blocking on that.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #69)
> (In reply to Rail Aliiev [:rail] from comment #67)
> > Should we block this bug on bug 886842 since it breaks ASan builds (AFAIK)?
> 
> It's already blocking on that.

Bah! I need moar coffee.
https://hg.mozilla.org/mozilla-central/rev/bc6a4518a973
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
No longer blocks: 872565
forgot the magic words
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [leave open]
Attachment #770734 - Flags: review?(ehsan)
Attachment #770734 - Flags: review?(ehsan) → review+
Whiteboard: [leave open]
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
This might have caused a 3% regression on dromaeo-dom...
(In reply to comment #74)
> This might have caused a 3% regression on dromaeo-dom...

:(  I thought that Nathan had looked at those numbers.  Nathan, can you please take a look?
Flags: needinfo?(nfroyd)
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #76)
> (In reply to comment #74)
> > This might have caused a 3% regression on dromaeo-dom...
> 
> :(  I thought that Nathan had looked at those numbers.  Nathan, can you
> please take a look?

Comment 34 has some compare-talos bits; dromaeo-dom had dropped a bit there, but not quite 3%.  The original Try revisions have been blown away, though, so no retriggering possible there, and I don't remember whether I retriggered and saw the numbers even out or not.

We can back out the manifest changes for now, since ASan has separate manifests and we're not (AFAIK) relying on any 3.3 features.  But it would take me a while to figure out why we dropped so much and fix clang to Not Do That.
Flags: needinfo?(nfroyd)
(In reply to comment #77)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #76)
> > (In reply to comment #74)
> > > This might have caused a 3% regression on dromaeo-dom...
> > 
> > :(  I thought that Nathan had looked at those numbers.  Nathan, can you
> > please take a look?
> 
> Comment 34 has some compare-talos bits; dromaeo-dom had dropped a bit there,
> but not quite 3%.  The original Try revisions have been blown away, though, so
> no retriggering possible there, and I don't remember whether I retriggered and
> saw the numbers even out or not.
> 
> We can back out the manifest changes for now, since ASan has separate manifests
> and we're not (AFAIK) relying on any 3.3 features.  But it would take me a
> while to figure out why we dropped so much and fix clang to Not Do That.

If you don't expect to get to fix this before the next uplift, let's back it out.  :(
Product: mozilla.org → Release Engineering
Component: Other → Platform Support
Depends on: 1087002
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: