Closed
Bug 870173
Opened 11 years ago
Closed 11 years ago
Upgrade to clang 3.3
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ehsan.akhgari, Assigned: rail)
References
Details
Attachments
(2 files, 7 obsolete files)
7.58 KB,
patch
|
rail
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
3.45 KB,
patch
|
ehsan.akhgari
:
review+
froydnj
:
feedback+
|
Details | Diff | Splinter Review |
clang 3.3rc1 is out, it would be great if we can start testing it out as soon as possible. Rail, is this something you'd like to do?
Assignee | ||
Comment 1•11 years ago
|
||
By "start testing" do you mean just ability to test on try or switching to it on mozilla-central?
Assignee: nobody → rail
Reporter | ||
Comment 2•11 years ago
|
||
(In reply to comment #1) > By "start testing" do you mean just ability to test on try or switching to it > on mozilla-central? Well, initially I think we should do try pushes to make sure everything is green and in good order, and that there are no performance regressions, so that when 3.3 is released (should happen very shortly) we can upgrade to it on mozilla-central. That puts us at the beginning of the 24 cycle which is a nice coincidence!
Reporter | ||
Comment 3•11 years ago
|
||
FWIW, please make sure that the backout of bug 860867 is in the tree that you're testing (it is on central now) as I think we should see if we can benefit from loop vectorizer optimizations in 3.3.
Assignee | ||
Comment 4•11 years ago
|
||
* I'm not going to land this patch unless we decide to proceed with this version * I grabbed the latest version on the release_33 branch (it's not the same now, thought) * I had to recreate build/unix/build-clang/no-sse-on-linux.patch, it would be great to check if I did it properly and if it's still makes sense to use it * I'll upload manifests once the binaries are ready (probably later today)
Attachment #747601 -
Flags: review?(ehsan)
Reporter | ||
Updated•11 years ago
|
Attachment #747601 -
Flags: review?(ehsan) → review+
Assignee | ||
Comment 5•11 years ago
|
||
These are manifests that be used for try pushes. Feel free to test the new version. Let me know when we are ready for another round of version bump.
Reporter | ||
Comment 6•11 years ago
|
||
(In reply to comment #5) > Created attachment 748030 [details] [diff] [review] > --> https://bugzilla.mozilla.org/attachment.cgi?id=748030&action=edit > clang manifests > > These are manifests that be used for try pushes. Feel free to test the new > version. Let me know when we are ready for another round of version bump. I'm afraid I don't have the cycles to do this myself. I will ask for help on dev.platform I guess.
Comment 7•11 years ago
|
||
Status update: First try run is at: https://tbpl.mozilla.org/?tree=Try&rev=3bb5addd12e1 (the Linux builds were probably unnecessary, shame on me) I have a fix in hand for the OS X opt build failure; hopefully upstream will approve. Should I hand fixes directly to Rail for him to build new binaries, or just wait until they get accepted upstream and then ping for a new build then? The reftest crashes look like real crashes and will probably take a bit of time to debug.
Flags: needinfo?(rail)
Reporter | ||
Comment 8•11 years ago
|
||
(In reply to comment #7) > Status update: > > First try run is at: https://tbpl.mozilla.org/?tree=Try&rev=3bb5addd12e1 (the > Linux builds were probably unnecessary, shame on me) > > I have a fix in hand for the OS X opt build failure; hopefully upstream will > approve. Should I hand fixes directly to Rail for him to build new binaries, > or just wait until they get accepted upstream and then ping for a new build > then? We usually do the latter. It's usually quite easy to land llvm/clang patches upstream. But IIRC we did have the mechanism to apply our local patch es to clang at some point...
Comment 9•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #8) > (In reply to comment #7) > > Status update: > > > > First try run is at: https://tbpl.mozilla.org/?tree=Try&rev=3bb5addd12e1 (the > > Linux builds were probably unnecessary, shame on me) > > > > I have a fix in hand for the OS X opt build failure; hopefully upstream will > > approve. Should I hand fixes directly to Rail for him to build new binaries, > > or just wait until they get accepted upstream and then ping for a new build > > then? > > We usually do the latter. It's usually quite easy to land llvm/clang > patches upstream. But IIRC we did have the mechanism to apply our local > patch es to clang at some point... We still do, we just would land those patches on m-c directly. Turns out that the "fix" I made interferes with a similar fix landed several weeks ago: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130401/077557.html and sounds like it was made explicitly to help out with the case that comes up in Firefox. However, the fix is obviously failing on our builds, so there's some more digging required... I've been working all day to try and get a 10.6 SDK put together so I can do a universal build on my machine, but it has not been easy. :(
Comment 10•11 years ago
|
||
An alternate fix would just be to fix our xpct* files to do what clang expects for those versions that we care about. That might be easier in the long run.
Assignee | ||
Comment 11•11 years ago
|
||
It will be much better to get the changes upstreamed.
Flags: needinfo?(rail) → needinfo?
Comment 13•11 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #9) > (In reply to :Ehsan Akhgari (needinfo? me!) from comment #8) > > (In reply to comment #7) > > > Status update: > > > > > > First try run is at: https://tbpl.mozilla.org/?tree=Try&rev=3bb5addd12e1 (the > > > Linux builds were probably unnecessary, shame on me) > > > > > > I have a fix in hand for the OS X opt build failure; hopefully upstream will > > > approve. Should I hand fixes directly to Rail for him to build new binaries, > > > or just wait until they get accepted upstream and then ping for a new build > > > then? > > > > We usually do the latter. It's usually quite easy to land llvm/clang > > patches upstream. But IIRC we did have the mechanism to apply our local > > patch es to clang at some point... > > We still do, we just would land those patches on m-c directly. The relevant patch was actually committed to trunk last week and is waiting for a backport. I will ping the backport request with a note that it's needed to build Firefox later this week if necessary. The reftest crash doesn't manifest on my machine with homebuilt clang 3.3, but it does manifest if I use the build from try. I will be looking at that tomorrow. It's possible that something has been fixed on 3.3 since Rail built binaries; there've been quite a lot of patches nominated for backports.
Comment 14•11 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #13) > The relevant patch was actually committed to trunk last week and is waiting > for a backport. I will ping the backport request with a note that it's > needed to build Firefox later this week if necessary. This backport has been committed. Rail, can you build and upload new binaries based on a version later than r181909, please?
Flags: needinfo?(rail)
Comment 16•11 years ago
|
||
Do we need to update https://wiki.mozilla.org/ReleaseEngineering/How_To/Clang_update for where and how we're doing the compiling now ?
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #16) > Do we need to update > https://wiki.mozilla.org/ReleaseEngineering/How_To/Clang_update for where > and how we're doing the compiling now ? I'll update the docs.
Comment 19•11 years ago
|
||
Still seeing reftest crashes, but not with my self-compiled binaries: https://tbpl.mozilla.org/?tree=Try&rev=40507fc4e2a6
Comment 20•11 years ago
|
||
Comment on attachment 750210 [details] [diff] [review] clang manifests Can we fix the whitespace on end of lines, or at least be consistent with it? Having a mixture of an additional space or not, is confusing the diffs.
Comment 21•11 years ago
|
||
The 3.3 branch has been updated with (at least) a fix to the reftest crash problem. Rail, can you build new binaries, please?
Flags: needinfo?(rail)
Reporter | ||
Comment 22•11 years ago
|
||
Nathan, have you tried getting some perf numbers yet?
Comment 23•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #22) > Nathan, have you tried getting some perf numbers yet? No. Correctness first. ;)
Comment 26•11 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #25) > I will update the binaries today/tomorrow. Thanks!
Comment 28•11 years ago
|
||
We cannot deploy this yet, the ASan builds fail with a startup crash with this (and the tip version) or llvm/clang. I am bisecting in the LLVM repo right now to find out what patch caused this and I hope we can find a quick workaround/solution to get this upgrade working.
Comment 29•11 years ago
|
||
So I think we figured out the startup crash. It crashes because we call clock_gettime in the main binary but it is not linked to librt (instead, libxul.so is linked to it and that is dlopened). ASan now intercepts clock_gettime and calls a NULL ptr when calling the real function. ASan devs will solve this by linking librt to their runtime and will provide a patch. Meanwhile, I'll do another try push with LDFLAGS="-lrt" which works around the bug, so we can see other failures.
Comment 30•11 years ago
|
||
Revision r182645 in the clang repository fixes this by forcing -lrt when clang links the ASan runtime. The patch is easy and should also be easy to apply/backport to the revision we're aiming for.
Comment 31•11 years ago
|
||
The necessary ASan bits have landed. Rail, can you build new binaries, please?
Flags: needinfo?(rail)
Comment 34•11 years ago
|
||
Talos comparison; baseline is present day, clang 3.3 is the new hotness: http://perf.snarkfest.net/compare-talos/index.html?newRev=b60117cd5d2e&oldRevs=09acee1711ba&submit=true We seem to lose a little bit on the JS tests (dromaeo, kraken, v8), slight improvements on other tests. Not sure about the tp5o_shutdown_paint tests. Will try retriggering some tests.
Comment 35•11 years ago
|
||
We might have found one more ASan bug that needs fixing on the ASan side before upgrading (see bug 864008 for more information). The problem is that this bug causes mochitest-1 to abort and also triggers on certain images. I will try to get a quick fix from the ASan devs as soon as possible. Sorry for the delay.
Assignee | ||
Comment 36•11 years ago
|
||
No worries. I'd prefer to switch to a stable and tested version. :)
Comment 37•11 years ago
|
||
Upstream fixed this on trunk r183647 and I confirmed their fix. The change is here: http://llvm.org/viewvc/llvm-project?view=revision&revision=183647 Do you guys think we can get this backported to the 3.3 branch or should we try to apply this as a patch on our own?
Comment 38•11 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #37) > Upstream fixed this on trunk r183647 and I confirmed their fix. The change > is here: > > http://llvm.org/viewvc/llvm-project?view=revision&revision=183647 > > Do you guys think we can get this backported to the 3.3 branch or should we > try to apply this as a patch on our own? We will have to apply the patches ourselves: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130610/177353.html
Assignee | ||
Comment 39•11 years ago
|
||
Can someone attach the patch we want to apply?
Assignee | ||
Comment 40•11 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #39) > Can someone attach the patch we want to apply? The one in http://llvm.org/viewvc/llvm-project?view=revision&revision=183647 doesn't apply properly. Could someone backport the patch?
Comment 41•11 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #40) > (In reply to Rail Aliiev [:rail] from comment #39) > > Can someone attach the patch we want to apply? > > The one in http://llvm.org/viewvc/llvm-project?view=revision&revision=183647 > doesn't apply properly. Could someone backport the patch? I will do this today.
Comment 42•11 years ago
|
||
Refresh build-clang.py with the 3.3 release branch and include the necessary bits to build ASan so posix_memalign works properly. Tested by running the ASan tests and sanity-checking build-clang.py locally.
Attachment #747601 -
Attachment is obsolete: true
Attachment #760961 -
Flags: review+
Comment 47•11 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #46) > Is this version any better? Not sure, the only problem with the previous one was with asan, so I assumed decoder would look at it. I think he was on PTO, though... Christian, did you try out the new binaries?
Flags: needinfo?(nfroyd) → needinfo?(choller)
Comment 48•11 years ago
|
||
Doing a try run now, leaving the needinfo until the results are there :)
Assignee | ||
Comment 49•11 years ago
|
||
No rush. Just wanted to ping because it's so easy to miss a comment on a CC'ed bug.
Comment 50•11 years ago
|
||
Small update: Seeing a startup crash now which makes all tests orange. But I'll have to investigate first if this is something introduced by us, or related to Clang.
Comment 51•11 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #50) > Small update: Seeing a startup crash now which makes all tests orange. But > I'll have to investigate first if this is something introduced by us, or > related to Clang. It's entirely possible that my backport is buggy. But it did pass clang's internal tests...at least, I *think* I was running them correctly. :)
Reporter | ||
Comment 52•11 years ago
|
||
FWIW, with bug 886842, we should no longer be blocked on the ASan issues.
Reporter | ||
Updated•11 years ago
|
Summary: Upgrade to clang 3.3rc1 → Upgrade to clang 3.3
Comment 53•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #52) > FWIW, with bug 886842, we should no longer be blocked on the ASan issues. Well, that bug blocks this bug. We cannot upgrade to 3.3 until we have at least one Clang version that can still build and run ASan properly, otherwise we break all the automated builds. If we keep the old version around, then the 3.3 upgrade can be done right now, but if not, then I'd like to have trunk clang available before we upgrade.
Reporter | ||
Comment 54•11 years ago
|
||
(In reply to comment #53) > (In reply to :Ehsan Akhgari (needinfo? me!) from comment #52) > > FWIW, with bug 886842, we should no longer be blocked on the ASan issues. > > Well, that bug blocks this bug. We cannot upgrade to 3.3 until we have at least > one Clang version that can still build and run ASan properly, otherwise we > break all the automated builds. If we keep the old version around, then the 3.3 > upgrade can be done right now, but if not, then I'd like to have trunk clang > available before we upgrade. We will keep the old compiler around, we just won't use it for anything other than ASan builds. Does that sound good?
Comment 55•11 years ago
|
||
Sounds good to me :) You should probably still backout the backported patch that doesn't work.
Flags: needinfo?(choller)
Assignee | ||
Comment 56•11 years ago
|
||
What about this plan: 1) rail preps a new patch using r183744 and dropping asan-posix-memalign-fix.patch 2) rail provides new binaries and manifests 3) ??? tests the binaries 4) someone pushes the patch from 1) and manifests to mozilla-inbound
Flags: needinfo?(ehsan)
Reporter | ||
Comment 57•11 years ago
|
||
Sounds like a plan! I believe that Nathan will be the person testing the binaries.
Flags: needinfo?(ehsan)
Assignee | ||
Comment 58•11 years ago
|
||
This is attachment 747601 [details] [diff] [review] with a fixed version.
Attachment #760961 -
Attachment is obsolete: true
Attachment #770635 -
Flags: review+
Comment 59•11 years ago
|
||
Comment on attachment 770635 [details] [diff] [review] build-clang.py Could we get a version of the patch without gratuitous whitespace/formatting changes?
Assignee | ||
Comment 60•11 years ago
|
||
Nathan, can you verify that these binaries work fine?
Attachment #761206 -
Attachment is obsolete: true
Attachment #770734 -
Flags: feedback?(nfroyd)
Assignee | ||
Comment 61•11 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #59) > Comment on attachment 770635 [details] [diff] [review] > build-clang.py > > Could we get a version of the patch without gratuitous whitespace/formatting > changes? Basically it does the same as https://bugzilla.mozilla.org/attachment.cgi?id=760961 without asan-posix-memalign-fix.patch. The patch I attached is based on attachment 747601 [details] [diff] [review] reviewed by Ehsan with one liner change.
Comment 62•11 years ago
|
||
Try push with new manifests: https://tbpl.mozilla.org/?tree=Try&rev=2271ee27d652 Assuming everything looks ok with that, I will push the updated script and manifests.
Assignee | ||
Comment 63•11 years ago
|
||
I retriggered one of the orange tests and it passed.
Assignee | ||
Comment 64•11 years ago
|
||
Ping. What are the next steps/actions here?
Comment 65•11 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #64) > Ping. What are the next steps/actions here? My understanding is that committing the manifests that we have here will break the ASan builds we are currently running, so we're waiting for ASan builds to get their own separate manifests (that can just be copies of the current ones for now) before we can safely fix this bug. That is what bug 886842 is all about.
Comment 66•11 years ago
|
||
Comment on attachment 770734 [details] [diff] [review] clang manifests (r183744) Yup, these work.
Attachment #770734 -
Flags: feedback?(nfroyd) → feedback+
Assignee | ||
Comment 67•11 years ago
|
||
Should we block this bug on bug 886842 since it breaks ASan builds (AFAIK)?
Assignee | ||
Comment 68•11 years ago
|
||
Comment on attachment 770635 [details] [diff] [review] build-clang.py https://hg.mozilla.org/integration/mozilla-inbound/rev/bc6a4518a973
Attachment #770635 -
Flags: checked-in+
Reporter | ||
Comment 69•11 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #67) > Should we block this bug on bug 886842 since it breaks ASan builds (AFAIK)? It's already blocking on that.
Assignee | ||
Comment 70•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #69) > (In reply to Rail Aliiev [:rail] from comment #67) > > Should we block this bug on bug 886842 since it breaks ASan builds (AFAIK)? > > It's already blocking on that. Bah! I need moar coffee.
Comment 71•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/bc6a4518a973
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 72•11 years ago
|
||
forgot the magic words
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [leave open]
Assignee | ||
Updated•11 years ago
|
Attachment #770734 -
Flags: review?(ehsan)
Reporter | ||
Updated•11 years ago
|
Attachment #770734 -
Flags: review?(ehsan) → review+
Assignee | ||
Comment 73•11 years ago
|
||
Comment on attachment 770734 [details] [diff] [review] clang manifests (r183744) https://hg.mozilla.org/integration/mozilla-inbound/rev/e7ed186c565c
Assignee | ||
Updated•11 years ago
|
Whiteboard: [leave open]
Assignee | ||
Updated•11 years ago
|
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 74•11 years ago
|
||
This might have caused a 3% regression on dromaeo-dom...
Reporter | ||
Comment 76•11 years ago
|
||
(In reply to comment #74) > This might have caused a 3% regression on dromaeo-dom... :( I thought that Nathan had looked at those numbers. Nathan, can you please take a look?
Flags: needinfo?(nfroyd)
Comment 77•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #76) > (In reply to comment #74) > > This might have caused a 3% regression on dromaeo-dom... > > :( I thought that Nathan had looked at those numbers. Nathan, can you > please take a look? Comment 34 has some compare-talos bits; dromaeo-dom had dropped a bit there, but not quite 3%. The original Try revisions have been blown away, though, so no retriggering possible there, and I don't remember whether I retriggered and saw the numbers even out or not. We can back out the manifest changes for now, since ASan has separate manifests and we're not (AFAIK) relying on any 3.3 features. But it would take me a while to figure out why we dropped so much and fix clang to Not Do That.
Flags: needinfo?(nfroyd)
Reporter | ||
Comment 78•11 years ago
|
||
(In reply to comment #77) > (In reply to :Ehsan Akhgari (needinfo? me!) from comment #76) > > (In reply to comment #74) > > > This might have caused a 3% regression on dromaeo-dom... > > > > :( I thought that Nathan had looked at those numbers. Nathan, can you > > please take a look? > > Comment 34 has some compare-talos bits; dromaeo-dom had dropped a bit there, > but not quite 3%. The original Try revisions have been blown away, though, so > no retriggering possible there, and I don't remember whether I retriggered and > saw the numbers even out or not. > > We can back out the manifest changes for now, since ASan has separate manifests > and we're not (AFAIK) relying on any 3.3 features. But it would take me a > while to figure out why we dropped so much and fix clang to Not Do That. If you don't expect to get to fix this before the next uplift, let's back it out. :(
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•10 years ago
|
Component: Other → Platform Support
Updated•6 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•