Closed Bug 1270664 Opened 3 years ago Closed 3 years ago

Backout MSVC 2015 from aurora

Categories

(Firefox Build System :: General, defect)

defect
Not set

Tracking

(firefox48 fixed)

RESOLVED FIXED
Tracking Status
firefox48 --- fixed

People

(Reporter: glandium, Assigned: gps)

References

Details

Attachments

(4 files)

With luck, aurora doesn't need changes to build with 2013...
VS2015 appears to be adding SSE instructions into generated code even
though it is told not to. The details are in bug 1265615. This is
causing crashes on machines without SSE. Sadly this means we can't ship
VS2015 builds to users :/

Review commit: https://reviewboard.mozilla.org/r/50969/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/50969/
Attachment #8749464 - Flags: review?(mh+mozilla)
Comment on attachment 8749464 [details]
MozReview Request: Bug 1270664 - Downgrade to VS2013 due to unwanted SSE instruction insertion; r=glandium

https://reviewboard.mozilla.org/r/50969/#review47649
Attachment #8749464 - Flags: review?(mh+mozilla) → review+
This is a graft of 082bd84583c3.

Review commit: https://reviewboard.mozilla.org/r/50999/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/50999/
Attachment #8749464 - Attachment description: MozReview Request: Bug 1270664 - Downgrade to VS2013 due to unwanted SSE instruction insertion; r?glandium → MozReview Request: Bug 1270664 - Downgrade to VS2013 due to unwanted SSE instruction insertion; r=glandium
Attachment #8749502 - Flags: review?(dholbert)
Comment on attachment 8749464 [details]
MozReview Request: Bug 1270664 - Downgrade to VS2013 due to unwanted SSE instruction insertion; r=glandium

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/50969/diff/1-2/
Comment on attachment 8749502 [details]
MozReview Request: Bug 1270664 - Use MOZ_ARRAY_LENGTH to avoid MSVC 2013 compiler error; r=dholbert

I marked it r=dholbert because dholbert landed it as a fixup commit on inbound and the patch is trivial.
Attachment #8749502 - Flags: review?(dholbert) → review+
Attachment #8749502 - Flags: approval-mozilla-aurora?
Comment on attachment 8749464 [details]
MozReview Request: Bug 1270664 - Downgrade to VS2013 due to unwanted SSE instruction insertion; r=glandium

Approval Request Comment
[Feature/regressing bug #]: stability
[User impact if declined]: crashes on machines not supporting SSE due to a MSVC 2015 compiler bug
[Describe test coverage new/current, TreeHerder]: This reverts to the toolchain currently used on Firefox <48
[Risks and why]: Should be small. This is what we're shipping on Firefox 48
[String/UUID change made/needed]: None
Attachment #8749464 - Flags: approval-mozilla-aurora?
Try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=379df6d8d5e62b31d43b19cc3b6988da282e38b9 is looking promising so far.
Status: NEW → ASSIGNED
Blocks: 1270714
(In reply to Gregory Szorc [:gps] from comment #5)
> I marked it r=dholbert because dholbert landed it as a fixup commit on
> inbound and the patch is trivial.

LGTM.
Attachment #8749502 - Flags: approval-mozilla-aurora?
Comment on attachment 8749464 [details]
MozReview Request: Bug 1270664 - Downgrade to VS2013 due to unwanted SSE instruction insertion; r=glandium

Cancelling uplift request pending decision to drop support for ~0.01% of our users by requiring SSE.
Attachment #8749464 - Flags: approval-mozilla-aurora?
Blocks: 1271755
Comment on attachment 8749464 [details]
MozReview Request: Bug 1270664 - Downgrade to VS2013 due to unwanted SSE instruction insertion; r=glandium

Approval Request Comment
[User impact if declined]: Should be none
[Describe test coverage new/current, TreeHerder]: Automation will revert to what it was doing before
[Risks and why]: Should be low risk
[String/UUID change made/needed]: None

We decided to revert to VS2013 on Firefox 48 to buy us more time to handle requiring SSE to run Firefox.

This patch simply restores the old automation configs that are currently being used on <48. Should be no risk to end users.
Attachment #8749464 - Flags: approval-mozilla-aurora?
Comment on attachment 8749502 [details]
MozReview Request: Bug 1270664 - Use MOZ_ARRAY_LENGTH to avoid MSVC 2013 compiler error; r=dholbert

Approval Request Comment

Ride-along patch to unbust VS2013 on Aurora.
Attachment #8749502 - Flags: approval-mozilla-aurora?
Comment on attachment 8749464 [details]
MozReview Request: Bug 1270664 - Downgrade to VS2013 due to unwanted SSE instruction insertion; r=glandium

We want people to be able to use VS, let's uplift to aurora
Attachment #8749464 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8749502 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: [checkin-needed-aurora]
This should be fun: the permanent failures on your try push weren't coincidence, and with VS2103 Win8 fails some asm.js tests, https://treeherder.mozilla.org/logviewer.html#?job_id=2596188&repo=mozilla-aurora and all three versions of Windows fail a web-platform-test, https://treeherder.mozilla.org/logviewer.html#?job_id=2593304&repo=mozilla-aurora

Meaningless test failures that should just be wiped away by disabling, or a sign of an actual broken browser?

I closed mozilla-aurora while you figure out who will know which.
The WPT failure requires a revert of https://hg.mozilla.org/mozilla-central/rev/f231263334d4.

The jit-test\tests\asm.js\testMathLib.js failure is new AFAICT and will need to be triaged by a SM dev.

bbouvier: can you (or someone you delegate to) please look at that testMathLib.js failure at https://treeherder.mozilla.org/logviewer.html#?job_id=2596188&repo=mozilla-aurora ?
Flags: needinfo?(bbouvier)
The test appears to be exercising a compiler bug that no longer
reproduced in VS2015. Since we reverted to VS2013, we'll need
to re-add the workaround.
Comment on attachment 8752918 [details] [diff] [review]
Backed out changeset f231263334d4 (bug 1255656)

Approval Request Comment
[Feature/regressing bug #]: This bug

There is a permafail wpt test on Aurora is a result of the revert to VS2013. This patch backs out a changeset that was part of the VS2015 switch. It restores the code as it was before we switched to VS2015.
Attachment #8752918 - Flags: approval-mozilla-aurora?
Comment on attachment 8752918 [details] [diff] [review]
Backed out changeset f231263334d4 (bug 1255656)

Landed this a=testonly:
https://hg.mozilla.org/releases/mozilla-aurora/rev/796d757bceee
Attachment #8752918 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/releases/comm-aurora/rev/4c7d9cb42d6a33628a818e0eb3ed54a6f25dca06
Port Bug 1270664 - Downgrade to VS2013 due to unwanted SSE instruction insertion for c-a. r=glandium a=aleth
No idea what happens there, there doesn't seem to be a reason why this test fails. Namely this test ensures that sin(-0) called from within asm.js is equal to sin(-0) === -0 called from within JS.

Potential culprits:
- ShadowStackSize not taken into account? (this is an win x64 only issue, and it is a usual suspect)
- PGO? :(

Any other idea or obvious error you can find in the aurora sources, luke? Otherwise I'll ask for a loaner, SSH into it and try to reproduce the issue.
Flags: needinfo?(bbouvier) → needinfo?(luke)
Didn't mean to clear the needinfo from myself.
Flags: needinfo?(bbouvier)
The failure in testMathLib.js is for a sin call and the impl of sin has some special logic for handling -0 on _WIN64:
  http://hg.mozilla.org/releases/mozilla-aurora/file/tip/js/src/jsmath.cpp#l808
Seems like _WIN64 isn't defined in this file?  Or maybe IsNegativeZero() isn't working?  Seems unrelated to asm.js per se, since it simply calls js::math_sin_uncached.
Flags: needinfo?(luke)
(In reply to Luke Wagner [:luke] from comment #22)
> The failure in testMathLib.js is for a sin call and the impl of sin has some
> special logic for handling -0 on _WIN64:
>  
> http://hg.mozilla.org/releases/mozilla-aurora/file/tip/js/src/jsmath.cpp#l808

This workaround should be in js::math_sin_uncached rather than in js::math_sin_impl. Otherwise the asm.js code path will not pass this workaround, no?  I have no idea why the changeset <https://hg.mozilla.org/mozilla-central/rev/518aafb5a98a> "fixed" the VS2013 Win64 jit-test failure.
Whiteboard: [checkin-needed-aurora]
(In reply to Masatoshi Kimura [:emk] from comment #23)
> This workaround should be in js::math_sin_uncached rather than in
> js::math_sin_impl. Otherwise the asm.js code path will not pass this
> workaround, no?  I have no idea why the changeset
> <https://hg.mozilla.org/mozilla-central/rev/518aafb5a98a> "fixed" the VS2013
> Win64 jit-test failure.

Eh, when I landed that, the MSVC workaround was in math_sin_uncached so it was correct. Then the following patch moved that code to math_sin_impl and it broke:

https://hg.mozilla.org/releases/mozilla-aurora/rev/f1f3e847cdb3

And yeah, this landed in 48.
Attached patch aurora.patchSplinter Review
r=arai from irc.

Try build running: https://treeherder.mozilla.org/#/jobs?repo=try&revision=efd84e597177
Flags: needinfo?(bbouvier)
Attachment #8753783 - Flags: review+
Plz look at: https://groups.google.com/forum/#!topic/mozilla.dev.platform/SCYbdGwxhrI



[...] Processor is that what I'm was looking for...

So I had a look at...
https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=SkPath%3A%3AisRectContour

...then "View ALL products and versions for this signature."...
https://crash-stats.mozilla.com/report/list?date=2016-05-19&range_unit=days&range_value=28&signature=SkPath%3A%3AisRectContour

...and then had on this huge amount of two reports a look... ;-)
https://crash-stats.mozilla.com/report/list?date=2016-05-19&range_unit=days&range_value=28&signature=SkPath%3A%3AisRectContour#tab-reports

I see:
Build Architecture Info = AuthenticAMD family 6 model 4 stepping 2 | 1
Build Architecture Info = GenuineIntel family 6 model 15 stepping 13 | 2

...what IMHO means that the user have just deactivated SSE2-Support in BIOS...



Question is:
If Mozilla will really "Backout MSVC 2015 from aurora" because 2 people are not able to configure there PCs right in BIOS ???
The action of reverting to MSVC 2013 on aurora-48 appears to have caused the regression of a top crasher bug 1158189. It returned on 48 after having been 'cured' on nightly-48 when MSVC 2015 landed. nightly-49 is still using MSVC-2015 and the crash is not present there.

restoring aurora to msvc-2015 would likely fix 1158189 on that channel.

Many details are here: https://groups.google.com/forum/#!topic/mozilla.dev.platform/A9_zQFgXhn8

Its tough to decide what to do. IMO crash rate is our top priority and this is a significant crash, so its worth the pain of making the SSE change on 48 rather than 49 - but its a close call. NeedInfo of all my release-management friends and Nathan too, as he had some interesting infra comments to make on a related dev-platform thread.
Flags: needinfo?(rkothari)
Flags: needinfo?(nfroyd)
Flags: needinfo?(lmandel)
Flags: needinfo?(lhenry)
Flags: needinfo?(dougt)
(In reply to Patrick McManus [:mcmanus] from comment #28)
> The action of reverting to MSVC 2013 on aurora-48 appears to have caused the
> regression of a top crasher bug 1158189. It returned on 48 after having been
> 'cured' on nightly-48 when MSVC 2015 landed. nightly-49 is still using
> MSVC-2015 and the crash is not present there.

That is just wacky that a toolchain change fixes an issue like that.  Maybe there were some atomicness changes in there somewhere.

> Its tough to decide what to do. IMO crash rate is our top priority and this
> is a significant crash, so its worth the pain of making the SSE change on 48
> rather than 49 - but its a close call. NeedInfo of all my release-management
> friends and Nathan too, as he had some interesting infra comments to make on
> a related dev-platform thread.

Over in bug 1186064, we're discussing dropping support for MSVC 2013 entirely; we technically still support building with MSVC 2013, even though we're not running builds on infrastructure with MSVC 2013.  I said I wanted to be pretty sure we had all the infrastructure support in places before we did that, so we didn't have to scramble to back things out.

bug 1186064 comment 22 says we have the app update (bug 1271761) and installer bits (bug 1271759) to ensure we don't install SSE-requiring Firefox on non-SSE machines.  IIUC, we're going to uplift the app update bits to 48 (so 48, as a non-SSE-requiring release, can get properly notified of whether it can upgrade to an SSE-requiring 49), but the installer changes are staying on 49 (so the installer can refuse to install on non-SSE machines for an SSE-requiring 49).  We could bring those forward to 47 and 48 (I think?), but we'd probably want to length the 47 cycle (a couple weeks?  a whole 'nother 6 week cycle?) to ensure everything's going to work properly before releasing 47.

Uplifting all that so late in the cycle makes me nervous, but fixing a topcrash and letting things stabilize a bit more might be worth it?
Flags: needinfo?(nfroyd)
ni Sylvestre as he's on point for 48 for relman. Also clearing the other people in relman as I think Sylvestre can take this and they've already been notified.

To try and summarize, we're talking about trading off an inconsistently reported, but relatively high in some cases, crash with the risk of introducing new issues with the changes required to change our requirements wrt SSE2 one cycle earlier. In order to take this change it has been suggested that we push the release cycle out a couple of weeks. This is possible but has other ramifications around release timing and the all hands and the fall cycle as well as our user base (including ESR). So, not an easy decision. Stability is a top focus right now but fixing a crash that has already been released can probably wait an additional cycle if need be. I'm going to leave the decision to Sylvestre as he's responsible for this release. Sylvestre, happy to talk through this with Patrick and you if need be.
Flags: needinfo?(sledru)
Flags: needinfo?(rkothari)
Flags: needinfo?(lmandel)
Flags: needinfo?(lhenry)
Flags: needinfo?(dougt)
its definitely true the crash/hange isn't new at all. It likely has been there for years and never detected until the shutdown hang detector was added. Nonetheless, its relatively severe.
fwiw I'm actually OK with waiting for 49 - but I want to make sure we make that tradeoff consciously.
To me, having a way to inform our users without SSE2 support is the priority. We cannot update these people and have Firefox crashes on startup without any warning or workaround (ESR for example).

Changing our plan to require from 49 to 48 is adding too much risks to the release (we are touching the installer and the update process).
Therefor, even if I understand we are fixing a top crash (but """only""" a shutdown hang), I would like to delay it to 49.
Flags: needinfo?(sledru)
I believe my work here is done. Unassigning myself.

Actually, I'm not sure why this bug is still open: we've done what the bug summary requested. We have other bugs tracking all the other work around requiring SSE2.
Assignee: gps → nobody
Status: ASSIGNED → NEW
Assignee: nobody → gps
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.