Closed
Bug 977538
Opened 11 years ago
Closed 11 years ago
MSVC with PGO still miscompiles/nops CanonicalizeNaN
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
VERIFIED
FIXED
mozilla30
Tracking | Status | |
---|---|---|
firefox28 | + | fixed |
firefox29 | + | fixed |
firefox30 | + | verified |
firefox-esr24 | 28+ | fixed |
b2g18 | --- | unaffected |
b2g-v1.1hd | --- | unaffected |
b2g-v1.2 | --- | unaffected |
b2g-v1.3 | --- | unaffected |
b2g-v1.4 | --- | unaffected |
People
(Reporter: jandem, Assigned: jandem)
References
Details
(Keywords: sec-critical, Whiteboard: [adv-main28+][adv-esr24.4+])
Attachments
(2 files)
355 bytes,
text/html
|
Details | |
866 bytes,
patch
|
luke
:
review+
abillings
:
approval-mozilla-aurora+
abillings
:
approval-mozilla-beta+
abillings
:
approval-mozilla-esr24+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
Remember bug 859892, MSVC miscompiling the CanonicalizeNaN call in DataView.getFloat32? I fixed that bug but MSVC is still miscompiling CanonicalizeNaN.
Bug 939562 enables the JITs for more chrome code and this Win32 PGO bug is causing Jetpack crashes, but the reduced testcase also crashes a normal Nightly.
MSVC turns the CanonicalizeNaN call for DataView.getFloat64 into a no-op, so JS code can create arbitrary Values and this is sec-critical.
For CanonicalizeNaN, MSVC with PGO generates the following code, annotated:
// Prologue.
mozjs!JS::CanonicalizeNaN:
680ef5f0 55 push ebp
680ef5f1 8bec mov ebp,esp
680ef5f3 83ec0c sub esp,0Ch
// Move the double argument to ebp-8. Also save esi and set it to 0.
680ef5f6 dd4508 fld qword ptr [ebp+8]
680ef5f9 56 push esi
680ef5fa 33f6 xor esi,esi
680ef5fc dd5df8 fstp qword ptr [ebp-8]
// Compare esi and if it's non-zero, we're done.
// We just zero'ed esi so this branch is never taken.
680ef5ff 3bf6 cmp esi,esi
680ef601 7513 jne mozjs!JS::CanonicalizeNaN+0x26 (680ef616)
// mozilla::IsNaN does (bits & DoubleExponentBits) == DoubleExponentBits,
// so this looks reasonable. If this test fails, we're done.
680ef603 8b45fc mov eax,dword ptr [ebp-4]
680ef606 250000f07f and eax,7FF00000h
680ef60b 3d0000f07f cmp eax,7FF00000h
680ef610 0f849b911900 je mozjs!JS::CanonicalizeNaN+0x1991c1 (682887b1)
// Done, return the double and restore esi.
680ef616 dd4508 fld qword ptr [ebp+8]
680ef619 5e pop esi
680ef61a c9 leave
680ef61b c3 ret
The branch that's always taken is a bit weird for an opt build, but so far so good. Here's what happens when we have a NaN value and jump to 682887b1:
// Load the high word in edx, low word in eax.
682887b1 8b55fc mov edx,dword ptr [ebp-4]
682887b4 8b45f8 mov eax,dword ptr [ebp-8]
// mozilla::IsNaN does: (bits & DoubleSignificandBits) != 0
// DoubleSignificandBits == 0x000fffff ffffffff, so the and instruction below
// makes some sense.
682887b7 81e2ffff0f00 and edx,0FFFFFh
// The code below is totally bogus, we "or" both words, but whatever
// happens we jump to "Done." and return the original input.
682887bd 0bc2 or eax,edx
682887bf 0f84516ee6ff je mozjs!JS::CanonicalizeNaN+0x26 (680ef616)
682887c5 e94c6ee6ff jmp mozjs!JS::CanonicalizeNaN+0x26 (680ef616)
Assignee | ||
Updated•11 years ago
|
Summary: MSVC PGO builds still miscompiles/nops CanonicalizeNaN → MSVC with PGO still miscompiles/nops CanonicalizeNaN
Assignee | ||
Updated•11 years ago
|
Keywords: sec-critical
Assignee | ||
Comment 1•11 years ago
|
||
My current plan of attack is to disable PGO for JS::CanonicalizeNaN and see if that helps.
But we should also consider disabling PGO completely for (big parts of) JS. The perf win from PGO should be a lot less than in the interpreter days, and even if we lose a few % on the benchmarks we can make up for that elsewhere. Somebody should measure.
Comment 2•11 years ago
|
||
As of two years ago PGO on Windows was still good for a 10% improvement on Sunspider:
https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/HzAIVijRXUE That was from bug 641325.
Assignee | ||
Comment 3•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #2)
> As of two years ago PGO on Windows was still good for a 10% improvement on
> Sunspider:
With our new JITs we should spend less time in the interpreter/VM though, so I expect this to be less nowadays. I'll get some numbers.
How can I get a Try build with --disable-profiling --disable-js-diagnostics? I assume --enable-profiling affects both PGO and non-PGO builds, but ideally we'd compare without it.
Assignee | ||
Comment 4•11 years ago
|
||
I downloaded PGO and non-PGO inbound builds, created a new profile and ran some benchmarks.
On Sunspider, PGO still helps about 10%. Sunspider is kind of a best case for PGO though because it's short running so we spend more time in the interpreter/VM than other benchmarks. Most of this is on a few different tests; it would be interesting to see where PGO is helping us and if we can add JIT/C++ optimizations to get there without PGO.
On Kraken, PGO helps about 3-4%. Kraken spends more time in JIT code. Octane is a bit more noisy, but it looks like PGO is a ~5% win.
So PGO is still a measurable perf win. Question is if we really need PGO for all of JS or just a small number of files (Interpreter.cpp, jsobj.cpp, etc).
It's really unfortunate that our shell fuzzers are not testing the code we run in the browser. Is this something we can easily fix?
Assignee | ||
Comment 5•11 years ago
|
||
While I was stepping through the code, I noticed that MSVC with PGO was not inlining many trival functions like Value::toObject(), CallArgs::rval() etc. Also note that CanonicalizeNaN in comment 0 is not inlined.
With a non-PGO build, all these methods *are* inlined. So I created a silly micro-benchmark to see which one is faster:
function f() {
var buffer = new Uint8Array(8);
var view = new DataView(buffer.buffer);
var t = new Date;
for (var i=0; i<10000000; i++)
view.getFloat64(0);
alert(new Date - t);
}
And indeed, PGO builds are much slower (663 ms with PGO, 369 ms without PGO).
This suggests that PGO builds don't just optimize hot code, they also deoptimize cold code. If this is true, disabling PGO for code not exercised in our profile run could actually be a win...
Comment 6•11 years ago
|
||
In general, PGO not inlining cold code is one of its most important features: hot code is optimized for speed and cold code is optimized for size because overall that produces the fastest result (because of cache miss rates etc).
And I don't think this s-s bug is the right place to discuss our overall PGO strategy. Let's fix the bug at hand by removing this particular function from PGO in the simplest way possible (or figuring out why it's miscompiling and working around it, though that seems harder).
Assignee | ||
Comment 7•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #6)
> In general, PGO not inlining cold code is one of its most important
> features: hot code is optimized for speed and cold code is optimized for
> size because overall that produces the fastest result (because of cache miss
> rates etc).
I get that, but if you decide what's hot based on an outdated benchmark like Sunspider and the compiler deoptimizes the other 80% of the code you'll lose on (real-world) workloads.
> And I don't think this s-s bug is the right place to discuss our overall PGO
> strategy.
Agreed. Sorry, I'll take this elsewhere.
Assignee | ||
Comment 8•11 years ago
|
||
Disable PGO for CanonicalizeNaN. Jetpack tests are green now (on top of bug 939562):
https://tbpl.mozilla.org/?tree=Try&rev=42153df0d9d1
Attachment #8383165 -
Flags: review?(luke)
Assignee | ||
Comment 9•11 years ago
|
||
I tested Firefox 27 and 29 and they don't crash, so this seems to only affect Nightly. That makes it a lot less scary. I'll backport the patch because it's trivial and in case other callers have the same problem.
status-firefox30:
--- → affected
tracking-firefox30:
--- → ?
Comment 10•11 years ago
|
||
Comment on attachment 8383165 [details] [diff] [review]
Patch
Nice job tracking this down Jan!
Attachment #8383165 -
Flags: review?(luke) → review+
Assignee | ||
Comment 11•11 years ago
|
||
Comment on attachment 8383165 [details] [diff] [review]
Patch
AFAIK this only affects m-c. Asking for sec-approval though because I don't know when this was introduced and it *may* affect older branches somehow, so I'd like to backport the patch.
[Security approval request comment]
> How easily could an exploit be constructed based on the patch?
Not very easy. There are multiple callers of this function and not all of them are affected.
> Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
No.
> Which older supported branches are affected by this flaw?
It only affects Nightly. However, this may cause similar problems in older versions so I'd like to backport it to be safe.
> If not all supported branches, which bug introduced the flaw?
Unknown.
> Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
Should apply.
> How likely is this patch to cause regressions; how much testing does it need?
Unlikely.
Attachment #8383165 -
Flags: sec-approval?
Comment 12•11 years ago
|
||
FWIW, 27.0.1 has two copies of this function, one of which (called from js::ctypes::ConvertToJS, and possibly elsewhere) has exactly the same disassembly as comment 0. The other copy (called from js::DataViewObject::getFloat64Impl, and possibly elsewhere) looks OK at first glance.
Comment 13•11 years ago
|
||
Comment on attachment 8383165 [details] [diff] [review]
Patch
sec-approval+ for trunk.
We'll need discussion with Release Management about taking it on Beta but if you make an Aurora patch, I can approve that as well.
Attachment #8383165 -
Flags: sec-approval? → sec-approval+
Updated•11 years ago
|
Assignee | ||
Comment 14•11 years ago
|
||
Assignee | ||
Comment 15•11 years ago
|
||
Comment on attachment 8383165 [details] [diff] [review]
Patch
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Unknown.
User impact if declined: Possible crashes or security issues.
Testing completed (on m-c, etc.): On m-i.
Risk to taking this patch (and alternatives if risky): Low.
String or IDL/UUID changes made by this patch: None.
Attachment #8383165 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 16•11 years ago
|
||
Comment on attachment 8383165 [details] [diff] [review]
Patch
Patch also applies to beta.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Unknown.
User impact if declined: Possible crashes or security issues.
Testing completed (on m-c, etc.): On m-i.
Risk to taking this patch (and alternatives if risky): Low.
String or IDL/UUID changes made by this patch: None.
[Approval Request Comment]
User impact if declined: Possible crashes and/or security issues.
Fix Landed on Version: m-c, but will be backported.
Risk to taking this patch (and alternatives if risky): Low.
String or UUID changes made by this patch: None.
Attachment #8383165 -
Flags: approval-mozilla-esr24?
Attachment #8383165 -
Flags: approval-mozilla-beta?
Updated•11 years ago
|
Attachment #8383165 -
Flags: approval-mozilla-esr24?
Attachment #8383165 -
Flags: approval-mozilla-esr24+
Attachment #8383165 -
Flags: approval-mozilla-beta?
Attachment #8383165 -
Flags: approval-mozilla-beta+
Attachment #8383165 -
Flags: approval-mozilla-aurora?
Attachment #8383165 -
Flags: approval-mozilla-aurora+
Updated•11 years ago
|
status-firefox28:
--- → affected
status-firefox29:
--- → affected
status-firefox-esr24:
--- → affected
tracking-firefox28:
--- → +
tracking-firefox29:
--- → +
tracking-firefox-esr24:
--- → +
Comment 17•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #6)
> Let's fix the bug at hand by removing this particular function
> from PGO in the simplest way possible (or figuring out why it's miscompiling
> and working around it, though that seems harder).
Isn't figuring out why it's being miscompiled a requirement to prevent this from happening again with other functions? If MSVC's PGO can cause such critical problems, maybe this is not the only case.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Comment 19•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/1150740733cb
https://hg.mozilla.org/releases/mozilla-beta/rev/7bf309f95730
https://hg.mozilla.org/releases/mozilla-esr24/rev/cd943018fd5b
status-b2g18:
--- → unaffected
status-b2g-v1.1hd:
--- → unaffected
status-b2g-v1.2:
--- → unaffected
status-b2g-v1.3:
--- → unaffected
status-b2g-v1.4:
--- → unaffected
Comment 20•11 years ago
|
||
Updated•11 years ago
|
Whiteboard: [adv-main28+][adv-esr24.4+]
Comment 22•11 years ago
|
||
Confirmed crash in Fx30, 2014-02-14.
Verified fix in Fx30, 2014-03-12.
I never saw a crash in other branches, and based on comment 9, it appears to only to have been backported for good measure. So, no QA verification on 24esr/28/29 will be done.
Status: RESOLVED → VERIFIED
Updated•11 years ago
|
Updated•11 years ago
|
Group: core-security
Comment 23•7 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/2edc56eddf55
Land the attached testcase as a crashtest. r=me
Updated•7 years ago
|
Flags: in-testsuite? → in-testsuite+
Comment 24•7 years ago
|
||
bugherder |
You need to log in
before you can comment on or make changes to this bug.
Description
•