Last Comment Bug 720070 - Crash when producing IEEE754 S-NaN values using Typed Arrays
: Crash when producing IEEE754 S-NaN values using Typed Arrays
Status: VERIFIED FIXED
[sg:critical][qa!] likely regressed b...
: crash, regression, testcase
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- critical (vote)
: mozilla12
Assigned To: Jeff Walden [:Waldo] (remove +bmo to email)
:
Mentors:
Depends on: 720094
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-20 20:15 PST by Joshua Bell
Modified: 2016-02-19 19:25 PST (History)
12 users (show)
choller: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
wontfix
+
verified
+
verified
+
verified
10+
verified
unaffected


Attachments
nancrash.html (647 bytes, text/html)
2012-01-20 20:15 PST, Joshua Bell
no flags Details
Branch patch originally from bug 720094, applies to aurora/beta both (701 bytes, patch)
2012-01-23 17:49 PST, Jeff Walden [:Waldo] (remove +bmo to email)
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
Details | Diff | Review

Description Joshua Bell 2012-01-20 20:15:05 PST
Created attachment 590421 [details]
nancrash.html

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.52.7 (KHTML, like Gecko) Version/5.1.2 Safari/534.52.7

Steps to reproduce:

If you produce an IEEE754 "S-NaN" value using Float64 or Float32 arrays and the appropriate bit pattern, it behaves partly like a JavaScript NaN but has strange side effects. 

var bytes = [0xff, 0xf7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff]; // big-endian
bytes.reverse(); // if testing on little-endian
var result = new Float64Array(new Uint8Array(bytes).buffer)[0];

String(result) ==> "NaN"
typeof result ==> "number"
isNaN(result) ==> false

Repros reliably on MacOS 10.6, and in Firefox 9 (stable), 10 (beta) and 11 (aurora).


Actual results:

Performing arithmetic with the value will cause a crash after a short delay. An immediate crash occurs when doing isNaN(value + 0)


Expected results:

All varieties of IEEE754 NaN (S, Q, +, -) should become proper JS NaN values
Comment 1 Thomas Ahlblom 2012-01-20 22:45:12 PST
Thanks for reporting!

Reproduced:
Mozilla/5.0 (X11; Linux x86_64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1
Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0
Mozilla/5.0 (X11; Linux x86_64; rv:11.0a2) Gecko/20120120 Firefox/11.0a2
Mozilla/5.0 (X11; Linux x86_64; rv:12.0a1) Gecko/20120120 Firefox/12.0a1

bp-77d1d1eb-3c0e-4583-883f-4a1432120121
bp-6ffda805-0070-4556-90e4-5b1262120121
bp-d5b951b5-929f-4b36-9bce-528be2120121
bp-795ac814-4230-4f11-8833-2f82f2120121
Comment 2 Thomas Ahlblom 2012-01-20 23:02:12 PST
Last good nightly: 2011-09-23
First bad nightly: 2011-09-24

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=259d1556c221&tochange=c71229984353
Comment 3 Jeff Walden [:Waldo] (remove +bmo to email) 2012-01-20 23:18:59 PST
Out of an abundance of caution I'm going to mark this private.  The usual suspects just CC'd can correct me on this if I'm wrong.

This code should be converting NaNs of all flavors in typed arrays into the canonical NaN (there's a similar version for float):

template<>
void
TypedArrayTemplate<double>::copyIndexToValue(JSContext *cx, JSObject *tarray, uint32_t index, Value *vp)
{
    double val = getIndex(tarray, index);

    /*
     * Doubles in typed arrays could be typed-punned arrays of integers. This
     * could allow user code to break the engine-wide invariant that only
     * canonical nans are stored into jsvals, which means user code could
     * confuse the engine into interpreting a double-typed jsval as an
     * object-typed jsval.
     */
    if (JS_UNLIKELY(JSDOUBLE_IS_NaN(val)))
        val = js_NaN;

    vp->setDouble(val);
}

Yet it's not; JSDOUBLE_IS_NaN says the value's not a NaN.  It looks like the value of JSDOUBLE_HI32_NAN is wrong -- it's 0x7ff80000 when it should be 0x7ff00000.  IEEE-754 says NaN values have unconstrained sign bit (bit 63), all-ones exponent bits (bits 62-52), and unconstrained significand bits (bits 51-0).  0x7ff00000 matches those all-ones bits where 0x7ff80000 doesn't.   That this value could actually be wrong seems absurd, of course.  But it looks like it is to me.

I guess I'll take this, at this point.

The offending changeset seems likely to be <http://hg.mozilla.org/mozilla-central/rev/41e4d29fb76d>, and bug 653056.  I'm not marking a dependency for the moment in case someone malicious happens to notice the dependency on a security-sensitive bug and reverse-engineers the problem.
Comment 4 Jeff Walden [:Waldo] (remove +bmo to email) 2012-01-21 00:26:08 PST
Comment 3's analysis is slightly wrong because I was looking at the wrong part of IEEE-754 and would have treated ±Infinity as a NaN value, but it's otherwise on target.

The MIPS people were the most recent people to touch JSDOUBLE_IS_NaN, and their touching introduced a corrected algorithm for MIPS alone.  I would dearly love to know if they found the method didn't work for them and fixed it without realizing the mistake affected all platforms, or if the change was for a MIPS compiler bug of some sort, or something.

On the plus side, having the extra MIPS implementation provided a convenient excuse for a shell bug.  I've posted the fix for this bug in the public bug 720094, so let's track work on this there.  (And, again, not marking a dependency in case of malicious onlookers.)
Comment 5 Mike Hommey [:glandium] 2012-01-21 00:33:50 PST
Note the MIPS implementation is what we had before for all platforms, but that is miscompiled by MSVC. See bug 653056.
Comment 6 Mike Hommey [:glandium] 2012-01-21 00:39:50 PST
And now I realize you pointed to that bug already.
Comment 7 Jeff Walden [:Waldo] (remove +bmo to email) 2012-01-21 00:44:26 PST
Hmm, sigh.  Nothing's ever easy.  I'll try some other equivalent variants and see if I can finagle Windows into accepting something.  Do you remember which version of MSVC was falling over, and do you have a testcase I could use to check, by any chance?
Comment 8 Mike Hommey [:glandium] 2012-01-21 00:52:52 PST
See bug 640494 comment 35 onwards, there are some information on implementation that *don't* work. ISTR it's triggered by PGO.
Comment 9 Mike Hommey [:glandium] 2012-01-21 00:57:19 PST
(In reply to Mike Hommey [:glandium] from comment #5)
> Note the MIPS implementation is what we had before for all platforms, but
> that is miscompiled by MSVC. See bug 653056.

And I misremembered, we were using isnan, before, but I landed the same as the MIPS implementation for all platforms except windows. Then I changed it again to have something similar on all platforms, which is the current version that is wrong :(.
Comment 10 Joshua Bell 2012-01-21 12:11:18 PST
By the way, I found this running unit tests of a Typed Array "polyfill". The tests include all the funky IEEE754 floating point cases for 32 and 64-bit floats. Please feel free to appropriate any for your test suite.

http://calormen.com/polyfill/typedarray_tests.js

Look for the "IEEE754 single precision parsing" and "IEEE754 double precision parsing" tests.
Comment 11 Jeff Walden [:Waldo] (remove +bmo to email) 2012-01-23 17:38:19 PST
Thanks for that.  Until this is fixed in released builds, I think we're probably going to hold off on too much testing of this in public test suites.

I did take that script, tho, and add the necessary bits to make it run against a fixed build (and skip the tests that require DataView, which is bug 575688, being worked on now).  It passes now, after the patch included in bug 720094.  A fix will be in the next nightly, or possibly the one after that, depending.
Comment 12 Jeff Walden [:Waldo] (remove +bmo to email) 2012-01-23 17:49:46 PST
Created attachment 590954 [details] [diff] [review]
Branch patch originally from bug 720094, applies to aurora/beta both

[Approval Request Comment]
Regression caused by: bug 653056

User impact if declined: Leaving this open should enable arbitrary code execution, with enough effort.

Testing completed (on m-c, etc.): The original testcase no longer crashes opt builds or asserts in debug builds.  I also ran a lightly-munged version of the tests noted in comment 10, and those pass for me.

I'm happy to let this bake a bit, just feel like I should post this sooner rather than later in case there's a chance of getting it into the forthcoming Firefox release.

Risk to taking this patch (and alternatives if risky): NaN-testing is pretty important to the integrity of our JS value representation, so getting this right is important.  There's considerable risk involved in getting it wrong -- which we now face.  Fortunately, the code to evaluate to get this right is extremely minimal, and a comparison of IEEE-754 to the version in this patch should simply verify its correctness.  So while it touches high-risk code, the patch itself should be fairly easy to verify, and the risk shouldn't be more than is involved in touching any code like this.

As usual, I have no idea which trees are really open to fixes right now, this close to another release, so I'm requesting aurora/beta and will let branch triage sort things out.
Comment 13 Alex Keybl [:akeybl] 2012-01-24 16:23:59 PST
Comment on attachment 590954 [details] [diff] [review]
Branch patch originally from bug 720094, applies to aurora/beta both

[Triage Comment]
Curtis, Jeff, and I met to discuss this bug. Jeff was uncomfortable leaving this code execution sg:crit bug in the build for any more than 6 weeks (so we agreed to aurora+), and felt that it would be very easy for a deviant to understand what the patch was attempting to fix. The issue is externally known and found through normal test cases, and we agreed that the bug is sufficiently low-risk. This code is also called significantly in normal browser usage, so we decided to see how it fares on Aurora/m-c and then land between FF10 Beta 6 and our RC build if all goes well.
Comment 14 Jeff Walden [:Waldo] (remove +bmo to email) 2012-01-25 16:32:14 PST
Landed in aurora:

https://hg.mozilla.org/releases/mozilla-aurora/rev/e7312c8640ff

The change still links to the old bug.  I hand-waved a bit about reducing developer complexity considering different implementations on trunk and branches to justify discussing landing, and then landing, it in aurora.  It probably wouldn't throw anyone looking hard off the scent, but since we're planning to take this in beta shortly anyway, it's probably not *too* important, and it might be enough of a headfake anyway.

And I guess since the change landed in m-c, this should be marked fixed now.
Comment 15 Alex Keybl [:akeybl] 2012-01-26 15:38:15 PST
Comment on attachment 590954 [details] [diff] [review]
Branch patch originally from bug 720094, applies to aurora/beta both

[Triage Comment]
Approved for Beta 10 since we have not seen any fallout from m-c/Aurora.
Comment 16 Jeff Walden [:Waldo] (remove +bmo to email) 2012-01-26 15:50:13 PST
https://hg.mozilla.org/releases/mozilla-beta/rev/217edd58f5e5
Comment 17 Jason Smith [:jsmith] 2012-03-07 18:27:19 PST
Verified fixed on FF 10.0.2.
Comment 18 Jason Smith [:jsmith] 2012-03-07 18:32:15 PST
Verified on FF 11 Beta 6.
Comment 19 Jason Smith [:jsmith] 2012-03-07 18:35:56 PST
Verified on FF 12 Aurora.
Comment 20 Jason Smith [:jsmith] 2012-03-07 18:38:40 PST
Verified on Nightly.
Comment 21 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-21 12:33:24 PDT
Verified fixed in Firefox 10.0.6esrpre 2012-06-21.
Comment 22 Ed Morley [:emorley] 2013-03-20 04:49:35 PDT
https://hg.mozilla.org/mozilla-central/rev/893a57ee94bf

Note You need to log in before you can comment on or make changes to this bug.