Closed Bug 720070 Opened 12 years ago Closed 12 years ago

Crash when producing IEEE754 S-NaN values using Typed Arrays

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla12
Tracking Status
firefox9 - wontfix
firefox10 + verified
firefox11 + verified
firefox12 + verified
firefox-esr10 10+ verified
status1.9.2 --- unaffected

People

(Reporter: inexorabletash, Assigned: Waldo)

References

Details

(Keywords: crash, regression, testcase, Whiteboard: [sg:critical][qa!] likely regressed by 653056)

Crash Data

Attachments

(2 files)

Attached file nancrash.html
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.52.7 (KHTML, like Gecko) Version/5.1.2 Safari/534.52.7

Steps to reproduce:

If you produce an IEEE754 "S-NaN" value using Float64 or Float32 arrays and the appropriate bit pattern, it behaves partly like a JavaScript NaN but has strange side effects. 

var bytes = [0xff, 0xf7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff]; // big-endian
bytes.reverse(); // if testing on little-endian
var result = new Float64Array(new Uint8Array(bytes).buffer)[0];

String(result) ==> "NaN"
typeof result ==> "number"
isNaN(result) ==> false

Repros reliably on MacOS 10.6, and in Firefox 9 (stable), 10 (beta) and 11 (aurora).


Actual results:

Performing arithmetic with the value will cause a crash after a short delay. An immediate crash occurs when doing isNaN(value + 0)


Expected results:

All varieties of IEEE754 NaN (S, Q, +, -) should become proper JS NaN values
Thanks for reporting!

Reproduced:
Mozilla/5.0 (X11; Linux x86_64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1
Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0
Mozilla/5.0 (X11; Linux x86_64; rv:11.0a2) Gecko/20120120 Firefox/11.0a2
Mozilla/5.0 (X11; Linux x86_64; rv:12.0a1) Gecko/20120120 Firefox/12.0a1

bp-77d1d1eb-3c0e-4583-883f-4a1432120121
bp-6ffda805-0070-4556-90e4-5b1262120121
bp-d5b951b5-929f-4b36-9bce-528be2120121
bp-795ac814-4230-4f11-8833-2f82f2120121
Severity: normal → critical
Crash Signature: [@ js::ToNumberSlow ] [@ JSObject::defaultValue ]
Keywords: crash
OS: Mac OS X → All
Hardware: x86 → All
Version: 11 Branch → Trunk
Assignee: nobody → general
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
QA Contact: untriaged → general
Last good nightly: 2011-09-23
First bad nightly: 2011-09-24

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=259d1556c221&tochange=c71229984353
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
Out of an abundance of caution I'm going to mark this private.  The usual suspects just CC'd can correct me on this if I'm wrong.

This code should be converting NaNs of all flavors in typed arrays into the canonical NaN (there's a similar version for float):

template<>
void
TypedArrayTemplate<double>::copyIndexToValue(JSContext *cx, JSObject *tarray, uint32_t index, Value *vp)
{
    double val = getIndex(tarray, index);

    /*
     * Doubles in typed arrays could be typed-punned arrays of integers. This
     * could allow user code to break the engine-wide invariant that only
     * canonical nans are stored into jsvals, which means user code could
     * confuse the engine into interpreting a double-typed jsval as an
     * object-typed jsval.
     */
    if (JS_UNLIKELY(JSDOUBLE_IS_NaN(val)))
        val = js_NaN;

    vp->setDouble(val);
}

Yet it's not; JSDOUBLE_IS_NaN says the value's not a NaN.  It looks like the value of JSDOUBLE_HI32_NAN is wrong -- it's 0x7ff80000 when it should be 0x7ff00000.  IEEE-754 says NaN values have unconstrained sign bit (bit 63), all-ones exponent bits (bits 62-52), and unconstrained significand bits (bits 51-0).  0x7ff00000 matches those all-ones bits where 0x7ff80000 doesn't.   That this value could actually be wrong seems absurd, of course.  But it looks like it is to me.

I guess I'll take this, at this point.

The offending changeset seems likely to be <http://hg.mozilla.org/mozilla-central/rev/41e4d29fb76d>, and bug 653056.  I'm not marking a dependency for the moment in case someone malicious happens to notice the dependency on a security-sensitive bug and reverse-engineers the problem.
Assignee: general → jwalden+bmo
Group: core-security
Status: NEW → ASSIGNED
Comment 3's analysis is slightly wrong because I was looking at the wrong part of IEEE-754 and would have treated ±Infinity as a NaN value, but it's otherwise on target.

The MIPS people were the most recent people to touch JSDOUBLE_IS_NaN, and their touching introduced a corrected algorithm for MIPS alone.  I would dearly love to know if they found the method didn't work for them and fixed it without realizing the mistake affected all platforms, or if the change was for a MIPS compiler bug of some sort, or something.

On the plus side, having the extra MIPS implementation provided a convenient excuse for a shell bug.  I've posted the fix for this bug in the public bug 720094, so let's track work on this there.  (And, again, not marking a dependency in case of malicious onlookers.)
Note the MIPS implementation is what we had before for all platforms, but that is miscompiled by MSVC. See bug 653056.
And now I realize you pointed to that bug already.
Hmm, sigh.  Nothing's ever easy.  I'll try some other equivalent variants and see if I can finagle Windows into accepting something.  Do you remember which version of MSVC was falling over, and do you have a testcase I could use to check, by any chance?
See bug 640494 comment 35 onwards, there are some information on implementation that *don't* work. ISTR it's triggered by PGO.
(In reply to Mike Hommey [:glandium] from comment #5)
> Note the MIPS implementation is what we had before for all platforms, but
> that is miscompiled by MSVC. See bug 653056.

And I misremembered, we were using isnan, before, but I landed the same as the MIPS implementation for all platforms except windows. Then I changed it again to have something similar on all platforms, which is the current version that is wrong :(.
By the way, I found this running unit tests of a Typed Array "polyfill". The tests include all the funky IEEE754 floating point cases for 32 and 64-bit floats. Please feel free to appropriate any for your test suite.

http://calormen.com/polyfill/typedarray_tests.js

Look for the "IEEE754 single precision parsing" and "IEEE754 double precision parsing" tests.
Thanks for that.  Until this is fixed in released builds, I think we're probably going to hold off on too much testing of this in public test suites.

I did take that script, tho, and add the necessary bits to make it run against a fixed build (and skip the tests that require DataView, which is bug 575688, being worked on now).  It passes now, after the patch included in bug 720094.  A fix will be in the next nightly, or possibly the one after that, depending.
Whiteboard: [sg:critical]
[Approval Request Comment]
Regression caused by: bug 653056

User impact if declined: Leaving this open should enable arbitrary code execution, with enough effort.

Testing completed (on m-c, etc.): The original testcase no longer crashes opt builds or asserts in debug builds.  I also ran a lightly-munged version of the tests noted in comment 10, and those pass for me.

I'm happy to let this bake a bit, just feel like I should post this sooner rather than later in case there's a chance of getting it into the forthcoming Firefox release.

Risk to taking this patch (and alternatives if risky): NaN-testing is pretty important to the integrity of our JS value representation, so getting this right is important.  There's considerable risk involved in getting it wrong -- which we now face.  Fortunately, the code to evaluate to get this right is extremely minimal, and a comparison of IEEE-754 to the version in this patch should simply verify its correctness.  So while it touches high-risk code, the patch itself should be fairly easy to verify, and the risk shouldn't be more than is involved in touching any code like this.

As usual, I have no idea which trees are really open to fixes right now, this close to another release, so I'm requesting aurora/beta and will let branch triage sort things out.
Attachment #590954 - Flags: approval-mozilla-beta?
Attachment #590954 - Flags: approval-mozilla-aurora?
Comment on attachment 590954 [details] [diff] [review]
Branch patch originally from bug 720094, applies to aurora/beta both

[Triage Comment]
Curtis, Jeff, and I met to discuss this bug. Jeff was uncomfortable leaving this code execution sg:crit bug in the build for any more than 6 weeks (so we agreed to aurora+), and felt that it would be very easy for a deviant to understand what the patch was attempting to fix. The issue is externally known and found through normal test cases, and we agreed that the bug is sufficiently low-risk. This code is also called significantly in normal browser usage, so we decided to see how it fares on Aurora/m-c and then land between FF10 Beta 6 and our RC build if all goes well.
Attachment #590954 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Landed in aurora:

https://hg.mozilla.org/releases/mozilla-aurora/rev/e7312c8640ff

The change still links to the old bug.  I hand-waved a bit about reducing developer complexity considering different implementations on trunk and branches to justify discussing landing, and then landing, it in aurora.  It probably wouldn't throw anyone looking hard off the scent, but since we're planning to take this in beta shortly anyway, it's probably not *too* important, and it might be enough of a headfake anyway.

And I guess since the change landed in m-c, this should be marked fixed now.
Attachment #590421 - Attachment mime type: text/plain → text/html
Target Milestone: --- → mozilla12
Whiteboard: [sg:critical] → [sg:critical] likely regressed by 653056
Comment on attachment 590954 [details] [diff] [review]
Branch patch originally from bug 720094, applies to aurora/beta both

[Triage Comment]
Approved for Beta 10 since we have not seen any fallout from m-c/Aurora.
Attachment #590954 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-beta/rev/217edd58f5e5
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [sg:critical] likely regressed by 653056 → [sg:critical][qa+] likely regressed by 653056
Verified fixed on FF 10.0.2.
Verified on FF 11 Beta 6.
Verified on FF 12 Aurora.
Verified on Nightly.
Status: RESOLVED → VERIFIED
Whiteboard: [sg:critical][qa+] likely regressed by 653056 → [sg:critical][qa!] likely regressed by 653056
Group: core-security
Keywords: testcase
Verified fixed in Firefox 10.0.6esrpre 2012-06-21.
Flags: in-testsuite+
Depends on: 720094
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: