Closed Bug 1558179 Opened 5 years ago Closed 5 years ago

JS::Value::isDouble doesn't handle NaN correctly - SPARC issue

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: petr.sumbera, Assigned: iain)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0

Steps to reproduce:

IEEE 754 defines NaN representation (e.g. here: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_math.html):

'NaN: f not equal 0 (at least one bit in f is nonzero)'

Following test:

https://searchfox.org/mozilla-central/source/js/public/Value.h#598

works correctly when NaN is represented by: 7ff8000000000000

but sometimes on SPARC I get NaN as: 7fffffffffffffff (which is correct according to IEEE 754). But Firefox says it's not double.

Following seems to resolve the issue (see 'if __sparc' part):

  void setDouble(double d) {
    // Don't assign to asDouble_ to fix a miscompilation with GCC 5.2.1 and
    // 5.3.1. See bug 1312488.
    *this = Value(d);
#if __sparc
    // NaN (Not-a-Number): f not eq 0 (at least one bit in f is nonzero)
    if (isnan(d)) asBits_ = 0x7ff8000000000000;
#endif
    MOZ_ASSERT(isDouble());
  }
Component: Untriaged → JavaScript Engine
Product: Firefox → Core

Iain, could you take a look at this.

Flags: needinfo?(iireland)
Priority: -- → P2

This is an unfortunate interaction between SpiderMonkey's value representation and SPARC's unique choice of canonical hardware NaN.

SpiderMonkey uses a NaN-boxing scheme to represent JavaScript values. We take advantage of the fact that there are ~2^52 distinct NaN patterns to encode non-double values (objects, strings, ints, bools, etc...) inside the NaN space. To make this work, we pick a canonical NaN pattern to represent an actual NaN. Almost all architectures have standardized on either 0x7ff8000000000000 or 0xfff8000000000000 (the same pattern, but with the sign bit set) as the canonical NaN generated by hardware operations like Inf-Inf, so SpiderMonkey uses 0x7ff8000000000000 as canonical. To the best of my knowledge, SPARC is the only remaining architecture using a different NaN. (MIPS used to use a different payload, but after IEEE 754-2008 formally recommended that the most significant bit should be is_quiet, not is_signalling, MIPS moved towards the more common NaN.)

(Sidenote: the details of our NaN-boxing scheme will be changing in the very near future when bug 1401624 lands, but the general principle remains: SpiderMonkey implicitly assumes that a NaN generated by hardware will be either 0x7ff8000000000000 or 0xfff8000000000000.)

In a variety of places throughout SpiderMonkey, we call canonicalizeNaN to coerce non-canonical NaN values into a canonical form. Unfortunately, because SpiderMonkey is never tested on hardware with a non-canonical native NaN, it's very hard to notice if we miss a case.

If SPARC had JIT support, we'd have to go fix the JITs too. Since SPARC is interpreter-only, a fix in Value.h should be enough, and the performance impact of the extra branch will be dwarfed by the overall slowness of the interpreter.

I've already been cleaning a bunch of this code up for bug 1401624 and bug 1548908. I'll pull a patch together to fix this in Nightly.

Petr: if you are building from source, your proposed fix should work for now, although I would suggest "d = CanonicalizeNaN(d); *this = Value(d);" instead of reimplementing the logic yourself.

Assignee: nobody → iireland
Flags: needinfo?(iireland)
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

There is one more probably related issue I see with js test atomics/basic-tests.js:

Assertion failure: (asBits_ >> 47) <= JSVAL_TAG_OBJECT, at /builds/psumbera/FIREFOX/obj-sparc64-sun-solaris2.11/dist/include/js/Value.h:629

It's coming from isObject() where asBits_ is 0xffffffffffffffff.

asBits_ being 0xffffffffffffffff almost certainly implies that a raw uncanonicalized NaN made it into a Value somehow. I suspect that it was via JS::Value::fromDouble, which I missed when I was writing my previous comment (but fixed in my patch). If you call canonicalizeNaN in fromDouble, does that fix your bug?

Flags: needinfo?(petr.sumbera)

static Value fromDouble(double d) { return Value(CanonicalizeNaN(d)); }

doesn't help. It's not called at all in the test.

Following is minimal test version (note 255, values 128-255 will cause the assertion):

load(libdir + "asserts.js");

var DEBUG = false;              // Set to true for useful printouts


function testInt8Extremes(a) {

    a[10] = 255;
    assertEq(Atomics.load(a, 10), -1);

}

function runTests() {
    // Currently the SharedArrayBuffer needs to be a multiple of 4K bytes in size.
    var sab = new SharedArrayBuffer(12);

    // Test extreme values
    testInt8Extremes(new Int8Array(sab));

    assertEq(Atomics[Symbol.toStringTag], "Atomics");
}

runTests();
Flags: needinfo?(petr.sumbera)

On second thought, I was wrong in my previous post. The canonical hardware NaN on SPARC is 0x7fff_ffff_ffff_ffff, with a sign bit of 0. So the value that you're looking at is not a non-canonicalized hardware NaN; it must come from somewhere else.

There's a good chance this is a bug in the atomics code. Tier-3 hardware like SPARC uses the implementations in Atomic-Operations-feeling-lucky.h, which are unsafe, but "usually functional". (See bug 1347128 for more.) If that's the problem, then you'll have to delve into SPARC atomics yourself. (We'll accept a patch if you get something working.)

If you can get a more complete stack trace of the failure, I might be able to give a little more direction.

PS: Is |assertEq(Atomics[Symbol.toStringTag], "Atomics");| in your testcase actually necessary to make it crash? That would be really weird.

(In reply to Iain Ireland [:iain] from comment #7)

If you can get a more complete stack trace of the failure, I might be able to give a little more direction.

00007fffbfffcd91 JS::Value::isObjectconst+0xa8(7fffb8834218, 7fffb8834228, a, 7fffb8834218, 7fffbf5a8000, 1)
00007fffbfffce41 js::ContextChecks::check+0x28(7fffbfffd7b0, 7fffb8834218, 0, 7fffbfe00001, 1, 7fffb88300d8)
00007fffbfffcf01 CallJSNative+0x2fc(7fffb882d000, 7fffbe0eec58, 7fffbfffdb70, 0, 7fffb889d800, 1)
00007fffbfffcfc1 js::InternalCallOrConstruct+0x8bc(7fffb882d000, 7fffbfffdb70, 0, 7fffb882d000, 7fffbe0eec58, 3193436b3e00)
00007fffbfffd0e1 InternalCall+0x74(7fffb882d000, 7fffbfffdb70, 0, 35, 3193436b3e00, 7fffb8834218)
00007fffbfffd1b1 Interpret+0x7f18(7fffb882d000, 7fffbfffe0b8, 7fffb882d000, 7fffbfffdde8, 7fffb8834218, 0)
00007fffbfffd6f1 js::RunScript+0x164(7fffb882d000, 7fffbfffe0b8, 7fffbfffe088, 7fffb9c31818, 3193436b1b50, 1)
00007fffbfffd7c1 js::ExecuteKernel+0x6bc(3193436b1b50, 7fffbfffe0d8, 319343685040, 7fffbfffe0f0, 0, 0)
00007fffbfffd921 js::Execute+0x90c(7fffbfffe4e0, 7fffbfffe1e8, 20000, 0, 7fffb882d000, 0)
00007fffbfffda51 ExecuteScript+0xd8(7fffb882d000, 7fffbfffe3f0, 7fffbfffe4e0, 0, 0, 3193436b1b50)
00007fffbfffdb11 JS_ExecuteScript+0x64(7fffb882d000, 7fffbfffe4e0, 7fffbffff437, 7fffb9c2a000, 3193436b1b50, 7fffbfffe3e0)
00007fffbfffdc01 RunFile+0x228(7fffb882d000, 7fffbffff437, 7fffb9c31870, 1, 0, 58b12c45f99cd)
00007fffbfffddb1 Process+0x1c8(7fffb882d000, 7fffbffff437, 0, 0, 0, 7fffb9c31870)
00007fffbfffe031 ProcessArgs+0x810(1, 7fffbfffed50, ffffffffffffffff, ffffffffffffffff, e, ffffffffffffffff)
00007fffbfffe251 main+0x1f10(7fffb8882110, 7fffbfffefe8, 7fffbffff068, 100000, 1, 7fffbfffed00)
00007fffbfffe701 _start+0x64(0, 0, 0, 0, 0, 0)

PS: Is |assertEq(Atomics[Symbol.toStringTag], "Atomics");| in your testcase actually necessary to make it crash? That would be really weird.

No. It's not needed.

Thank you!

There are two calls to cx->check in CallJSNative. I'm pretty sure they both eventually resolve to this function here, which accepts a Value and makes sure that it is valid. 0xffff_ffff_ffff_ffff is not a valid Value, so we end up asserting inside isObject.

This implies that either an argument or the return value of a native function call is 0xffff_ffff_ffff_ffff. Given that atomics_load is a native function, I would start looking there.

It's probably a coincidence that -1 = 0xffff_ffff_ffff_ffff = the expected result of Atomics.load, but it is certainly a bit suggestive. If there's some sort of weird sign-extension thing going on, that could also explain why 128-255 (with the top bit set) fail, and 0-127 succeed.

Are you running big-endian? I don't think this code gets tested much on anything that isn't little-endian.

(In reply to Iain Ireland [:iain] from comment #9)

It's probably a coincidence that -1 = 0xffff_ffff_ffff_ffff = the expected result of Atomics.load, but it is certainly a bit suggestive. If there's some sort of weird sign-extension thing going on, that could also explain why 128-255 (with the top bit set) fail, and 0-127 succeed.

The issue seems to be in setIn32. Following makes the test pass:

   void setInt32(int32_t i) {
-    asBits_ = bitsFromTagAndPayload(JSVAL_TAG_INT32, uint32_t(i));
+    asBits_ = bitsFromTagAndPayload(JSVAL_TAG_INT32, PayloadType(i) & UINT32_MAX);
   }

and similar change probably should go at least also into isInt32().

Are you running big-endian? I don't think this code gets tested much on anything that isn't little-endian.

Yes. But this doesn't seem to be the issue here.

Huh. That seems weird.

We are first casting int32_t to uint32_t, which should be a no-op in terms of bit pattern. I believe this is the relevant section of the C++ standard (4.7.2):

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the
number of bits used to represent the unsigned type). (Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern if there is no truncation.)

Then we implicitly cast uint32_t to a PayloadType, which is uint64_t. Converting one unsigned type to an unsigned type of larger rank shouldn't sign-extend.

So the existing code should work. What compiler are you using?

(In reply to Iain Ireland [:iain] from comment #11)

Huh. That seems weird.

Maybe something related with compiler optimization. When I added at the end of setInt32() fprintf which printed out 'i' value it worked.

So the existing code should work. What compiler are you using?

gcc 7.3.0

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8de5da3de4ba
Canonicalize all NaNs on hardware that does not generate canonical NaNs natively r=tcampbell
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: