Open Bug 1971519 Opened 7 months ago Updated 2 months ago

Crash [@ ??] with --disable-main-thread-denormals on ARM64

Categories

(Core :: JavaScript: WebAssembly, defect, P1)

ARM64
Linux
defect

Tracking

()

ASSIGNED
Tracking Status
firefox-esr140 --- affected
firefox140 --- wontfix
firefox141 --- wontfix

People

(Reporter: decoder, Assigned: nbp)

References

Details

(5 keywords, Whiteboard: [jsbugmon:update,bisect][fuzzblocker])

Crash Data

Attachments

(3 files, 5 obsolete files)

The attached testcase crashes on mozilla-central revision 20250610-5223faf9adce (build with opt, run with --fuzzing-safe --ion-offthread-compile=off --disable-main-thread-denormals).

Backtrace:

    Program terminated with signal SIGTRAP, Trace/breakpoint trap.
    #0  0x000014b9b0051398 in ?? ()
    #1  0x00002a763466afb0 in ?? ()
    Backtrace stopped: previous frame inner to this frame (corrupt stack?)
    x0	0x1	-9223372036854775807
    x1	0x848f3420	281472905720864
    x2	0xd2320a48	281474208238152
    x3	0xb0051200	22787754627584
    x4	0xd23205c0	281474208236992
    x5	0xd23205c8	281474208237000
    x6	0xd23206c8	281474208237256
    x7	0x0	-1407374883553280
    x8	0x1	-9223372036854775807
    x9	0xc7a23a00	-6408831667529434624
    x10	0x0	9218868437227405312
    x11	0x34604d80	46687173234048
    x12	0xd23206d0	281474208237264
    x13	0xffffffff	4294967295
    x14	0xffffffff	4294967295
    x15	0x0	0
    x16	0xffff0000	-65536
    x17	0x0	0
    x18	0x1	1
    x19	0xb0006840	22787754321984
    x20	0xfffff800	4294965248
    x21	0xd2320b20	281474208238368
    x22	0xafff1580	22787754235264
    x23	0x848f3420	281472905720864
    x24	0x9a	154
    x25	0xd2320eb8	281474208239288
    x26	0xd2320eb8	281474208239288
    x27	0x0	-985162418487296
    x28	0xd2320a00	281474208238080
    x29	0xd2320a20	281474208238112
    x30	0xb00513e4	22787754628068
    sp	0xd2320a00	281474208238080
    pc	0xb0051398	22787754627992
    cpsr	[ EL=0 SSBS C N ]
    fpcsr	void
    fpcr	0x1000000	16777216
    => 0x14b9b0051398:	brk	#0xf000
       0x14b9b005139c:	ldr	x21, [x23]

Marking s-s per discussion about denormals.

Attached file Testcase
Flags: needinfo?(nicolas.b.pierron)
Assignee: nobody → nicolas.b.pierron
Severity: -- → S3
Status: NEW → ASSIGNED
Priority: -- → P1

The failing breakpoint is located at the same position as in bug 1940716 comment 4, with the following sequence of instructions:

0x2fdbf9daebc0 add     sp, sp, #0x10 (16)
0x2fdbf9daebc4 mov     x28, sp
0x2fdbf9daebc8 ret
0x2fdbf9dbe368 cbnz    w0, #-0xf8 (addr 0x2fdbf9dbe270)
0x2fdbf9dbe270 ldr     x8, [x29, #40]
0x2fdbf9dbe274 asr     x16, x8, #47
0x2fdbf9dbe278 cmn     w16, #0x4 (4)
0x2fdbf9dbe27c b.eq    #+0x98 (addr 0x2fdbf9dbe314)
0x2fdbf9dbe280 cmn     w16, #0xa (10)
0x2fdbf9dbe284 b.eq    #+0x80 (addr 0x2fdbf9dbe304)
0x2fdbf9dbe288 cmn     w16, #0xc (12)
0x2fdbf9dbe28c b.eq    #+0x70 (addr 0x2fdbf9dbe2fc)
0x2fdbf9dbe290 cmn     w16, #0xf (15)
0x2fdbf9dbe294 b.eq    #+0x40 (addr 0x2fdbf9dbe2d4)
0x2fdbf9dbe298 cmn     w16, #0x10 (16)
0x2fdbf9dbe29c b.ls    #+0x8 (addr 0x2fdbf9dbe2a4)
0x2fdbf9dbe2a4 fmov    d16, x8
0x2fdbf9dbe2a8 fjcvtzs w0, d16
0x2fdbf9dbe2ac b.ne    #+0x70 (addr 0x2fdbf9dbe31c)
0x2fdbf9dbe31c brk     #0xf000

One first thing to note is that the arm64 simulator now seems to be using the FJCVTZS instruction, compared to bug 1940716.

Decoder, what device have you use to find this issue? To be precise what CPU?
( see https://gist.github.com/jandem/e6b5660975f145b85d533b97ad054f08?permalink_comment_id=5563383 )

The issue I am seeing with the ARM64 simulator does not look armful:

0x2fdbf9dbe2a4 fmov    d16, x8
0x2fdbf9dbe2a8 fjcvtzs w0, d16
0x2fdbf9dbe2ac b.ne    #+0x70 (addr 0x2fdbf9dbe31c)
x8: 0x8000000000000001
v16: 0x00000000000000008000000000000001
x0: 0x0000000000000000
Z = 0

Z = 0, points to an error, which triggers the breakpoint, but in this case this is most likely coming from the value which flows in x8.

However, this does not match the reported registers, as x0 == 0x8000000000000001 in comment 0 (careful, these are 64 bits registers)

Flags: needinfo?(nicolas.b.pierron) → needinfo?(choller)

The trace here is from a Cortex-A72 CPU.

But the fuzzing runs on GCP, which I believe uses Ampere Altra. So this was reproduced on two distinct CPUs.

Flags: needinfo?(choller)

The problem I am seeing with the simulator is that the JS test case basically calls with the negated Number.MIN_VALUE which does not reflect the disabled denormals, and thus fails when converting the double to an integer, as this would be mapped to negative zero.

The only solution I see would be to change the "MIN_VALUE" to be the smallest non-denormals when denormals are disabled.

js/src/jsnum.cpp:

JS_DOUBLE_PS("MIN_VALUE", MinNumberValue<double>(),
             JSPROP_READONLY | JSPROP_PERMANENT),

mfbt/FloatingPoint.h:

/** Computes the smallest non-zero positive float/double value. */
template <typename T>
static constexpr MOZ_ALWAYS_INLINE T MinNumberValue() {
  return std::numeric_limits<T>::denorm_min();
}

Ideally we should change the MinNumberValue to reflect denormal status and update Number.MIN_VALUE when denormals are enabled/disabled to return the correct value or make it call MinNumberValue.

Paul, do you see any issue with changing MinNumberValue to reflect the disabled denormals?
The idea would be something like:

/** Computes the smallest non-zero positive float/double value. */
template <typename T>
static constexpr MOZ_ALWAYS_INLINE T MinNumberValue() {
  T min = std::numeric_limits<T>::denorm_min();
  if (min != T()) {
    return min;
  }
  return std::numeric_limits<T>::min();
}
Flags: needinfo?(padenot)

(In reply to Nicolas B. Pierron [:nbp] from comment #4)

However, this does not match the reported registers, as x0 == 0x8000000000000001 in comment 0 (careful, these are 64 bits registers)

Running without fjcvtsz yield the same register value, and given the test case I would not expect any additional code path.

We switched the fuzzing to ARMv9 and this is now a fuzzblocker.

Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update,bisect][fuzzblocker]

The std::denorm_min() value already compute the minimal value, even when denormals are disabled.
The problem comes from MIN_VALUE which baked in the denormal minimal value, instead of the minimal value depending on the execution context.

Trying to change MIN_VALUE as follows:

JS_PSG("MIN_VALUE", NumberMinValue, JSPROP_PERMANENT),

where NumberMinValue is defined as:

bool NumberMinValue(JSContext* cx, unsigned argc, Value* vp) {
  CallArgs args = CallArgsFromVp(argc, vp);
  args.rval().setDouble(MinNumberValue<double>());
  return true;
}

causes the failure of test262/built-ins/Object/getOwnPropertyDescriptor/15.2.3.3-4-197.js
Even when denormals are not disabled. I do not know to which extend this we can violate the ECMAScript spec in these conditions…

However, fixing MIN_VALUE to actually be a valid double value does not seems to fix the crash on WebAssembly entry…

Flags: needinfo?(padenot)

(In reply to Nicolas B. Pierron [:nbp] from comment #9)

However, fixing MIN_VALUE to actually be a valid double value does not seems to fix the crash on WebAssembly entry…

This does fix the issue … the problems are that:

  • std::numeric_limits<double>::denorm_min() does not seems to work as described and only returns the denormal value.
  • Optimizing compilers are elide conditions which are comparing denorm_min against zero as false, despite being an unknown fact at runtime.

(In reply to Nicolas B. Pierron [:nbp] from comment #6)

Paul, do you see any issue with changing MinNumberValue to reflect the disabled denormals?

I don't, but that doesn't mean much.

This bug prevents fuzzing from making progress; however, it has low severity. It is important for fuzz blocker bugs to be addressed in a timely manner (see here why?).
:nbp, could you consider increasing the severity?

For more information, please visit BugBot documentation.

Flags: needinfo?(nicolas.b.pierron)

I do not understand this code … The input value is 0x8000000000000001, which should be interpreted to -0 when denormals are disabled.

Arm64 check if the double has to be boxed and and thus jump into CoerceInPlace_JitEntry, which check that this is a number and determine that there is nothing to be done, and thus does nothing, before hitting a break point because we failed to convertValueToWasmAnyRef.

x64 successfully convert the same value to an int32 value of 1. Which from my point of view is even more incorrect than the arm64 behavior.

I have a few questions which are preventing me to make progress as I do not understand what this code is supposed to do:

  • Why convertValueToWasmAnyRef and branchValueConvertsToWasmAnyRefInline fail when we have double inputs? I understand that we might be interested in coercing double to int32, but I do not understand why we fallback to CoerceInPlace_JitEntry or hit a breakpoint if we cannot give a double input. Are double inputs supposed to be handled elsewhere?
  • Why AnyRef goes through all these checks? If AnyRef is supposed to represent any JS::Value, shouldn't this be a no-op?
Flags: needinfo?(nicolas.b.pierron) → needinfo?(rhunt)

(In reply to Nicolas B. Pierron [:nbp] from comment #14)

I have a few questions which are preventing me to make progress as I do not understand what this code is supposed to do:

  • Why convertValueToWasmAnyRef and branchValueConvertsToWasmAnyRefInline fail when we have double inputs? I understand that we might be interested in coercing double to int32, but I do not understand why we fallback to CoerceInPlace_JitEntry or hit a breakpoint if we cannot give a double input. Are double inputs supposed to be handled elsewhere?

The wasm JIT entry conversion code is confusing. It first does a loop over all arguments to see if we can convert the JS::Value into a wasm::AnyRef without any complicated code (like boxing the value). If anything in that loop fails, we jump to an OOL call to CoerceInPlace_JitEntry which will do the boxing and conversions and write the results back in the original JS::Value argv. After that we rejoin the JIT entry code in a loop which does the actual inline conversion, using the now boxed value stored in argv.

So for a double of -1 (which needs to be boxed in WasmValueBox to fit in a wasm::AnyRef), we will first fail branchValueConvertsToWasmAnyRefInline here [1], then jump out of the loop and call CoerceInPlace_JitEntry. That will check if AnyRef::valueNeedsBoxing(), which checks AnyRef::doubleNeedsBoxing() [3], and will return true for -1. We box the value and store it in argv. Then in the conversion loop in JIT entry we will use convertValueToWasmAnyRef [4] to read the boxed value.

Step [4] has an asssertion style breakpoint for in the case that we fail to convert the wasm anyref inline. It means that either branchValueConvertsToWasmAnyRefInline is out of sync and said something could be converted inline but actually can't. Or else that CoerceInPlace_JitEntry didn't do the correct conversions so that inline conversion is infallible.

In this test case, it seems the issue is that CoerceInPlace_JitEntry is not boxing the double when it should be. I opened it up in the debugger and found things go wrong in AnyRef::doubleNeedsBoxing(0x8000000000000001). It looks like mozilla::NumberIsInt32(0x8000000000000001) returns true and thinks that it's actually 0. Which then causes the code to think it doesn't need to be boxed, which fails the following assertion. I'm guessing the issue is somewhere here [5].

  • Why AnyRef goes through all these checks? If AnyRef is supposed to represent any JS::Value, shouldn't this be a no-op?

Some JS::Values must be boxed (like doubles outside of a signed 31-bit range), and so we need to call into the VM to do that.

[1] https://searchfox.org/mozilla-central/rev/de87ab8fb7d046a068943d4728a2345568ff77c5/js/src/jit/MacroAssembler.cpp#7250
[2] https://searchfox.org/mozilla-central/rev/de87ab8fb7d046a068943d4728a2345568ff77c5/js/src/wasm/WasmBuiltins.cpp#1122
[3] https://searchfox.org/mozilla-central/rev/de87ab8fb7d046a068943d4728a2345568ff77c5/js/src/wasm/WasmAnyRef.h#251
[4] https://searchfox.org/mozilla-central/rev/de87ab8fb7d046a068943d4728a2345568ff77c5/js/src/wasm/WasmStubs.cpp#1197-1205
[5] https://searchfox.org/mozilla-central/rev/de87ab8fb7d046a068943d4728a2345568ff77c5/mfbt/FloatingPoint.h#370

Flags: needinfo?(rhunt)

This is probably unrelated, but I filed bug 1974307 for a thinko in doubleNeedsBoxing that I found.

(In reply to Ryan Hunt [:rhunt] from comment #16)

This is probably unrelated, but I filed bug 1974307 for a thinko in doubleNeedsBoxing that I found.

Thank, I myself made a patch for handling -0 cases.
This should fix the issue seen with the current test case.

Whiteboard: [jsbugmon:update,bisect][fuzzblocker] → assertion, bugmon, crash, regression, testcase
Whiteboard: assertion, bugmon, crash, regression, testcase → [jsbugmon:update,bisect][fuzzblocker]

(In reply to Nicolas B. Pierron [:nbp] from comment #17)

This should fix the issue seen with the current test case.

Do you mean that Ryan's patch in bug 1974307 should fix this bug ("the current test case")? Or were you referring to the patch you made that might be different from Ryan's?

What is an appropriate security rating for this crash?
If it's a security bug how far back does it go? Presumably the recent ESR-140 at least?

Flags: needinfo?(nicolas.b.pierron)
Keywords: sec-low
Attachment #9497238 - Attachment description: Bug 1971519 - Fix IsNegativeZero to handle denormals. → (secure)

Add a flushDenormalsToZero masm method which is used on double
values before doing a conversion to anyref. This ensures we
have well defined behavior in the JIT and C++ code on this
code path.

Multiply by 1.0 was chosen because it is guaranteed to not
change the sign, unlike adding zero. This is not a hot-code
path, so I don't think the overhead is going to be too
significant.

I attached an alternate targeted patch that fixes this issue (and the related failure from bug 1940716).

Nicolas, can we declassify this bug? This bug will always result in a safe crash in JIT code due to the breakpoint instruction.

Flags: needinfo?(nicolas.b.pierron)

I believe we can remove patches and hide comments so they are not visible. At a minimum the sec-low rating should be removed, as this is not a security bug. The only reason we'd keep it hidden is because other security issues were discussed here.

Keywords: sec-lowsec-other

(In reply to Nicolas B. Pierron [:nbp] from comment #24)

First reason (Bug 1974481) briefly mentioned in comment 14.

This was a mistake from my part, what happens is that the denormal value is correctly interpreted as 0, except that the lowest bit is used to tag the int31_t type used by WebAssembly, thus leading to the 1 result, leading to this miss-conception.

Second reason (Bug 1976016) is hiding an OOB which has the same root-cause as the patch attached on this bug.

After much in depth verification, none of the 0 / -0 miss-attribution can be leveraged into an exploitable out-of-bound. These would remain as bad application of the JS standard.

(In reply to Ryan Hunt [:rhunt] from comment #23)

Nicolas, can we declassify this bug? This bug will always result in a safe crash in JIT code due to the breakpoint instruction.

Yes, now we can.

Attachment #9496009 - Attachment is obsolete: true

Comment on attachment 9497238 [details]
Bug 1971519 - Fix IsNegativeZero to handle denormals.

This is a different can of worms that would have to be addressed if we ever want to be spec compliant on denormal-disabled execution context. (including the MIN_VALUE patch)

Attachment #9497238 - Attachment is obsolete: true
Group: javascript-core-security

Unable to reproduce bug 1971519 using build mozilla-central 20250610214301-5223faf9adce. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Keywords: bugmon
Attachment #9506374 - Attachment description: (secure) → Bug 1971519 - Clean-up zeros before boxing doubles.
Attachment #9506375 - Attachment description: (secure) → Bug 1971519 - masm: Rename canonicalize{} to canonicalize{}NaN.
Attachment #9500164 - Attachment description: (secure) → WIP: Bug 1971519 - WIP - add masm.flushDenormalsToZero
Attachment #9500164 - Attachment is obsolete: true

I have not landed any patches yet, as I am no longer able to reproduce the issue. This seems to have been fixed by Bug 1977692.

Looking through the code, this seems like a legitimate fix, as -0 are now accepted as valid input which are coerced to 0 (int31)

Thus, only the test case remains to land.

Attachment #9497238 - Attachment description: (secure) → Bug 1971519 - Fix IsNegativeZero to handle denormals.
Attachment #9506374 - Attachment is obsolete: true
Attachment #9506375 - Attachment is obsolete: true

Unfortunately we'll need to revert bug 1977692 in bug 1992240. The spec was rewritten to clarify that -0 should not coerce to +0 for i31ref values, as that breaks identity roundtripping for these values. We'll probably want to fix the denormal issue so that it stops crashing on this test case, although my understanding is that the crash is not security sensitive.

See Also: → 1992240
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: