Crash [@ ??] with --disable-main-thread-denormals on ARM64
Categories
(Core :: JavaScript: WebAssembly, defect, P1)
Tracking
()
People
(Reporter: decoder, Assigned: nbp)
References
Details
(5 keywords, Whiteboard: [jsbugmon:update,bisect][fuzzblocker])
Crash Data
Attachments
(3 files, 5 obsolete files)
The attached testcase crashes on mozilla-central revision 20250610-5223faf9adce (build with opt, run with --fuzzing-safe --ion-offthread-compile=off --disable-main-thread-denormals).
Backtrace:
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 0x000014b9b0051398 in ?? ()
#1 0x00002a763466afb0 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
x0 0x1 -9223372036854775807
x1 0x848f3420 281472905720864
x2 0xd2320a48 281474208238152
x3 0xb0051200 22787754627584
x4 0xd23205c0 281474208236992
x5 0xd23205c8 281474208237000
x6 0xd23206c8 281474208237256
x7 0x0 -1407374883553280
x8 0x1 -9223372036854775807
x9 0xc7a23a00 -6408831667529434624
x10 0x0 9218868437227405312
x11 0x34604d80 46687173234048
x12 0xd23206d0 281474208237264
x13 0xffffffff 4294967295
x14 0xffffffff 4294967295
x15 0x0 0
x16 0xffff0000 -65536
x17 0x0 0
x18 0x1 1
x19 0xb0006840 22787754321984
x20 0xfffff800 4294965248
x21 0xd2320b20 281474208238368
x22 0xafff1580 22787754235264
x23 0x848f3420 281472905720864
x24 0x9a 154
x25 0xd2320eb8 281474208239288
x26 0xd2320eb8 281474208239288
x27 0x0 -985162418487296
x28 0xd2320a00 281474208238080
x29 0xd2320a20 281474208238112
x30 0xb00513e4 22787754628068
sp 0xd2320a00 281474208238080
pc 0xb0051398 22787754627992
cpsr [ EL=0 SSBS C N ]
fpcsr void
fpcr 0x1000000 16777216
=> 0x14b9b0051398: brk #0xf000
0x14b9b005139c: ldr x21, [x23]
Marking s-s per discussion about denormals.
| Reporter | ||
Comment 1•7 months ago
|
||
| Reporter | ||
Comment 2•7 months ago
|
||
| Reporter | ||
Updated•7 months ago
|
| Assignee | ||
Updated•7 months ago
|
| Assignee | ||
Comment 3•7 months ago
|
||
The failing breakpoint is located at the same position as in bug 1940716 comment 4, with the following sequence of instructions:
0x2fdbf9daebc0 add sp, sp, #0x10 (16)
0x2fdbf9daebc4 mov x28, sp
0x2fdbf9daebc8 ret
0x2fdbf9dbe368 cbnz w0, #-0xf8 (addr 0x2fdbf9dbe270)
0x2fdbf9dbe270 ldr x8, [x29, #40]
0x2fdbf9dbe274 asr x16, x8, #47
0x2fdbf9dbe278 cmn w16, #0x4 (4)
0x2fdbf9dbe27c b.eq #+0x98 (addr 0x2fdbf9dbe314)
0x2fdbf9dbe280 cmn w16, #0xa (10)
0x2fdbf9dbe284 b.eq #+0x80 (addr 0x2fdbf9dbe304)
0x2fdbf9dbe288 cmn w16, #0xc (12)
0x2fdbf9dbe28c b.eq #+0x70 (addr 0x2fdbf9dbe2fc)
0x2fdbf9dbe290 cmn w16, #0xf (15)
0x2fdbf9dbe294 b.eq #+0x40 (addr 0x2fdbf9dbe2d4)
0x2fdbf9dbe298 cmn w16, #0x10 (16)
0x2fdbf9dbe29c b.ls #+0x8 (addr 0x2fdbf9dbe2a4)
0x2fdbf9dbe2a4 fmov d16, x8
0x2fdbf9dbe2a8 fjcvtzs w0, d16
0x2fdbf9dbe2ac b.ne #+0x70 (addr 0x2fdbf9dbe31c)
0x2fdbf9dbe31c brk #0xf000
One first thing to note is that the arm64 simulator now seems to be using the FJCVTZS instruction, compared to bug 1940716.
| Assignee | ||
Comment 4•7 months ago
|
||
Decoder, what device have you use to find this issue? To be precise what CPU?
( see https://gist.github.com/jandem/e6b5660975f145b85d533b97ad054f08?permalink_comment_id=5563383 )
The issue I am seeing with the ARM64 simulator does not look armful:
0x2fdbf9dbe2a4 fmov d16, x8
0x2fdbf9dbe2a8 fjcvtzs w0, d16
0x2fdbf9dbe2ac b.ne #+0x70 (addr 0x2fdbf9dbe31c)
x8: 0x8000000000000001
v16: 0x00000000000000008000000000000001
x0: 0x0000000000000000
Z = 0
Z = 0, points to an error, which triggers the breakpoint, but in this case this is most likely coming from the value which flows in x8.
However, this does not match the reported registers, as x0 == 0x8000000000000001 in comment 0 (careful, these are 64 bits registers)
| Reporter | ||
Comment 5•7 months ago
|
||
The trace here is from a Cortex-A72 CPU.
But the fuzzing runs on GCP, which I believe uses Ampere Altra. So this was reproduced on two distinct CPUs.
| Assignee | ||
Comment 6•7 months ago
|
||
The problem I am seeing with the simulator is that the JS test case basically calls with the negated Number.MIN_VALUE which does not reflect the disabled denormals, and thus fails when converting the double to an integer, as this would be mapped to negative zero.
The only solution I see would be to change the "MIN_VALUE" to be the smallest non-denormals when denormals are disabled.
JS_DOUBLE_PS("MIN_VALUE", MinNumberValue<double>(),
JSPROP_READONLY | JSPROP_PERMANENT),
/** Computes the smallest non-zero positive float/double value. */
template <typename T>
static constexpr MOZ_ALWAYS_INLINE T MinNumberValue() {
return std::numeric_limits<T>::denorm_min();
}
Ideally we should change the MinNumberValue to reflect denormal status and update Number.MIN_VALUE when denormals are enabled/disabled to return the correct value or make it call MinNumberValue.
Paul, do you see any issue with changing MinNumberValue to reflect the disabled denormals?
The idea would be something like:
/** Computes the smallest non-zero positive float/double value. */
template <typename T>
static constexpr MOZ_ALWAYS_INLINE T MinNumberValue() {
T min = std::numeric_limits<T>::denorm_min();
if (min != T()) {
return min;
}
return std::numeric_limits<T>::min();
}
| Assignee | ||
Comment 7•7 months ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #4)
However, this does not match the reported registers, as
x0 == 0x8000000000000001in comment 0 (careful, these are 64 bits registers)
Running without fjcvtsz yield the same register value, and given the test case I would not expect any additional code path.
| Reporter | ||
Comment 8•7 months ago
|
||
We switched the fuzzing to ARMv9 and this is now a fuzzblocker.
| Assignee | ||
Comment 9•7 months ago
|
||
The std::denorm_min() value already compute the minimal value, even when denormals are disabled.
The problem comes from MIN_VALUE which baked in the denormal minimal value, instead of the minimal value depending on the execution context.
Trying to change MIN_VALUE as follows:
JS_PSG("MIN_VALUE", NumberMinValue, JSPROP_PERMANENT),
where NumberMinValue is defined as:
bool NumberMinValue(JSContext* cx, unsigned argc, Value* vp) {
CallArgs args = CallArgsFromVp(argc, vp);
args.rval().setDouble(MinNumberValue<double>());
return true;
}
causes the failure of test262/built-ins/Object/getOwnPropertyDescriptor/15.2.3.3-4-197.js
Even when denormals are not disabled. I do not know to which extend this we can violate the ECMAScript spec in these conditions…
However, fixing MIN_VALUE to actually be a valid double value does not seems to fix the crash on WebAssembly entry…
| Assignee | ||
Comment 10•7 months ago
•
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #9)
However, fixing MIN_VALUE to actually be a valid double value does not seems to fix the crash on WebAssembly entry…
This does fix the issue … the problems are that:
- std::numeric_limits<double>::denorm_min() does not seems to work as described and only returns the denormal value.
- Optimizing compilers are elide conditions which are comparing
denorm_minagainst zero as false, despite being an unknown fact at runtime.
| Assignee | ||
Comment 11•7 months ago
|
||
Comment 12•7 months ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #6)
Paul, do you see any issue with changing
MinNumberValueto reflect the disabled denormals?
I don't, but that doesn't mean much.
Comment 13•7 months ago
|
||
This bug prevents fuzzing from making progress; however, it has low severity. It is important for fuzz blocker bugs to be addressed in a timely manner (see here why?).
:nbp, could you consider increasing the severity?
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 14•7 months ago
|
||
I do not understand this code … The input value is 0x8000000000000001, which should be interpreted to -0 when denormals are disabled.
Arm64 check if the double has to be boxed and and thus jump into CoerceInPlace_JitEntry, which check that this is a number and determine that there is nothing to be done, and thus does nothing, before hitting a break point because we failed to convertValueToWasmAnyRef.
x64 successfully convert the same value to an int32 value of 1. Which from my point of view is even more incorrect than the arm64 behavior.
I have a few questions which are preventing me to make progress as I do not understand what this code is supposed to do:
- Why
convertValueToWasmAnyRefandbranchValueConvertsToWasmAnyRefInlinefail when we have double inputs? I understand that we might be interested in coercing double to int32, but I do not understand why we fallback toCoerceInPlace_JitEntryor hit a breakpoint if we cannot give a double input. Are double inputs supposed to be handled elsewhere? - Why
AnyRefgoes through all these checks? IfAnyRefis supposed to represent any JS::Value, shouldn't this be a no-op?
Comment 15•7 months ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #14)
I have a few questions which are preventing me to make progress as I do not understand what this code is supposed to do:
- Why
convertValueToWasmAnyRefandbranchValueConvertsToWasmAnyRefInlinefail when we have double inputs? I understand that we might be interested in coercing double to int32, but I do not understand why we fallback toCoerceInPlace_JitEntryor hit a breakpoint if we cannot give a double input. Are double inputs supposed to be handled elsewhere?
The wasm JIT entry conversion code is confusing. It first does a loop over all arguments to see if we can convert the JS::Value into a wasm::AnyRef without any complicated code (like boxing the value). If anything in that loop fails, we jump to an OOL call to CoerceInPlace_JitEntry which will do the boxing and conversions and write the results back in the original JS::Value argv. After that we rejoin the JIT entry code in a loop which does the actual inline conversion, using the now boxed value stored in argv.
So for a double of -1 (which needs to be boxed in WasmValueBox to fit in a wasm::AnyRef), we will first fail branchValueConvertsToWasmAnyRefInline here [1], then jump out of the loop and call CoerceInPlace_JitEntry. That will check if AnyRef::valueNeedsBoxing(), which checks AnyRef::doubleNeedsBoxing() [3], and will return true for -1. We box the value and store it in argv. Then in the conversion loop in JIT entry we will use convertValueToWasmAnyRef [4] to read the boxed value.
Step [4] has an asssertion style breakpoint for in the case that we fail to convert the wasm anyref inline. It means that either branchValueConvertsToWasmAnyRefInline is out of sync and said something could be converted inline but actually can't. Or else that CoerceInPlace_JitEntry didn't do the correct conversions so that inline conversion is infallible.
In this test case, it seems the issue is that CoerceInPlace_JitEntry is not boxing the double when it should be. I opened it up in the debugger and found things go wrong in AnyRef::doubleNeedsBoxing(0x8000000000000001). It looks like mozilla::NumberIsInt32(0x8000000000000001) returns true and thinks that it's actually 0. Which then causes the code to think it doesn't need to be boxed, which fails the following assertion. I'm guessing the issue is somewhere here [5].
- Why
AnyRefgoes through all these checks? IfAnyRefis supposed to represent any JS::Value, shouldn't this be a no-op?
Some JS::Values must be boxed (like doubles outside of a signed 31-bit range), and so we need to call into the VM to do that.
[1] https://searchfox.org/mozilla-central/rev/de87ab8fb7d046a068943d4728a2345568ff77c5/js/src/jit/MacroAssembler.cpp#7250
[2] https://searchfox.org/mozilla-central/rev/de87ab8fb7d046a068943d4728a2345568ff77c5/js/src/wasm/WasmBuiltins.cpp#1122
[3] https://searchfox.org/mozilla-central/rev/de87ab8fb7d046a068943d4728a2345568ff77c5/js/src/wasm/WasmAnyRef.h#251
[4] https://searchfox.org/mozilla-central/rev/de87ab8fb7d046a068943d4728a2345568ff77c5/js/src/wasm/WasmStubs.cpp#1197-1205
[5] https://searchfox.org/mozilla-central/rev/de87ab8fb7d046a068943d4728a2345568ff77c5/mfbt/FloatingPoint.h#370
Comment 16•7 months ago
|
||
This is probably unrelated, but I filed bug 1974307 for a thinko in doubleNeedsBoxing that I found.
| Assignee | ||
Comment 17•7 months ago
•
|
||
(In reply to Ryan Hunt [:rhunt] from comment #16)
This is probably unrelated, but I filed bug 1974307 for a thinko in doubleNeedsBoxing that I found.
Thank, I myself made a patch for handling -0 cases.
This should fix the issue seen with the current test case.
| Assignee | ||
Comment 18•7 months ago
|
||
Updated•7 months ago
|
Comment 19•7 months ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #17)
This should fix the issue seen with the current test case.
Do you mean that Ryan's patch in bug 1974307 should fix this bug ("the current test case")? Or were you referring to the patch you made that might be different from Ryan's?
What is an appropriate security rating for this crash?
If it's a security bug how far back does it go? Presumably the recent ESR-140 at least?
Updated•7 months ago
|
| Comment hidden (obsolete) |
Updated•6 months ago
|
Comment 21•6 months ago
|
||
Add a flushDenormalsToZero masm method which is used on double
values before doing a conversion to anyref. This ensures we
have well defined behavior in the JIT and C++ code on this
code path.
Multiply by 1.0 was chosen because it is guaranteed to not
change the sign, unlike adding zero. This is not a hot-code
path, so I don't think the overhead is going to be too
significant.
Comment 22•6 months ago
|
||
I attached an alternate targeted patch that fixes this issue (and the related failure from bug 1940716).
Updated•6 months ago
|
Comment 23•6 months ago
|
||
Nicolas, can we declassify this bug? This bug will always result in a safe crash in JIT code due to the breakpoint instruction.
| Comment hidden (obsolete) |
Comment 25•6 months ago
|
||
I believe we can remove patches and hide comments so they are not visible. At a minimum the sec-low rating should be removed, as this is not a security bug. The only reason we'd keep it hidden is because other security issues were discussed here.
Updated•6 months ago
|
| Assignee | ||
Comment 26•6 months ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #24)
First reason (Bug 1974481) briefly mentioned in comment 14.
This was a mistake from my part, what happens is that the denormal value is correctly interpreted as 0, except that the lowest bit is used to tag the int31_t type used by WebAssembly, thus leading to the 1 result, leading to this miss-conception.
Second reason (Bug 1976016) is hiding an OOB which has the same root-cause as the patch attached on this bug.
After much in depth verification, none of the 0 / -0 miss-attribution can be leveraged into an exploitable out-of-bound. These would remain as bad application of the JS standard.
(In reply to Ryan Hunt [:rhunt] from comment #23)
Nicolas, can we declassify this bug? This bug will always result in a safe crash in JIT code due to the breakpoint instruction.
Yes, now we can.
| Assignee | ||
Updated•5 months ago
|
| Assignee | ||
Comment 27•5 months ago
|
||
| Assignee | ||
Comment 28•5 months ago
|
||
| Assignee | ||
Comment 29•5 months ago
|
||
Comment on attachment 9497238 [details]
Bug 1971519 - Fix IsNegativeZero to handle denormals.
This is a different can of worms that would have to be addressed if we ever want to be spec compliant on denormal-disabled execution context. (including the MIN_VALUE patch)
| Assignee | ||
Updated•5 months ago
|
Comment 30•5 months ago
|
||
Unable to reproduce bug 1971519 using build mozilla-central 20250610214301-5223faf9adce. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
| Assignee | ||
Comment 31•5 months ago
|
||
I have not landed any patches yet, as I am no longer able to reproduce the issue. This seems to have been fixed by Bug 1977692.
Looking through the code, this seems like a legitimate fix, as -0 are now accepted as valid input which are coerced to 0 (int31)
Thus, only the test case remains to land.
| Assignee | ||
Comment 32•5 months ago
|
||
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Comment 33•2 months ago
|
||
Unfortunately we'll need to revert bug 1977692 in bug 1992240. The spec was rewritten to clarify that -0 should not coerce to +0 for i31ref values, as that breaks identity roundtripping for these values. We'll probably want to fix the denormal issue so that it stops crashing on this test case, although my understanding is that the crash is not security sensitive.
Description
•