perma failing bug1940716.js when running macosx aarch64 shippable tests on new os version 15.30 and m4 chipset
Categories
(Core :: JavaScript Engine: JIT, defect, P1)
Tracking
()
People
(Reporter: jmaher, Assigned: jmaher)
References
(Blocks 2 open bugs)
Details
Attachments
(2 files, 2 obsolete files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
while migrating to macosx 1500 on the new m4 chipset, this jit test fails on shippable only:
[task 2025-04-28T16:59:42.738Z] 16:59:42 INFO - TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1928407.js | Success (code 0, args "--blinterp-eager") [0.0 s]
[task 2025-04-28T16:59:42.741Z] 16:59:42 INFO - TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1939962.js | Success (code 0, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.0 s]
[task 2025-04-28T16:59:42.746Z] 16:59:42 INFO - TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1939962.js | Success (code 0, args "--ion-eager --ion-offthread-compile=off --ion-check-range-analysis --ion-extra-checks --no-sse3 --no-threads") [0.0 s]
[task 2025-04-28T16:59:42.747Z] 16:59:42 INFO - TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1939962.js | Success (code 0, args "--baseline-eager --write-protect-code=off") [0.0 s]
[task 2025-04-28T16:59:42.751Z] 16:59:42 INFO - TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1939962.js | Success (code 0, args "--blinterp-eager") [0.0 s]
[task 2025-04-28T16:59:42.752Z] 16:59:42 INFO - TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Success (code 0, args "--disable-main-thread-denormals") [0.0 s]
[task 2025-04-28T16:59:42.753Z] 16:59:42 INFO - Exit code: -5
[task 2025-04-28T16:59:42.753Z] 16:59:42 INFO - FAIL - auto-regress/bug1940716.js
[task 2025-04-28T16:59:42.754Z] 16:59:42 WARNING - TEST-UNEXPECTED-FAIL | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Unknown (code -5, args "--disable-main-thread-denormals --ion-eager --ion-offthread-compile=off --more-compartments") [0.0 s]
[task 2025-04-28T16:59:42.755Z] 16:59:42 INFO - INFO exit-status : -5
[task 2025-04-28T16:59:42.755Z] 16:59:42 INFO - INFO timed-out : False
[task 2025-04-28T16:59:42.756Z] 16:59:42 INFO - TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1939962.js | Success (code 0, args "--no-blinterp --no-baseline --no-ion --more-compartments") [0.0 s]
[task 2025-04-28T16:59:42.758Z] 16:59:42 INFO - Exit code: -5
[task 2025-04-28T16:59:42.758Z] 16:59:42 INFO - FAIL - auto-regress/bug1940716.js
[task 2025-04-28T16:59:42.758Z] 16:59:42 WARNING - TEST-UNEXPECTED-FAIL | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Unknown (code -5, args "--disable-main-thread-denormals --baseline-eager --write-protect-code=off") [0.0 s]
[task 2025-04-28T16:59:42.758Z] 16:59:42 INFO - INFO exit-status : -5
[task 2025-04-28T16:59:42.758Z] 16:59:42 INFO - INFO timed-out : False
[task 2025-04-28T16:59:42.763Z] 16:59:42 INFO - Exit code: -5
[task 2025-04-28T16:59:42.763Z] 16:59:42 INFO - FAIL - auto-regress/bug1940716.js
[task 2025-04-28T16:59:42.763Z] 16:59:42 WARNING - TEST-UNEXPECTED-FAIL | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Unknown (code -5, args "--disable-main-thread-denormals --ion-eager --ion-offthread-compile=off --ion-check-range-analysis --ion-extra-checks --no-sse3 --no-threads") [0.0 s]
[task 2025-04-28T16:59:42.763Z] 16:59:42 INFO - INFO exit-status : -5
[task 2025-04-28T16:59:42.763Z] 16:59:42 INFO - INFO timed-out : False
[task 2025-04-28T16:59:42.765Z] 16:59:42 INFO - Exit code: -5
[task 2025-04-28T16:59:42.765Z] 16:59:42 INFO - FAIL - auto-regress/bug1940716.js
[task 2025-04-28T16:59:42.765Z] 16:59:42 WARNING - TEST-UNEXPECTED-FAIL | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Unknown (code -5, args "--disable-main-thread-denormals --blinterp-eager") [0.0 s]
[task 2025-04-28T16:59:42.765Z] 16:59:42 INFO - INFO exit-status : -5
[task 2025-04-28T16:59:42.766Z] 16:59:42 INFO - INFO timed-out : False
[task 2025-04-28T16:59:42.769Z] 16:59:42 INFO - TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Success (code 0, args "--disable-main-thread-denormals --no-blinterp --no-baseline --no-ion --more-compartments") [0.0 s]
[task 2025-04-28T16:59:42.771Z] 16:59:42 INFO - TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1942648.js | Success (code 0, args "") [0.0 s]
the test runs in debug, but it isn't aarch64. I really don't understand the debug tests and why we are running some tests on aarch64 or 64 or fake aarch64. My priority is to migrate the tests to the new machines by May 9th.
in phabricator D246100, I had the test skipped, but I ran on try and it passed, it was showing for debug, but the data for shippable hadn't finished building so I mistakenly said it was passing.
Assignee | ||
Comment 1•18 days ago
|
||
Updated•18 days ago
|
Comment 2•17 days ago
|
||
Stupid question, does it pass the shippable build without the --disable-main-thread-denormals;
from the first line.
Note, we should absolutely not remove it, I am just trying to see if this is something related to denormals or something else.
If this is something else, then we can probably find another test case for denormals, and forward this issue to the WebAssembly team.
Assignee | ||
Comment 4•17 days ago
|
||
:nbp, how do you build without --disable-main-thread-denormals
? just remove the reference from:
https://searchfox.org/mozilla-central/source/js/src/jit-test/tests/auto-regress/bug1940716.js#1
Comment 5•17 days ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #4)
:nbp, how do you build without
--disable-main-thread-denormals
? just remove the reference from:
https://searchfox.org/mozilla-central/source/js/src/jit-test/tests/auto-regress/bug1940716.js#1
Just remove the first line of the test case, as an experiment, the same line with the skip-if
.
Updated•17 days ago
|
Comment 6•17 days ago
|
||
bugherder |
Assignee | ||
Comment 7•16 days ago
|
||
the test seems to pass:
https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=06b8a49e29ceab2647bc569d62e6527a450b916a
I removed the whole line at the top of the test:
https://hg-edge.mozilla.org/try/rev/d87b39e8ab8f6c8fe72315819a210910372a0511
:nbp, I am happy to test more stuff, etc...
Comment 8•16 days ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #7)
the test seems to pass:
https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=06b8a49e29ceab2647bc569d62e6527a450b916aI removed the whole line at the top of the test:
https://hg-edge.mozilla.org/try/rev/d87b39e8ab8f6c8fe72315819a210910372a0511
Thanks, that definitely narrow down the issue to the --disable-main-thread-denormals
flag.
Which strangely works fine as part of https://searchfox.org/mozilla-central/source/js/src/jit-test/tests/self-test/denormals-1.js
Comment 9•16 days ago
|
||
I did some analysis of this on Try.
The test case passes Number.MIN_VALUE
to a Wasm function that has an externref
argument. It fails with --disable-main-thread-denormals
.
The bug is related to the GenerateJitEntry
trampoline that's used for calls from JS JIT code to Wasm, so the test only fails with eager compilation (--blinterp-eager
, --baseline-eager
, etc).
For an externref
argument, the code in GenerateJitEntry
performs the following steps:
- The first loop uses
masm.branchValueConvertsToWasmAnyRefInline
to determine if we can convert the JS Value to an externref in place. In this case the function has just one argument, so if that succeeds we're done. If it fails, we callCoerceInPlace_JitEntry
to do the conversion in C++ and then also proceed to step 2. - The second step calls
masm.convertValueToWasmAnyRef
to perform the actual conversion. It also asserts this is now infallible after step 1 (triggersmasm.breakpoint()
if the conversion can't be done in JIT code). I'm pretty sure we're hitting this breakpoint.
The double-to-anyref code converts the value to Int32 using convertDoubleToInt32
. The problematic case here is the code path that uses the Fjcvtzs
instruction. If I force use of the non-Fjcvtzs
code path the test passes.
What happens on my M1 is that the Fjcvtzs
instruction converts Number.MIN_VALUE
to 0 if denormals are disabled, so the conversion can be done in JIT code and we don't need to call into C++.
What seems to happen on the M4 in CI:
- In step 1, the
Fjcvtzs
instruction doesn't set theZero
flag forNumber.MIN_VALUE
soconvertDoubleToInt32
unexpectedly fails. - This means we call
CoerceInPlace_JitEntry
to do the conversion in C++. - In C++ we do the equivalent check using
AnyRef::valueNeedsBoxing
which callsAnyRef::doubleNeedsBoxing
. This returnsfalse
so it disagrees with step 1 and thinks the conversion can and should be done in JIT code. - We then return to the second step where we call
convertValueToWasmAnyRef
. This usesFjcvtzs
again and it still fails and we hit the breakpoint.
Open questions:
- Why am I not seeing this behavior on my M1? Maybe
Fjcvtzs
with denormals disabled behaves differently on these M4 CPUs? - Why does this not affect debug builds? Maybe the check that happens in C++ code in
AnyRef::doubleNeedsBoxing
results in different machine code? I'll trigger a debug build on Try to see what happens there.
Comment 10•16 days ago
•
|
||
Matt, can you try this again on your machine with this JS shell build and the --tbpl
flag passed to jit-tests? So something like:
$ python3 js/src/jit-test/jit_test.py downloaded/build/js --tbpl bug1940716 --repeat 100
(It's the shell build from this try push, for the "OS X Cross Compiled Shippable" Bpgo(B) job, since that's the one we end up running according to the logs.)
Comment 11•16 days ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #9)
- Why does this not affect debug builds?
I downloaded the JS shell build that's used by this debug jit-test job and I think it only contains x64 code so it would use Rosetta. That might explain the difference between debug and shippable builds.
Is that expected? It would be better (and likely use less resources in CI!) to run a native arm64 build.
Comment 12•16 days ago
•
|
||
Ok, I can confirm what Jan suspected for why this was failing on shippable and not failing on debug builds.
Using this try push: https://treeherder.mozilla.org/jobs?repo=try&revision=954d05d5524c0abda8851896199ba4a5ae8f2fd6&selectedTaskRun=RunSqDD9TcautsRd-I0dEQ.0
And comparing the config.status
artifacts from the following jobs: OS X Cross Compiled debug build-macosx64/debug B
and OS X AArch64 Cross Compiled Shippable opt Profile-guided optimization builds build-macosx64-aarch64-shippable/opt B
.
We can confirm that the debug builds are being compiled to target x64 instead of arm64. Thus, all builds would run under the x64 emulation instead of running on arm64.
Sub-part of the diff of config.status files, which highlight that debug builds are generating x64 code at runtime, whereas shippable builds are generating arm64 code:
'HAVE__UNWIND_BACKTRACE': '1',
'JSON_USE_EXCEPTION': 0,
'JS_64BIT': '1',
- 'JS_CODEGEN_X64': '1',
- 'JS_DEBUG': '1',
+ 'JS_CODEGEN_ARM64': '1',
'JS_DEFAULT_JITREPORT_GRANULARITY': '3',
Comment 13•16 days ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #10)
Matt, can you try this again on your machine with this JS shell build and the
--tbpl
flag passed to jit-tests? So something like:$ python3 js/src/jit-test/jit_test.py downloaded/build/js --tbpl bug1940716 --repeat 100
(It's the shell build from this try push, for the "OS X Cross Compiled Shippable" Bpgo(B) job, since that's the one we end up running according to the logs.)
Fails first time:
mgaudet@M4Book unified % python3 js/src/jit-test/jit_test.py ~/Downloads/target/js --tbpl bug1940716
[1|0|0|0] 16% ========> | 0.0s
doubleNeedsBoxing: 0
no boxing in CoerceInPlace_JitEntry
Exit code: -5
FAIL - auto-regress/bug1940716.js
[1|1|0|0] 33% =================> | 0.0s
doubleNeedsBoxing: 0
no boxing in CoerceInPlace_JitEntry
Exit code: -5
FAIL - auto-regress/bug1940716.js
[1|2|0|0] 50% ==========================> | 0.0s
doubleNeedsBoxing: 0
no boxing in CoerceInPlace_JitEntry
Exit code: -5
FAIL - auto-regress/bug1940716.js
[1|3|0|0] 66% ===================================> | 0.0s
doubleNeedsBoxing: 0
no boxing in CoerceInPlace_JitEntry
Exit code: -5
FAIL - auto-regress/bug1940716.js
[2|4|0|0] 100% ======================================================>| 0.0s
FAILURES:
--disable-main-thread-denormals --blinterp-eager auto-regress/bug1940716.js
--disable-main-thread-denormals --baseline-eager --write-protect-code=off auto-regress/bug1940716.js
--disable-main-thread-denormals --ion-eager --ion-offthread-compile=off --ion-check-range-analysis --ion-extra-checks --no-sse3 --no-threads auto-regress/bug1940716.js
--disable-main-thread-denormals --ion-eager --ion-offthread-compile=off --more-compartments auto-regress/bug1940716.js
TIMEOUTS:
Needs at least blinterp-eager
lldb says
Process 37693 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=1, subcode=0x2291556082ac)
frame #0: 0x00002291556082b0
-> 0x2291556082b0: ldr x21, [x23]
0x2291556082b4: str x23, [sp]
Here's the crashing instructions.
(lldb) x/20i $pc-40
0x229155608288: b 0x2291556082a8
0x22915560828c: eor x0, x0, x0
0x229155608290: b 0x2291556082a8
0x229155608294: mov x16, #-0x5000000000000 ; =-1407374883553280
0x229155608298: eor x0, x8, x16
0x22915560829c: orr x0, x0, #0x2
0x2291556082a0: b 0x2291556082a8
0x2291556082a4: eor x0, x8, #0xfffe000000000000
0x2291556082a8: b 0x2291556082b0
0x2291556082ac: brk #0xf000 <------- here's our breakpoint. I'm assuming this is masm.unreachable (which I'm less sure of)
-> 0x2291556082b0: ldr x21, [x23]
0x2291556082b4: str x23, [sp]
0x2291556082b8: mov x28, sp
0x2291556082bc: bl 0x229155608020
0x2291556082c0: mov sp, x29
0x2291556082c4: mov x2, #0x800000000000 ; =140737488355328
0x2291556082c8: movk x2, #0xfff9, lsl #48
0x2291556082cc: ldr x30, [sp, #0x8]
0x2291556082d0: ldr x29, [sp]
0x2291556082d4: add sp, sp, #0x10
Comment 14•16 days ago
|
||
I hadn't read comment #9 -- I suspect your diagnosis will be correct; anything you want me to try to help confirm for you?
Assignee | ||
Comment 15•13 days ago
|
||
I don't know why the jittests are running in x86_64/debug not aarch64/debug. I can try to schedule them on aarch64/debug and see what happens
Assignee | ||
Comment 16•12 days ago
|
||
these run fine on a proper aarch64 debug build, so I am going to switch them in bug 1964276
Assignee | ||
Comment 17•10 days ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D246983
Updated•10 days ago
|
Assignee | ||
Comment 18•10 days ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D246983
Updated•10 days ago
|
Assignee | ||
Comment 19•10 days ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D246983
Updated•10 days ago
|
Updated•10 days ago
|
Updated•10 days ago
|
Updated•10 days ago
|
Updated•10 days ago
|
Comment 20•10 days ago
|
||
uplift |
Description
•