Closed Bug 1963159 Opened 18 days ago Closed 17 days ago

perma failing bug1940716.js when running macosx aarch64 shippable tests on new os version 15.30 and m4 chipset

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
140 Branch
Tracking Status
firefox139 --- fixed
firefox140 --- fixed

People

(Reporter: jmaher, Assigned: jmaher)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files, 2 obsolete files)

while migrating to macosx 1500 on the new m4 chipset, this jit test fails on shippable only:

[task 2025-04-28T16:59:42.738Z] 16:59:42     INFO -  TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1928407.js | Success (code 0, args "--blinterp-eager") [0.0 s]
[task 2025-04-28T16:59:42.741Z] 16:59:42     INFO -  TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1939962.js | Success (code 0, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.0 s]
[task 2025-04-28T16:59:42.746Z] 16:59:42     INFO -  TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1939962.js | Success (code 0, args "--ion-eager --ion-offthread-compile=off --ion-check-range-analysis --ion-extra-checks --no-sse3 --no-threads") [0.0 s]
[task 2025-04-28T16:59:42.747Z] 16:59:42     INFO -  TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1939962.js | Success (code 0, args "--baseline-eager --write-protect-code=off") [0.0 s]
[task 2025-04-28T16:59:42.751Z] 16:59:42     INFO -  TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1939962.js | Success (code 0, args "--blinterp-eager") [0.0 s]
[task 2025-04-28T16:59:42.752Z] 16:59:42     INFO -  TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Success (code 0, args "--disable-main-thread-denormals") [0.0 s]
[task 2025-04-28T16:59:42.753Z] 16:59:42     INFO -  Exit code: -5
[task 2025-04-28T16:59:42.753Z] 16:59:42     INFO -  FAIL - auto-regress/bug1940716.js
[task 2025-04-28T16:59:42.754Z] 16:59:42  WARNING -  TEST-UNEXPECTED-FAIL | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Unknown (code -5, args "--disable-main-thread-denormals --ion-eager --ion-offthread-compile=off --more-compartments") [0.0 s]
[task 2025-04-28T16:59:42.755Z] 16:59:42     INFO -  INFO exit-status     : -5
[task 2025-04-28T16:59:42.755Z] 16:59:42     INFO -  INFO timed-out       : False
[task 2025-04-28T16:59:42.756Z] 16:59:42     INFO -  TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1939962.js | Success (code 0, args "--no-blinterp --no-baseline --no-ion --more-compartments") [0.0 s]
[task 2025-04-28T16:59:42.758Z] 16:59:42     INFO -  Exit code: -5
[task 2025-04-28T16:59:42.758Z] 16:59:42     INFO -  FAIL - auto-regress/bug1940716.js
[task 2025-04-28T16:59:42.758Z] 16:59:42  WARNING -  TEST-UNEXPECTED-FAIL | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Unknown (code -5, args "--disable-main-thread-denormals --baseline-eager --write-protect-code=off") [0.0 s]
[task 2025-04-28T16:59:42.758Z] 16:59:42     INFO -  INFO exit-status     : -5
[task 2025-04-28T16:59:42.758Z] 16:59:42     INFO -  INFO timed-out       : False
[task 2025-04-28T16:59:42.763Z] 16:59:42     INFO -  Exit code: -5
[task 2025-04-28T16:59:42.763Z] 16:59:42     INFO -  FAIL - auto-regress/bug1940716.js
[task 2025-04-28T16:59:42.763Z] 16:59:42  WARNING -  TEST-UNEXPECTED-FAIL | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Unknown (code -5, args "--disable-main-thread-denormals --ion-eager --ion-offthread-compile=off --ion-check-range-analysis --ion-extra-checks --no-sse3 --no-threads") [0.0 s]
[task 2025-04-28T16:59:42.763Z] 16:59:42     INFO -  INFO exit-status     : -5
[task 2025-04-28T16:59:42.763Z] 16:59:42     INFO -  INFO timed-out       : False
[task 2025-04-28T16:59:42.765Z] 16:59:42     INFO -  Exit code: -5
[task 2025-04-28T16:59:42.765Z] 16:59:42     INFO -  FAIL - auto-regress/bug1940716.js
[task 2025-04-28T16:59:42.765Z] 16:59:42  WARNING -  TEST-UNEXPECTED-FAIL | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Unknown (code -5, args "--disable-main-thread-denormals --blinterp-eager") [0.0 s]
[task 2025-04-28T16:59:42.765Z] 16:59:42     INFO -  INFO exit-status     : -5
[task 2025-04-28T16:59:42.766Z] 16:59:42     INFO -  INFO timed-out       : False
[task 2025-04-28T16:59:42.769Z] 16:59:42     INFO -  TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1940716.js | Success (code 0, args "--disable-main-thread-denormals --no-blinterp --no-baseline --no-ion --more-compartments") [0.0 s]
[task 2025-04-28T16:59:42.771Z] 16:59:42     INFO -  TEST-PASS | tests/jit-test/jit-test/tests/auto-regress/bug1942648.js | Success (code 0, args "") [0.0 s]

the test runs in debug, but it isn't aarch64. I really don't understand the debug tests and why we are running some tests on aarch64 or 64 or fake aarch64. My priority is to migrate the tests to the new machines by May 9th.

in phabricator D246100, I had the test skipped, but I ran on try and it passed, it was showing for debug, but the data for shippable hadn't finished building so I mistakenly said it was passing.

Assignee: nobody → jmaher
Status: NEW → ASSIGNED

Stupid question, does it pass the shippable build without the --disable-main-thread-denormals; from the first line.
Note, we should absolutely not remove it, I am just trying to see if this is something related to denormals or something else.

If this is something else, then we can probably find another test case for denormals, and forward this issue to the WebAssembly team.

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9138373fb84a Disable bug1940716.js on mac/arm64 as it fails when we upgrade os/hardware. r=jandem

:nbp, how do you build without --disable-main-thread-denormals ? just remove the reference from:
https://searchfox.org/mozilla-central/source/js/src/jit-test/tests/auto-regress/bug1940716.js#1

Flags: needinfo?(nicolas.b.pierron)

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #4)

:nbp, how do you build without --disable-main-thread-denormals ? just remove the reference from:
https://searchfox.org/mozilla-central/source/js/src/jit-test/tests/auto-regress/bug1940716.js#1

Just remove the first line of the test case, as an experiment, the same line with the skip-if.

Flags: needinfo?(nicolas.b.pierron)
Blocks: sm-testing
Severity: -- → S4
Priority: -- → P1
Status: ASSIGNED → RESOLVED
Closed: 17 days ago
Resolution: --- → FIXED
Target Milestone: --- → 140 Branch
Flags: needinfo?(nicolas.b.pierron)

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #7)

the test seems to pass:
https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=06b8a49e29ceab2647bc569d62e6527a450b916a

I removed the whole line at the top of the test:
https://hg-edge.mozilla.org/try/rev/d87b39e8ab8f6c8fe72315819a210910372a0511

Thanks, that definitely narrow down the issue to the --disable-main-thread-denormals flag.
Which strangely works fine as part of https://searchfox.org/mozilla-central/source/js/src/jit-test/tests/self-test/denormals-1.js

Flags: needinfo?(nicolas.b.pierron)

I did some analysis of this on Try.

The test case passes Number.MIN_VALUE to a Wasm function that has an externref argument. It fails with --disable-main-thread-denormals.

The bug is related to the GenerateJitEntry trampoline that's used for calls from JS JIT code to Wasm, so the test only fails with eager compilation (--blinterp-eager, --baseline-eager, etc).

For an externref argument, the code in GenerateJitEntry performs the following steps:

  1. The first loop uses masm.branchValueConvertsToWasmAnyRefInline to determine if we can convert the JS Value to an externref in place. In this case the function has just one argument, so if that succeeds we're done. If it fails, we call CoerceInPlace_JitEntry to do the conversion in C++ and then also proceed to step 2.
  2. The second step calls masm.convertValueToWasmAnyRef to perform the actual conversion. It also asserts this is now infallible after step 1 (triggers masm.breakpoint() if the conversion can't be done in JIT code). I'm pretty sure we're hitting this breakpoint.

The double-to-anyref code converts the value to Int32 using convertDoubleToInt32. The problematic case here is the code path that uses the Fjcvtzs instruction. If I force use of the non-Fjcvtzs code path the test passes.

What happens on my M1 is that the Fjcvtzs instruction converts Number.MIN_VALUE to 0 if denormals are disabled, so the conversion can be done in JIT code and we don't need to call into C++.

What seems to happen on the M4 in CI:

  1. In step 1, the Fjcvtzs instruction doesn't set the Zero flag for Number.MIN_VALUE so convertDoubleToInt32 unexpectedly fails.
  2. This means we call CoerceInPlace_JitEntry to do the conversion in C++.
  3. In C++ we do the equivalent check using AnyRef::valueNeedsBoxing which calls AnyRef::doubleNeedsBoxing. This returns false so it disagrees with step 1 and thinks the conversion can and should be done in JIT code.
  4. We then return to the second step where we call convertValueToWasmAnyRef. This uses Fjcvtzs again and it still fails and we hit the breakpoint.

Open questions:

  • Why am I not seeing this behavior on my M1? Maybe Fjcvtzs with denormals disabled behaves differently on these M4 CPUs?
  • Why does this not affect debug builds? Maybe the check that happens in C++ code in AnyRef::doubleNeedsBoxing results in different machine code? I'll trigger a debug build on Try to see what happens there.

Matt, can you try this again on your machine with this JS shell build and the --tbpl flag passed to jit-tests? So something like:

$ python3 js/src/jit-test/jit_test.py downloaded/build/js  --tbpl bug1940716 --repeat 100

(It's the shell build from this try push, for the "OS X Cross Compiled Shippable" Bpgo(B) job, since that's the one we end up running according to the logs.)

Flags: needinfo?(mgaudet)

(In reply to Jan de Mooij [:jandem] from comment #9)

  • Why does this not affect debug builds?

I downloaded the JS shell build that's used by this debug jit-test job and I think it only contains x64 code so it would use Rosetta. That might explain the difference between debug and shippable builds.

Is that expected? It would be better (and likely use less resources in CI!) to run a native arm64 build.

Flags: needinfo?(jmaher)

Ok, I can confirm what Jan suspected for why this was failing on shippable and not failing on debug builds.
Using this try push: https://treeherder.mozilla.org/jobs?repo=try&revision=954d05d5524c0abda8851896199ba4a5ae8f2fd6&selectedTaskRun=RunSqDD9TcautsRd-I0dEQ.0

And comparing the config.status artifacts from the following jobs: OS X Cross Compiled debug build-macosx64/debug B and OS X AArch64 Cross Compiled Shippable opt Profile-guided optimization builds build-macosx64-aarch64-shippable/opt B.

We can confirm that the debug builds are being compiled to target x64 instead of arm64. Thus, all builds would run under the x64 emulation instead of running on arm64.

Sub-part of the diff of config.status files, which highlight that debug builds are generating x64 code at runtime, whereas shippable builds are generating arm64 code:

     'HAVE__UNWIND_BACKTRACE': '1',
     'JSON_USE_EXCEPTION': 0,
     'JS_64BIT': '1',
-    'JS_CODEGEN_X64': '1',
-    'JS_DEBUG': '1',
+    'JS_CODEGEN_ARM64': '1',
     'JS_DEFAULT_JITREPORT_GRANULARITY': '3',

(In reply to Jan de Mooij [:jandem] from comment #10)

Matt, can you try this again on your machine with this JS shell build and the --tbpl flag passed to jit-tests? So something like:

$ python3 js/src/jit-test/jit_test.py downloaded/build/js  --tbpl bug1940716 --repeat 100

(It's the shell build from this try push, for the "OS X Cross Compiled Shippable" Bpgo(B) job, since that's the one we end up running according to the logs.)

Fails first time:

mgaudet@M4Book unified % python3 js/src/jit-test/jit_test.py ~/Downloads/target/js  --tbpl bug1940716             
[1|0|0|0]  16% ========>                                              |   0.0s
doubleNeedsBoxing: 0
no boxing in CoerceInPlace_JitEntry
Exit code: -5
FAIL - auto-regress/bug1940716.js
[1|1|0|0]  33% =================>                                     |   0.0s
doubleNeedsBoxing: 0
no boxing in CoerceInPlace_JitEntry
Exit code: -5
FAIL - auto-regress/bug1940716.js
[1|2|0|0]  50% ==========================>                            |   0.0s
doubleNeedsBoxing: 0
no boxing in CoerceInPlace_JitEntry
Exit code: -5
FAIL - auto-regress/bug1940716.js
[1|3|0|0]  66% ===================================>                   |   0.0s
doubleNeedsBoxing: 0
no boxing in CoerceInPlace_JitEntry
Exit code: -5
FAIL - auto-regress/bug1940716.js
[2|4|0|0] 100% ======================================================>|   0.0s
FAILURES:
    --disable-main-thread-denormals --blinterp-eager auto-regress/bug1940716.js
    --disable-main-thread-denormals --baseline-eager --write-protect-code=off auto-regress/bug1940716.js
    --disable-main-thread-denormals --ion-eager --ion-offthread-compile=off --ion-check-range-analysis --ion-extra-checks --no-sse3 --no-threads auto-regress/bug1940716.js
    --disable-main-thread-denormals --ion-eager --ion-offthread-compile=off --more-compartments auto-regress/bug1940716.js
TIMEOUTS:

Needs at least blinterp-eager

lldb says

Process 37693 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=1, subcode=0x2291556082ac)
    frame #0: 0x00002291556082b0
->  0x2291556082b0: ldr    x21, [x23]
    0x2291556082b4: str    x23, [sp]

Here's the crashing instructions.

(lldb) x/20i $pc-40 
    0x229155608288: b      0x2291556082a8
    0x22915560828c: eor    x0, x0, x0
    0x229155608290: b      0x2291556082a8
    0x229155608294: mov    x16, #-0x5000000000000 ; =-1407374883553280 
    0x229155608298: eor    x0, x8, x16
    0x22915560829c: orr    x0, x0, #0x2
    0x2291556082a0: b      0x2291556082a8
    0x2291556082a4: eor    x0, x8, #0xfffe000000000000
    0x2291556082a8: b      0x2291556082b0
    0x2291556082ac: brk    #0xf000 <------- here's our breakpoint. I'm assuming this is masm.unreachable (which I'm less sure of)
->  0x2291556082b0: ldr    x21, [x23]
    0x2291556082b4: str    x23, [sp]
    0x2291556082b8: mov    x28, sp
    0x2291556082bc: bl     0x229155608020
    0x2291556082c0: mov    sp, x29
    0x2291556082c4: mov    x2, #0x800000000000 ; =140737488355328 
    0x2291556082c8: movk   x2, #0xfff9, lsl #48
    0x2291556082cc: ldr    x30, [sp, #0x8]
    0x2291556082d0: ldr    x29, [sp]
    0x2291556082d4: add    sp, sp, #0x10

I hadn't read comment #9 -- I suspect your diagnosis will be correct; anything you want me to try to help confirm for you?

Flags: needinfo?(mgaudet)

I don't know why the jittests are running in x86_64/debug not aarch64/debug. I can try to schedule them on aarch64/debug and see what happens

Depends on: 1964276

these run fine on a proper aarch64 debug build, so I am going to switch them in bug 1964276

Flags: needinfo?(jmaher)
Attachment #9485738 - Flags: approval-mozilla-beta?
Attachment #9485748 - Flags: approval-mozilla-beta?
Attachment #9485753 - Flags: approval-mozilla-beta?
Attachment #9485753 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9485738 - Attachment is obsolete: true
Attachment #9485738 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9485748 - Attachment is obsolete: true
Attachment #9485748 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: