Open Bug 1701945 Opened 3 years ago Updated 1 year ago

Ryzen startup crash in [@ js::jit::MoveResolver::resolve] (= with cpu family 23 model 1 stepping 1)

Categories

(Core :: JavaScript Engine: JIT, defect, P2)

Unspecified
Windows 10
defect

Tracking

()

People

(Reporter: aryx, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

10 crashes with 9 installations of 88.0b4 x64 builds on Windows 10, most in the first minutes; no crashes with 88.0b1-b3 and 87.0

Crash report: https://crash-stats.mozilla.org/report/index/ee87047c-5726-44ce-bb5f-d121b0210330

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0 xul.dll js::jit::MoveResolver::resolve js/src/jit/MoveResolver.cpp:269
1 xul.dll js::jit::CodeGenerator::generateBody js/src/jit/CodeGenerator.cpp:6571
2 xul.dll js::jit::CodeGenerator::generate js/src/jit/CodeGenerator.cpp:11450
3 xul.dll js::jit::CompileBackEnd js/src/jit/Ion.cpp:1582
4 xul.dll js::jit::IonCompileTask::runHelperThreadTask js/src/jit/IonCompileTask.cpp:30
5 xul.dll static js::HelperThread::ThreadMain js/src/vm/HelperThreads.cpp:2364
6 xul.dll static js::detail::ThreadTrampoline<void  js/src/threading/Thread.h:205
7 ucrtbase.dll thread_start<unsigned int , 1> 
8 kernel32.dll BaseThreadInitThunk 
9 ntdll.dll RtlUserThreadStart 

Only the uplift of bug 1700610 looks js engine related in 88.0b4. Yury, can do a sanity check if there is a relationship between these crashes and the bug?

Flags: needinfo?(ydelendik)

I'll check. Wasm SIMD is not enabled by default, so it will be nice to see what invokes bug 1700610 code in this crash. There is a relationship if: a) SIMD is explicitly enabled, b) content or extension code use WebAssembly SIMD. Also, I'll enable tests for x64 to see if that will reproduce the issue.

Unable to reproduce locally with build id 20210328185936. The crash data has no URLs, so I assume an extensions are the source of wasm code (if it is bug 1700610 regression). The crash reports have common CPU id Family 23 Model 1 Stepping 1. I enabled regression tests for the bug1700610 for all CPUs at https://treeherder.mozilla.org/jobs?repo=try&revision=bca275f9be877f9d6ed368e17a93cde9b0d5b618 . Julian, Lars, am I missing something?

Summary: startup crash in [@ js::jit::MoveResolver::resolve] → Ryzen startup crash in [@ js::jit::MoveResolver::resolve] (= with cpu family 23 model 1 stepping 1)

Looking back at D109644 of bug 1700610, I now wonder if that is really correct.
In particular, the sequence:

   xchgl(eax, output);
   movsbl(eax, eax);
   xchgl(eax, output);

If those xchgs are really 32-bit versions, then this sequence will be wrong in
64-bit mode, because they will zero out the upper half of rax. Also the
sequence is irrelevant in 64-bit mode, because the lowest 8 bits of all registers
are uniformly available in 64-bit mode (no?) But this sequence appears in
MacroAssembler-x86-shared-SIMD.cpp ; maybe it should be in the x86-only
(32-bit-only) equivalent?

The code should not be executed in 64-bit mode because SingleByteRegs holds all registers on x64, hence we should never enter that branch. We should indeed be able to assert that we're not in 64-bit mode in the branch. I now feel bad for making Yury remove the ifdef, even if it was correct to do so.

Would also be extremely weird if that patch - in the masm - caused a problem in the move resolver (ignoring the fact about simd not being enabled at all and the patch being simd-only). Also the test case should not run with !simd, so should not be at fault.

Possibly relevant, it is possible for privileged extensions to use SIMD in FF88, this facilitates the Bergamot extension. So there's a variable there around that. Jit-tests should not normally be affected: they should continue to see !simd, unless something went wrong with the prefs. But startup code could be affected because it is privileged and the predicate tests isSystemOrAddonPrinicipal(). There is some logic around this in XPCJSContext.cpp that changes between early and late beta (search for useWasmSimdWormhole) but I don't see how this matters apart from a possible C++ compiler bug.

All that said, there should be no SIMD code in the self-hosted code, so no obvious trigger. Maybe worth investigating with some asserts, just to rule out.

Yury, can do a sanity check if there is a relationship between these crashes and the bug?

There is a very tiny chance that bug 1700610 caused this regression. I was monitoring the crash-stats, and here is my observations, and quick summary of the above comments:

  • All crashes are specific to one configuration CPU with id "Family 23 Model 1 Stepping 1" and only on Windows NT
  • Bug 1700610 is specific to WebAssembly SIMD operation, and there is no information in crashes about using that
  • Crash reports do not mention any extension that could possibly use SIMD operations
  • No meaningful URLs specified in the reports
  • Last crashes happened a week ago for few days only.

I did some brief testing on my local Windows x64 machine, and could not reproduce the crash. We probably need more information to connect that to the bug 1700610.

Flags: needinfo?(ydelendik)
Severity: -- → S4
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.