Open Bug 1643369 Opened 4 years ago Updated 3 years ago

Crash in [@ js::jit::MConstant::MConstant]

Categories

(Core :: JavaScript Engine: JIT, defect, P2)

Unspecified
Windows 10
defect

Tracking

()

Tracking Status
firefox-esr68 --- wontfix
firefox77 + wontfix
firefox78 --- wontfix
firefox79 --- wontfix

People

(Reporter: pascalc, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

This bug is for crash report bp-e54c555c-fff8-43a1-9fc5-5d10a0200604.

Top 10 frames of crashing thread:

0 xul.dll js::jit::MConstant::MConstant js/src/jit/MIR.cpp:969
1 xul.dll js::jit::MCompare::foldsTo js/src/jit/MIR.cpp
2 xul.dll js::jit::ValueNumberer::visitBlock js/src/jit/ValueNumbering.cpp:1016
3 xul.dll js::jit::ValueNumberer::run js/src/jit/ValueNumbering.cpp:1279
4 xul.dll js::jit::OptimizeMIR js/src/jit/Ion.cpp:1076
5 xul.dll js::jit::CompileBackEnd js/src/jit/Ion.cpp:1438
6 xul.dll js::jit::IonCompileTask::runTask js/src/jit/IonCompileTask.cpp:27
7 xul.dll js::HelperThread::handleIonWorkload js/src/vm/HelperThreads.cpp:2147
8 xul.dll static js::HelperThread::ThreadMain js/src/vm/HelperThreads.cpp:2050
9 xul.dll static js::detail::ThreadTrampoline<void  js/src/threading/Thread.h:206

New crash in 77

There's a strong correlation to cpu "family 23 model 1 stepping 1", which seem to be AMD Ryzen CPUs based on the Zen microarchitecture.

Latest crashes over the last 7 days highlight that most crashes are crashing with -1 with an EXCEPTION_ACCESS_VIOLATION_READ.
Same as Bug 1643367 which also has matching spikes in release.

Crash Signature: [@ js::jit::MConstant::MConstant] → [@ js::jit::MConstant::MConstant] [@ js::jit::ValueNumberer::visitBlock]
Severity: S4 → S3
Priority: P3 → P2

Concerning volume for these signatures compared to 76, tracking for 77.

Crashes are almost all on Windows, 98% on windows 10. Several users reported that they were crashing on Twitch.

I looked at a few of these crashes and they are crashing on arithmetic instructions (impossible).

See Also: → 1512430

Looking at crashes for the architecture reported in comment 2 suggest that most signatures are in the jit namespace, under the Value Numbering phase.

However 78% of the crashes are crashing with the value -1, which seems to suggest that this is the same bug, independently of the code.
The crash happens more frequently with larger number of cores: (8: 20%, 12: 25%, 16: 51%) which matches with Bug 1512430 observation.

The fact that the value numbering is the center of the storm would suggest that the code of the value numbering phase triggers this specific CPU bug, or that we happen to have the Value numbering code in one of the thread at the time when the bug occur.

If I recall correctly Twitch was among the first to adopt Asm.js/WebAssembly which we do compile eagerly on multiple threads.
I suggest we add a preference to change the min/max thread-count and experiment with it using addons, to see if this has any impact, or test it locally if someone can reliably reproduce this issue. The idea being to limit the number of threads running at the same time to avoid overheating.

Wasm has a complicated system for determining the number of threads, search for maxWasmCompilationThreads, which is used in various ways to determine whether to tier, whether to clamp the number of compilation threads to the number of physical cores, or whether to use the logical cores (because compilation is backed up and needs to make progress). In most cases, cpuCount (nee thread-count) should directly affect these computations, so it should be possible to twiddle that value while leaving everything else constant.

Flags: needinfo?(jdemooij)
You need to log in before you can comment on or make changes to this bug.