Crash in [@ gemmology::(anonymous namespace)::kernel::maddw]
Categories
(Firefox :: Translations, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox121 | --- | unaffected |
firefox122 | --- | unaffected |
firefox123 | + | fixed |
People
(Reporter: mccr8, Assigned: sergesanspaille)
References
(Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
[Tracking Requested - why for this release]:
Crash report: https://crash-stats.mozilla.org/report/index/7c105665-3c0d-461a-a614-b4baa0231220
Reason: EXCEPTION_ILLEGAL_INSTRUCTION
Top 10 frames of crashing thread:
0 xul.dll gemmology:: third_party/gemmology/gemmology.h:208
0 xul.dll gemmology:: third_party/gemmology/gemmology.h:640
0 xul.dll gemmology:: third_party/gemmology/gemmology.h:646
0 xul.dll gemmology::Engine<xsimd::avxvnni>::Shift::PrepareBias<gemmology::callbacks::UnquantizeAndAddBiasAndWrite> third_party/gemmology/gemmology.h:1303
1 xul.dll js::intgemm::IntrI8PrepareBias::<lambda_4>::operator const js/src/intgemm/IntegerGemmIntrinsic.cpp:317
1 xul.dll xsimd::detail::dispatcher<`lambda at /builds/worker/checkouts/gecko/js/src/intgemm/IntegerGemmIntrinsic.cpp:317:3', xsimd::arch_list<xsimd::avxvnni, xsimd::avx2, xsimd::ssse3, xsimd::sse2> >::walk_archs third_party/xsimd/include/xsimd/config/xsimd_arch.hpp:238
1 xul.dll xsimd::detail::dispatcher<`lambda at /builds/worker/checkouts/gecko/js/src/intgemm/IntegerGemmIntrinsic.cpp:317:3', xsimd::arch_list<xsimd::avxvnni, xsimd::avx2, xsimd::ssse3, xsimd::sse2> >::operator third_party/xsimd/include/xsimd/config/xsimd_arch.hpp:253
1 xul.dll js::intgemm::IntrI8PrepareBias js/src/intgemm/IntegerGemmIntrinsic.cpp:301
2 ? @0x00000176f245301e
3 xul.dll WasmMemoryCopy js/src/wasm/WasmInstance.cpp:566
Looks like a regression from bug 1868949.
Comment 1•1 year ago
|
||
Set release status flags based on info from the regressing bug 1868949
:sergesanspaille, since you are the author of the regressor, bug 1868949, could you take a look? Also, could you set the severity field?
For more information, please visit BugBot documentation.
Updated•1 year ago
|
Reporter | ||
Updated•1 year ago
|
Assignee | ||
Comment 2•1 year ago
•
|
||
This probably means that the runtime detection code is incorrect :-/
Assignee | ||
Comment 3•1 year ago
•
|
||
Based on the crash report, the proc is a Tiger Lake, which supports avx vnni. I've double checked- the detection code and it looks correct. And the stack trace points at vpdpbusd
which is indeed an AVX VNNI instruction :-/
Comment 4•1 year ago
•
|
||
Serge, Tiger Lake supports AVX512 VNNI instructions (in 512-bit and 256-bit width), but that's not the same as AVXVNNI on later CPUs like Alder Lake, which have a different VEX
prefix. When you see vpdpbusd
, check whether the instruction prefix is correct, specifically, not VEX
for Tiger Lake.
Comment 5•1 year ago
|
||
Yannis confirmed that the disassembly shows vex vpdpbusd
, so it's AVXVNNI, but it means the detection code misfired. As far as I can tell you're checking the right bits though.
Comment 6•1 year ago
|
||
This reproduces on my Zen 4: https://crash-stats.mozilla.org/report/index/ba465ecd-1ccf-425a-84d7-063db0231221
Comment 7•1 year ago
•
|
||
I'm looking through the code, and there was a suspicion "best" CPU detection was at fault (https://github.com/xtensor-stack/xsimd/blob/a48ab430d4b84ecd5449180ee1c6d2eed67c4191/include/xsimd/config/xsimd_cpuid.hpp#L189), but I don't see anything wrong there. Note that even if AVXVNNI detection misfires, it should be overruled by the AVX512_VNNI detection that follows.
What I do notice is that gemmology (https://github.com/mozilla/gemmology/blob/40dda91e99088ff80e21d71e57415aa491a0954c/gemmology.h#L208) ONLY has code for the AVXVNNI version, not the AVX512_VNNI one. So indeed misfiring detection (rather than "best") could still be the cause.
I'm still looking, but for example this looks like a minor bug (wouldn't cause this crash tho as we don't compile with Intel): https://github.com/xtensor-stack/xsimd/blob/a48ab430d4b84ecd5449180ee1c6d2eed67c4191/include/xsimd/config/xsimd_cpuid.hpp#L116
And this also looks suspicious, but it's probably dead code for Firefox: https://searchfox.org/mozilla-central/source/mozglue/misc/SSE.cpp#65 (comment that follows is also misleading)
Comment 8•1 year ago
|
||
On AVX512 machines, this would set best_arch_found(available_architectures().best)
to some level of AVX512 support, e.g. generic::version(3, 4, 1);
. Wouldn't the following code: https://hg.mozilla.org/mozilla-central/file/37657c7691664026e54babf7d1cf608fe58a92fb/third_party/xsimd/include/xsimd/config/xsimd_arch.hpp#l237
then match on AVXVNNI generic::version(2, 3, 0)
?
This seems to match the specific case we have here where higher version hardware support doesn't imply lower version support, and we provide a lower version routine, but not the higher version one.
Updated•1 year ago
|
Assignee | ||
Comment 9•1 year ago
|
||
Upstream patch: https://github.com/xtensor-stack/xsimd/pull/994.patch
Comment 10•1 year ago
|
||
The bug is marked as tracked for firefox123 (nightly). However, the bug still isn't assigned.
:marco, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
Comment 11•1 year ago
|
||
Serge is working on it.
Assignee | ||
Comment 12•1 year ago
|
||
Assignee | ||
Updated•1 year ago
|
Comment 13•1 year ago
|
||
Comment 14•1 year ago
|
||
bugherder |
Updated•1 year ago
|
Description
•