Crash in [@ js::gc::CellWithTenuredGCPointer<T>::headerPtr ]
Categories
(Core :: DOM: Workers, defect)
Tracking
()
People
(Reporter: cpeterson, Unassigned)
References
Details
(Keywords: crash, regression)
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/413a89e5-3a07-4f74-b4e5-080520221020
@ André, do you think your LexicalEnvironmentObjects optimizations in bug 1341937 could have caused this Android GC crash?
This crash looks like a regression in Fenix 106. The first crash report is from 106.0a1 build ID 20220909093917. Here is the pushlog between 2022-09-08 and 20220909093917, which includes your LexicalEnvironmentObjects changes:
Reason: SIGSEGV / SEGV_MAPERR
Top 10 frames of crashing thread:
0 libxul.so std::__ndk1::__cxx_atomic_load<unsigned long> /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/atomic:970
0 libxul.so std::__ndk1::__atomic_base<unsigned long, false>::load const /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/atomic:1487
0 libxul.so mozilla::detail::IntrinsicMemoryOps<unsigned long, mfbt/Atomics.h:195
0 libxul.so mozilla::detail::AtomicBaseIncDec<unsigned long, const mfbt/Atomics.h:340
0 libxul.so js::gc::CellWithTenuredGCPointer<js::gc::Cell, js::Shape>::headerPtr const js/src/gc/Cell.h:791
0 libxul.so JSObject::shape const js/src/vm/JSObject.h:91
0 libxul.so JSObject::nonCCWRealm const js/src/vm/JSObject.h:398
0 libxul.so JSObject::nonCCWGlobal const js/src/vm/JSObject-inl.h:215
0 libxul.so JS::GetNonCCWObjectGlobal js/src/jsapi.cpp:1182
0 libxul.so xpc::NativeGlobal js/xpconnect/wrappers/WrapperFactory.cpp:778
Reporter | ||
Updated•2 years ago
|
Comment 1•2 years ago
|
||
This could also be a signature change. That looks like a pretty generic GC crash. Bug 1791509 added some of these atomic functions to the skip list, but I don't know if the Android versions were in there.
Comment 2•2 years ago
|
||
Yes, this is a signature change caused by new inlined methods appearing. I'd say we could ignore the whole std::__ndk1
namespace because it's not interesting.
Comment 3•2 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #0)
@ André, do you think your LexicalEnvironmentObjects optimizations in bug 1341937 could have caused this Android GC crash?
No, I don't think so. Looking at the complete stack trace from the crash report, the object passed to AutoJSAPI::Init
is ModuleLoadRequest::mDynamicPromise
, which is the Promise object created in js::StartDynamicModuleImport
. The Promise object is passed to Gecko through cx->runtime()->moduleDynamicImportHook
. And this is pretty far away from the changes in bug 1341937.
This bug looks more similar to bug 1291535. Maybe it makes sense to add MOZ_RELEASE_ASSERT(aRequest->mDynamicPromise)
in ModuleLoaderBase::FinishDynamicImportAndReject()
to verify we don't clear mDynamicPromise
and are then calling FinishDynamicImportAndReject()
?
Updated•2 years ago
|
Comment 4•2 years ago
|
||
Comment 5•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on release (startup)
:sdetar, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Reporter | ||
Comment 6•2 years ago
|
||
Now that we're ignoring the std::__ndk1
namespace (bug 1798495), this crash's signature has changed to js::gc::CellWithTenuredGCPointer<T>::headerPtr
in 106. This signature looks pretty generic and affects both Android and desktop, though almost 50% of the crash reports are from Android!
Updated•2 years ago
|
Comment 7•2 years ago
|
||
Looking deeper in the stack trace of the crash, this seems like the issue might be related to Modules loading.
Yulia / Andrew, can you have a look at this issue?
Comment 8•2 years ago
|
||
This isn't related to Workers as modules are not turned on for them, so I can take this.
It looks like this is the problem related to FinishDynamicImportAndReject. This was recently closed in two parts in 1774111 and 1788532 -- and it looks like all of the crashes are pre ~108 when some of that work landed. Jonco was working on this so I'll cc him, at least for monitoring. This can likely be closed as a dup once we identify which fix addressed this.
Comment 9•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Reporter | ||
Comment 10•2 years ago
|
||
This bug's crash volume dropped from ~500 reports per day to 0 overnight (Dec 1-2). Is this a reporting problem? Even if this crash was fixed, I would expect we would still receive reports from older builds that haven't updated to a fixed build yet.
Reporter | ||
Comment 11•2 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #10)
This bug's crash volume dropped from ~500 reports per day to 0 overnight (Dec 1-2). Is this a reporting problem? Even if this crash was fixed, I would expect we would still receive reports from older builds that haven't updated to a fixed build yet.
mccr8 suggests the signatures might have changed.
Reporter | ||
Comment 12•2 years ago
|
||
[@ JSObject::nonCCWGlobal]
is a related signature, though I only see reports from 106 and 107.
Crash report: https://crash-stats.mozilla.org/report/index/cd34f9e1-b7e5-4932-84a6-a50c60221210
There are JSObject::nonCCWGlobal
crash reports from all platforms, but 83% are from Android.
Comment 13•2 years ago
|
||
Bug 1774111 should have fixed this. It's present in 108 onwards.
Comment 14•2 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #13)
Bug 1774111 should have fixed this. It's present in 108 onwards.
Thanks! Closing as duplicate seems appropriate then, I'd say.
Updated•2 years ago
|
Description
•