Closed Bug 1796666 Opened 2 years ago Closed 2 years ago

Crash in [@ js::gc::CellWithTenuredGCPointer<T>::headerPtr ]

Categories

(Core :: DOM: Workers, defect)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1774111
Tracking Status
firefox106 --- wontfix
firefox107 --- wontfix
firefox108 --- fixed
firefox109 --- fixed

People

(Reporter: cpeterson, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/413a89e5-3a07-4f74-b4e5-080520221020

@ André, do you think your LexicalEnvironmentObjects optimizations in bug 1341937 could have caused this Android GC crash?

This crash looks like a regression in Fenix 106. The first crash report is from 106.0a1 build ID 20220909093917. Here is the pushlog between 2022-09-08 and 20220909093917, which includes your LexicalEnvironmentObjects changes:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=21386acd44738780c060ee8070d29f7eefc9af5c&tochange=331d6d70f0e992d44e97263beeda22bf4d667857

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0 libxul.so std::__ndk1::__cxx_atomic_load<unsigned long> /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/atomic:970
0 libxul.so std::__ndk1::__atomic_base<unsigned long, false>::load const /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/atomic:1487
0 libxul.so mozilla::detail::IntrinsicMemoryOps<unsigned long,  mfbt/Atomics.h:195
0 libxul.so mozilla::detail::AtomicBaseIncDec<unsigned long,  const mfbt/Atomics.h:340
0 libxul.so js::gc::CellWithTenuredGCPointer<js::gc::Cell, js::Shape>::headerPtr const js/src/gc/Cell.h:791
0 libxul.so JSObject::shape const js/src/vm/JSObject.h:91
0 libxul.so JSObject::nonCCWRealm const js/src/vm/JSObject.h:398
0 libxul.so JSObject::nonCCWGlobal const js/src/vm/JSObject-inl.h:215
0 libxul.so JS::GetNonCCWObjectGlobal js/src/jsapi.cpp:1182
0 libxul.so xpc::NativeGlobal js/xpconnect/wrappers/WrapperFactory.cpp:778
Flags: needinfo?(andrebargull)
Keywords: regression
See Also: → 1341937

This could also be a signature change. That looks like a pretty generic GC crash. Bug 1791509 added some of these atomic functions to the skip list, but I don't know if the Android versions were in there.

Yes, this is a signature change caused by new inlined methods appearing. I'd say we could ignore the whole std::__ndk1 namespace because it's not interesting.

(In reply to Chris Peterson [:cpeterson] from comment #0)

@ André, do you think your LexicalEnvironmentObjects optimizations in bug 1341937 could have caused this Android GC crash?

No, I don't think so. Looking at the complete stack trace from the crash report, the object passed to AutoJSAPI::Init is ModuleLoadRequest::mDynamicPromise, which is the Promise object created in js::StartDynamicModuleImport. The Promise object is passed to Gecko through cx->runtime()->moduleDynamicImportHook. And this is pretty far away from the changes in bug 1341937.

This bug looks more similar to bug 1291535. Maybe it makes sense to add MOZ_RELEASE_ASSERT(aRequest->mDynamicPromise) in ModuleLoaderBase::FinishDynamicImportAndReject() to verify we don't clear mDynamicPromise and are then calling FinishDynamicImportAndReject()?

Flags: needinfo?(andrebargull)

Looks like it's too late for 107.
Is Comment 3 something we should consider for 108 or should this be ignored as per Comment 2

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on release (startup)

:sdetar, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(sdetar)
Depends on: 1798495

Now that we're ignoring the std::__ndk1 namespace (bug 1798495), this crash's signature has changed to js::gc::CellWithTenuredGCPointer<T>::headerPtr in 106. This signature looks pretty generic and affects both Android and desktop, though almost 50% of the crash reports are from Android!

Crash Signature: [@ std::__ndk1::__cxx_atomic_load<T>] → [@ std::__ndk1::__cxx_atomic_load<T>] [@ js::gc::CellWithTenuredGCPointer<T>::headerPtr ]
OS: Android → All
Summary: Android GC crash in [@ std::__ndk1::__cxx_atomic_load<T>] → Crash in [@ js::gc::CellWithTenuredGCPointer<T>::headerPtr ]
Flags: needinfo?(sdetar)

Looking deeper in the stack trace of the crash, this seems like the issue might be related to Modules loading.
Yulia / Andrew, can you have a look at this issue?

Component: JavaScript: GC → DOM: Workers
Flags: needinfo?(ystartsev)
Flags: needinfo?(bugmail)

This isn't related to Workers as modules are not turned on for them, so I can take this.

It looks like this is the problem related to FinishDynamicImportAndReject. This was recently closed in two parts in 1774111 and 1788532 -- and it looks like all of the crashes are pre ~108 when some of that work landed. Jonco was working on this so I'll cc him, at least for monitoring. This can likely be closed as a dup once we identify which fix addressed this.

Flags: needinfo?(ystartsev)
Flags: needinfo?(jcoppeard)
Flags: needinfo?(bugmail)

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

This bug's crash volume dropped from ~500 reports per day to 0 overnight (Dec 1-2). Is this a reporting problem? Even if this crash was fixed, I would expect we would still receive reports from older builds that haven't updated to a fixed build yet.

See Also: → 1774111, 1788532

(In reply to Chris Peterson [:cpeterson] from comment #10)

This bug's crash volume dropped from ~500 reports per day to 0 overnight (Dec 1-2). Is this a reporting problem? Even if this crash was fixed, I would expect we would still receive reports from older builds that haven't updated to a fixed build yet.

mccr8 suggests the signatures might have changed.

[@ JSObject::nonCCWGlobal] is a related signature, though I only see reports from 106 and 107.

Crash report: https://crash-stats.mozilla.org/report/index/cd34f9e1-b7e5-4932-84a6-a50c60221210

There are JSObject::nonCCWGlobal crash reports from all platforms, but 83% are from Android.

Crash Signature: [@ std::__ndk1::__cxx_atomic_load<T>] [@ js::gc::CellWithTenuredGCPointer<T>::headerPtr ] → [@ JSObject::nonCCWGlobal] [@ js::gc::CellWithTenuredGCPointer<T>::headerPtr ] [@ std::__ndk1::__cxx_atomic_load<T>]

Bug 1774111 should have fixed this. It's present in 108 onwards.

Crash Signature: [@ JSObject::nonCCWGlobal] [@ js::gc::CellWithTenuredGCPointer<T>::headerPtr ] [@ std::__ndk1::__cxx_atomic_load<T>] → [@ std::__ndk1::__cxx_atomic_load<T>] [@ js::gc::CellWithTenuredGCPointer<T>::headerPtr ]
Flags: needinfo?(jcoppeard)

(In reply to Jon Coppeard (:jonco) from comment #13)

Bug 1774111 should have fixed this. It's present in 108 onwards.

Thanks! Closing as duplicate seems appropriate then, I'd say.

Status: NEW → RESOLVED
Closed: 2 years ago
Duplicate of bug: 1774111
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.