Closed Bug 1738282 Opened 2 years ago Closed 2 years ago

Crash in [@ shutdownhang | js::frontend::CompilationStencil::~CompilationStencil]

Categories

(Core :: JavaScript Engine, defect, P2)

All
Windows
defect

Tracking

()

RESOLVED FIXED
96 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox94 --- wontfix
firefox95 --- wontfix
firefox96 --- fixed

People

(Reporter: aryx, Assigned: arai)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

New Windows crash in Firefox 94, mostly with 32-bit builds after >1h usage. Is this a change of a crash signature? 120 crashes across betas RCs so far.

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/cfa3fc75-f911-4515-bf58-75fa60211028

MOZ_CRASH Reason: Shutdown hanging at step xpcom-shutdown. Something is blocking the main-thread.

Top 10 frames of crashing thread:

0 xul.dll js::frontend::CompilationStencil::~CompilationStencil js/src/frontend/CompilationStencil.h:651
1 xul.dll JS::StencilRelease js/src/frontend/Stencil.cpp:3984
2 xul.dll mozilla::ScriptPreloader::CachedStencil::~CachedStencil js/xpconnect/loader/ScriptPreloader.h:184
3 xul.dll static nsTHashtable<nsBaseHashtableET<nsCStringHashKey, mozilla::UniquePtr<mozilla::ScriptPreloader::CachedStencil, mozilla::DefaultDelete<mozilla::ScriptPreloader::CachedStencil> > > >::s_ClearEntry xpcom/ds/nsTHashtable.h:720
4 xul.dll PLDHashTable::Clear xpcom/ds/PLDHashTable.cpp:311
5 xul.dll mozilla::ScriptPreloader::Observe js/xpconnect/loader/ScriptPreloader.cpp:329
6 xul.dll nsObserverList::NotifyObservers xpcom/ds/nsObserverList.cpp:70
7 xul.dll nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:291
8 xul.dll static mozilla::AppShutdown::AdvanceShutdownPhase xpcom/base/AppShutdown.cpp:379
9 xul.dll mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:610

Arai,is this something you could look into? If not let me know and I can find someone else.

Flags: needinfo?(arai.unmht)
Priority: -- → P2

Looking at crashing, it seems that we are hanging on a simple memory-load that is 4kB aligned. The stencil data is likely to be cold and swapped out after 1hr, so the "hang" is very likely just a slow disk fetching back memory so we can free it. In 94 we changed the data format of this cache so the new signature there makes sense.

On investigation with :arai and :mccr8, it seems odd we are bothering to clean up the data in release builds. We probably should stop doing that and it should solve the crash. Arai is exploring removing the table cleanup code. The cleanup code was largely added previously when they were GC pointers.

Assignee: nobody → arai.unmht
Status: NEW → ASSIGNED

discussed some more with :tcampbell, :mccr8, and :nika.

It turns out that we're doing the free operation for Stencils in too early stage inside ShutdownXPCOM for the current code (without GC pointers):

ShutdownXPCOM -> "xpcom-shutdown" notification -> ScriptPreloader::Cleanup -> ScriptPreloader::CachedStencil::~CachedStencil -> CompilationStencil::~CompilationStencil

And delaying it to the later step solves it because we perform fast shutdown on release build:

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/modules/libpref/init/StaticPrefList.yaml#11338-11347
https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/xpcom/base/AppShutdown.cpp#93-106
https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/xpcom/build/XPCOMInit.cpp#714-715

nsresult ShutdownXPCOM(nsIServiceManager* aServMgr) {
...
  mozilla::AppShutdown::MaybeFastShutdown(
      mozilla::ShutdownPhase::CCPostLastCycleCollection);

So, if we move the free operation after that point, it's not performed on release build (but OS will perform cleanup for us), and it's performed on other builds (debug, LSan etc) to make sure everything is freed properly.

Currently, ScriptPreloader::Cleanup is called during "xpcom-shutdown" notification,
and ScriptPreloader dtor is called during ClearOnShutdown for XPCOMShutdownFinal phase,
and mmap-ed data is freed during ClearOnShutdown for JSPostShutDown phase.

the relation between those 4 calls are the following:

nsresult ShutdownXPCOM(nsIServiceManager* aServMgr) {
...
      mozilla::AppShutdown::AdvanceShutdownPhase(
          mozilla::ShutdownPhase::XPCOMShutdown, nullptr,
          do_QueryInterface(mgr));
...
    mozilla::KillClearOnShutdown(ShutdownPhase::XPCOMShutdownFinal);
...
  mozilla::AppShutdown::MaybeFastShutdown(
      mozilla::ShutdownPhase::CCPostLastCycleCollection);
...
  mozilla::KillClearOnShutdown(ShutdownPhase::JSPostShutDown);

So, if we move the free operation into JSPostShutDown phase, or somewhere (maybe new JSPreShutDown if necessary), we won't hit hang due to slow page-in.

Stencils should be released before JSRuntime::destroyRuntime, that happens in (nsComponentManagerImpl::gComponentManager)->Shutdown().
So, ScriptPreloader should be released between CCPostLastCycleCollection fast shutdown and it,
and actually, we just need to put it next to StartupCache::DeleteSingleton.

maybe we should directly call ScriptPreloader's static method, both for ScriptPreloader singleton and AutoMemMap singleton,
to align with StartupCache, instead of adding shutdown phase.

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/xpcom/build/XPCOMInit.cpp#724

nsresult ShutdownXPCOM(nsIServiceManager* aServMgr) {
...
  mozilla::AppShutdown::MaybeFastShutdown(
      mozilla::ShutdownPhase::CCPostLastCycleCollection);

  mozilla::scache::StartupCache::DeleteSingleton();

  PROFILER_MARKER_UNTYPED("Shutdown xpcom", OTHER);

  // Shutdown xpcom. This will release all loaders and cause others holding
  // a refcount to the component manager to release it.
  if (nsComponentManagerImpl::gComponentManager) {
    rv = (nsComponentManagerImpl::gComponentManager)->Shutdown();

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/xpcom/components/nsComponentManager.cpp#852

nsresult nsComponentManagerImpl::Shutdown(void) {
...
  StaticComponents::Shutdown();

https://searchfox.org/mozilla-central/source/__GENERATED__/__linux64__/xpcom/components/StaticComponents.cpp#12410

/* static */ void StaticComponents::Shutdown() {
  CallUnloadFuncs();

https://searchfox.org/mozilla-central/source/__GENERATED__/__linux64__/xpcom/components/StaticComponents.cpp#8868

static void CallUnloadFuncs() {
...
    nsLayoutModuleDtor();

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/layout/build/nsLayoutModule.cpp#263

void nsLayoutModuleDtor() {
...
  xpcModuleDtor();

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/js/xpconnect/src/XPCModule.cpp#18

void xpcModuleDtor() {
  // Release our singletons
  nsXPConnect::ReleaseXPConnectSingleton();

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/js/xpconnect/src/nsXPConnect.cpp#164

void nsXPConnect::ReleaseXPConnectSingleton() {
  nsXPConnect* xpc = gSelf;
  if (xpc) {
    nsrefcnt cnt;
    NS_RELEASE2(xpc, cnt);
  }

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/js/xpconnect/src/nsXPConnect.cpp#129

nsXPConnect::~nsXPConnect() {
...
  delete mContext;

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/js/xpconnect/src/xpcprivate.h#303

class XPCJSContext final : public mozilla::CycleCollectedJSContext,

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/xpcom/base/CycleCollectedJSContext.cpp#110

CycleCollectedJSContext::~CycleCollectedJSContext() {
...
  JS_DestroyContext(mJSContext);

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/js/src/jsapi.cpp#394

JS_PUBLIC_API void JS_DestroyContext(JSContext* cx) { DestroyContext(cx); }

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/js/src/vm/JSContext.cpp#244

void js::DestroyContext(JSContext* cx) {
...
  rt->destroyRuntime();

https://searchfox.org/mozilla-central/rev/267682a8f45221bf0bfe999d4a0239706a43bc56/js/src/vm/Runtime.cpp#301-304

void JSRuntime::destroyRuntime() {
...
  {
    AutoLockScriptData lock(this);
    MOZ_ASSERT(scriptDataTable(lock).empty());
  }

This stops destructing the stencils held by ScriptPreloader on release build,
to avoid the shutdown hang caused by thrashing.

Also:

  • Move ScriptPreloader::Cleanup from xpcom-shutdown notification to ScriptPreloader destructor, given there's no reason for having 2 steps
  • Replace ClearOnShutdown call with explicit function call for the singleton deletion, for simplicity
  • Remove now-unused PostJSShutDown phase
Flags: needinfo?(arai.unmht)
Pushed by arai_a@mac.com:
https://hg.mozilla.org/integration/autoland/rev/7a328949326b
Delete ScriptPreloader singleton after CCPostLastCycleCollection phase. r=mccr8
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 96 Branch

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

Please nominate this for Beta approval when you get a chance.

Flags: needinfo?(arai.unmht)

There's another crash possibly caused by this patch, and now I'm investigating.
I'd like to wait for a while before asking uplift.

Regressions: 1742381
Flags: needinfo?(arai.unmht)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: