MOZ_PROFILER_STARTUP=1 leads to Assertion failure: false (MOZ_ASSERT_UNREACHABLE: WorkerGlobalScope alive after worker shutdown), at C:/mozilla-source/mozilla-unified/dom/workers/RuntimeService.cpp:2227
Categories
(Core :: DOM: Workers, defect, P3)
Tracking
()
People
(Reporter: yannis, Unassigned)
References
Details
I can reproduce the error below when building and executing a Windows x64 debug locally though not very reliably. This still happens with the latest mozilla-central after bug 1908240 and it was already happening before (not a regression).
[Parent 29704, DOM Worker] WARNING: 'globalScopeSentinel && globalScopeSentinel->IsAlive()', file C:/mozilla-source/mozilla-unified/dom/workers/RuntimeService.cpp:2226
[29704] Assertion failure: false (MOZ_ASSERT_UNREACHABLE: WorkerGlobalScope alive after worker shutdown), at C:/mozilla-source/mozilla-unified/dom/workers/RuntimeService.cpp:2227
#01: mozilla::dom::workerinternals::`anonymous namespace'::WorkerThreadPrimaryRunnable::Run (C:\mozilla-source\mozilla-unified\dom\workers\RuntimeService.cpp:2227)
#02: nsThread::ProcessNextEvent (C:\mozilla-source\mozilla-unified\xpcom\threads\nsThread.cpp:1150)
#03: NS_ProcessNextEvent (C:\mozilla-source\mozilla-unified\xpcom\threads\nsThreadUtils.cpp:480)
#04: mozilla::ipc::MessagePumpForNonMainThreads::Run (C:\mozilla-source\mozilla-unified\ipc\glue\MessagePump.cpp:300)
#05: MessageLoop::RunHandler (C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc:364)
#06: MessageLoop::Run (C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc:346)
#07: nsThread::ThreadFunc (C:\mozilla-source\mozilla-unified\xpcom\threads\nsThread.cpp:368)
#08: _PR_NativeRunThread (C:\mozilla-source\mozilla-unified\nsprpub\pr\src\threads\combined\pruthr.c:408)
#09: pr_root (C:\mozilla-source\mozilla-unified\nsprpub\pr\src\md\windows\w95thred.c:140)
#10: recalloc[C:\WINDOWS\System32\ucrtbase.dll +0x29333]
#11: BaseThreadInitThunk[C:\WINDOWS\System32\KERNEL32.DLL +0x1257d]
#12: patched_BaseThreadInitThunk (C:\mozilla-source\mozilla-unified\toolkit\xre\dllservices\mozglue\WindowsDllBlocklist.cpp:562)
#13: RtlUserThreadStart[C:\WINDOWS\SYSTEM32\ntdll.dll +0x5af28]
I ran into this because it broke my workflow while working on profiler startup deadlocks. It makes the browser sometimes crash while using the profiler. But it is not related to my own patches, I can reproduce on mozilla-central as well.
First proposal for STR:
./mach buildMOZ_PROFILER_STARTUP=1 ./mach run- Navigate to
lemonde.fr - Stop profiling
- In the profiler window, while symbols are still loading start typing in the filter box
Unfortunately it doesn't reproduce super reliably. Some of the steps above may not be actually useful for reproduction. How can I provide more debug info for this issue?
| Reporter | ||
Comment 1•1 year ago
|
||
NI :edenchuang because you've worked on similar bug bug 1908240. Again this is not a regression, it just appears to be a different occurence of the same MOZ_ASSERT_UNREACHABLE.
Comment 2•1 year ago
|
||
Usually, when you see the same crash signature, it implies that the WorkerGlobalScope lives longer than expected.
For example, in bug 1908240, the root cause is that LockManagerChild::mOwner is not released properly when shutting down the Worker too fast after launching it.
I am unfamiliar with Profiler's codes, but I think there might be some cases during the Worker's shutdown
- The Profiler tries to get Worker's information through WorkerGlobalScope(nsIGlobalObject)
- The Worker sends some data to the Profiler to show information on the Profiler window. But the data keeps the WorkerGlobalScope(nsIGlobalObject) alive.
If it is something in the case 1.
Holding a WorkerRef is the basic way to observe the Worker's shutdown status for the codes out of the Worker's scope. By providing the shutdown callback function for creating WorkerRef, we could handle what to do when the Worker starts shutdown. The functionality that uses WorkerRef usually forbids access to Worker data when not holding a WorkerRef since it means the Worker is in shutdown.
But we also need to handle the WorkerRef creation fails, and bug 1908240 is an example we didn't handle the creation fails.
If it is something in the case 2.
Currently, there is no good way to prevent it automatically. We can only check what Worker sends and avoid keeping WorkerGlobalScope alive.
Let me know if there is anything I can help.
Comment 4•1 year ago
|
||
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #3)
Is this another variant of the shutdown leak?
I believe this is referring to bug 1699681 and https://phabricator.services.mozilla.com/D169610 where the discussion was that a failure to call profiler_clear_js_context would leave some jitcode entries around which would entrain some stuff. That may make the main thread sad, but in workers we call it 100% of the time before we would reach the assertion.
Doing a search of path:profiler nsIGlobalObject the only uses of nsIGlobalObject by profiler code seem to be explicitly main-thread nsIProfiler uses. I suspect this is unrelated to the profiler and just a function of some system ChromeWorker using some API that should hold a WorkerRef but does not. That would explain why things might be so intermittent; remote-settings might not want to actually do much every time the browser restarts, etc.
Note that while this assertion will only trip in debug builds, if you ever find an execution of mozilla::dom::WorkerGlobalScopeBase::NoteWorkerTerminated in a pernosco trace of a non-debug build, that is a pernosco trace that will be actionable for finding an instance of this (family of) bug. That said, it is really likely to get inlined so it might be necessary to use pernosco breakpointy things on the invocation lines.
Thanks for the investigation, Andrew! Removing my needinfo as it looks unrelated to the profiler shutdown leak.
Updated•1 year ago
|
Comment 6•1 year ago
|
||
This is just a note that I can also reproduce this assertion failure on MacOS with a debug + optimize build. I've found that I can reliably reproduce it by searching in the marker chart while symbols are being transferred.
Here's the stacktrace I get:
[Parent 34621, DOM Worker] WARNING: 'globalScopeSentinel && globalScopeSentinel->IsAlive()', file /Users/aabh/Firefox/bootstrap/mozilla-unified/dom/workers/RuntimeService.cpp:2226
[34621] Assertion failure: false (MOZ_ASSERT_UNREACHABLE: WorkerGlobalScope alive after worker shutdown), at /Users/aabh/Firefox/bootstrap/mozilla-unified/dom/workers/RuntimeService.cpp:2227
#01: mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run()[/Users/aabh/Firefox/bootstrap/mozilla-unified/obj-aarch64-apple-darwin23.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x51238a0]
#02: nsThread::ProcessNextEvent(bool, bool*)[/Users/aabh/Firefox/bootstrap/mozilla-unified/obj-aarch64-apple-darwin23.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x57dd9c]
#03: NS_ProcessNextEvent(nsIThread*, bool)[/Users/aabh/Firefox/bootstrap/mozilla-unified/obj-aarch64-apple-darwin23.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5844e8]
#04: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)[/Users/aabh/Firefox/bootstrap/mozilla-unified/obj-aarch64-apple-darwin23.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xfb07f0]
#05: MessageLoop::RunInternal()[/Users/aabh/Firefox/bootstrap/mozilla-unified/obj-aarch64-apple-darwin23.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf1f2dc]
#06: MessageLoop::Run()[/Users/aabh/Firefox/bootstrap/mozilla-unified/obj-aarch64-apple-darwin23.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf1f1d4]
#07: nsThread::ThreadFunc(void*)[/Users/aabh/Firefox/bootstrap/mozilla-unified/obj-aarch64-apple-darwin23.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x579328]
#08: _pt_root[/Users/aabh/Firefox/bootstrap/mozilla-unified/obj-aarch64-apple-darwin23.6.0/dist/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x1a7cc0]
#09: _pthread_start[/usr/lib/system/libsystem_pthread.dylib +0x6f94]
Comment 7•1 year ago
|
||
The profiler's use of workers happens from https://searchfox.org/mozilla-central/source/devtools/client/performance-new/shared/symbolication.sys.mjs .
Comment 8•1 year ago
|
||
Hm, seems like the WASM compilation/instantiation logic might not be meaningfully hooked into the worker lifecycle and this could lead to problems.
It would be amazing if we could get a reproduction under rr/pernosco. Are there any automated tests that might be able to help reproduce this?
Description
•