Closed Bug 1709184 Opened 3 years ago Closed 1 year ago

Intermittent Assertion failure: false (ClearOnShutdown for phase that already was cleared), at /builds/worker/checkouts/gecko/xpcom/base/ClearOnShutdown.cpp:20

Categories

(Core :: DOM: Core & HTML, defect)

defect

Tracking

()

RESOLVED FIXED
110 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox108 --- wontfix
firefox109 --- wontfix
firefox110 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jstutte)

References

Details

(Keywords: assertion, intermittent-failure)

Attachments

(2 files, 1 obsolete file)

Filed by: malexandru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=338567772&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/KP3aTPQOSQqESyjzLKEDKA/runs/0/artifacts/public/logs/live_backing.log


[task 2021-05-03T19:00:29.410Z] 19:00:29     INFO - TEST-OK | docshell/test/mochitest/test_bug529119-1.html | took 1897ms
[task 2021-05-03T19:00:29.439Z] 19:00:29     INFO - GECKO(4429) | [Child 4433, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-03T19:00:29.448Z] 19:00:29     INFO - TEST-START | docshell/test/mochitest/test_bug529119-2.html
[task 2021-05-03T19:00:29.457Z] 19:00:29     INFO - GECKO(4429) | [Child 4433, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-03T19:00:29.469Z] 19:00:29     INFO - GECKO(4429) | [Child 4433, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-03T19:00:29.479Z] 19:00:29     INFO - GECKO(4429) | [Child 4433, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-03T19:00:29.489Z] 19:00:29     INFO - GECKO(4429) | [Child 4433, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-03T19:00:29.605Z] 19:00:29     INFO - GECKO(4429) | [Child 4478, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111 (NS_ERROR_NOT_AVAILABLE): file /builds/worker/checkouts/gecko/caps/BasePrincipal.cpp:111
[task 2021-05-03T19:00:29.612Z] 19:00:29     INFO - GECKO(4429) | [Child 4478, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111 (NS_ERROR_NOT_AVAILABLE): file /builds/worker/checkouts/gecko/caps/BasePrincipal.cpp:111
[task 2021-05-03T19:00:29.637Z] 19:00:29     INFO - GECKO(4429) | [Child 4478, Main Thread] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp:3357
[task 2021-05-03T19:00:29.637Z] 19:00:29     INFO - GECKO(4429) | [Child 4478, Main Thread] WARNING: '!obs', file /builds/worker/checkouts/gecko/toolkit/components/sessionstore/RestoreTabContentObserver.cpp:58
[task 2021-05-03T19:00:29.649Z] 19:00:29     INFO - GECKO(4429) | [Child 4478, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: file /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:4442
[task 2021-05-03T19:00:29.833Z] 19:00:29     INFO - GECKO(4429) | [Child 4481, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111 (NS_ERROR_NOT_AVAILABLE): file /builds/worker/checkouts/gecko/caps/BasePrincipal.cpp:1349
[task 2021-05-03T19:00:29.845Z] 19:00:29     INFO - GECKO(4429) | [Child 4481, Main Thread] WARNING: nsAppShell::Exit() called redundantly: file /builds/worker/checkouts/gecko/widget/cocoa/nsAppShell.mm:751
[task 2021-05-03T19:00:29.846Z] 19:00:29     INFO - GECKO(4429) | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-05-03T19:00:29.848Z] 19:00:29     INFO - GECKO(4429) | [Child 4481, Main Thread] ###!!! ASSERTION: Using observer service after XPCOM shutdown!: 'Error', file /builds/worker/checkouts/gecko/xpcom/ds/nsObserverService.cpp:176
[task 2021-05-03T19:00:29.853Z] 19:00:29     INFO -  Initializing stack-fixing for the first stack frame, this may take a while...
[task 2021-05-03T19:00:38.134Z] 19:00:38     INFO - GECKO(4429) | #01: NS_DebugBreak [xpcom/base/nsDebugImpl.cpp:431]
[task 2021-05-03T19:00:38.135Z] 19:00:38     INFO - GECKO(4429) | #02: nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) [xpcom/ds/nsObserverService.cpp:274]
[task 2021-05-03T19:00:38.135Z] 19:00:38     INFO - GECKO(4429) | #03: mozilla::dom::Document::DispatchContentLoadedEvents() [dom/base/Document.cpp:7657]
[task 2021-05-03T19:00:38.136Z] 19:00:38     INFO - GECKO(4429) | #04: mozilla::detail::RunnableMethodImpl<mozilla::dom::Document*, void (mozilla::dom::Document::*)(), true, (mozilla::RunnableKind)0, >::Run() [xpcom/threads/nsThreadUtils.h:1204]
[task 2021-05-03T19:00:38.136Z] 19:00:38     INFO - GECKO(4429) | #05: mozilla::SchedulerGroup::Runnable::Run() [xpcom/threads/SchedulerGroup.cpp:143]
[task 2021-05-03T19:00:38.136Z] 19:00:38     INFO - GECKO(4429) | #06: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:483]
[task 2021-05-03T19:00:38.137Z] 19:00:38     INFO - GECKO(4429) | #07: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:766]
[task 2021-05-03T19:00:38.137Z] 19:00:38     INFO - GECKO(4429) | #08: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:621]
[task 2021-05-03T19:00:38.138Z] 19:00:38     INFO - GECKO(4429) | #09: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:405]
[task 2021-05-03T19:00:38.138Z] 19:00:38     INFO - GECKO(4429) | #10: mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() [xpcom/threads/nsThreadUtils.h:535]
[task 2021-05-03T19:00:38.139Z] 19:00:38     INFO - GECKO(4429) | #11: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1163]
[task 2021-05-03T19:00:38.139Z] 19:00:38     INFO - GECKO(4429) | #12: NS_ProcessPendingEvents(nsIThread*, unsigned int) [xpcom/threads/nsThreadUtils.cpp:496]
[task 2021-05-03T19:00:38.139Z] 19:00:38     INFO - GECKO(4429) | #13: nsThreadManager::Shutdown() [xpcom/threads/nsThreadManager.cpp:372]
[task 2021-05-03T19:00:38.140Z] 19:00:38     INFO - GECKO(4429) | #14: mozilla::ShutdownXPCOM(nsIServiceManager*) [xpcom/build/XPCOMInit.cpp:657]
[task 2021-05-03T19:00:38.140Z] 19:00:38     INFO - GECKO(4429) | #15: XRE_TermEmbedding() [toolkit/xre/nsEmbedFunctions.cpp:216]
[task 2021-05-03T19:00:38.140Z] 19:00:38     INFO - GECKO(4429) | #16: mozilla::ipc::ScopedXREEmbed::Stop() [ipc/glue/ScopedXREEmbed.cpp:91]
[task 2021-05-03T19:00:38.141Z] 19:00:38     INFO - GECKO(4429) | #17: XRE_InitChildProcess(int, char**, XREChildData const*) [toolkit/xre/nsEmbedFunctions.cpp:748]
[task 2021-05-03T19:00:38.141Z] 19:00:38     INFO - GECKO(4429) | #18: main [ipc/app/MozillaRuntimeMain.cpp:72]
[task 2021-05-03T19:00:38.142Z] 19:00:38     INFO - GECKO(4429) | Assertion failure: false (ClearOnShutdown for phase that already was cleared), at /builds/worker/checkouts/gecko/xpcom/base/ClearOnShutdown.cpp:20
[task 2021-05-03T19:00:38.142Z] 19:00:38     INFO - GECKO(4429) | #01: mozilla::ClearOnShutdown_Internal::InsertIntoShutdownList(mozilla::ClearOnShutdown_Internal::ShutdownObserver*, mozilla::ShutdownPhase) [xpcom/base/ClearOnShutdown.cpp:20]
[task 2021-05-03T19:00:38.142Z] 19:00:38     INFO - GECKO(4429) | #02: mozilla::dom::ServiceWorkerManager::GetInstance() [dom/serviceworkers/ServiceWorkerManager.cpp:1623]
[task 2021-05-03T19:00:38.143Z] 19:00:38     INFO - GECKO(4429) | #03: mozilla::dom::Document::DispatchContentLoadedEvents() [dom/base/Document.cpp:7760]
[task 2021-05-03T19:00:38.143Z] 19:00:38     INFO - GECKO(4429) | #04: mozilla::detail::RunnableMethodImpl<mozilla::dom::Document*, void (mozilla::dom::Document::*)(), true, (mozilla::RunnableKind)0, >::Run() [xpcom/threads/nsThreadUtils.h:1204]
[task 2021-05-03T19:00:38.144Z] 19:00:38     INFO - GECKO(4429) | #05: mozilla::SchedulerGroup::Runnable::Run() [xpcom/threads/SchedulerGroup.cpp:143]
[task 2021-05-03T19:00:38.144Z] 19:00:38     INFO - GECKO(4429) | #06: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:483]
[task 2021-05-03T19:00:38.144Z] 19:00:38     INFO - GECKO(4429) | #07: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:766]
[task 2021-05-03T19:00:38.145Z] 19:00:38     INFO - GECKO(4429) | #08: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:621]
[task 2021-05-03T19:00:38.145Z] 19:00:38     INFO - GECKO(4429) | #09: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:405]
[task 2021-05-03T19:00:38.146Z] 19:00:38     INFO - GECKO(4429) | #10: mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() [xpcom/threads/nsThreadUtils.h:535]
[task 2021-05-03T19:00:38.146Z] 19:00:38     INFO - GECKO(4429) | #11: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1163]
[task 2021-05-03T19:00:38.146Z] 19:00:38     INFO - GECKO(4429) | #12: NS_ProcessPendingEvents(nsIThread*, unsigned int) [xpcom/threads/nsThreadUtils.cpp:496]
[task 2021-05-03T19:00:38.147Z] 19:00:38     INFO - GECKO(4429) | #13: nsThreadManager::Shutdown() [xpcom/threads/nsThreadManager.cpp:372]
[task 2021-05-03T19:00:38.147Z] 19:00:38     INFO - GECKO(4429) | #14: mozilla::ShutdownXPCOM(nsIServiceManager*) [xpcom/build/XPCOMInit.cpp:657]
[task 2021-05-03T19:00:38.148Z] 19:00:38     INFO - GECKO(4429) | #15: XRE_TermEmbedding() [toolkit/xre/nsEmbedFunctions.cpp:216]
[task 2021-05-03T19:00:38.148Z] 19:00:38     INFO - GECKO(4429) | #16: mozilla::ipc::ScopedXREEmbed::Stop() [ipc/glue/ScopedXREEmbed.cpp:91]
[task 2021-05-03T19:00:38.149Z] 19:00:38     INFO - GECKO(4429) | #17: XRE_InitChildProcess(int, char**, XREChildData const*) [toolkit/xre/nsEmbedFunctions.cpp:748]
[task 2021-05-03T19:00:38.149Z] 19:00:38     INFO - GECKO(4429) | #18: main [ipc/app/MozillaRuntimeMain.cpp:72]
[task 2021-05-03T19:00:38.149Z] 19:00:38     INFO - GECKO(4429) | [Parent 4429, Unnamed thread 125018ed0] WARNING: Resource acquired is being released in non-LIFO order; why?
[task 2021-05-03T19:00:38.150Z] 19:00:38     INFO - GECKO(4429) | : file /builds/worker/checkouts/gecko/xpcom/threads/BlockingResourceBase.cpp:290
[task 2021-05-03T19:00:38.150Z] 19:00:38     INFO - GECKO(4429) | --- Mutex : dumpSafetyLock (currently acquired)
[task 2021-05-03T19:00:38.150Z] 19:00:38     INFO - GECKO(4429) |  calling context
[task 2021-05-03T19:00:38.151Z] 19:00:38     INFO - GECKO(4429) |   [stack trace unavailable]
[task 2021-05-03T19:00:38.151Z] 19:00:38     INFO - GECKO(4429) | [4488, MainThread] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/worker/checkouts/gecko/xpcom/base/nsTraceRefcnt.cpp:204
[task 2021-05-03T19:00:38.152Z] 19:00:38     INFO - GECKO(4429) | [4488, MainThread] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/worker/checkouts/gecko/xpcom/base/nsTraceRefcnt.cpp:204
[task 2021-05-03T19:00:38.152Z] 19:00:38     INFO - GECKO(4429) | ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /var/folders/ds/9_p3r7s95p7_1rx9lt0_5wg0000014/T/tmppfg2c9j5.mozrunner/runtests_leaks_tab_pid4488.log
[task 2021-05-03T19:00:38.152Z] 19:00:38     INFO - GECKO(4429) | [4488, MainThread] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/worker/checkouts/gecko/xpcom/base/nsTraceRefcnt.cpp:204
[task 2021-05-03T19:00:38.153Z] 19:00:38     INFO - GECKO(4429) | [4488, MainThread] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/worker/checkouts/gecko/xpcom/base/nsTraceRefcnt.cpp:204
[task 2021-05-03T19:00:38.153Z] 19:00:38     INFO - GECKO(4429) | [4488, Main Thread] WARNING: XPCOM_MEM_BLOAT_LOG is set, disabling native allocations.: file /builds/worker/checkouts/gecko/tools/profiler/core/platform.cpp:254
[task 2021-05-03T19:00:38.154Z] 19:00:38     INFO - GECKO(4429) | MEMORY STAT | vsize 6745MB | residentFast 105MB | heapAllocated 14MB
[task 2021-05-03T19:00:38.154Z] 19:00:38     INFO - Buffered messages logged at 19:00:29
[task 2021-05-03T19:00:38.155Z] 19:00:38     INFO - TEST-PASS | docshell/test/mochitest/test_bug529119-2.html | A valid string reason is expected 
[task 2021-05-03T19:00:38.155Z] 19:00:38     INFO - TEST-PASS | docshell/test/mochitest/test_bug529119-2.html | Reason cannot be empty 
[task 2021-05-03T19:00:38.155Z] 19:00:38     INFO - TEST-FAIL | docshell/test/mochitest/test_bug529119-2.html | The author of the test has indicated that flaky timeouts are expected.  Reason: untriaged 
[task 2021-05-03T19:00:38.156Z] 19:00:38     INFO - TEST-PASS | docshell/test/mochitest/test_bug529119-2.html | Waiting for error page succeeded 
[task 2021-05-03T19:00:38.156Z] 19:00:38     INFO - TEST-FAIL | docshell/test/mochitest/test_bug529119-2.html | The author of the test has indicated that flaky timeouts are expected.  Reason: untriaged 
[task 2021-05-03T19:00:38.157Z] 19:00:38     INFO - Buffered messages logged at 19:00:30
[task 2021-05-03T19:00:38.157Z] 19:00:38     INFO - TEST-PASS | docshell/test/mochitest/test_bug529119-2.html | Waiting for original page succeeded 
[task 2021-05-03T19:00:38.157Z] 19:00:38     INFO - TEST-PASS | docshell/test/mochitest/test_bug529119-2.html | Is on the previous page 
[task 2021-05-03T19:00:38.158Z] 19:00:38     INFO - Buffered messages finished
[task 2021-05-03T19:00:38.158Z] 19:00:38     INFO - TEST-UNEXPECTED-ERROR | docshell/test/mochitest/test_bug529119-2.html | unexpected-crash-dump-found - This test left crash dumps behind, but we weren't expecting it to!
[task 2021-05-03T19:00:38.159Z] 19:00:38     INFO - Found unexpected crash dump file /var/folders/ds/9_p3r7s95p7_1rx9lt0_5wg0000014/T/tmppfg2c9j5.mozrunner/minidumps/9E446040-33D9-44D5-AE13-CFC962C5C4AC.dmp.
[task 2021-05-03T19:00:38.159Z] 19:00:38     INFO - TEST-UNEXPECTED-CRASH | docshell/test/mochitest/test_bug529119-2.html | Finished in 1553ms
[task 2021-05-03T19:00:38.159Z] 19:00:38     INFO - {'runtime': 1553}
[task 2021-05-03T19:00:38.159Z] 19:00:38     INFO - TEST-INFO took 1553ms
[task 2021-05-03T19:00:38.160Z] 19:00:38     INFO - TEST-START | docshell/test/mochitest/test_bug530396.html```

It looks like Document::DispatchContentLoadedEvents() is running very late in shutdown.

Component: XPCOM → DOM: Core & HTML

Yeah, and deeper inside in the event processing we try to create a new service worker registrar which then fails during resurrection, despite the comment in the beginning saying, we want to avoid resurrection.

To my feeling, we should improve and use AppShutdown::IsShuttingDown() more often. I see two ToDos here:

  1. The current implementation of AppShutdown::IsShuttingDown() relies on AppShutdown::OnShutdownConfirmed(); being called. Instead it should probably just check, if AppShutdown::GetCurrentShutdownPhase() >= ShutdownPhase::AppShutdownConfirmed.

  2. Introduce more checks for shutdown in the following places:

  • generation of new global objects (like the service worker registrar here)
  • dispatching of particular events, throwing NS_ERROR_ILLEGAL_DURING_SHUTDOWN

A more advanced architectural change could be, to associate to each event type a latest possible shutdown phase for this event and check that directly during dispatch (with an assert in debug and an if in release). The default would be again ShutdownPhase::AppShutdownConfirmed.

Flags: needinfo?(nika)
Flags: needinfo?(bugmail)

I think this is https://bugzilla.mozilla.org/show_bug.cgi?id=1697972#c13 where I found that I think this set of symptoms is happening because a hidden window is being accidentally created that races xpcshell shutdown on windows and this starts up the ServiceWorkerManager. Fixing that fell off my radar but is likely the right fix to fix a whole class of this problem.

Flags: needinfo?(bugmail)
Flags: needinfo?(nika)

That said, I do agree that ServiceWorkerManager and other singleton getters should check the shutdown phase and return an error when in shutdown. And potentially we should make sure xpcshell's shutdown routine makes sure that the shutdown checks correctly indicate this.

We should trace a path forward here (at least in terms of todos/bugs).

Flags: needinfo?(jstutte)
Depends on: 1716530

The high frequency failure that started a few hours ago with the same assertion but a different stack was handled in bug 1753598.

(In reply to Intermittent Failures Robot from comment #21)

4 failures in 3497 pushes (0.001 failures/push) were associated with this bug in the last 7 days.

Repository breakdown:

  • autoland: 2
  • mozilla-central: 2

Platform and build breakdown:

  • windows10-32-2004-qr: 4
    • debug: 4

For more details, see:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1709184&startday=2022-07-18&endday=2022-07-24&tree=all

At least one instance here seems to be caused by a late initialization of MCSInfo. This singleton has been introduced by bug 1766307.

Flags: needinfo?(jstutte) → needinfo?(azebrowski)

Thanks for this, taking a look now.

Flags: needinfo?(azebrowski)

Remove XPCOM shutdown listener from MCSInfo to remove possibility of the listener starting after the shutdown process has already begun.

Assignee: nobody → azebrowski
Status: NEW → ASSIGNED

Comment on attachment 9287599 [details]
Bug 1709184 - Intermittent Assertion failure: false (ClearOnShutdown for phase that already was cleared), at /builds/worker/checkouts/gecko/xpcom/base/ClearOnShutdown.cpp:20

Beta/Release Uplift Approval Request

  • User impact if declined: None
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Small patch that simplifies deallocation logic + adds several additional asserts. Treeherder link showing failure reduction: https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2022-07-17&endday=2022-07-28&tree=all&bug=1709184
  • String changes made/needed:
  • Is Android affected?: Unknown
Attachment #9287599 - Flags: approval-mozilla-beta?

Moved the patch over from Bug 1781259 as it fixes this issue.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

Would like to update the patch's commit message after moving it over, but Phabricator isn't cooperating and I have to call it a night. I'll check back tomorrow + plan to update it to be:

Remove XPCOM shutdown listener from MCSInfo to remove possibility of the listener starting after the shutdown process has already begun.

Thanks!

(In reply to az [:az] from comment #29)

Moved the patch over from Bug 1781259 as it fixes this issue.

As shown by the past, there might be different causes for this assertion, not just one (though this is probably the most frequent currently). I'd prefer in these cases to create new bugs for specific fixes and make this bug depend on them.

Depends on: 1782313

Reopening as this will be fixed in Bug 1782313

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

:az Is the beta uplift still needed?

Flags: needinfo?(azebrowski)

Comment on attachment 9287599 [details]
Bug 1709184 - Intermittent Assertion failure: false (ClearOnShutdown for phase that already was cleared), at /builds/worker/checkouts/gecko/xpcom/base/ClearOnShutdown.cpp:20

Revision D152954 was moved to bug 1782313. Setting attachment 9287599 [details] to obsolete.

Attachment #9287599 - Attachment is obsolete: true
Attachment #9287599 - Flags: approval-mozilla-beta?

.

Assignee: azebrowski → nobody
Flags: needinfo?(azebrowski)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended]

There seem to be some cases where we dispatch a RunOnShutdown to the main thread well before we have entered shutdown but end up executing that runnable only after we reached XPCOMShutdown.

We can either check inside the lambda of the "MCSInfo::MCSInfo" runnable if we are in shutdown (and just do the deletion in case) or we might want to check if this assert is actually needed or if the release behavior isn't always the right behavior. Or we want to have a thread-safe InsertIntoShutdownList such that we can avoid the thread hopping just for inserting?

Flags: needinfo?(kmaglione+bmo)

(In reply to Jens Stutte [:jstutte] from comment #45)

There seem to be some cases where we dispatch a RunOnShutdown to the main thread well before we have entered shutdown but end up executing that runnable only after we reached XPCOMShutdown.

We can either check inside the lambda of the "MCSInfo::MCSInfo" runnable if we are in shutdown (and just do the deletion in case) or we might want to check if this assert is actually needed or if the release behavior isn't always the right behavior. Or we want to have a thread-safe InsertIntoShutdownList such that we can avoid the thread hopping just for inserting?

The assertion is needed, unfortunately. Without it, we could run into all sorts of undefined behavior when things run too late in shutdown. The most common manifestation, though, is that someone calls a getter that calls ClearOnShutdown after the target shutdown phase, ClearOnShutdown immediately clears the target, the getter returns null, and we then null pointer crash instead of asserting. The assertion is mainly there to make it clearer where the failure really is, rather than trying to work backward from a null pointer crash.

For this particular case, I think the solution may be to call NS_ProcessPendingEvents before advancing to the XPCOMShutdown phase like we do before progressing to later shutdown phases. That way any runnable dispatched before XPCOMShutdown will be guaranteed to run in time to safely register its shutdown observers.

That said... Are you sure that runnable is actually guaranteed to be dispatched before shutdown has started? Because, without a shutdown blocker, you generally can't guarantee that for background thread code. And I don't see a shutdown blocker anywhere in that code.

Flags: needinfo?(kmaglione+bmo)

(In reply to Kris Maglione [:kmag] from comment #46)

The assertion is needed, unfortunately. Without it, we could run into all sorts of undefined behavior when things run too late in shutdown. The most common manifestation, though, is that someone calls a getter that calls ClearOnShutdown after the target shutdown phase, ClearOnShutdown immediately clears the target, the getter returns null, and we then null pointer crash instead of asserting. The assertion is mainly there to make it clearer where the failure really is, rather than trying to work backward from a null pointer crash.

It seems to me that in release builds InsertIntoShutdownList (used by RunOnShutdown) handles this by just calling aObserver->Shutdown(); immediately. Obviously this can still have any sorts of unexpected side-effects, so I think the intention of that assert is generally right.

For this particular case, I think the solution may be to call NS_ProcessPendingEvents before advancing to the XPCOMShutdown phase like we do before progressing to later shutdown phases. That way any runnable dispatched before XPCOMShutdown will be guaranteed to run in time to safely register its shutdown observers.

Thanks, that seems like something we should do, yes. Probably also before XPCOMWillShutdown.

That said... Are you sure that runnable is actually guaranteed to be dispatched before shutdown has started? Because, without a shutdown blocker, you generally can't guarantee that for background thread code. And I don't see a shutdown blocker anywhere in that code.

At least MCSInfo::GetInstance does check AppShutdown::IsInOrBeyond(ShutdownPhase::AppShutdownConfirmed) at its beginning, but as we seem to be off-main-thread here that can race right afterwards with whatever is happening on the main thread? This would be probably a candidate singleton for something like bug 1796566 (and gives me some input for the thread safety questions I have there). Instantiating singletons and thus adding a blocker for them should be protected by a mutex that prevents us from advancing shutdown phases in parallel in such situations, I think. But in the meantime let's try with processing the pending events before each AdvanceShutdownPhase to improve the situation here (and potentially for other off-main-thread instantiated singletons we already have and that follow the same pattern).

Assignee: nobody → jstutte

(In reply to Jens Stutte [:jstutte] from comment #47)

It seems to me that in release builds InsertIntoShutdownList (used by RunOnShutdown) handles this by just calling aObserver->Shutdown(); immediately. Obviously this can still have any sorts of unexpected side-effects, so I think the intention of that assert is generally right.

Yes, that's what I mean. The typical pattern is:

static StaticRefPtr<Thing> sThing;

if (!sThing) {
  sThing = new Thing();
  ClearOnShutdown(&sThing);
}
return sThing;

Then the shutdown callback runs immediately, the function returns null, and the caller crashes on the null deref. But there are often worse problems from just calling the constructor during shutdown, since it often winds up reinitializing things which have already been shutdown, or calling into code which isn't designed to work during shutdown.

That said... Are you sure that runnable is actually guaranteed to be dispatched before shutdown has started? Because, without a shutdown blocker, you generally can't guarantee that for background thread code. And I don't see a shutdown blocker anywhere in that code.

At least MCSInfo::GetInstance does check AppShutdown::IsInOrBeyond(ShutdownPhase::AppShutdownConfirmed) at its beginning, but as we seem to be off-main-thread here that can race right afterwards with whatever is happening on the main thread?

Yeah, technically that could race, but I guess it's unlikely enough that it probably isn't especially relevant to the assertion. Processing the pending events is likely enough to at least make the intermittent go away.

See Also: → 1797688
Attachment #9300060 - Attachment description: Bug 1709184 - Process pending MT events before advancing the shutdown phase. r?#xpcom-reviewers → Bug 1709184 - Part 10: Process pending MT events before advancing the shutdown phase. r?#xpcom-reviewers
Attachment #9300175 - Attachment description: Bug 1709184 - Add a paranoia check against re-entrance in AdvanceShutdownPhase. r?#xpcom-reviewers → Bug 1709184 - Part 11: Add a paranoia check against re-entrance in AdvanceShutdownPhase. r?#xpcom-reviewers
See Also: → 1799235
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ac48b4d61b39
Part 10: Process pending MT events before advancing the shutdown phase. r=xpcom-reviewers,nika,kmag
https://hg.mozilla.org/integration/autoland/rev/46d7864a45df
Part 11: Add a paranoia check against re-entrance in AdvanceShutdownPhase. r=xpcom-reviewers,nika

(In reply to Natalia Csoregi [:nataliaCs] from comment #53)

* [SUMMARY: AddressSanitizer: SEGV /builds/worker/checkouts/gecko/xpcom/base/nsISupportsImpl.cpp:53:12 in nsAutoOwningEventTarget](https://treeherder.mozilla.org/logviewer?job_id=396846889&repo=autoland&lineNumber=3527)

This one is interesting. IIUC we want to shut down the component manager and in doing so we want to create an AutoSafeJSContext that ends up trying to assign something to mUnprivilegedJunkScope.

This wants to create a WeakReference and the constructor does a MOZ_WEAKPTR_INIT_THREAD_SAFETY_CHECK which wants to create a nsAutoOwningEventTarget that will find GetCurrentSerialEventTarget() being nullptr.

In terms what the patch might have changed here: Indeed we removed this NS_ProcessPendingEvents(thread); which inside AdvanceShutdownPhase is only executed, if we had to notify our observers - which is unlikely here, as we just killed the observer service a few lines above.

This would be repaired by D160628, I think, so landing the entire stack might be the better call here.

Flags: needinfo?(jstutte)
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1181bdcde598
Part 10: Process pending MT events before advancing the shutdown phase. r=xpcom-reviewers,nika,kmag
https://hg.mozilla.org/integration/autoland/rev/a4c2b2d0b63a
Part 11: Add a paranoia check against re-entrance in AdvanceShutdownPhase. r=xpcom-reviewers,nika

Backed out 3 changesets (Bug 1768581, Bug 1709184) for causing multiple failures on LateWriteChecks.cpp.
Backout link
Push with failures <--> gpu
Failure Log
Also R1 Failure Log, Also R2 Failure Log, ...

Also X4 almost perma --> Failure Log

Flags: needinfo?(jstutte)

Wow.

Flags: needinfo?(jstutte)
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/032bf39acabb
Part 10: Process pending MT events before advancing the shutdown phase. r=xpcom-reviewers,nika,kmag
https://hg.mozilla.org/integration/autoland/rev/c7465ec6154e
Part 11: Add a paranoia check against re-entrance in AdvanceShutdownPhase. r=xpcom-reviewers,nika

Backed out for causing xpcshell failures

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: TEST-UNEXPECTED-FAIL | extensions/pref/autoconfig/test/unit/test_autoconfig.js | xpcshell return code: -6
    SUMMARY: ThreadSanitizer: SEGV /builds/worker/checkouts/gecko/xpcom/base/nsISupportsImpl.cpp:53:12 in nsAutoOwningEventTarget
    TEST-UNEXPECTED-FAIL | extensions/pref/autoconfig/test/unit/test_autoconfig_nonascii.js | xpcshell return code: -6
    SUMMARY: ThreadSanitizer: SEGV /builds/worker/checkouts/gecko/xpcom/base/nsISupportsImpl.cpp:53:12 in nsAutoOwningEventTarget
    TEST-UNEXPECTED-FAIL | extensions/pref/autoconfig/test/unit/test_autoconfig_no_sandbox.js | xpcshell return code: -6
    SUMMARY: ThreadSanitizer: SEGV /builds/worker/checkouts/gecko/xpcom/base/nsISupportsImpl.cpp:53:12 in nsAutoOwningEventTarget

Can you also please take a look at this? https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&searchStr=android%2Cbrowsertime&revision=bd3bc1478107b9a72ce5dded92b7821e87a0e052&selectedTaskRun=TQNO9jTxSQ2KK0ouVXURHA.0

Flags: needinfo?(jstutte)
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/98f1dc6f0123
Part 10: Process pending MT events before advancing the shutdown phase. r=xpcom-reviewers,nika,kmag
https://hg.mozilla.org/integration/autoland/rev/a3d89b2ef44a
Part 11: Add a paranoia check against re-entrance in AdvanceShutdownPhase. r=xpcom-reviewers,nika
Attachment #9300060 - Attachment description: Bug 1709184 - Part 10: Process pending MT events before advancing the shutdown phase. r?#xpcom-reviewers → Bug 1709184 - Part 1: Process pending MT events before advancing the shutdown phase. r?#xpcom-reviewers
Attachment #9300175 - Attachment description: Bug 1709184 - Part 11: Add a paranoia check against re-entrance in AdvanceShutdownPhase. r?#xpcom-reviewers → Bug 1709184 - Part 2: Add a paranoia check against re-entrance in AdvanceShutdownPhase. r?#xpcom-reviewers
Flags: needinfo?(jstutte)
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9dcaa7753fc6
Part 1: Process pending MT events before advancing the shutdown phase. r=xpcom-reviewers,nika,kmag
https://hg.mozilla.org/integration/autoland/rev/dffbc74bbaa9
Part 2: Add a paranoia check against re-entrance in AdvanceShutdownPhase. r=xpcom-reviewers,nika
Status: REOPENED → RESOLVED
Closed: 2 years ago1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 110 Branch
See Also: → 1852465
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: