Closed Bug 1492959 Opened 7 years ago Closed 6 years ago

Crash in shutdownhang | mozilla::AlertNotification::Release

Categories

(Core :: SQLite and Embedded Database Bindings, defect, P3)

63 Branch
All
Windows
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr60 --- unaffected
firefox62 --- unaffected
firefox63 --- wontfix
firefox64 --- fix-optional

People

(Reporter: philipp, Unassigned)

Details

(Keywords: crash, regression)

Crash Data

This bug was filed from the Socorro interface and is report bp-9819ac2a-cedf-4802-bd48-46bb80180920. ============================================================= Top 10 frames of crashing thread: 0 xul.dll mozilla::AlertNotification::Release netwerk/cookie/nsCookie.cpp:196 1 xul.dll void nsTArray_Impl<RefPtr<nsCookie>, nsTArrayInfallibleAllocator>::ClearAndRetainStorage xpcom/ds/nsTArray.h:1370 2 xul.dll void nsTArray_Impl<RefPtr<nsCookie>, nsTArrayInfallibleAllocator>::~nsTArray_Impl xpcom/ds/nsTArray.h:925 3 xul.dll void nsCookieEntry::~nsCookieEntry netwerk/cookie/nsCookieService.h:84 4 xul.dll PLDHashTable::~PLDHashTable xpcom/ds/PLDHashTable.cpp:335 5 xul.dll CloseCookieDBListener::Release netwerk/cookie/nsCookieService.cpp:493 6 xul.dll void mozilla::storage::`anonymous namespace'::CallbackEvent::~CallbackEvent storage/mozStoragePrivateHelpers.cpp:251 7 xul.dll CheckResponsivenessTask::Release xpcom/io/InputStreamLengthHelper.cpp:264 8 xul.dll detail::ProxyReleaseEvent<mozilla::mscom::WeakReferenceSupport>::Run xpcom/threads/nsProxyRelease.h:38 9 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1161 ============================================================= this is a new shutdownhang signature starting to show up since firefox 63 - it's only occurring in low volume so far though.
Component: Networking: Cookies → Storage
Product: Core → Toolkit
The commonality in the crashes seems to be that: On the main thread: - LocalStorage is trying to shutdown on the main thread, spinning a nested event loop. - Some nsCookieService async shutdown stuff is on the stack, presumably triggered from https://searchfox.org/mozilla-central/rev/0b8ed772d24605d7cb44c1af6d59e4ca023bd5f5/netwerk/cookie/nsCookieService.cpp#1747 - Within that nested event loop, mozStorage's Connection::shutdownAsyncThread() method is on the stack trying to synchronously trigger shutdown of the nsThread. The thread is spinning a nested event loop waiting for the shutdown ack to come in from the thread. - The AsyncCloseConnection's destructor's NS_ReleaseOnMainThreadSystemGroup proxy-release of the cookie service's CloseCookieDBListener is running. It has a default constructor and does maintain an interesting DBState reference, but most of the state should have been nulled out in CleanupDefaultDBConnection immediately after invoking AsyncClose. And in some of the crashes, the LocalStorage DB thread is actively performing I/O, either shutdown or non-shutdown operations that were backlogged. In those cases PBackground is blocked in synchronous nsThread::Shutdown. Note that this is based on a small sampling of n=6. More investigation is necessary.
Given that this is affecting beta and where we are in the cycle, are you doing the more investigation or someone else?
Flags: needinfo?(bugmail)
So, I may have over-spoken about more investigation being necessary. Many of these crash signatures show LocalStorage busy doing I/O on its thread. shutdownhangs have a real problem in that: - We don't do I/O on the main thread. - The shutdownhang reports the stack of the main thread. - The shutdownhang reporting timeout doesn't seem to scale based on the I/O throughput of the computer. - The shutdownhang signatures tend to end up very specific which results in selection bias that makes them look more actionable than they are because it's not easy to get the context that there are probably tons of shutdown hangs with LocalStorage way down on the stack spinning a nested event loop while it waits for its I/O thread to shutdown but slow I/O slows things down. So the summary is: - This looks like slow I/O that looks more actionable than it is because it's a very specific signature. This also explains the low number of crashes. - The slow I/O could be due to slow hardware, slow anti-virus, and/or overwhelmed hardware (like the user browsed a site that caused a ton of I/O or used a lot of memory which caused VM paging which caused an I/O storm), etc. - crash-stats is not as useful as it could be for shutdownhangs. On https://github.com/squarewave/bhr.html/issues/22 I got :dthayer to try an experiment with a more useful UI which was cool, but there needs to be a mechanism to do some filtering/pivoting as it relates to multiple threads. I don't think there's more to do here unless there's a massive crash spike on this signature or related signatures.
Flags: needinfo?(bugmail)
Keywords: stalled
Priority: -- → P3
Marking fix-optional for 64. We could still take a patch for 65, and if it's verified and doesn't seem risky, could still take fixes for 64 as well.

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
Product: Toolkit → Core
You need to log in before you can comment on or make changes to this bug.