Closed Bug 1160459 Opened 10 years ago Closed 9 years ago

shutdown hang in mozilla::dom::indexedDB::`anonymous namespace'::QuotaClient::ShutdownWorkThreads()

Categories

(Core :: Storage: IndexedDB, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
e10s + ---
firefox40 --- affected
firefox41 --- affected
firefox42 --- affected

People

(Reporter: jimm, Unassigned)

References

Details

20:36:32 INFO - 4 nss3.dll!PR_Wait [prmon.c:55826466dd7b : 294 + 0xd] 20:36:32 INFO - rip = 0x000007f9d3e4366b rsp = 0x000000094041ecf0 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 5 xul.dll!nsEventQueue::GetEvent(bool,nsIRunnable * *) [nsEventQueue.cpp:55826466dd7b : 67 + 0x10] 20:36:32 INFO - rip = 0x000007f9cda733a0 rsp = 0x000000094041ed30 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 6 xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:55826466dd7b : 857 + 0x15] 20:36:32 INFO - rip = 0x000007f9cda758f5 rsp = 0x000000094041ed60 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 7 xul.dll!NS_ProcessNextEvent(nsIThread *,bool) [nsThreadUtils.cpp:55826466dd7b : 265 + 0xc] 20:36:32 INFO - rip = 0x000007f9cda91aff rsp = 0x000000094041ef40 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 8 xul.dll!mozilla::dom::indexedDB::`anonymous namespace'::QuotaClient::ShutdownWorkThreads() [ActorsParent.cpp:55826466dd7b : 15103 + 0x9] 20:36:32 INFO - rip = 0x000007f9cea6bf9a rsp = 0x000000094041ef70 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 9 xul.dll!mozilla::dom::quota::QuotaManager::Observe(nsISupports *,char const *,wchar_t const *) [QuotaManager.cpp:55826466dd7b : 2859 + 0x11] 20:36:32 INFO - rip = 0x000007f9ce998b49 rsp = 0x000000094041efa0 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 10 xul.dll!nsObserverList::NotifyObservers(nsISupports *,char const *,wchar_t const *) [nsObserverList.cpp:55826466dd7b : 113 + 0x13] 20:36:32 INFO - rip = 0x000007f9cda4d4bb rsp = 0x000000094041f1d0 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 11 xul.dll!nsObserverService::NotifyObservers(nsISupports *,char const *,wchar_t const *) [nsObserverService.cpp:55826466dd7b : 334 + 0x10] 20:36:32 INFO - rip = 0x000007f9cda4d5ab rsp = 0x000000094041f210 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 12 xul.dll!nsXREDirProvider::DoShutdown() [nsXREDirProvider.cpp:55826466dd7b : 902 + 0x18] 20:36:32 INFO - rip = 0x000007f9cf1d0e78 rsp = 0x000000094041f240 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 13 xul.dll!ScopedXPCOMStartup::~ScopedXPCOMStartup() [nsAppRunner.cpp:55826466dd7b : 1318 + 0xb] 20:36:32 INFO - rip = 0x000007f9cf1c7a93 rsp = 0x000000094041f270 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 14 xul.dll!XREMain::XRE_main(int,char * * const,nsXREAppData const *) [nsAppRunner.cpp:55826466dd7b : 4177 + 0x14] 20:36:32 INFO - rip = 0x000007f9cf1cc7ea rsp = 0x000000094041f2a0 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 15 xul.dll!XRE_main [nsAppRunner.cpp:55826466dd7b : 4240 + 0x11] 20:36:32 INFO - rip = 0x000007f9cf1ce960 rsp = 0x000000094041f320 20:36:32 INFO - rbp = 0x0000000940812438 20:36:32 INFO - Found by: call frame info 20:36:32 INFO - 16 firefox.exe!do_main [nsBrowserApp.cpp:55826466dd7b : 214 + 0x17] 20:36:32 INFO - rip = 0x000007f71e131a0a rsp = 0x000000094041f4d0 20:36:32 INFO - rbp = 0x0000000940812438
We're currently getting about 30 or 40 of these a day while running tests.
Component: General → DOM: IndexedDB
This looks like nested event loops. Could we perhaps use AsyncShutdown instead?
The recent surge in the intermittent looks like it could be a regression from bug 1131766. Can you look at this, Kyle, as Ben is on PTO?
Flags: needinfo?(khuey)
I was out that week too :P This is janv territory anyways.
Flags: needinfo?(khuey) → needinfo?(Jan.Varga)
Hm, ni bent too.
Flags: needinfo?(bent.mozilla)
Hm, where is the data here? Is it only happening on windows? (/me assumes windows because jimm filed)
Flags: needinfo?(bent.mozilla) → needinfo?(jmathies)
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #6) > Hm, where is the data here? Is it only happening on windows? (/me assumes > windows because jimm filed) This is a test only failure afaict, see the bug this bug blocks - bug 1121145. Looks like it is all Windows.
Flags: needinfo?(jmathies)
This is probably caused by bug 1180978.
Depends on: 1180978
Flags: needinfo?(Jan.Varga)
Flags: needinfo?(mrbkap)
FYI, I'm strongly considering hiding various Windows mochitest-bc suites affected by bug 1121145. What can we do to prioritize getting this (and/or deps it has) fixed?
Flags: needinfo?(jmathies)
Flags: needinfo?(Jan.Varga)
We can land bug 1180978 without the assertion and see what happens.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #9) > FYI, I'm strongly considering hiding various Windows mochitest-bc suites > affected by bug 1121145. What can we do to prioritize getting this (and/or > deps it has) fixed? Sounds like bug 1180978 need to be made a priority assuming it's the cause.
Flags: needinfo?(jmathies)
(In reply to Jim Mathies [:jimm] from comment #11) > (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #9) > > FYI, I'm strongly considering hiding various Windows mochitest-bc suites > > affected by bug 1121145. What can we do to prioritize getting this (and/or > > deps it has) fixed? > > Sounds like bug 1180978 need to be made a priority assuming it's the cause. Hmm, not looking very promising but lets see how it goes today - bug 1180978 landed on inbound prior to a test failure report on inbound in bug 1121145.
That doesn't seem to have helped, unfortunately. Looking at http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32-pgo/1437424212/mozilla-inbound_win7-ix_test_pgo-mochitest-e10s-browser-chrome-2-bm112-tests1-windows-build222.txt.gz, the main thread is blocked on the IDB background thread, which is blocked on the waiting for the ConnectionPool to shut down. There's clearly a connection thread outstanding (thread 55), but its not doing anything. I'm inclined to stick a fatal assertion in http://hg.mozilla.org/mozilla-central/annotate/2ddec2dedced/dom/indexedDB/ActorsParent.cpp#l11008 that checks to see if we are shutting down. What do you think janv?
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) (UTC+8 July 17-25, expect delays) from comment #13) > I'm inclined to stick a fatal assertion in > http://hg.mozilla.org/mozilla-central/annotate/2ddec2dedced/dom/indexedDB/ > ActorsParent.cpp#l11008 that checks to see if we are shutting down. What do > you think janv? Ok, sounds good.
Flags: needinfo?(Jan.Varga)
Now that I think about it more I don't think that's correct. I think we'll end up in that path in the testcase from bug 1180978. Perhaps I should try just stacking a ton of blocked transactions on a readwrite and seeing what happens if we shut down.
Trying to reproduce this locally ... is it expected that ./mach mochitest -f browser --e10s starts up a new browser for each directory? Does that match the tinderbox behavior?
Flags: needinfo?(jmathies)
For mochitest-bc and mochitest-dt, I'd expect that, yes. We use run-by-dir on them.
Ok, there's no need to hide the whole test suite then because bug 1121145 appears to always happen in the customizableui directory. Unfortunately customizableui doesn't actually use IndexedDB itself (at least not directly) so figuring out what's going on is non-trivial ...
Looks like Ryan answered the question and that's good since I have no info on our "run by directory" practices in automation.
Flags: needinfo?(jmathies)
Flags: needinfo?(mrbkap)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #18) > Ok, there's no need to hide the whole test suite then because bug 1121145 > appears to always happen in the customizableui directory. Confirmed on Try that skipping the customizableui directory on Windows e10s makes the failures go away. Will try to bisect it down next.
Disabling half the tests in the directory led to all green regardless of which half were disabled :\ https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6a4ed2ff82c https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2042048e895 Not sure where we go from here.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #21) > Disabling half the tests in the directory led to all green regardless of > which half were disabled :\ > https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6a4ed2ff82c > https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2042048e895 > > Not sure where we go from here. I would take one of these pushes and add back small blocks of disabled tests for pushing to try. Hopefully we'd find a block with a test in it that triggers the failure.
Priority: -- → P3
This appears to have gone away on its own somewhere along the way.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.