Closed Bug 1006478 Opened 10 years ago Closed 10 years ago

With "Clear History when Firefox closes" enabled, quitting Firefox shortly after startup makes it hang on shutdown

Categories

(Firefox :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 965309

People

(Reporter: ttaubert, Unassigned)

Details

I hit this while trying to reproduce bug 1005487 but I'm not sure that's the same or just a related issue. This one is definitely easier to reproduce and has a clear STR so I'd focus on investigating this first?

STR:

1) Create a new profile
2) Start Firefox and tick off "Clear History when Firefox closes"
3) Quit Firefox
4) Start Firefox
5) Quit Firefox immediately (or wait 1-3s) using Cmd+Q or the panel menu button

Repeat 4+5 until it hangs on shutdown.

No matter how long I wait, it doesn't crash or do anything else. It just hangs and doesn't print anything to the console. I can try a debug build though.
Can reproduce on Windows and Mac btw.
This is what I'm seeing on a debug build:

[37691] WARNING: NS_ENSURE_TRUE(!mHasOrHasHadOwnerWindow || mOwnerWindow) failed: file ../../dist/include/mozilla/DOMEventTargetHelper.h, line 122
[37691] WARNING: NS_ENSURE_SUCCESS(status, status) failed with result 0x804B0002: file /Users/tim/workspace/mozilla-central/content/base/src/nsCrossSiteListenerProxy.cpp, line 559
[37691] WARNING: Failed to log blocked cross-site request: 'NS_SUCCEEDED(rv)', file /Users/tim/workspace/mozilla-central/content/base/src/nsCrossSiteListenerProxy.cpp, line 480
[37691] WARNING: cannot post event if not initialized: file /Users/tim/workspace/mozilla-central/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 190
[37694] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /Users/tim/workspace/mozilla-central/xpcom/base/nsTraceRefcnt.cpp, line 142
[37694] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /Users/tim/workspace/mozilla-central/xpcom/base/nsTraceRefcnt.cpp, line 142
[loaded plugin /Library/Internet Plug-Ins/Flash Player.plugin]
[37691] WARNING: cannot post event if not initialized: file /Users/tim/workspace/mozilla-central/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 190
--DOCSHELL 0x1147b0800 == 5 [pid = 37691] [id = 1]
pldhash: for the table at address 0x11a3b6220, the given entrySize of 168 definitely favors chaining over double hashing.
--DOMWINDOW == 14 (0x127d7b800) [pid = 37691] [serial = 13] [outer = 0x127d78400] [url = about:blank]
--DOMWINDOW == 13 (0x1233bb800) [pid = 37691] [serial = 8] [outer = 0x123371400] [url = about:blank]
--DOCSHELL 0x11bce5800 == 4 [pid = 37691] [id = 3]
--DOCSHELL 0x11bfbb800 == 3 [pid = 37691] [id = 4]
[37691] WARNING: NS_ENSURE_TRUE(mTextInputHandler) failed: file /Users/tim/workspace/mozilla-central/widget/cocoa/nsChildView.mm, line 5330
[37694] WARNING: '!compMgr', file /Users/tim/workspace/mozilla-central/xpcom/glue/nsComponentManagerUtils.cpp, line 49
nsStringStats
 => mAllocCount:              5
 => mReallocCount:            1
 => mFreeCount:               5
 => mShareCount:              3
 => mAdoptCount:              0
 => mAdoptFreeCount:          0
 => Process ID: 37694, Thread ID: 140735273071376
[37691] WARNING: nsAppShell::Exit() called redundantly: file /Users/tim/workspace/mozilla-central/widget/cocoa/nsAppShell.mm, line 762
Gijs pointed me to bug 1005958 and I applied the patch locally. Disabling the seer doesn't fix the issue unfortunately.
After some bisecting I found that making nsNavHistory::RemoveAllPages() a no-op fixes the issue. However, I don't see any relevant places changes for 29. Marco, do you have any idea what could cause this?

To be exact, commenting out this part fixes the shutdown hang:

nsresult rv = mDB->MainConn()->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
  "DELETE FROM moz_historyvisits"
));
NS_ENSURE_SUCCESS(rv, rv);
Flags: needinfo?(mak77)
(In reply to Tim Taubert [:ttaubert] from comment #4)
> nsresult rv = mDB->MainConn()->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
>   "DELETE FROM moz_historyvisits"
> ));
> NS_ENSURE_SUCCESS(rv, rv);

Stupid intermittent hangs. Removing this does actually *NOT* fix it. Making the whole function a no-op seems to still fix it though.
I don't exepct that to do any heavy work considered this is a new profile...
Auite strange that nothing happens even waiting for all of the various timeouts we have around. Maybe a deadlock in Storage? You may check if there are 2 threads clearly contending a Storage mutex, through the stacks.
Flags: needinfo?(mak77)
Looks like this is another issue with Workers on shutdown - see bug 964531. I'm almost sure this is caused by the onClearHistory() listener in PageThumbs.jsm that calls PageThumbsStorage.wipe() and spawns its own I/O worker on shutdown. Removing that makes the hang go away.
So far, our history with chrome workers hasn't been very successful, particularly upon shutdown. Tim, do you think that AsyncShutdown can help you there?
Yeah, that would probably help here. First it doesn't make us hang, second it would ensure that we actually wipe the directory on shutdown.

It would be really nice to find and fix the worker hang too though.
By some debugging I can confirm that there definitely is a worker that prevents us from shutting down. The script URL is resource://gre/modules/osfile/osfile_shared_allthreads.jsm as expected.
Is the problem in the JS or in the C++?
Well, there are two problems here, or three.

1) We should use AsyncShutdown for the PageThumbsWorker (JS).
2) We should fix the Worker hang (CPP).

3) Don't start the worker on shutdown. Although this would be fixed by any of the two above. For the sake of OS.File it would be better to fix (2) instead of trying to enforce not spawning workers on shutdown.
Looks like WorkerPrivate::WaitForWorkerEvents() is called over and over. The XHR has already been torn down and unpinned, the SyncTeardownRunnable completed. WorkerPrivate::DoRunLoop() does never finish for the XHR with the aborted request.

We don't hang on shutdown when the XHR is fast enough to complete and we thus use an AsyncTeardownRunnable.

I assume that for some reason the worker thread isn't notified that it should terminate, maybe something is still holding onto it?
So far, I have decided to stay away from attempting to fix such issues at C++ level, as they could possibly impact Web code hence have security implications. Also, I believe that sending messages after web-workers-shutdown is unsafe by design – are we sure that the message was sent before web-workers-shutdown?
Blocks: 1005487
Kyle, is there anything we can do to debug this further? Or do you maybe have any idea what could be going on? It should be rather easy to reproduce this from the STR given in comment #0.
Flags: needinfo?(khuey)
Is this not just bug 965309?
Flags: needinfo?(khuey)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #16)
> Is this not just bug 965309?

Uh, yes. I looked at that but was mislead by the patch in there that didn't work. I should have looked the summary, it's exactly the same issue.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.