Closed
Bug 959218
Opened 11 years ago
Closed 11 years ago
ANR: Deadlock during backgrounding when shutting down disk cache
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
Firefox 29
People
(Reporter: jchen, Assigned: jchen)
Details
(Whiteboard: [ANR])
Attachments
(1 file, 1 obsolete file)
892 bytes,
patch
|
michal
:
review+
|
Details | Diff | Splinter Review |
I saw an ANR on my Galaxy Nexus that looks like a double locking in disk cache shutdown code. Gecko stack during deadlock: > #0 0x40188104 in __futex_syscall3 () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/system/lib/libc.so > #1 0x4017d5d8 in ?? () from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/system/lib/libc.so > #2 0x61853dc4 in PR_Lock () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libnss3.so > #3 0x64be6cf2 in nsDeleteDir::Shutdown(bool) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #4 0x64be5e56 in nsCacheService::Shutdown() () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #5 0x6508a4fc in nsAppShell::ProcessNextNativeEvent(bool) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #6 0x65093afc in nsBaseAppShell::DoProcessNextNativeEvent(bool, unsigned int) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #7 0x65093b8e in nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool, unsigned int) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #8 0x64b835e2 in nsThread::ProcessNextEvent(bool, bool*) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #9 0x64b5b4aa in NS_ProcessNextEvent(nsIThread*, bool) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #10 0x64b8693e in nsThread::Shutdown() () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #11 0x64be6dd2 in nsDeleteDir::Shutdown(bool) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #12 0x64be5e56 in nsCacheService::Shutdown() () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #13 0x6508a4fc in nsAppShell::ProcessNextNativeEvent(bool) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #14 0x65093afc in nsBaseAppShell::DoProcessNextNativeEvent(bool, unsigned int) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #15 0x65093bb0 in nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool, unsigned int) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #16 0x64b835e2 in nsThread::ProcessNextEvent(bool, bool*) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #17 0x64b5b4aa in NS_ProcessNextEvent(nsIThread*, bool) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #18 0x64c90508 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #19 0x64c8805a in MessageLoop::RunInternal() () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #20 0x64c8810c in MessageLoop::Run() () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #21 0x65090576 in nsBaseAppShell::Run() () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #22 0x656288f6 in nsAppStartup::Run() () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #23 0x656008ec in XREMain::XRE_mainRun() () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #24 0x656019a8 in XREMain::XRE_main(int, char**, nsXREAppData const*) () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so > #25 0x65601b24 in XRE_main () > from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so I think the cause is that, for some reason, the APP_BACKGROUNDING event [1] was sent twice consecutively, causing us to shut down the disk cache twice. [1] http://mxr.mozilla.org/mozilla-central/source/widget/android/nsAppShell.cpp#312
Assignee | ||
Updated•11 years ago
|
Whiteboard: [ANR]
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → nchen
Status: NEW → ASSIGNED
Assignee | ||
Comment 1•11 years ago
|
||
The deadlock happens during nsDeleteDir thread shutdown. nsDeleteDir::Shutdown calls nsThread::Shutdown inside a lock [1], and nsThread::Shutdown waits for the background thread's event queue to empty [2]. While it's waiting, it will spin the main thread's event loop [3]. This can cause problems because there might be events in the main thread queue that will trigger nsDeleteDir::Shutdown again. Since nsDeleteDir::mLock is not reentrant, calling nsDeleteDir::Shutdown twice in the same thread will cause a deadlock. The particular event in Fennec that (indirectly) calls nsDeleteDir::Shutdown is APP_BACKGROUNDING [4]. This patch moves the nsThread::Shutdown call to outside of the lock, and adds a check to return early for reentrant calls to nsDeleteDir::Shutdown. nsBlockOnBackgroundThreadEvent is not needed because, as mentioned before, nsThread::Shutdown already blocks on the background thread. [1] http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsDeleteDir.cpp?rev=004effcc3ee3#69 [2] http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp?rev=e176126d2b42#470 [3] http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp?rev=e176126d2b42#486 [4] http://mxr.mozilla.org/mozilla-central/source/widget/android/nsAppShell.cpp?rev=5a9badd6db00#320
Assignee | ||
Comment 2•11 years ago
|
||
Comment on attachment 8361843 [details] [diff] [review] Shutdown nsDeleteDir thread later to prevent possible deadlock (v1) Michal, does this look like a good solution?
Attachment #8361843 -
Flags: feedback?(michal.novotny)
Comment 3•11 years ago
|
||
Comment on attachment 8361843 [details] [diff] [review] Shutdown nsDeleteDir thread later to prevent possible deadlock (v1) Review of attachment 8361843 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/cache/nsDeleteDir.cpp @@ +86,3 @@ > > + if (thread) { > + thread->Shutdown(); nsDeleteDir::Shutdown() is called only from nsCacheService::Shutdown() which asserts in debug build or returns early in release build when it is called twice. So there is also a call to nsCacheService::Init(), i.e. this is just another variant of the same problem as in bug #829419. This patch won't ensure that nsCacheService cannot be initialized again before previous nsDeleteDir is destroyed. To fix the problem, we should simply call nsShutdownThread::BlockingShutdown(thread) here.
Attachment #8361843 -
Flags: feedback?(michal.novotny) → feedback-
Assignee | ||
Comment 4•11 years ago
|
||
> nsDeleteDir::Shutdown() is called only from nsCacheService::Shutdown() which
> asserts in debug build or returns early in release build when it is called
> twice. So there is also a call to nsCacheService::Init(), i.e. this is just
> another variant of the same problem as in bug #829419. This patch won't
> ensure that nsCacheService cannot be initialized again before previous
> nsDeleteDir is destroyed. To fix the problem, we should simply call
> nsShutdownThread::BlockingShutdown(thread) here.
Okay, like this?
Attachment #8361843 -
Attachment is obsolete: true
Attachment #8367521 -
Flags: review?(michal.novotny)
Comment 5•11 years ago
|
||
Comment on attachment 8367521 [details] [diff] [review] Avoid race condition when shutting down nsDeleteDir thread (v1) Review of attachment 8367521 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Jim Chen [:jchen :nchen] from comment #4) > Okay, like this? Yes, exactly.
Attachment #8367521 -
Flags: review?(michal.novotny) → review+
Assignee | ||
Updated•11 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 6•11 years ago
|
||
Try: https://tbpl.mozilla.org/?tree=Try&rev=1adc2137dfee
Comment 7•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/494eec74bb01
Keywords: checkin-needed
Comment 8•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/494eec74bb01
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 29
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•