ANR: Deadlock during backgrounding when shutting down disk cache

RESOLVED FIXED in Firefox 29

Status

()

RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: jchen, Assigned: jchen)

Tracking

Trunk
Firefox 29
All
Android
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [ANR])

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

5 years ago
I saw an ANR on my Galaxy Nexus that looks like a double locking in disk cache shutdown code. Gecko stack during deadlock:

> #0  0x40188104 in __futex_syscall3 ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/system/lib/libc.so
> #1  0x4017d5d8 in ?? () from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/system/lib/libc.so
> #2  0x61853dc4 in PR_Lock ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libnss3.so
> #3  0x64be6cf2 in nsDeleteDir::Shutdown(bool) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #4  0x64be5e56 in nsCacheService::Shutdown() ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #5  0x6508a4fc in nsAppShell::ProcessNextNativeEvent(bool) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #6  0x65093afc in nsBaseAppShell::DoProcessNextNativeEvent(bool, unsigned int) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #7  0x65093b8e in nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool, unsigned int) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #8  0x64b835e2 in nsThread::ProcessNextEvent(bool, bool*) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #9  0x64b5b4aa in NS_ProcessNextEvent(nsIThread*, bool) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #10 0x64b8693e in nsThread::Shutdown() ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #11 0x64be6dd2 in nsDeleteDir::Shutdown(bool) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #12 0x64be5e56 in nsCacheService::Shutdown() ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #13 0x6508a4fc in nsAppShell::ProcessNextNativeEvent(bool) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #14 0x65093afc in nsBaseAppShell::DoProcessNextNativeEvent(bool, unsigned int) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #15 0x65093bb0 in nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool, unsigned int) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #16 0x64b835e2 in nsThread::ProcessNextEvent(bool, bool*) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #17 0x64b5b4aa in NS_ProcessNextEvent(nsIThread*, bool) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #18 0x64c90508 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #19 0x64c8805a in MessageLoop::RunInternal() ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #20 0x64c8810c in MessageLoop::Run() ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #21 0x65090576 in nsBaseAppShell::Run() ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #22 0x656288f6 in nsAppStartup::Run() ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #23 0x656008ec in XREMain::XRE_mainRun() ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #24 0x656019a8 in XREMain::XRE_main(int, char**, nsXREAppData const*) ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so
> #25 0x65601b24 in XRE_main ()
>    from /Users/nchen/jimdb-arm/lib/0149E0820D01300B/app/org.mozilla.fennec/assets/libxul.so

I think the cause is that, for some reason, the APP_BACKGROUNDING event [1] was sent twice consecutively, causing us to shut down the disk cache twice.

[1] http://mxr.mozilla.org/mozilla-central/source/widget/android/nsAppShell.cpp#312
(Assignee)

Updated

5 years ago
Whiteboard: [ANR]
(Assignee)

Updated

5 years ago
Assignee: nobody → nchen
Status: NEW → ASSIGNED
(Assignee)

Comment 1

5 years ago
Created attachment 8361843 [details] [diff] [review]
Shutdown nsDeleteDir thread later to prevent possible deadlock (v1)

The deadlock happens during nsDeleteDir thread shutdown. nsDeleteDir::Shutdown calls nsThread::Shutdown inside a lock [1], and nsThread::Shutdown waits for the background thread's event queue to empty [2]. While it's waiting, it will spin the main thread's event loop [3]. This can cause problems because there might be events in the main thread queue that will trigger nsDeleteDir::Shutdown again. Since nsDeleteDir::mLock is not reentrant, calling nsDeleteDir::Shutdown twice in the same thread will cause a deadlock. The particular event in Fennec that (indirectly) calls nsDeleteDir::Shutdown is APP_BACKGROUNDING [4].

This patch moves the nsThread::Shutdown call to outside of the lock, and adds a check to return early for reentrant calls to nsDeleteDir::Shutdown. nsBlockOnBackgroundThreadEvent is not needed because, as mentioned before, nsThread::Shutdown already blocks on the background thread.

[1] http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsDeleteDir.cpp?rev=004effcc3ee3#69
[2] http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp?rev=e176126d2b42#470
[3] http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp?rev=e176126d2b42#486
[4] http://mxr.mozilla.org/mozilla-central/source/widget/android/nsAppShell.cpp?rev=5a9badd6db00#320
(Assignee)

Comment 2

5 years ago
Comment on attachment 8361843 [details] [diff] [review]
Shutdown nsDeleteDir thread later to prevent possible deadlock (v1)

Michal, does this look like a good solution?
Attachment #8361843 - Flags: feedback?(michal.novotny)
Comment on attachment 8361843 [details] [diff] [review]
Shutdown nsDeleteDir thread later to prevent possible deadlock (v1)

Review of attachment 8361843 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache/nsDeleteDir.cpp
@@ +86,3 @@
>  
> +  if (thread) {
> +    thread->Shutdown();

nsDeleteDir::Shutdown() is called only from nsCacheService::Shutdown() which asserts in debug build or returns early in release build when it is called twice. So there is also a call to nsCacheService::Init(), i.e. this is just another variant of the same problem as in bug #829419. This patch won't ensure that nsCacheService cannot be initialized again before previous nsDeleteDir is destroyed. To fix the problem, we should simply call nsShutdownThread::BlockingShutdown(thread) here.
Attachment #8361843 - Flags: feedback?(michal.novotny) → feedback-
(Assignee)

Comment 4

5 years ago
Created attachment 8367521 [details] [diff] [review]
Avoid race condition when shutting down nsDeleteDir thread (v1)

> nsDeleteDir::Shutdown() is called only from nsCacheService::Shutdown() which
> asserts in debug build or returns early in release build when it is called
> twice. So there is also a call to nsCacheService::Init(), i.e. this is just
> another variant of the same problem as in bug #829419. This patch won't
> ensure that nsCacheService cannot be initialized again before previous
> nsDeleteDir is destroyed. To fix the problem, we should simply call
> nsShutdownThread::BlockingShutdown(thread) here.

Okay, like this?
Attachment #8361843 - Attachment is obsolete: true
Attachment #8367521 - Flags: review?(michal.novotny)
Comment on attachment 8367521 [details] [diff] [review]
Avoid race condition when shutting down nsDeleteDir thread (v1)

Review of attachment 8367521 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Jim Chen [:jchen :nchen] from comment #4)
> Okay, like this?

Yes, exactly.
Attachment #8367521 - Flags: review?(michal.novotny) → review+
(Assignee)

Updated

5 years ago
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/494eec74bb01
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 29
You need to log in before you can comment on or make changes to this bug.