Intermittent Windows & OS X shutdown timeout with MediaShutdownManager::Shutdown() on the stack

RESOLVED FIXED in Firefox 30

Status

()

defect
--
major
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: emorley, Assigned: jwwang)

Tracking

({intermittent-failure})

Trunk
mozilla32
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox30 fixed, firefox31 fixed, firefox32 fixed, firefox-esr24 unaffected, b2g-v1.4 fixed, b2g-v2.0 fixed)

Details

(Whiteboard: [qa-] )

Attachments

(3 attachments, 1 obsolete attachment)

Reporter

Description

5 years ago
Breaking out from bug 918759, which has become a dumping ground.

On OS X and Windows, we're seeing intermittent failures where we hang during shutdown - and the induced crash shows MediaShutdownManager on the stack.

Looking at the timeline, I believe bug 938107 caused this regression, since:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=918759&entireHistory=true&tree=trunk
...shows a spike at 2013-12-16/2013-12-17 (note the OS X instances prior to that we're bug 957693 and bug 960605 aiui).

Example logs:
https://tbpl.mozilla.org/php/getParsedLog.php?id=38356405&full=1&branch=try#error0
Windows 7 32-bit try debug test crashtest on 2014-04-23 17:12:28 PDT for push 97491fef6385

17:35:31     INFO -  Thread 0
17:35:31     INFO -   0  ntdll.dll + 0x470b4
17:35:31     INFO -      eip = 0x772270b4   esp = 0x002af6e8   ebp = 0x002af754   ebx = 0x00000000
17:35:31     INFO -      esi = 0x00000050   edi = 0x00000000   eax = 0x005c1d28   ecx = 0x650e6d3c
17:35:31     INFO -      edx = 0x650e6d46   efl = 0x00200246
17:35:31     INFO -      Found by: given as instruction pointer in context
17:35:31     INFO -   1  kernel32.dll + 0x4baf2
17:35:31     INFO -      eip = 0x759cbaf3   esp = 0x002af75c   ebp = 0x002af76c
17:35:31     INFO -      Found by: previous frame's frame pointer
17:35:31     INFO -   2  kernel32.dll + 0x4baa1
17:35:31     INFO -      eip = 0x759cbaa2   esp = 0x002af774   ebp = 0x002af780
17:35:31     INFO -      Found by: previous frame's frame pointer
17:35:31     INFO -   3  nss3.dll!_PR_MD_WAIT_CV [w95cv.c:97491fef6385 : 248 + 0x10]
17:35:31     INFO -      eip = 0x67a119de   esp = 0x002af788   ebp = 0x002af79c
17:35:31     INFO -      Found by: previous frame's frame pointer
17:35:31     INFO -   4  nss3.dll!_PR_WaitCondVar [prucv.c:97491fef6385 : 172 + 0x24]
17:35:31     INFO -      eip = 0x67a0db1c   esp = 0x002af7a4   ebp = 0x002af7b8
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -   5  nss3.dll!PR_WaitCondVar [prucv.c:97491fef6385 : 525 + 0xb]
17:35:31     INFO -      eip = 0x67a0df36   esp = 0x002af7c0   ebp = 0x002af7d8
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -   6  nss3.dll!PR_Wait [prmon.c:97491fef6385 : 294 + 0xf]
17:35:31     INFO -      eip = 0x67a0116d   esp = 0x002af7e0   ebp = 0x002af7f8
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -   7  xul.dll!mozilla::ReentrantMonitor::Wait(unsigned int) [BlockingResourceBase.cpp:97491fef6385 : 320 + 0xb]
17:35:31     INFO -      eip = 0x63771999   esp = 0x002af800   ebp = 0x002af820
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -   8  xul.dll!nsEventQueue::GetEvent(bool,nsIRunnable * *) [nsEventQueue.cpp:97491fef6385 : 63 + 0x9]
17:35:31     INFO -      eip = 0x637e3499   esp = 0x002af828   ebp = 0x002af83c
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -   9  xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:97491fef6385 : 693 + 0x23]
17:35:31     INFO -      eip = 0x637eeb99   esp = 0x002af844   ebp = 0x002af8a4
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -  10  xul.dll!NS_ProcessNextEvent(nsIThread *,bool) [nsThreadUtils.cpp:97491fef6385 : 263 + 0xc]
17:35:31     INFO -      eip = 0x637867ae   esp = 0x002af8ac   ebp = 0x002af8b8
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -  11  xul.dll!mozilla::SharedThreadPool::SpinUntilShutdown() [SharedThreadPool.cpp:97491fef6385 : 78 + 0xc]
17:35:31     INFO -      eip = 0x648addec   esp = 0x002af8c0   ebp = 0x002af8d8
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -  12  xul.dll!mozilla::MediaShutdownManager::Shutdown() [MediaShutdownManager.cpp:97491fef6385 : 134 + 0x4]
17:35:31     INFO -      eip = 0x648b0a0f   esp = 0x002af8cc   ebp = 0x002af8d8
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -  13  xul.dll!mozilla::MediaShutdownManager::Observe(nsISupports *,char const *,wchar_t const *) [MediaShutdownManager.cpp:97491fef6385 : 102 + 0x7]
17:35:31     INFO -      eip = 0x648b4005   esp = 0x002af8d8   ebp = 0x002af8d8
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -  14  xul.dll!nsObserverList::NotifyObservers(nsISupports *,char const *,wchar_t const *) [nsObserverList.cpp:97491fef6385 : 96 + 0x19]
17:35:31     INFO -      eip = 0x637bf6b0   esp = 0x002af8e0   ebp = 0x002af8f8
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -  15  xul.dll!nsObserverService::NotifyObservers(nsISupports *,char const *,wchar_t const *) [nsObserverService.cpp:97491fef6385 : 302 + 0xd]
17:35:31     INFO -      eip = 0x637bf7cf   esp = 0x002af900   ebp = 0x002af914
17:35:31     INFO -      Found by: call frame info
17:35:31     INFO -  16  xul.dll!mozilla::ShutdownXPCOM(nsIServiceManager *) [nsXPComInit.cpp:97491fef6385 : 777 + 0x16]
17:35:31     INFO -      eip = 0x637868e0   esp = 0x002af91c   ebp = 0x002af950
17:35:31     INFO -      Found by: call frame info
...

And:
https://tbpl.mozilla.org/php/getParsedLog.php?id=38290528&full=1&branch=mozilla-inbound#error1
Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound debug test mochitest-1 on 2014-04-22 18:06:19 PDT for push 4fca4b6565d8
19:13:51     INFO -  Thread 0 (crashed)
19:13:51     INFO -   0  libSystem.B.dylib + 0x3ba6a
19:13:51     INFO -      rbx = 0x00007fff7022bcc0   r12 = 0x000000010012bbf0
19:13:51     INFO -      r13 = 0x000000010012bb78   r14 = 0x0000000000000001
19:13:51     INFO -      r15 = 0x0000000000000000   rip = 0x00007fff819a0a6a
19:13:51     INFO -      rsp = 0x00007fff5fbfe238   rbp = 0x00007fff5fbfe2b0
19:13:51     INFO -      Found by: given as instruction pointer in context
19:13:51     INFO -   1  libSystem.B.dylib + 0x3f880
19:13:51     INFO -      rip = 0x00007fff819a4881   rsp = 0x00007fff5fbfe240
19:13:51     INFO -      rbp = 0x00007fff5fbfe2b0
19:13:51     INFO -      Found by: stack scanning
19:13:51     INFO -   2  libSystem.B.dylib + 0xdb7f2
19:13:51     INFO -      rip = 0x00007fff81a407f3   rsp = 0x00007fff5fbfe268
19:13:51     INFO -      rbp = 0x00007fff5fbfe2b0
19:13:51     INFO -      Found by: stack scanning
19:13:51     INFO -   3  libSystem.B.dylib + 0x3a12c
19:13:51     INFO -      rip = 0x00007fff8199f12d   rsp = 0x00007fff5fbfe280
19:13:51     INFO -      rbp = 0x00007fff5fbfe2b0
19:13:51     INFO -      Found by: stack scanning
19:13:51     INFO -   4  libnss3.dylib!PR_Wait + 0x142
19:13:51     INFO -      rip = 0x0000000100525163   rsp = 0x00007fff5fbfe2c0
19:13:51     INFO -      rbp = 0x00007fff5fbfe320
19:13:51     INFO -      Found by: stack scanning
19:13:51     INFO -   5  XUL!_ZZL13nsEscapeCountPKc12nsEscapeMaskPmE8hexChars + 0x291b
19:13:51     INFO -      rip = 0x00000001055e929c   rsp = 0x00007fff5fbfe2e8
19:13:51     INFO -      rbp = 0x00007fff5fbfe320
19:13:51     INFO -      Found by: stack scanning
19:13:51     INFO -   6  libSystem.B.dylib + 0x6699
19:13:51     INFO -      rip = 0x00007fff8196b69a   rsp = 0x00007fff5fbfe2f0
19:13:51     INFO -      rbp = 0x00007fff5fbfe320
19:13:51     INFO -      Found by: stack scanning
19:13:51     INFO -   7  XUL!mozilla::ReentrantMonitor::Wait(unsigned int) [BlockingResourceBase.cpp:4fca4b6565d8 : 319 + 0xb]
19:13:51     INFO -      rip = 0x0000000101812f53   rsp = 0x00007fff5fbfe330
19:13:51     INFO -      rbp = 0x00007fff5fbfe360
19:13:51     INFO -      Found by: stack scanning
19:13:51     INFO -   8  XUL!nsEventQueue::GetEvent(bool, nsIRunnable**) [ReentrantMonitor.h:4fca4b6565d8 : 199 + 0xc]
19:13:51     INFO -      rbx = 0x000000010012b8d0   r12 = 0x00000001055e929c
19:13:51     INFO -      r13 = 0x00000001055e92b4   r14 = 0x00007fff5fbfe3f8
19:13:51     INFO -      r15 = 0x00000001055e9ef7   rip = 0x00000001018ab3c9
19:13:51     INFO -      rsp = 0x00007fff5fbfe370   rbp = 0x00007fff5fbfe3a0
19:13:51     INFO -      Found by: call frame info
19:13:51     INFO -   9  XUL!nsThread::ProcessNextEvent(bool, bool*) [nsThread.h:4fca4b6565d8 : 97 + 0x16]
19:13:51     INFO -      rbx = 0x0000000000000000   r12 = 0x000000010012b850
19:13:51     INFO -      r13 = 0x000000010012d8a8   r14 = 0x0000000000000001
19:13:51     INFO -      r15 = 0x000000010012b890   rip = 0x00000001018aeca0
19:13:51     INFO -      rsp = 0x00007fff5fbfe3b0   rbp = 0x00007fff5fbfe480
19:13:51     INFO -      Found by: call frame info
19:13:51     INFO -  10  XUL!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:4fca4b6565d8 : 263 + 0xc]
19:13:51     INFO -      rbx = 0x0000000000000001   r12 = 0x00000001563bd880
19:13:51     INFO -      r13 = 0x000000010012d8a8   r14 = 0x0000000151ab5b78
19:13:51     INFO -      r15 = 0x00000001055dbda6   rip = 0x000000010181da53
19:13:51     INFO -      rsp = 0x00007fff5fbfe490   rbp = 0x00007fff5fbfe4a0
19:13:51     INFO -      Found by: call frame info
19:13:51     INFO -  11  XUL!mozilla::SharedThreadPool::SpinUntilShutdown() [SharedThreadPool.cpp:4fca4b6565d8 : 78 + 0x11]
19:13:51     INFO -      rbx = 0x0000000151ab5b60   r12 = 0x00000001563bd880
19:13:51     INFO -      r13 = 0x000000010012d8a8   r14 = 0x0000000151ab5b78
19:13:51     INFO -      r15 = 0x00000001055dbda6   rip = 0x0000000103102c8c
19:13:51     INFO -      rsp = 0x00007fff5fbfe4b0   rbp = 0x00007fff5fbfe4c0
19:13:51     INFO -      Found by: call frame info
19:13:51     INFO -  12  XUL!mozilla::MediaShutdownManager::Shutdown() [MediaShutdownManager.cpp:4fca4b6565d8 : 134 + 0x4]
19:13:51     INFO -      rbx = 0x0000000151ab5b60   r12 = 0x00000001563bd880
19:13:51     INFO -      r13 = 0x000000010012d8a8   r14 = 0x0000000151ab5b78
19:13:51     INFO -      r15 = 0x00000001055dbda6   rip = 0x0000000103102b85
19:13:51     INFO -      rsp = 0x00007fff5fbfe4d0   rbp = 0x00007fff5fbfe4f0
19:13:51     INFO -      Found by: call frame info
19:13:51     INFO -  13  XUL!mozilla::MediaShutdownManager::Observe(nsISupports*, char const*, char16_t const*) [MediaShutdownManager.cpp:4fca4b6565d8 : 102 + 0x7]
19:13:51     INFO -      rbx = 0x00000001055dbda6   r12 = 0x00000001563bd880
19:13:51     INFO -      r13 = 0x000000010012d8a8   r14 = 0x0000000151ab5b60
19:13:51     INFO -      r15 = 0x00000001055dbda6   rip = 0x0000000103102a91
19:13:51     INFO -      rsp = 0x00007fff5fbfe500   rbp = 0x00007fff5fbfe510
19:13:51     INFO -      Found by: call frame info
19:13:51     INFO -  14  XUL!nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverList.cpp:4fca4b6565d8 : 96 + 0xe]
19:13:51     INFO -      rbx = 0x0000000000000005   r12 = 0x00000001563bd880
19:13:51     INFO -      r13 = 0x000000010012d8a8   r14 = 0x0000000000000000
19:13:51     INFO -      r15 = 0x00000001055dbda6   rip = 0x00000001018645f8
19:13:51     INFO -      rsp = 0x00007fff5fbfe520   rbp = 0x00007fff5fbfe550
19:13:51     INFO -      Found by: call frame info
19:13:51     INFO -  15  XUL!nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverService.cpp:4fca4b6565d8 : 302 + 0x10]
19:13:51     INFO -      rbx = 0x0000000100144ae0   r12 = 0x00000001055dbda6
19:13:51     INFO -      r13 = 0x0000000100144b08   r14 = 0x0000000000000000
19:13:51     INFO -      r15 = 0x000000010012d8a8   rip = 0x0000000101865ba4
19:13:51     INFO -      rsp = 0x00007fff5fbfe560   rbp = 0x00007fff5fbfe590
19:13:51     INFO -      Found by: call frame info
19:13:51     INFO -  16  XUL!mozilla::ShutdownXPCOM(nsIServiceManager*) [nsXPComInit.cpp:4fca4b6565d8 : 776 + 0xd]
19:13:51     INFO -      rbx = 0x0000000000000000   r12 = 0x0000000000000000
19:13:51     INFO -      r13 = 0x00007fff5fbfe9e8   r14 = 0x000000010012d8a8
19:13:51     INFO -      r15 = 0x0000000000000000   rip = 0x0000000101825c0c
19:13:51     INFO -      rsp = 0x00007fff5fbfe5a0   rbp = 0x00007fff5fbfe610
19:13:51     INFO -      Found by: call frame info
...
Reporter

Comment 1

5 years ago
This intermittent failure is #10 on OrangeFactor - cpearce or roc, could you take a look?
Flags: needinfo?(cpearce)
Reporter

Updated

5 years ago
Flags: needinfo?(roc)
Comment hidden (Legacy TBPL/Treeherder Robot)
This is going to have to be cpearce I think
Flags: needinfo?(roc)
I'll add this to my queue.

This could caused by *anything* in the media stack, so it will take some time to find a solution.
Flags: needinfo?(cpearce)
If we block waiting for the mediadecoder shutdown like this in release builds, and we hang like this,  this could make the user unable to restart Firefox. Which is bad.

What we can do here is in opt (and thus release but not debug) builds is stick a timeout in SharedThreadPool::SpinUntilShutdown(), and give up if we don't complete shutdown before the timeout expires. Then we're less likely to cause grief with users, but we'll still get data to help us solve this.
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Assignee

Comment 9

5 years ago
It looks like some leaks preventing sPools->Count() from reaching 0 in SharedThreadPool::Release and ShutdownPoolsEvent is not dispatched. SharedThreadPool::SpinUntilShutdown waits until sPools is destroyed which never happens.
Reporter

Comment 10

5 years ago
Thank you for looking at this :-)
Assignee

Comment 11

5 years ago
Posted patch 1004669_fix_leaks.patch (obsolete) — Splinter Review
MediaTaskQueue::Dispatch doesn't take the memory ownership of aRunnable when mIsShutdown is true. (https://hg.mozilla.org/try/file/20ca234fd62b/content/media/MediaTaskQueue.cpp#l33) And there will be a leak if passing |new nsRunnable(xxx)| and the memory ownership isn't transferred. (https://hg.mozilla.org/try/file/20ca234fd62b/content/media/MediaDecoderStateMachine.cpp#l1610)

nsThread::DispatchInternal also has such a problem which can return early before transferring the memory ownership. (https://hg.mozilla.org/try/file/20ca234fd62b/xpcom/threads/nsThread.cpp#l453)

Should we consider |thread->Dispatch(new nsRunnable())| a bad idiom or parameter as raw pointer being bad for the memory ownership is unclear?
Attachment #8420621 - Flags: review?(cpearce)
Attachment #8420621 - Flags: feedback?(rlin)
(In reply to JW Wang [:jwwang] from comment #11)
> Should we consider |thread->Dispatch(new nsRunnable())| a bad idiom or
> parameter as raw pointer being bad for the memory ownership is unclear?

Yes! This is a bad idiom. I hit this last week in new code I was writing. We should prevent these silly mistakes.
Comment on attachment 8420621 [details] [diff] [review]
1004669_fix_leaks.patch

Review of attachment 8420621 [details] [diff] [review]:
-----------------------------------------------------------------

Good catch, but we can tweak the interface to make it explicit.

::: content/media/MediaTaskQueue.cpp
@@ +26,5 @@
>    MOZ_COUNT_DTOR(MediaTaskQueue);
>  }
>  
>  nsresult
> +MediaTaskQueue::Dispatch(RefPtr<nsIRunnable> aRunnable)

Please make this:

nsresult
MediaTaskQueue::Dispatch(TemporaryRef<nsIRunnable> aRunnable)
{
  RefPtr<nsIRunnable> runnable(aRunnable);
  // ...

Then the caller must explicitly call event.forget() and pass ownership of a reference to the task queue, though it may retain another reference it if wishes.
Attachment #8420621 - Flags: review?(cpearce) → review-
Assignee

Comment 15

5 years ago
Sometimes I am confused with when to use nsRefPtr or RefPtr. It appears to me RefPtr+TemporaryRef are easier to use and read than nsRefPtr+already_AddRefed. Should we also replace nsRefPtr in MediaDecoderStateMachine?
(In reply to JW Wang [:jwwang] from comment #15)
> Sometimes I am confused with when to use nsRefPtr or RefPtr. It appears to
> me RefPtr+TemporaryRef are easier to use and read than
> nsRefPtr+already_AddRefed. Should we also replace nsRefPtr in
> MediaDecoderStateMachine?

I prefer mozilla::RefPtr. I started a thread in m.d.platform to clarify:

https://groups.google.com/forum/#!topic/mozilla.dev.platform/Xsuwy4h7lNE
Assignee

Comment 17

5 years ago
Use TemporaryRef to prevent loss of memory ownership.
Assignee: nobody → jwwang
Attachment #8420621 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8420621 - Flags: feedback?(rlin)
Attachment #8420744 - Flags: feedback?(rlin)
Comment on attachment 8420744 [details] [diff] [review]
1004669_fix_leaks.patch

Look good to me. :)
Attachment #8420744 - Flags: feedback?(rlin) → feedback+
Assignee

Updated

5 years ago
Attachment #8420744 - Flags: review?(cpearce)
Attachment #8420744 - Flags: review?(cpearce) → review+
https://hg.mozilla.org/mozilla-central/rev/fbae0dc61cd3
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Given how this bug started (spun off from a big dumping ground bug), I'm not entirely sure what all branches are affected by this. AFAICT, both Aurora and Beta are affected? JW, can you please confirm and nominate this patch for whatever branches need it? :)
Flags: needinfo?(jwwang)
Assignee

Comment 23

5 years ago
[Approval Request Comment]
Bug caused by (feature/regressing bug #): 962385
User impact if declined: browser might hang if shutdown when playing music
Testing completed (on m-c, etc.): try: https://tbpl.mozilla.org/?tree=Try&rev=f4a1a4a84069
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #8421592 - Flags: review+
Attachment #8421592 - Flags: approval-mozilla-aurora?
Flags: needinfo?(jwwang)
Assignee

Comment 24

5 years ago
[Approval Request Comment]
Bug caused by (feature/regressing bug #): 962385
User impact if declined: browser might hang if shutdown when playing music
Testing completed (on m-c, etc.): try: https://tbpl.mozilla.org/?tree=Try&rev=37893fcdde60
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #8421595 - Flags: review+
Attachment #8421595 - Flags: approval-mozilla-beta?
Attachment #8421592 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8421595 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Whiteboard: [qa-]
Reporter

Comment 27

5 years ago
(Tweaking summary to avoid future TBPL false positives, now this is fixed)
Summary: Intermittent *Windows & OS X* TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output (with MediaShutdownManager::Shutdown() on the stack) → Intermittent Windows & OS X shutdown timeout with MediaShutdownManager::Shutdown() on the stack
You need to log in before you can comment on or make changes to this bug.