[EME] crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | nsThread::Shutdown()

RESOLVED FIXED in Firefox 41

Status

()

defect
P2
critical
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: cpearce, Assigned: cpearce)

Tracking

(Blocks 1 bug, {crash})

unspecified
mozilla42
x86
Windows NT
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox41+ fixed, firefox42 fixed)

Details

(crash signature)

Attachments

(1 attachment)

This bug was filed from the Socorro interface and is 
report bp-ba768898-9c17-4077-9199-2d4cb2150723.
=============================================================

Note the MediaDataDecoderProxy in the crash report; Firefox was using EME at the time of shutdown.

We have 16 reports of the so far in 40b6, which has only been out a few days.
Thread 30, in the reader's task queue, holds the decoder monitor in TrackBuffer::ContinueShutdown(). It calls into MediaDataDecoderProxy::Flush() which waits on a Condition. Thread 29, the MDSM thread, tries to take the decoder monitor, and MDSM deadlocks.

Whatever was supposed to awaken the Condition that MediaDataDecoderProxy::Flush() is waiting on didn't work.
I think what can happen is if the MediaDataDecoderProxy is blocked waiting for a ResetComplete() or a DrainComplete() callback, and the GMP shuts down or crashes before it has sent the Complete message back, the MediaDataDecoderProxy will hang forever waiting for the callback.

I think this could happen if the GMP service's shutdown observer runs before the MediaDecoder's.
Assignee: nobody → cpearce
Status: NEW → ASSIGNED
Attachment #8639064 - Flags: review?(gsquelart)
Comment on attachment 8639064 [details] [diff] [review]
Patch: Ensure we send Reset/Drain complete notifications no matter what happens in GMP{Audio,Video}Decoder.

Review of attachment 8639064 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with nits.

::: dom/media/gmp/GMPAudioDecoderParent.cpp
@@ +145,5 @@
>    MOZ_ASSERT(!mPlugin || mPlugin->GMPThread() == NS_GetCurrentThread());
>  
> +  // Ensure if we've received a Close while waiting for a ResetComplete
> +  // or DrainComplete notification, we'll unlbock the caller before processing
> +  // the close. This seems unlikely to happen, but better to be careful...

unlbock -> unblock
(9 times in this patch, do a find-replace)
Also I'd lose the ellipsis.

@@ +252,5 @@
>      return false;
>    }
>  
> +  if (!mIsAwaitingDrainComplete) {
> +    NS_WARNING("Received unexpected (or canceled) GMPAudioDecoder DrainComplete message");

Do we really need a warning here? (And 3 others in this patch)
Though rare, it could be part of a perfectly normal sequence of events, couldn't it?
E.g.: Parent sends a drain request; then parent shuts down (because of user action), which unblocks the parent drain; then child's drain-complete response finally arrives.
Bias disclosure: I'm suffering from warning-fatigue!

@@ +280,1 @@
>    mCallback->ResetComplete();

Style nitpick, I prefer the style in RecvDrainComplete() above, i.e.:
"""
  return true;
}
mIsAwaitingResetComplete = false;

// Ignore any return code. It is OK for this to fail without killing the process.
mCallback->ResetComplete();
"""

And for extra OCD points, pick whether you leave a blank line before "mIsAwaitingFor...=false", and use the same style in both cpp files.
Attachment #8639064 - Flags: review?(gsquelart) → review+
Should uplift.
Flags: needinfo?(cpearce)
https://hg.mozilla.org/mozilla-central/rev/9bccb566456b
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Comment on attachment 8639064 [details] [diff] [review]
Patch: Ensure we send Reset/Drain complete notifications no matter what happens in GMP{Audio,Video}Decoder.

Approval Request Comment
[Feature/regressing bug #]: EME
[User impact if declined]: We are getting a few shutdown hang reports when shutting down when EME GMPs are in use. This patch ensures that the GMP code "hangs up" definitively when talking to the Firefox media stack, so Firefox can shutdown cleanly. Without this patch, we may continue to get shutdown hangs.
[Describe test coverage new/current, TreeHerder]: We have lots of EME mochitest that cover this code path.
[Risks and why]: I think this is lowish risk, but since it deals with shutdown of a multi-threaded system, I think we should be a little careful and only uplift to Aurora41, not to Beta42 this late in the cycle.
[String/UUID change made/needed]: None.
Attachment #8639064 - Flags: approval-mozilla-aurora?
Chris, since this is a non-trivial patch, I was looking at the treeherder link and saw several tests are purple i.e. had exceptions and one busted (in red) test. Do those test issues need investigation?
(In reply to Ritu Kothari (:ritu) from comment #10)
> Chris, since this is a non-trivial patch, I was looking at the treeherder
> link and saw several tests are purple i.e. had exceptions and one busted (in
> red) test. Do those test issues need investigation?

The push looks fine to me.

If you look in the log for B2G ICS 24, you'll see "Automation Error" at the start. My code is not in automation, it's run by automation. So that can't be my patch.

Purple is infrastructure failure; my code is not in the infrastructure/automation, so it can't be me. 

Most of the other orange is like that too.

It looks like there are a few orange that are probably already fixed in Nightly, or not run on Aurora.
Comment on attachment 8639064 [details] [diff] [review]
Patch: Ensure we send Reset/Drain complete notifications no matter what happens in GMP{Audio,Video}Decoder.

This patch has been in Nightly for ~2 weeks. No known negative impact, seems safe to uplift to Aurora. Chris confirmed that the try push (despite a few failures) is unrelated to his code change. Approved.
Attachment #8639064 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flags: needinfo?(cpearce)
You need to log in before you can comment on or make changes to this bug.