Closed Bug 1187163 Opened 9 years ago Closed 9 years ago

[EME] crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool) | NS_ProcessNextEvent(nsIThread, bool) | nsThread::Shutdown()

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla42

Tracking Flags:

Tracking

Status

firefox41

fixed

firefox42

---

fixed

People

(Reporter: cpearce, Assigned: cpearce)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

Patch: Ensure we send Reset/Drain complete notifications no matter what happens in GMP{Audio,Video}Decoder. 9 years ago Chris Pearce [:cpearce (Not reading bugmail)] 13.33 KB, patch	mozbugz : review+ ritu : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Description

•

9 years ago

This bug was filed from the Socorro interface and is report bp-ba768898-9c17-4077-9199-2d4cb2150723. ============================================================= Note the MediaDataDecoderProxy in the crash report; Firefox was using EME at the time of shutdown. We have 16 reports of the so far in 40b6, which has only been out a few days.

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Comment 1

•

9 years ago

Thread 30, in the reader's task queue, holds the decoder monitor in TrackBuffer::ContinueShutdown(). It calls into MediaDataDecoderProxy::Flush() which waits on a Condition. Thread 29, the MDSM thread, tries to take the decoder monitor, and MDSM deadlocks. Whatever was supposed to awaken the Condition that MediaDataDecoderProxy::Flush() is waiting on didn't work.

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Comment 2

•

9 years ago

Attached patch Patch: Ensure we send Reset/Drain complete notifications no matter what happens in GMP{Audio,Video}Decoder. — Details — Splinter Review

I think what can happen is if the MediaDataDecoderProxy is blocked waiting for a ResetComplete() or a DrainComplete() callback, and the GMP shuts down or crashes before it has sent the Complete message back, the MediaDataDecoderProxy will hang forever waiting for the callback. I think this could happen if the GMP service's shutdown observer runs before the MediaDecoder's.

Assignee: nobody → cpearce

Status: NEW → ASSIGNED

Attachment #8639064 - Flags: review?(gsquelart)

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Comment 3

•

9 years ago

Comment on attachment 8639064 [details] [diff] [review] Patch: Ensure we send Reset/Drain complete notifications no matter what happens in GMP{Audio,Video}Decoder. Review of attachment 8639064 [details] [diff] [review]: ----------------------------------------------------------------- r+ with nits. ::: dom/media/gmp/GMPAudioDecoderParent.cpp @@ +145,5 @@ > MOZ_ASSERT(!mPlugin || mPlugin->GMPThread() == NS_GetCurrentThread()); > > + // Ensure if we've received a Close while waiting for a ResetComplete > + // or DrainComplete notification, we'll unlbock the caller before processing > + // the close. This seems unlikely to happen, but better to be careful... unlbock -> unblock (9 times in this patch, do a find-replace) Also I'd lose the ellipsis. @@ +252,5 @@ > return false; > } > > + if (!mIsAwaitingDrainComplete) { > + NS_WARNING("Received unexpected (or canceled) GMPAudioDecoder DrainComplete message"); Do we really need a warning here? (And 3 others in this patch) Though rare, it could be part of a perfectly normal sequence of events, couldn't it? E.g.: Parent sends a drain request; then parent shuts down (because of user action), which unblocks the parent drain; then child's drain-complete response finally arrives. Bias disclosure: I'm suffering from warning-fatigue! @@ +280,1 @@ > mCallback->ResetComplete(); Style nitpick, I prefer the style in RecvDrainComplete() above, i.e.: """ return true; } mIsAwaitingResetComplete = false; // Ignore any return code. It is OK for this to fail without killing the process. mCallback->ResetComplete(); """ And for extra OCD points, pick whether you leave a blank line before "mIsAwaitingFor...=false", and use the same style in both cpp files.

Attachment #8639064 - Flags: review?(gsquelart) → review+

Pulsebot

Comment 4

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/9bccb566456b

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Comment 5

•

9 years ago

Should uplift.

Flags: needinfo?(cpearce)

Ryan VanderMeulen [:RyanVM]

Comment 6

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/9bccb566456b

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox42: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla42

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Comment 7

•

9 years ago

Aurora try push: https://hg.mozilla.org/try/pushloghtml?changeset=d0ca0ec537ca Beta try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0b9dac96987

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Comment 8

•

9 years ago

(In reply to Chris Pearce (:cpearce) from comment #7) > Aurora try push: > https://hg.mozilla.org/try/pushloghtml?changeset=d0ca0ec537ca This should be: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0ca0ec537ca

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Comment 9

•

9 years ago

Comment on attachment 8639064 [details] [diff] [review] Patch: Ensure we send Reset/Drain complete notifications no matter what happens in GMP{Audio,Video}Decoder. Approval Request Comment [Feature/regressing bug #]: EME [User impact if declined]: We are getting a few shutdown hang reports when shutting down when EME GMPs are in use. This patch ensures that the GMP code "hangs up" definitively when talking to the Firefox media stack, so Firefox can shutdown cleanly. Without this patch, we may continue to get shutdown hangs. [Describe test coverage new/current, TreeHerder]: We have lots of EME mochitest that cover this code path. [Risks and why]: I think this is lowish risk, but since it deals with shutdown of a multi-threaded system, I think we should be a little careful and only uplift to Aurora41, not to Beta42 this late in the cycle. [String/UUID change made/needed]: None.

Attachment #8639064 - Flags: approval-mozilla-aurora?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 10

•

9 years ago

Chris, since this is a non-trivial patch, I was looking at the treeherder link and saw several tests are purple i.e. had exceptions and one busted (in red) test. Do those test issues need investigation?

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Comment 11

•

9 years ago

(In reply to Ritu Kothari (:ritu) from comment #10) > Chris, since this is a non-trivial patch, I was looking at the treeherder > link and saw several tests are purple i.e. had exceptions and one busted (in > red) test. Do those test issues need investigation? The push looks fine to me. If you look in the log for B2G ICS 24, you'll see "Automation Error" at the start. My code is not in automation, it's run by automation. So that can't be my patch. Purple is infrastructure failure; my code is not in the infrastructure/automation, so it can't be me. Most of the other orange is like that too. It looks like there are a few orange that are probably already fixed in Nightly, or not run on Aurora.

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 12

•

9 years ago

Comment on attachment 8639064 [details] [diff] [review] Patch: Ensure we send Reset/Drain complete notifications no matter what happens in GMP{Audio,Video}Decoder. This patch has been in Nightly for ~2 weeks. No known negative impact, seems safe to uplift to Aurora. Chris confirmed that the try push (despite a few failures) is unrelated to his code change. Approved.

Attachment #8639064 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

9 years ago

status-firefox41: --- → affected

tracking-firefox41: --- → +

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Comment 13

•

9 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/431772a2d682

status-firefox41: affected → fixed

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Updated

•

9 years ago

Flags: needinfo?(cpearce)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

[EME] crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool) | NS_ProcessNextEvent(nsIThread, bool) | nsThread::Shutdown()

Categories

(Core :: Audio/Video: Playback, defect, P2)

Tracking

()

People

(Reporter: cpearce, Assigned: cpearce)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Comment 13

Updated

Attachment

General

Description

File Name

Content Type

[EME] crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | nsThread::Shutdown()

[EME] crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool) | NS_ProcessNextEvent(nsIThread, bool) | nsThread::Shutdown()