Closed Bug 1166107 Opened 9 years ago Closed 9 years ago

shutdown hang SharedDecoderManager::DrainComplete()/Flush() deadlock

Categories

(Core :: Audio/Video, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox40 + fixed
firefox41 + fixed

People

(Reporter: karlt, Assigned: karlt)

References

Details

(Keywords: regression)

Attachments

(3 files, 1 obsolete file)

https://crash-stats.mozilla.com/report/index/930399a3-9302-499f-a7b6-deb492150517#allthreads

SharedDecoderManager::SetIdle() is holding its monitor while calling Flush().
http://hg.mozilla.org/releases/mozilla-aurora/annotate/b03c5f1a77a1/dom/media/fmp4/SharedDecoderManager.cpp#l173

WMFMediaDataDecoder::Flush() waits to run a task on the same queue as
DrainComplete() runs.

SharedDecoderManager::DrainComplete() is waiting for the monitor held by
SharedDecoderManager::SetIdle().
http://hg.mozilla.org/releases/mozilla-aurora/annotate/b03c5f1a77a1/dom/media/fmp4/SharedDecoderManager.cpp#l196

Holding the lock during Flush() is a regression from
http://hg.mozilla.org/releases/mozilla-aurora/rev/380b0a052ef0

I suspect there was an incomplete external Drain before SetIdle() set
mWaitForInternalDrain.  The wait for the internal drain has picked up the
DrainComplete() from the external Drain and the DrainComplete() from the
internal Drain runs after the Wait() acquires the monitor again.
[Tracking Requested - why for this release]:
Recently introduced cause of video playback failure and shutdown hang.
Enabled tracking for 40 and 41 — because what Karl said. :)
The DrainComplete() caught with mWaitForInternalDrain still won't necessarily
be from the internal Drain(), but all we need is that one DrainComplete() is
caught for the internal Drain() because one more will be generated if there is
a Drain() in progress.

What protecting mWaitForInternalDrain access with the monitor provides here is
that the compiler won't use its address for storage of other data meaningless
in the context of mWaitForInternalDrain and so, for example, two
DrainComplete() calls won't unintentionally think that they are both for one
internal drain.
And TSan warnings.
Attachment #8609159 - Flags: review?(gsquelart)
Assignee: nobody → karlt
Status: NEW → ASSIGNED
Comment on attachment 8609159 [details] [diff] [review]
release internal drain monitor before calling Flush()

Review of attachment 8609159 [details] [diff] [review]:
-----------------------------------------------------------------

r+.
Optional: Fix typo if easy enough (not your typo, but we might as well correct it while we're nearby).

::: dom/media/platforms/SharedDecoderManager.cpp
@@ +169,1 @@
>        // We don't want to hold the lock while calling Drain() has some

"has" -> "as"
Attachment #8609159 - Flags: review?(gsquelart) → review+
Attachment #8609160 - Flags: review?(gsquelart) → review+
Comment on attachment 8609159 [details] [diff] [review]
release internal drain monitor before calling Flush()

Review of attachment 8609159 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/platforms/SharedDecoderManager.cpp
@@ +173,2 @@
>      if (NS_SUCCEEDED(rv)) {
> +      MonitorAutoLock mon(mMonitor);

Is it possible that this monitor might make DrainComplete() unable to acquire the lock and cause a dead lock in next while below?
Comment on attachment 8609159 [details] [diff] [review]
release internal drain monitor before calling Flush()

Review of attachment 8609159 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/platforms/SharedDecoderManager.cpp
@@ +173,2 @@
>      if (NS_SUCCEEDED(rv)) {
> +      MonitorAutoLock mon(mMonitor);

wait() releases the lock
(In reply to Jean-Yves Avenard [:jya] from comment #7)
> Comment on attachment 8609159 [details] [diff] [review]
> release internal drain monitor before calling Flush()
> 
> Review of attachment 8609159 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/media/platforms/SharedDecoderManager.cpp
> @@ +173,2 @@
> >      if (NS_SUCCEEDED(rv)) {
> > +      MonitorAutoLock mon(mMonitor);
> 
> wait() releases the lock
Oh,my bad! I read it carelessly...XD
Attached patch 40 branch patch (obsolete) — Splinter Review
Approval Request Comment
[Feature/regressing bug #]: bug 1161443
[User impact if declined]: video playback failure and shutdown hang
[Describe test coverage new/current, TreeHerder]:
no new tests; this is a potential cause of intermittent failure in existing tests.
[Risks and why]: low risk because the deviation from existing code is very small.
[String/UUID change made/needed]: none
Attachment #8611598 - Flags: approval-mozilla-aurora?
Comment on attachment 8611598 [details] [diff] [review]
40 branch patch

Shutdown hangs are a pain. Taking it.
Attachment #8611598 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
applying 1166107-8611598
failed to synchronize metadata for "dom/media/fmp4/SharedDecoderManager.cpp"
Flags: needinfo?(karlt)
In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #14)
> applying 1166107-8611598
> failed to synchronize metadata for "dom/media/fmp4/SharedDecoderManager.cpp"

Thanks.
I guess something was being fussy about the arguments listed against diff.
Flags: needinfo?(karlt)
Attachment #8611598 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.