Closed Bug 1161892 Opened 9 years ago Closed 9 years ago

media threads wait for other tasks dispatched to the same thread pool

Categories

(Core :: Audio/Video, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox39 --- fixed
firefox40 --- fixed

People

(Reporter: karlt, Assigned: karlt)

References

Details

Attachments

(2 files, 1 obsolete file)

Noticed by Gerald in bug 1138294 comment 7.

There may not be more threads available in the pool for more tasks to run.

https://hg.mozilla.org/try/rev/6e46a830fb71
(In reply to Karl Tomlinson (ni?:karlt) from comment #0)
> Noticed by Gerald in

bug 1138294 comment 17.
Isn't this the same as bug 1161405 (which is what is causing this bug)
I'd say this bug here is about trying not to create inter-blocking tasks; bug 1161405 is about the pool not creating enough threads when necessary&possible -- but there would still be a limit which we could theoretically reach in heavy situations, that's why this new bug is useful.
I don't know whether the state machine still waits on the reader anywhere.
This may be a candidate for uplifting to other branches.

I picked 8 threads because it is approximately one third the current 25, and
not many systems have more than 8 parallel processing units.
Attachment #8601915 - Flags: review?(bobbyholley)
Comment on attachment 8601915 [details] [diff] [review]
use different thread pools for task queues that wait on others

Review of attachment 8601915 [details] [diff] [review]:
-----------------------------------------------------------------

What are the actual tasks that block? Is it just the PDM task queues? I did a bunch of work to make MDSM and readers share the same task queue, under the assumption that neither of them should ever block on other task queues. Can you be more specific as to which tasks are blocking on which?

::: dom/media/MediaDecoderStateMachine.cpp
@@ +201,5 @@
>  MediaDecoderStateMachine::MediaDecoderStateMachine(MediaDecoder* aDecoder,
>                                                     MediaDecoderReader* aReader,
>                                                     bool aRealTime) :
>    mDecoder(aDecoder),
> +  mTaskQueue(new MediaTaskQueue(GetMediaThreadPool(MediaThreadType::READER),

Don't you want MDSM here?
Attachment #8601915 - Flags: review?(bobbyholley)
Trying this patch with my some heavy tests from bug 1138294:

Before: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab103658c37b - TL;DR: All media mochitests are red or orange on Linux.

After: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b82ebcb0195 - TL;DR: Only 1 isolated EME test failure out of 50 test runs on Linux (which I believe is not related to this issue here).

So much better!
(In reply to Bobby Holley (:bholley) from comment #6)
> What are the actual tasks that block? Is it just the PDM task queues? I did
> a bunch of work to make MDSM and readers share the same task queue, under
> the assumption that neither of them should ever block on other task queues.
> Can you be more specific as to which tasks are blocking on which?

What I've seen in my tests:
A bunch of video elements are garbage-collected, for each of them at some point there's a call to MSDM::SetDormant(), which dispatches MediaSourceReader::ReleaseMediaResources on the decode task queue.
http://mxr.mozilla.org/mozilla-central/source/dom/media/MediaDecoderStateMachine.cpp#1556

From there, we have MSR::ReleaseMediaResources() -> MP4Reader::ReleaseMediaResources() -> SourceBufferDecoder::GetVideoFrameContainer() -> SharedDecoderProxy::Shutdown() -> SharedDecoderManager::SetIdle().
SetIdle first calls SharedDecoderProxy::Drain() -> FFmpegH264Decoder<LIBAV_VER>::Drain(), which dispatches FFmpegH264Decoder<LIBAV_VER>::DoDrain() on the SDM queue.
Then SetIdle waits for a flag to toggle.
DoDrain is supposed to do some work and then call DrainComplete() on the SDM, which would toggle the flag SetIdle is waiting for.

So in summary:
A- Decode task queue waiting in SDM::SetIdle
B- SDM task queue supposed to unblock SDM::SetIdle
But in my tests, we get a flood of 'A' tasks close together, using and blocking too many of the media ThreadPool threads, so 'B' tasks never get to run.
See Also: → 1162306
Why does SharedDecoderManager::SetIdle need to block? Can't it return a promise?

For context, I dug up and fixed a bunch of these issues before when I was getting bug 1142336 landed. Separating task queues just wallpapers over these problems - they can still occur, you just need two blocking sites rather than one. Imagine, for example, that |B| tasks had a sub-step that required something to happen on the decode task queue - you'd run into the same problem.

The answer is to remove blocking calls, or if the platform API absolutely requires them, make a separate task queue and (small) threadpool for performing them. The situation here is analogous to locking - the setup only works if you tightly-scope the acquisition of blocking resources.
(In reply to Bobby Holley (:bholley) from comment #9)
> Why does SharedDecoderManager::SetIdle need to block? Can't it return a
> promise?

It doesn't have to, but currently SetIdle() is called from various places which are synchronous... Fair amount of work to change it all ; especially when the short term goal is to remove SharedDecoderManager alltogether.

Yes, returning a promise here is the way to go, should we kept the SharedDecoderManager.
IMO this patch here is a useful quick almost-fix to some issues that will eventually be properly fixed through promises.
(In reply to Gerald Squelart [:gerald] from comment #11)
> IMO this patch here is a useful quick almost-fix to some issues that will
> eventually be properly fixed through promises.

The patch here is overly-general as a quick-fix, and undoes the work that I did in bug 1142336. If we want to wallpaper over this specific issue, we should just run the SDM on a separate small threadpool.
(In reply to Bobby Holley (:bholley) from comment #12)
> The patch here is overly-general as a quick-fix, and undoes the work that I
> did in bug 1142336. If we want to wallpaper over this specific issue, we
> should just run the SDM on a separate small threadpool.

Obviously I don't know about the bigger picture!
Sorry for the wrong advice, please ignore me -- and my eagerness to selfishly get my own dependent tests landed. ;-)
so that platform decoder tasks will run when their readers wait and block
their thread pool.

(In reply to Bobby Holley (:bholley) from comment #12)
> The patch here is overly-general as a quick-fix, and undoes the work that I
> did in bug 1142336. If we want to wallpaper over this specific issue, we
> should just run the SDM on a separate small threadpool.

The SDM runs on the same task queue as the readers.  The issue is general with
the MediaDataDecoders [1], with SDM adding another case of an existing issue.

[1] https://hg.mozilla.org/mozilla-central/annotate/60349cbc3d4e/dom/media/fmp4/PlatformDecoderModule.h#l193

(In reply to Bobby Holley (:bholley) from comment #9)
> Imagine, for example, that |B| tasks had a sub-step that
> required something to happen on the decode task queue - you'd run into the
> same problem.

Yes, and my comments were incomplete.  That doesn't happen, and I've changed
the comments to explain.

> The answer is to remove blocking calls, or if the platform API absolutely
> requires them, make a separate task queue and (small) threadpool for
> performing them. The situation here is analogous to locking - the setup only
> works if you tightly-scope the acquisition of blocking resources.

Promises provide a solution here, thank you, but dependency chain on the
blocking here is long, and so such a solution would not be appropriate for
branch uplift.

We have a number of shutdown hangs and this is a demonstrated cause for some
of them, so I'll looking to uplift this.

(In reply to Bobby Holley (:bholley) from comment #6)
> I did
> a bunch of work to make MDSM and readers share the same task queue, under
> the assumption that neither of them should ever block on other task queues.

Thanks.
I'd  flagged you for review because I didn't know how complete that work was.
This version has readers and SM using the same task queue.

> Can you be more specific as to which tasks are blocking on which?

MDD::Flush and SDM::SetIdle are reader tasks blocking on MDD tasks.
Attachment #8602474 - Flags: review?(bobbyholley)
Comment on attachment 8602474 [details] [diff] [review]
use separate thread pool for platform decoder task queues

Review of attachment 8602474 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks. r=bholley

::: dom/media/VideoUtils.h
@@ +219,5 @@
> +// The MediaDataDecoder API blocks, with implementations waiting on platform
> +// decoder tasks.  These platform decoder tasks are queued on a separate
> +// thread pool to ensure they can run when the MediaDataDecoder clients'
> +// thread pool is blocked.  Tasks on the PLATFORM_DECODER thread pool must not
> +// wait on tasks in the PLAYBACK thread pool.

Please also comment that we intend to move away from this model and re-unify the thread pool.

::: dom/media/mediasource/TrackBuffer.cpp
@@ +53,5 @@
>  {
>    MOZ_COUNT_CTOR(TrackBuffer);
>    mParser = ContainerParser::CreateForMIMEType(aType);
> +  mTaskQueue =
> +    new MediaTaskQueue(GetMediaThreadPool(MediaThreadType::PLATFORM_DECODER));

Hm, this is the initialization task queue for the track buffer, right? Does that do anything that blocks, or can this stay on the playback task queue?
Attachment #8602474 - Flags: review?(bobbyholley) → review+
(In reply to Bobby Holley (:bholley) from comment #15)
> Please also comment that we intend to move away from this model and re-unify
> the thread pool.

+// No new dependencies on this mechanism should be added, as methods are being
+// made async supported by MediaPromise, making this unnecessary and
+// permitting unifying the pool.

> ::: dom/media/mediasource/TrackBuffer.cpp
> @@ +53,5 @@
> >  {
> >    MOZ_COUNT_CTOR(TrackBuffer);
> >    mParser = ContainerParser::CreateForMIMEType(aType);
> > +  mTaskQueue =
> > +    new MediaTaskQueue(GetMediaThreadPool(MediaThreadType::PLATFORM_DECODER));
> 
> Hm, this is the initialization task queue for the track buffer, right? Does
> that do anything that blocks, or can this stay on the playback task queue?

Yes, that was a mistake, thanks.  Now PLAYBACK, as it should be.
(In reply to Bobby Holley (:bholley) from comment #15)
> Hm, this is the initialization task queue for the track buffer, right? Does
> that do anything that blocks, or can this stay on the playback task queue?

The initialisation in the TrackBuffer now uses the reader task queue.
The TrackBuffer's taskqueue is only use to shutdown the sub-readers.
Comment on attachment 8602474 [details] [diff] [review]
use separate thread pool for platform decoder task queues

Review of attachment 8602474 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/VideoUtils.h
@@ +219,5 @@
> +// The MediaDataDecoder API blocks, with implementations waiting on platform
> +// decoder tasks.  These platform decoder tasks are queued on a separate
> +// thread pool to ensure they can run when the MediaDataDecoder clients'
> +// thread pool is blocked.  Tasks on the PLATFORM_DECODER thread pool must not
> +// wait on tasks in the PLAYBACK thread pool.

The MediaDataDecoder doesn't block, it's fully async (other than flush). It's the SharedDecoderManager that does. In particular SharedDecoderManager::SetIdle()
Attachment #8601915 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/8c1d33d0ad8b
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Of a sample of ten 39.0a2 20150511004005 shutdown hangs tracked in
bug 1129455, 4 were clearly caused by this bug.

25 threads waiting in WMFMediaDataDecoder::Flush() for a reply from a thread
that won't start, because the pool is full.
https://crash-stats.mozilla.com/report/index/275f7571-8644-4e93-883c-c27142150512#allthreads

20 threads waiting in WMFMediaDataDecoder::Flush()
5 in MediaTaskQueue::AwaitIdleLocked() from MP4Reader::Shutdown()
https://crash-stats.mozilla.com/report/index/0f8502ff-4c0b-4b84-9d9d-556ee2150513#allthreads

25 threads waiting in WMFMediaDataDecoder::Flush()
https://crash-stats.mozilla.com/report/index/f248bd4e-66f4-406f-849d-f023a2150512#allthreads

14 threads waiting in WMFMediaDataDecoder::Flush()
11 in AwaitIdleLocked() from WMFMediaDataDecoder::ReleaseMediaResources()
https://crash-stats.mozilla.com/report/index/bfe95eb1-f557-4959-ba20-027d52150512#allthreads
Blocks: 1129455
Comment on attachment 8602474 [details] [diff] [review]
use separate thread pool for platform decoder task queues

Approval Request Comment
[Feature/regressing bug #]: old media bug
[User impact if declined]: shutdown hang
[Describe test coverage new/current, TreeHerder]: on m-c
Effectiveness of patch verified in bug 1138294
[Risks and why]: low given the simple nature of the patch.
[String/UUID change made/needed]: none.
Attachment #8602474 - Flags: approval-mozilla-beta?
Comment on attachment 8602474 [details] [diff] [review]
use separate thread pool for platform decoder task queues

Approved for uplift to beta (39). It will be great if this helps decrease our rate of shutdown hangs.
Attachment #8602474 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Needs rebasing for Beta uplift.
Flags: needinfo?(karlt)
Attached patch 39 beta patchSplinter Review
Flags: needinfo?(karlt)
You need to log in before you can comment on or make changes to this bug.