Closed Bug 1142336 Opened 9 years ago Closed 9 years ago

Unify state machine and decode thread pools and enable parallel state machine execution

Categories

(Core :: Audio/Video, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: bholley, Assigned: bholley)

References

Details

Attachments

(1 file)

The big win of making the state machine run on a task queue (bug 1135424) is that it allows us to run state machines concurrently. This helps us in situations where a state machine blocks (i.e. shutting down the audio thread), because it avoids janking other unrelated state machines.

To help narrow down regressions, I'm landing the refactoring (bug 1135424) separately and running the queues on a pool with maxThreads = 1. Once that's settled, the code changes to run state machines concurrently should be trivial, but we may uncover a new class of bugs from running state machines concurrently. Time will tell.

I'm also going to take this opportunity to use a single unified SharedThreadPool for both state machine and decode task queues.
Note to self - this would also be a good moment to rename On{Decode,StateMachine}Thread to On{Decode,StateMachine}TaskQueue.
Blocks: 1144519
This allows for parallel MDSM execution. \o/
Attachment #8579117 - Flags: review?(matt.woodrow)
Comment on attachment 8579117 [details] [diff] [review]
Create one unified thread pool for media code and run the MDSM task queues on it. v1

Review of attachment 8579117 [details] [diff] [review]:
-----------------------------------------------------------------

\o/ indeed.
Attachment #8579117 - Flags: review?(matt.woodrow) → review+
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/a907e799d8d4 - not yet clear how much of the other bustage is from this as well, but it already seems clear that what I think are shutdown hangs, primarily in crashtests but also in some reftest runs (https://treeherder.mozilla.org/logviewer.html#?job_id=7780984&repo=mozilla-inbound, crash [@ mozilla::::RunWatchdog] on most platforms, but https://treeherder.mozilla.org/logviewer.html#?job_id=7780050&repo=mozilla-inbound, shutdown 330 seconds without output on Android) are from this. Probably also from this, intermittent but frequent timeouts in various web-platform-reftest webvtt tests. Then there's a bunch of random timeouts and other failures in various media tests, which it's difficult to attribute because there are so many preexisting failures in various media tests.
And because b2g has to be different, b2g emulator crashtest-1, made "dom/media/test/crashtests/691096-1.html | application timed out after 330 seconds with no output" permaorange.
Depends on: 1145203
Hm. So the problem here is that we end up trying to shut down 25 MediaDecoders at once - all 25 state machine task queues grab a thread, run the state machine, and then block on draining the decode task queue. But because we're sharing the thread pool between state machine and decode, and because we've exhausted our capacity of threads, we deadlock. Very interesting!

One approach here would be to just use separate thread pools after all. But I think this deadlock can only happen when task queue A blocks on task queue B _and_ task queue B is not running. This excludes resource-based blocking, and so AwaitIdle is really the only situation where I can think of it happening. So I'm going to try just removing that in bug 1145203.
Blocks: MediaMonitor
Depends on: 1147219
(In reply to Bobby Holley (:bholley) from comment #9)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ce71301ce26

This is all green except for that pesky ICS emulator C1 failure which I just discovered is basically perma-orange already. Disabled that, and repushed this:

https://hg.mozilla.org/integration/mozilla-inbound/rev/5d9b7601850c
https://hg.mozilla.org/mozilla-central/rev/5d9b7601850c
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in before you can comment on or make changes to this bug.