Last Comment Bug 592833 - Reduce thread usage of media elements
: Reduce thread usage of media elements
Status: RESOLVED FIXED
[MemShrink:P1]
:
Product: Core
Classification: Components
Component: Audio/Video (show other bugs)
: Trunk
: All All
: -- critical with 2 votes (vote)
: mozilla8
Assigned To: Chris Pearce (:cpearce)
:
Mentors:
: 628967 (view as bug list)
Depends on: 816041 580531 671978 682593
Blocks: 557423 657115 598870 643171 668449 668973
  Show dependency treegraph
 
Reported: 2010-09-01 13:37 PDT by Benoit Jacob [:bjacob] (mostly away)
Modified: 2014-04-25 15:18 PDT (History)
38 users (show)
mounir: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Backtraces for all threads when crash occurs (481.88 KB, text/plain)
2010-09-01 13:37 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
tracing thread creation (47.74 KB, text/plain)
2010-09-01 13:38 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
Patch 1 v1: Split thread lifetime control (17.48 KB, patch)
2011-06-19 20:35 PDT, Chris Pearce (:cpearce)
roc: review+
Details | Diff | Splinter Review
Patch 2 v1: Move metadata decoding to decode thread (23.36 KB, patch)
2011-06-19 20:36 PDT, Chris Pearce (:cpearce)
roc: review+
Details | Diff | Splinter Review
Patch 3 v1: Move seeking over to the decode thread (36.00 KB, patch)
2011-06-19 20:37 PDT, Chris Pearce (:cpearce)
roc: review+
Details | Diff | Splinter Review
Patch 4 v1: Remove reader monitor (27.32 KB, patch)
2011-06-19 20:38 PDT, Chris Pearce (:cpearce)
roc: review+
Details | Diff | Splinter Review
Patch 5 v1: Move mAudioStream creation to audio thread (25.48 KB, patch)
2011-06-19 20:39 PDT, Chris Pearce (:cpearce)
no flags Details | Diff | Splinter Review
Patch 6 v1: Run state machine in events (40.74 KB, patch)
2011-06-19 20:40 PDT, Chris Pearce (:cpearce)
no flags Details | Diff | Splinter Review
Patch 7 v1: Unify state machine threads (53.18 KB, patch)
2011-06-19 20:41 PDT, Chris Pearce (:cpearce)
no flags Details | Diff | Splinter Review
Patch 8 v1: Shutdown idle decode threads (18.54 KB, patch)
2011-06-19 20:42 PDT, Chris Pearce (:cpearce)
no flags Details | Diff | Splinter Review
Patch 9 v1: Update comments/documentation (22.81 KB, patch)
2011-06-21 16:04 PDT, Chris Pearce (:cpearce)
no flags Details | Diff | Splinter Review
Patch 1 v2: Split thread lifetime control (17.45 KB, patch)
2011-06-29 16:27 PDT, Chris Pearce (:cpearce)
cpearce: review+
Details | Diff | Splinter Review
Patch 2 v2: Move metadata decoding to decode thread (22.97 KB, patch)
2011-06-29 16:28 PDT, Chris Pearce (:cpearce)
cpearce: review+
Details | Diff | Splinter Review
Patch 3 v2: Move seeking over to the decode thread (35.91 KB, patch)
2011-06-29 16:29 PDT, Chris Pearce (:cpearce)
cpearce: review+
Details | Diff | Splinter Review
Patch 4 v2: Remove reader monitor (27.35 KB, patch)
2011-06-29 16:30 PDT, Chris Pearce (:cpearce)
cpearce: review+
Details | Diff | Splinter Review
Patch 5 v2: Move mAudioStream creation to audio thread (25.48 KB, patch)
2011-06-29 16:33 PDT, Chris Pearce (:cpearce)
roc: review+
Details | Diff | Splinter Review
Patch 6 v2: Run state machine in events (41.33 KB, patch)
2011-06-29 16:36 PDT, Chris Pearce (:cpearce)
no flags Details | Diff | Splinter Review
Patch 6 v2 -w: Run state machine in events (28.77 KB, patch)
2011-06-29 16:39 PDT, Chris Pearce (:cpearce)
no flags Details | Diff | Splinter Review
Patch 7 v2: Unify state machine threads (51.36 KB, patch)
2011-06-29 16:42 PDT, Chris Pearce (:cpearce)
no flags Details | Diff | Splinter Review
Patch 8 v2: Shutdown idle decode threads (18.56 KB, patch)
2011-06-29 16:43 PDT, Chris Pearce (:cpearce)
roc: review+
Details | Diff | Splinter Review
Patch 9 v2: Update comments/documentation (22.73 KB, patch)
2011-06-29 16:45 PDT, Chris Pearce (:cpearce)
roc: review+
Details | Diff | Splinter Review
Patch 6 v3: Run state machine in events (41.37 KB, patch)
2011-06-30 15:20 PDT, Chris Pearce (:cpearce)
no flags Details | Diff | Splinter Review
Patch 6 v3 -w: Run state machine in events (28.81 KB, patch)
2011-06-30 15:21 PDT, Chris Pearce (:cpearce)
roc: review+
Details | Diff | Splinter Review
Patch 7 v3: Unify state machine threads (51.76 KB, patch)
2011-06-30 15:22 PDT, Chris Pearce (:cpearce)
roc: review+
Details | Diff | Splinter Review
Patch 10: Don't hold decoder monitor while audio stream does sync dispatch to main thread (on Android). (8.02 KB, patch)
2011-07-11 20:19 PDT, Chris Pearce (:cpearce)
roc: review+
Details | Diff | Splinter Review

Description Benoit Jacob [:bjacob] (mostly away) 2010-09-01 13:37:33 PDT
Created attachment 471264 [details]
Backtraces for all threads when crash occurs

Hi,
This is coming from bug 557423. We are trying to understand why the Javascript port of Quake2 is running slowly and sometimes crashing. It's a page with a lot of <audio> elements.

It turns out to have hundreds of threads running concurrently.

Attaching 2 files:
 
 1. backtraces for all threads when a crash occurs (does not occur everytime). Thread 1 is the active thread. The absence of debug info for it suggests that it is JITted code.

 2. tracing thread creation. I just let firefox run until it started creating an unusually high number of threads, interrupted, set a breakpoint on nsThread::Init, got some backtraces.
Comment 1 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 13:38:03 PDT
Created attachment 471265 [details]
tracing thread creation
Comment 2 Chris Pearce (:cpearce) 2010-09-01 14:10:49 PDT
We currently use 3 threads per media element:
1. The decode thread, which decodes ahead of the current playback position.
3. State machine thread, which manages what state we're in, and pushes decoded
frames off to the layers system.
3. Audio 'push' thread, which pushes audio samples to the audio hardware.

Linux's pulseaudio also creates a thread for each audio element.

We need the decode on a separate thread for performance reasons.

We're planing of rewriting the audio subsystem so that it's callback based,
this will remove the audio thread (but not pulseaudio's thread on Linux I
think).

We're doing YUV conversion upon paint in the layers system right? So the state
machine thread is really only setting the imagedata on the video element and
managing time updates? If so it may be possible to refactor this thread away,
but it would be a major refactoring though. Or perhaps we can pool the state machine thread across multiple media elements, as Brendan suggested.
Comment 3 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2010-11-24 19:16:42 PST
We chatted about the relationship between this bug and bug 598870.  IPC will require that a thread which has access to decoded frames+audio run an event loop, and it sounds like the new single state-machine thread will.  Some other things that should make remoting easier
 - have logically distinct decode and playback thread pools.  For single-process FF, they would be the same pool.  For multi-process, the decode pool would live in content and the playback thread in the compositor.
 - so far, we've set things up so that all large allocations are done in content so that it bears most of the OOM risk.  So when we have to do YUV->RGB conversion for basic layers, it would be nice to farm out the conversion task to the decode pool that would live in content, then forward converted frames to the compositor.  It sounded like this was pretty close to how things work currently.

Does that sound reasonable?
Comment 4 Chris Pearce (:cpearce) 2010-11-24 19:25:31 PST
Yup, sounds reasonable.
Comment 5 Chris Pearce (:cpearce) 2011-03-17 21:25:41 PDT
The plan of attack is to remove all operations that can block off the state machine thread, and fold all state machine threads into a single thread whose main purpose is to (basically) do the AdvanceFrame() logic for every media element at the appropriate time.

Specifically:

1. Move the metadata decoding to the decode thread.
2. Move the seeking implementation to the decode thread. We may then be able to remove the reader monitor.
3. We may need to move the nsAudioStream creation and destruction off the state machine thread. I think we an leave it there, but I need to verify that. When bug 623444 lands, we'll be able to remove the audio push thread per element, and then can probably change to creating the audio stream when playback starts and destroying it when either audio playback ends, or when we destroy the media element.
4. Factor out the DECODING, BUFFERING, COMPLETED, and SHUTDOWN states from nsBuiltinDecoderStateMachine::Run() into their own events, which we can schedule using timers on the state machine thread instead of having the thread run as a loop.
5. Change to using a single state machine thread which services all active media elements. This will reduce our thread usage by one third. This thread will then be responsible for advancing the current time, setting the current image on the image container at appropriate times, and triggering invalidation when it does so. Note YCbCr->RGB conversion (for the non HWACCEL case) will still happen on the decode thread.
6. Change to shutdown the decode thread when playback is paused and decode queues are full.

We'll remain with one thread for decoding per media element, and consider changing that in future if necessary.
Comment 6 Timothy B. Terriberry (:derf) 2011-03-17 21:43:07 PDT
(In reply to comment #5)
> when it does so. Note YCbCr->RGB conversion (for the non HWACCEL case) will
> still happen on the decode thread.

It may be worth thinking about how to do this in a separate thread from decode (which I realize is the opposite of the stated intent of this bug). It's probably the easiest parallelization win you can get, and it'll become increasingly important on mobile as more dual-core devices become available if we don't get our HWACCEL story together there.
Comment 7 cajbir (:cajbir) 2011-03-30 19:43:29 PDT
*** Bug 628967 has been marked as a duplicate of this bug. ***
Comment 8 K. Gadd (:kael) 2011-06-14 13:51:31 PDT
In the short term, it might be worthwhile to adjust the stack size ( using an API like http://msdn.microsoft.com/en-us/library/ms682453%28v=vs.85%29.aspx on windows, not sure if this can be done on other OSes ) for these threads so that they consume less address space. Doing that might provide improvements on bug 657115 for minimal risk.
Comment 9 Chris Pearce (:cpearce) 2011-06-14 15:10:41 PDT
I'm almost ready to post a patch to merge all the state machine threads into one, which will cut our thread usage per media element by 1/3 on OSX and Windows, and 1/4 on Linux. It should be ready for review this or next week depending on how shaking out the last few bugs goes...

Reducing the thread stack size is a good idea, as we reserve 10MB per thread stack on Linux, which kills us on 32bit Linux. IIRC, we use 1MB thread stacks on Windows and OSX, so the problem isn't so pronounced there, but we can probably still make some easy gains by reducing stack size there anyway.
Comment 10 Nicholas Nethercote [:njn] 2011-06-14 17:16:44 PDT
(In reply to comment #9)
> I'm almost ready to post a patch to merge all the state machine threads into
> one, which will cut our thread usage per media element by 1/3 on OSX and
> Windows, and 1/4 on Linux.

Great!

> Reducing the thread stack size is a good idea, as we reserve 10MB per thread
> stack on Linux, which kills us on 32bit Linux. IIRC, we use 1MB thread
> stacks on Windows and OSX, so the problem isn't so pronounced there, but we
> can probably still make some easy gains by reducing stack size there anyway.

I filed bug 664341 for this.
Comment 11 Chris Pearce (:cpearce) 2011-06-19 20:35:57 PDT
Created attachment 540383 [details] [diff] [review]
Patch 1 v1: Split thread lifetime control

* Splits control of lifetime of audio and decode threads up, so they can be controlled separately.
Comment 12 Chris Pearce (:cpearce) 2011-06-19 20:36:50 PDT
Created attachment 540384 [details] [diff] [review]
Patch 2 v1: Move metadata decoding to decode thread

* Moves metadata decoding over to the decode thread.
Comment 13 Chris Pearce (:cpearce) 2011-06-19 20:37:33 PDT
Created attachment 540385 [details] [diff] [review]
Patch 3 v1: Move seeking over to the decode thread

* Moves seeking over to the decode thread.
Comment 14 Chris Pearce (:cpearce) 2011-06-19 20:38:30 PDT
Created attachment 540386 [details] [diff] [review]
Patch 4 v1: Remove reader monitor

* Now that all the data which the reader monitor protects is only used on the decode thread, remove the reader monitor.
Comment 15 Chris Pearce (:cpearce) 2011-06-19 20:39:28 PDT
Created attachment 540387 [details] [diff] [review]
Patch 5 v1: Move mAudioStream creation to audio thread

* Destroying the audio stream on the state machine thread can block waiting on acquiring the audio monitor. With a shared state machine, we can't afford this, so move the creation and usage of audio stream over to the audio thread. This enables us to remove the audio monitor as well, provided we always acquire the decoder monitor whenever we create or destroy the audio stream on the audio thread, or whenever we use it off the audio thread. As long as we hold the decoder monitor on the state machine thread, we know the audio stream won't be destroyed while we're using it.
Comment 16 Chris Pearce (:cpearce) 2011-06-19 20:40:25 PDT
Created attachment 540388 [details] [diff] [review]
Patch 6 v1: Run state machine in events

* Refactor nsBuiltinDecoderStateMachine::Run() to run as a series of events rather than as a single synchronous loop.
* Instead of doing a NotifyAll() on the decoder monitor to wake up the state machine, we schedule an event to run instead.
Comment 17 Chris Pearce (:cpearce) 2011-06-19 20:41:16 PDT
Created attachment 540389 [details] [diff] [review]
Patch 7 v1: Unify state machine threads

* Unify the state machine threads, so that all state machine events runners run on a single thread.
* Each nsBuiltinDecoderStateMachine holds a refcount to the state machine thread, so the thread should be destroyed when the last state machine is.
* We have to be very careful to ensure that we don't run more than one event for any given state machine at once, else we'll get inconsistent state.
* The state machine holds a reference to the decoder, so that we can guarantee the decoder stays alive long enough for the state machine to shut down. This creates a cycle which we break when we shutdown.
Comment 18 Chris Pearce (:cpearce) 2011-06-19 20:42:05 PDT
Created attachment 540391 [details] [diff] [review]
Patch 8 v1: Shutdown idle decode threads

* Destroy decode thread when it's decode buffers are full, and playback is paused.
* Hold off creating the audio thread until playback begins.
* Note we can't shutdown the audio thread when paused yet, as if we do so we'll end up discarding the audio which has been written to the nsAudioStream but not played yet. We should be able to refactor again to achieve this when libcubeb lands.
Comment 19 Chris Pearce (:cpearce) 2011-06-19 20:44:11 PDT
In general mochitests should pass at every patch in the queue, though I've not tried each patch individually on tryserver, only with them all applied.

Tests are green on Try with all patches applied:
http://tbpl.mozilla.org/?tree=Try&rev=01f17e830017
Comment 20 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-19 21:17:37 PDT
Comment on attachment 540383 [details] [diff] [review]
Patch 1 v1: Split thread lifetime control

Review of attachment 540383 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 21 cajbir (:cajbir) 2011-06-19 23:14:16 PDT
Which patch updates the documentation in nsBuiltinDecoder.h so I can get an overview of how the new threading system works?
Comment 22 Chris Pearce (:cpearce) 2011-06-20 14:59:21 PDT
(In reply to comment #21)
> Which patch updates the documentation in nsBuiltinDecoder.h so I can get an
> overview of how the new threading system works?

The patch that I'll write and post this afternoon updates the documentation. ;)

In general I followed the plan I outlined in comment 5.
Comment 23 Chris Pearce (:cpearce) 2011-06-21 16:04:02 PDT
Created attachment 540921 [details] [diff] [review]
Patch 9 v1: Update comments/documentation

Updates comments in nsBuiltinDecoder*.h to match new architecture.
Comment 24 antistress 2011-06-26 16:38:32 PDT
isn't that bug related ?
Bug 617852 - Excessively large mmapped regions created by PulseAudio for each active stream
Comment 25 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-26 20:39:31 PDT
Comment on attachment 540384 [details] [diff] [review]
Patch 2 v1: Move metadata decoding to decode thread

Review of attachment 540384 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 26 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-26 20:59:40 PDT
Comment on attachment 540385 [details] [diff] [review]
Patch 3 v1: Move seeking over to the decode thread

Review of attachment 540385 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 27 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-26 21:01:21 PDT
Comment on attachment 540386 [details] [diff] [review]
Patch 4 v1: Remove reader monitor

Review of attachment 540386 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 28 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-26 21:07:58 PDT
Comment on attachment 540387 [details] [diff] [review]
Patch 5 v1: Move mAudioStream creation to audio thread

Review of attachment 540387 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +692,5 @@
>    }
>    return mReader->Init(cloneReader);
>  }
>  
> +void nsBuiltinDecoderStateMachine::StopPlayback()

StopPlayback doesn't cause the audio stream to be recreated anymore. If we start playing, then seek and resume playing, what causes the stream's pending samples to be flushed, now that we're not recreating the audio stream?
Comment 29 Chris Pearce (:cpearce) 2011-06-26 21:16:38 PDT
We call StopAudioThread() in DecodeSeek(), which destroys the old audio thread/stream. We'll start playing when the state machine thread runs in DECODING state again.
Comment 30 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-26 21:55:20 PDT
Comment on attachment 540388 [details] [diff] [review]
Patch 6 v1: Run state machine in events

Review of attachment 540388 [details] [diff] [review]:
-----------------------------------------------------------------

I think this patch would be much easier to review if it was diffed with -w.

::: content/media/nsBuiltinDecoderStateMachine.h
@@ +411,5 @@
>    // to call.
>    void DecodeThreadRun();
>  
> +  nsresult ScheduleStateMachine();
> +  nsresult ScheduleStateMachine(PRInt64 aUsecs);

Need to document what this does, and especially what aUsecs does.
Comment 31 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-26 22:30:35 PDT
Comment on attachment 540389 [details] [diff] [review]
Patch 7 v1: Unify state machine threads

Review of attachment 540389 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +183,5 @@
> +  NS_ASSERTION(NS_IsMainThread(), "Should be on main thread.");
> +  // Equivalent to MOZ_COUNT_CTOR
> +  NS_LogCtor((void*)gStateMachineThread,
> +              "MediaStateMachineThread",
> +              sizeof(*gStateMachineThread));

I think this is unnecessary. Threads are already tracked, and this is confusing since the thread object has its own refcount internally. Instead of this refcount, I would just have a global counter of the number of nsBuiltinDecoderStateMachine instances, tracked in the nsBuiltinDecoderStateMachine constructor/destructor, and have CreateStateMachineThread/DestroyStateMachineThread methods that get called when the counter rises above zero and reaches zero.

@@ +1038,5 @@
>      }
> +    // Equivalent to MOZ_COUNT_CTOR
> +    NS_LogCtor((void*)mDecodeThread,
> +               "MediaDecodeThread",
> +               sizeof(mDecodeThread));

These naked NS_LogCtors are scary! Are they really needed? They're confusing since MediaDecodeThread is not a class.

@@ +1856,5 @@
> +  mIsRunningStateMachine = PR_FALSE;
> +
> +  if (mRunStateMachineAgain && !mDispatchedStateMachineRun) {
> +    mDispatchedStateMachineRun = PR_TRUE;
> +    return NS_DispatchToCurrentThread(this);

Is mRunStateMachineAgain really needed? Why not have ScheduleStateMachine just dispatch this event and set mDispatchedStateMachineRun to true?

::: content/media/nsBuiltinDecoderStateMachine.h
@@ +219,5 @@
> + 
> +  // The decoder object that created this state machine. We hold a strong
> +  // reference to the decoder to ensure it stays alive after it's been shutdown
> +  // by the media element so that the state machine to keep using the decoder's
> +  // monitor until the state machine has finished shutdown. This is accessed

Grammar error somewhere in this sentence

@@ +579,5 @@
>    PRPackedBool mQuickBuffering;
>  
> +  // PR_TRUE if the shared state machine thread is currently running this
> +  // state machine.
> +  PRPackedBool mIsRunningStateMachine;

Just mIsRunning.

@@ +583,5 @@
> +  PRPackedBool mIsRunningStateMachine;
> +
> +  // PR_TRUE if we should run the state machine again once the current
> +  // state machine run has finished.
> +  PRPackedBool mRunStateMachineAgain;

mRunAgain. Document that this doesn't cause us to run again immediately, just post another event to run this state machine later.

@@ +590,5 @@
> +  // imperative that we don't dispatch multiple events to run the state
> +  // machine at the same time, as our code assume all events are synchronous.
> +  // If we dispatch multiple events, the second event can run while the
> +  // first is shutting down a thread, causing inconsistent state.
> +  PRPackedBool mDispatchedStateMachineRun;

mDispatchedRunEvent?
Comment 32 Chris Pearce (:cpearce) 2011-06-26 22:37:27 PDT
(In reply to comment #30)
> Comment on attachment 540388 [details] [diff] [review] [review]
> Patch 6 v1: Run state machine in events
> 
> Review of attachment 540388 [details] [diff] [review] [review]:
> -----------------------------------------------------------------
> 
> I think this patch would be much easier to review if it was diffed with -w.
> 
> ::: content/media/nsBuiltinDecoderStateMachine.h
> @@ +411,5 @@
> >    // to call.
> >    void DecodeThreadRun();
> >  
> > +  nsresult ScheduleStateMachine();
> > +  nsresult ScheduleStateMachine(PRInt64 aUsecs);
> 
> Need to document what this does, and especially what aUsecs does.

I add a comment for ScheduleStateMachine() in the next patch.
Comment 33 Chris Pearce (:cpearce) 2011-06-26 23:03:31 PDT
(In reply to comment #31)
> Comment on attachment 540389 [details] [diff] [review] [review]
> Patch 7 v1: Unify state machine threads
> 
> Review of attachment 540389 [details] [diff] [review] [review]:
> -----------------------------------------------------------------
> 
> ::: content/media/nsBuiltinDecoderStateMachine.cpp
> @@ +183,5 @@

> Instead
> of this refcount, I would just have a global counter of the number of
> nsBuiltinDecoderStateMachine instances, tracked in the
> nsBuiltinDecoderStateMachine constructor/destructor, and have
> CreateStateMachineThread/DestroyStateMachineThread methods that get called
> when the counter rises above zero and reaches zero.

Ah yes, I can do this now. In a previous iteration I needed to factor the release code out, but I don't need to now.

> 
> @@ +1038,5 @@
> >      }
> > +    // Equivalent to MOZ_COUNT_CTOR
> > +    NS_LogCtor((void*)mDecodeThread,
> > +               "MediaDecodeThread",
> > +               sizeof(mDecodeThread));
> 
> These naked NS_LogCtors are scary! Are they really needed? They're confusing
> since MediaDecodeThread is not a class.

They make it easy to distinguish *which* thread is leaking. I found this useful.

I added this because on my system the video mochitests leak an nsIThread (and a bunch of other stuff) about 40% of the time anyway (with or without my patches) so there's no way to distinguish that nsIThread from the one of the decode threads (or the state machine thread for that matter) unless I do this. Is there a better way to distinguish leaking objects?

> 
> @@ +1856,5 @@
> > +  mIsRunningStateMachine = PR_FALSE;
> > +
> > +  if (mRunStateMachineAgain && !mDispatchedStateMachineRun) {
> > +    mDispatchedStateMachineRun = PR_TRUE;
> > +    return NS_DispatchToCurrentThread(this);
> 
> Is mRunStateMachineAgain really needed? Why not have ScheduleStateMachine
> just dispatch this event and set mDispatchedStateMachineRun to true?

If we have more than one event to run a state machine dispatched at the same time, and the first event shuts down one of the helper threads, that event's execution will block, but the other event can still run. The other event will put the state machine into an inconsistent state. Ensuring that only one event is active for a state machine at one time ensures that our logic can assume everything for that state machine is synchronous, which is simpler.
Comment 34 Nicholas Nethercote [:njn] 2011-06-26 23:10:44 PDT
> I added this because on my system the video mochitests leak an nsIThread
> (and a bunch of other stuff) about 40% of the time anyway (with or without
> my patches)

Is there a bug filed about that?
Comment 35 Chris Pearce (:cpearce) 2011-06-29 16:27:29 PDT
Created attachment 542992 [details] [diff] [review]
Patch 1 v2: Split thread lifetime control

Patch 1 unbitrotten.
Comment 36 Chris Pearce (:cpearce) 2011-06-29 16:28:28 PDT
Created attachment 542993 [details] [diff] [review]
Patch 2 v2: Move metadata decoding to decode thread

Unbitrotten.
Comment 37 Chris Pearce (:cpearce) 2011-06-29 16:29:53 PDT
Created attachment 542994 [details] [diff] [review]
Patch 3 v2: Move seeking over to the decode thread

Unbitrotten...
Comment 38 Chris Pearce (:cpearce) 2011-06-29 16:30:51 PDT
Created attachment 542995 [details] [diff] [review]
Patch 4 v2: Remove reader monitor

Unbitrotten
Comment 39 Chris Pearce (:cpearce) 2011-06-29 16:33:36 PDT
Created attachment 542996 [details] [diff] [review]
Patch 5 v2: Move mAudioStream creation to audio thread

Patch 5 unbitrotten. Roc, see my comment 29 for a response to the question you asked about this patch. Why doesn't bugzilla add those who post review comments to the CC list?!?
Comment 40 Chris Pearce (:cpearce) 2011-06-29 16:36:26 PDT
Created attachment 542997 [details] [diff] [review]
Patch 6 v2: Run state machine in events

Patch 6 unbitrotten. I'll post an `diff -w` patch for easier review.
Comment 41 Chris Pearce (:cpearce) 2011-06-29 16:39:12 PDT
Created attachment 542998 [details] [diff] [review]
Patch 6 v2 -w: Run state machine in events

Patch 6 v2, `diff -w` for easier review. I removed the COUNT_CTOR equivalence, it's not precisely what I want, it doesn't track the object's lifetime, just our reference to it, which isn't actually that useful.
Comment 42 Chris Pearce (:cpearce) 2011-06-29 16:42:05 PDT
Created attachment 542999 [details] [diff] [review]
Patch 7 v2: Unify state machine threads

Patch 7 unbitrotten, and review comments addressed.
Comment 43 Chris Pearce (:cpearce) 2011-06-29 16:43:39 PDT
Created attachment 543000 [details] [diff] [review]
Patch 8 v2: Shutdown idle decode threads

Patch 8 unbitrotten.
Comment 44 Chris Pearce (:cpearce) 2011-06-29 16:45:01 PDT
Created attachment 543002 [details] [diff] [review]
Patch 9 v2: Update comments/documentation

Patch 9 unbitrotten. Yes, it was bitrotted. That's a real word. Look it up.
Comment 45 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-29 17:34:35 PDT
Comment on attachment 542996 [details] [diff] [review]
Patch 5 v2: Move mAudioStream creation to audio thread

Review of attachment 542996 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 46 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-29 17:48:11 PDT
Comment on attachment 543002 [details] [diff] [review]
Patch 9 v2: Update comments/documentation

Review of attachment 543002 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 47 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-29 17:55:15 PDT
Comment on attachment 542998 [details] [diff] [review]
Patch 6 v2 -w: Run state machine in events

Review of attachment 542998 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoder.cpp
@@ +252,5 @@
>                 "Must have state machine to start state machine thread");
>    if (mStateMachineThread) {
>      return NS_OK;
>    }
> +  nsresult res = NS_NewThread(getter_AddRefs(mStateMachineThread));

We're probably going to want a thread pool here, but this'll do for now.

::: content/media/nsBuiltinDecoder.h
@@ +570,5 @@
>    // Ensures the state machine thread is running, starting a new one
>    // if necessary.
>    nsresult StartStateMachineThread();
>  
> +  nsresult CreateStateMachineThread();

Need comments to explain the difference between this and StartStateMachineThread, and why both are needed.

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +1678,5 @@
>    stream->Unpin();
>    return res;
>  }
> +
> +void RunStateMachine(nsITimer *aTimer, void *aClosure) {

static?

@@ +1714,5 @@
> +  // Ensure the state machine thread is alive; we'll be running on it!
> +  nsresult res = mDecoder->CreateStateMachineThread();
> +  if (NS_FAILED(res)) return res;
> +
> +  nsCOMPtr<nsIEventTarget> target = do_QueryInterface(mDecoder->mStateMachineThread);

Why do you need a do_QI here? An implicit static cast should work fine.
Comment 48 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-29 19:55:18 PDT
Comment on attachment 542999 [details] [diff] [review]
Patch 7 v2: Unify state machine threads

Review of attachment 542999 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +223,4 @@
>  {
>    MOZ_COUNT_CTOR(nsBuiltinDecoderStateMachine);
> +  NS_ASSERTION(NS_IsMainThread(), "Should be on main thread.");
> +  if (gStateMachineRefCount == 0) {

Call it gStateMachineCount.

@@ +1826,5 @@
> +   if (mIsRunning) {
> +     mRunAgain = PR_TRUE;
> +   } else if (!mDispatchedRunEvent) {
> +     mDispatchedRunEvent = PR_TRUE;
> +    gStateMachineThread->Dispatch(this, NS_DISPATCH_NORMAL);

Why not just run an iteration of the state machine right here now?
Comment 49 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-29 20:03:42 PDT
Comment on attachment 543000 [details] [diff] [review]
Patch 8 v2: Shutdown idle decode threads

Review of attachment 543000 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoderStateMachine.h
@@ +600,5 @@
>    // If we dispatch multiple events, the second event can run while the
>    // first is shutting down a thread, causing inconsistent state.
>    PRPackedBool mDispatchedRunEvent;
>  
> +  // PR_TRUE if the decode thread has gone filled its buffers and is now

"has gone filled its buffers"?
Comment 50 Chris Pearce (:cpearce) 2011-06-30 15:20:14 PDT
Created attachment 543278 [details] [diff] [review]
Patch 6 v3: Run state machine in events

Patch 6 with review comments addressed.
Comment 51 Chris Pearce (:cpearce) 2011-06-30 15:21:13 PDT
Created attachment 543279 [details] [diff] [review]
Patch 6 v3 -w: Run state machine in events

Patch 6 `diff -w` for easier review.
Comment 52 Chris Pearce (:cpearce) 2011-06-30 15:22:17 PDT
Created attachment 543280 [details] [diff] [review]
Patch 7 v3: Unify state machine threads

Patch 7 with review comments addressed.
Comment 53 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-30 15:30:22 PDT
Comment on attachment 543279 [details] [diff] [review]
Patch 6 v3 -w: Run state machine in events

Review of attachment 543279 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 54 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-06-30 15:33:11 PDT
Comment on attachment 543280 [details] [diff] [review]
Patch 7 v3: Unify state machine threads

Review of attachment 543280 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 55 cajbir (:cajbir) 2011-07-02 23:05:45 PDT
The patch 6 and patch 6 -w versions seem to to result in different nesting of some closing '}' amongst other things. Which is the correct patch that should be applied? For example, see the 'case DECODER_STATE_DECODING_METADATA' in nsBuiltinDecoderStateMachine::Run. In the '-w' version the closing '}' for the case is nested incorrectly.
Comment 56 Chris Pearce (:cpearce) 2011-07-03 12:20:08 PDT
I will be landing "patch 6". Roc asked me to upload a diff -w patch to make it easier for him to review. diff -w patches don't include whitespace changes, which is why you're not seeing correct indentation in some instances with the -w patch applied.
Comment 57 Benoit Jacob [:bjacob] (mostly away) 2011-07-04 14:41:08 PDT
*** Bug 669206 has been marked as a duplicate of this bug. ***
Comment 59 Chris Pearce (:cpearce) 2011-07-06 00:25:59 PDT
Backed out of inbound due to suspected android crashtests permaorange.

http://hg.mozilla.org/integration/mozilla-inbound/rev/e95d8bef125f
http://hg.mozilla.org/integration/mozilla-inbound/rev/fcf2cf152e13


This will be fun to solve.
http://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=04c5bfe94282
Comment 60 :Ms2ger (⌚ UTC+1/+2) 2011-07-06 07:10:35 PDT
Btw, the style guide demands "rv" rather than "res" for nsresults.
Comment 61 Benoit Jacob [:bjacob] (mostly away) 2011-07-06 07:26:35 PDT
There's something I still don't understand about the general approach here. If I understand correctly, these patches reduce the number of threads per media element to one, but that's still one thread per media element so if a page has 300 media elements, that's 300 threads. What's the point of using 300 threads on a 4-core machine? Wouldn't it be better to limit the number of concurrent media threads to the number of logical CPUs and use a queue?
Comment 62 Chris Pearce (:cpearce) 2011-07-06 14:12:05 PDT
Yes, an excellent question, ideally we'd use a thread pool for the decoders. But one step at a time eh?

With these patches we now have one less thread per media element, and we shutdown threads while we're not decoding (basically when not playing). But we still have 2 threads per *playing* media element (3 on Linux). We should be able to get down to one thread per element once bug 623444 has landed.

The hard part is our decoders assume they run synchronously. But decoders can block while reading data when they run out of downloaded data to decode. Their reads will wait for some data to download before returning and continuing. If a decoder's download stalls, such a blocked read could take a long time to return, so in that case we need to switch decode to another decoder. Then the question becomes, how can we switch to another decoder safely? Do we just call the other decoder (conceptually) on top of the existing call stack of the previous decode? What if that decode also blocks, how do we get back to the first decode? What if a malicious website creates enough videos with blocking connections that we end up running out of stack space?

These are hard questions, which I don't yet have a good enough answer for.

One option is to change our underlying decode to being non-blocking, but that would require us to mutilate libnestegg (our WebM demuxer), so we'd need to consider it carefully. Regardless, we should consider that in another bug.
Comment 63 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-07-06 16:48:33 PDT
I think once we get down to one thread per playing element, we'll be in a reasonable good place. We can make further progress relatively easily by giving up the decoder thread when our buffers are full. As Chris says, getting beyond that --- to handle cases where a lot of elements are playing, but stalled --- is hard, but hopefully that isn't the common case, because it would mean the user is getting a terrible experience already.
Comment 64 Chris Pearce (:cpearce) 2011-07-06 20:45:30 PDT
(In reply to comment #59)
> Backed out of inbound due to suspected android crashtests permaorange.

The crashtest hang was introduced in "Patch 5 v2: Move mAudioStream creation to audio thread".
Comment 65 Chris Pearce (:cpearce) 2011-07-11 16:05:36 PDT
The crashtest which is timing out on Android is content/media/tests/crashtests/481136-1.html

http://tbpl.mozilla.org/?tree=Try&rev=942ebc7636c3
Comment 66 Chris Pearce (:cpearce) 2011-07-11 16:18:11 PDT
content/media/test/crashtests/459439-1.html is timing out too.

http://tbpl.mozilla.org/?tree=Try&rev=080d2c0c9eec
Comment 67 Chris Pearce (:cpearce) 2011-07-11 20:19:50 PDT
Created attachment 545313 [details] [diff] [review]
Patch 10: Don't hold decoder monitor while audio stream does sync dispatch to main thread (on Android).

The problem is that I was holding the decoder monitor while calling certain audio stream methods. On Android some nsAudioStreamRemote methods do synchronous event dispatch to the main thread, and the main thread sometimes needs to acquire the decoder monitor, which can cause a deadlock.

This patch fixes that...
http://tbpl.mozilla.org/?tree=Try&rev=f689e1455cfc
Comment 68 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-07-11 20:24:44 PDT
Comment on attachment 545313 [details] [diff] [review]
Patch 10: Don't hold decoder monitor while audio stream does sync dispatch to main thread (on Android).

Review of attachment 545313 [details] [diff] [review]:
-----------------------------------------------------------------

Note You need to log in before you can comment on or make changes to this bug.