Closed Bug 592833 Opened 14 years ago Closed 13 years ago

Reduce thread usage of media elements

Categories

(Core :: Audio/Video, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla8

People

(Reporter: bjacob, Assigned: cpearce)

References

Details

(Whiteboard: [MemShrink:P1])

Attachments

(13 files, 12 obsolete files)

481.88 KB, text/plain
Details
47.74 KB, text/plain
Details
17.45 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
22.97 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
35.91 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
27.35 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
25.48 KB, patch
roc
: review+
Details | Diff | Splinter Review
18.56 KB, patch
roc
: review+
Details | Diff | Splinter Review
22.73 KB, patch
roc
: review+
Details | Diff | Splinter Review
41.37 KB, patch
Details | Diff | Splinter Review
28.81 KB, patch
roc
: review+
Details | Diff | Splinter Review
51.76 KB, patch
roc
: review+
Details | Diff | Splinter Review
8.02 KB, patch
roc
: review+
Details | Diff | Splinter Review
Hi,
This is coming from bug 557423. We are trying to understand why the Javascript port of Quake2 is running slowly and sometimes crashing. It's a page with a lot of <audio> elements.

It turns out to have hundreds of threads running concurrently.

Attaching 2 files:
 
 1. backtraces for all threads when a crash occurs (does not occur everytime). Thread 1 is the active thread. The absence of debug info for it suggests that it is JITted code.

 2. tracing thread creation. I just let firefox run until it started creating an unusually high number of threads, interrupted, set a breakpoint on nsThread::Init, got some backtraces.
Attachment #471265 - Attachment description: tr → tracing thread creation
Component: Web Services → Video/Audio
QA Contact: web-services → video.audio
We currently use 3 threads per media element:
1. The decode thread, which decodes ahead of the current playback position.
3. State machine thread, which manages what state we're in, and pushes decoded
frames off to the layers system.
3. Audio 'push' thread, which pushes audio samples to the audio hardware.

Linux's pulseaudio also creates a thread for each audio element.

We need the decode on a separate thread for performance reasons.

We're planing of rewriting the audio subsystem so that it's callback based,
this will remove the audio thread (but not pulseaudio's thread on Linux I
think).

We're doing YUV conversion upon paint in the layers system right? So the state
machine thread is really only setting the imagedata on the video element and
managing time updates? If so it may be possible to refactor this thread away,
but it would be a major refactoring though. Or perhaps we can pool the state machine thread across multiple media elements, as Brendan suggested.
Summary: Audio-heavy page causes us to spawn hundreds of threads → Reduce thread usage of media elements
Depends on: 580531
We chatted about the relationship between this bug and bug 598870.  IPC will require that a thread which has access to decoded frames+audio run an event loop, and it sounds like the new single state-machine thread will.  Some other things that should make remoting easier
 - have logically distinct decode and playback thread pools.  For single-process FF, they would be the same pool.  For multi-process, the decode pool would live in content and the playback thread in the compositor.
 - so far, we've set things up so that all large allocations are done in content so that it bears most of the OOM risk.  So when we have to do YUV->RGB conversion for basic layers, it would be nice to farm out the conversion task to the decode pool that would live in content, then forward converted frames to the compositor.  It sounded like this was pretty close to how things work currently.

Does that sound reasonable?
Yup, sounds reasonable.
Assignee: nobody → chris
Status: NEW → ASSIGNED
The plan of attack is to remove all operations that can block off the state machine thread, and fold all state machine threads into a single thread whose main purpose is to (basically) do the AdvanceFrame() logic for every media element at the appropriate time.

Specifically:

1. Move the metadata decoding to the decode thread.
2. Move the seeking implementation to the decode thread. We may then be able to remove the reader monitor.
3. We may need to move the nsAudioStream creation and destruction off the state machine thread. I think we an leave it there, but I need to verify that. When bug 623444 lands, we'll be able to remove the audio push thread per element, and then can probably change to creating the audio stream when playback starts and destroying it when either audio playback ends, or when we destroy the media element.
4. Factor out the DECODING, BUFFERING, COMPLETED, and SHUTDOWN states from nsBuiltinDecoderStateMachine::Run() into their own events, which we can schedule using timers on the state machine thread instead of having the thread run as a loop.
5. Change to using a single state machine thread which services all active media elements. This will reduce our thread usage by one third. This thread will then be responsible for advancing the current time, setting the current image on the image container at appropriate times, and triggering invalidation when it does so. Note YCbCr->RGB conversion (for the non HWACCEL case) will still happen on the decode thread.
6. Change to shutdown the decode thread when playback is paused and decode queues are full.

We'll remain with one thread for decoding per media element, and consider changing that in future if necessary.
(In reply to comment #5)
> when it does so. Note YCbCr->RGB conversion (for the non HWACCEL case) will
> still happen on the decode thread.

It may be worth thinking about how to do this in a separate thread from decode (which I realize is the opposite of the stated intent of this bug). It's probably the easiest parallelization win you can get, and it'll become increasingly important on mobile as more dual-core devices become available if we don't get our HWACCEL story together there.
Blocks: 657115
Whiteboard: [MemShrink:P1]
In the short term, it might be worthwhile to adjust the stack size ( using an API like http://msdn.microsoft.com/en-us/library/ms682453%28v=vs.85%29.aspx on windows, not sure if this can be done on other OSes ) for these threads so that they consume less address space. Doing that might provide improvements on bug 657115 for minimal risk.
Whiteboard: [MemShrink:P1]
Whiteboard: [MemShrink:P1]
I'm almost ready to post a patch to merge all the state machine threads into one, which will cut our thread usage per media element by 1/3 on OSX and Windows, and 1/4 on Linux. It should be ready for review this or next week depending on how shaking out the last few bugs goes...

Reducing the thread stack size is a good idea, as we reserve 10MB per thread stack on Linux, which kills us on 32bit Linux. IIRC, we use 1MB thread stacks on Windows and OSX, so the problem isn't so pronounced there, but we can probably still make some easy gains by reducing stack size there anyway.
(In reply to comment #9)
> I'm almost ready to post a patch to merge all the state machine threads into
> one, which will cut our thread usage per media element by 1/3 on OSX and
> Windows, and 1/4 on Linux.

Great!

> Reducing the thread stack size is a good idea, as we reserve 10MB per thread
> stack on Linux, which kills us on 32bit Linux. IIRC, we use 1MB thread
> stacks on Windows and OSX, so the problem isn't so pronounced there, but we
> can probably still make some easy gains by reducing stack size there anyway.

I filed bug 664341 for this.
* Splits control of lifetime of audio and decode threads up, so they can be controlled separately.
Attachment #540383 - Flags: review?(roc)
* Moves metadata decoding over to the decode thread.
Attachment #540384 - Flags: review?(roc)
* Moves seeking over to the decode thread.
Attachment #540385 - Flags: review?(roc)
* Now that all the data which the reader monitor protects is only used on the decode thread, remove the reader monitor.
Attachment #540386 - Flags: review?(roc)
* Destroying the audio stream on the state machine thread can block waiting on acquiring the audio monitor. With a shared state machine, we can't afford this, so move the creation and usage of audio stream over to the audio thread. This enables us to remove the audio monitor as well, provided we always acquire the decoder monitor whenever we create or destroy the audio stream on the audio thread, or whenever we use it off the audio thread. As long as we hold the decoder monitor on the state machine thread, we know the audio stream won't be destroyed while we're using it.
Attachment #540387 - Flags: review?(roc)
* Refactor nsBuiltinDecoderStateMachine::Run() to run as a series of events rather than as a single synchronous loop.
* Instead of doing a NotifyAll() on the decoder monitor to wake up the state machine, we schedule an event to run instead.
Attachment #540388 - Flags: review?(roc)
* Unify the state machine threads, so that all state machine events runners run on a single thread.
* Each nsBuiltinDecoderStateMachine holds a refcount to the state machine thread, so the thread should be destroyed when the last state machine is.
* We have to be very careful to ensure that we don't run more than one event for any given state machine at once, else we'll get inconsistent state.
* The state machine holds a reference to the decoder, so that we can guarantee the decoder stays alive long enough for the state machine to shut down. This creates a cycle which we break when we shutdown.
Attachment #540389 - Flags: review?(roc)
* Destroy decode thread when it's decode buffers are full, and playback is paused.
* Hold off creating the audio thread until playback begins.
* Note we can't shutdown the audio thread when paused yet, as if we do so we'll end up discarding the audio which has been written to the nsAudioStream but not played yet. We should be able to refactor again to achieve this when libcubeb lands.
Attachment #540391 - Flags: review?(roc)
In general mochitests should pass at every patch in the queue, though I've not tried each patch individually on tryserver, only with them all applied.

Tests are green on Try with all patches applied:
http://tbpl.mozilla.org/?tree=Try&rev=01f17e830017
Comment on attachment 540383 [details] [diff] [review]
Patch 1 v1: Split thread lifetime control

Review of attachment 540383 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #540383 - Flags: review?(roc) → review+
Which patch updates the documentation in nsBuiltinDecoder.h so I can get an overview of how the new threading system works?
(In reply to comment #21)
> Which patch updates the documentation in nsBuiltinDecoder.h so I can get an
> overview of how the new threading system works?

The patch that I'll write and post this afternoon updates the documentation. ;)

In general I followed the plan I outlined in comment 5.
Updates comments in nsBuiltinDecoder*.h to match new architecture.
Attachment #540921 - Flags: review?(roc)
isn't that bug related ?
Bug 617852 - Excessively large mmapped regions created by PulseAudio for each active stream
Comment on attachment 540384 [details] [diff] [review]
Patch 2 v1: Move metadata decoding to decode thread

Review of attachment 540384 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #540384 - Flags: review?(roc) → review+
Comment on attachment 540385 [details] [diff] [review]
Patch 3 v1: Move seeking over to the decode thread

Review of attachment 540385 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #540385 - Flags: review?(roc) → review+
Comment on attachment 540386 [details] [diff] [review]
Patch 4 v1: Remove reader monitor

Review of attachment 540386 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #540386 - Flags: review?(roc) → review+
Comment on attachment 540387 [details] [diff] [review]
Patch 5 v1: Move mAudioStream creation to audio thread

Review of attachment 540387 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +692,5 @@
>    }
>    return mReader->Init(cloneReader);
>  }
>  
> +void nsBuiltinDecoderStateMachine::StopPlayback()

StopPlayback doesn't cause the audio stream to be recreated anymore. If we start playing, then seek and resume playing, what causes the stream's pending samples to be flushed, now that we're not recreating the audio stream?
We call StopAudioThread() in DecodeSeek(), which destroys the old audio thread/stream. We'll start playing when the state machine thread runs in DECODING state again.
Comment on attachment 540388 [details] [diff] [review]
Patch 6 v1: Run state machine in events

Review of attachment 540388 [details] [diff] [review]:
-----------------------------------------------------------------

I think this patch would be much easier to review if it was diffed with -w.

::: content/media/nsBuiltinDecoderStateMachine.h
@@ +411,5 @@
>    // to call.
>    void DecodeThreadRun();
>  
> +  nsresult ScheduleStateMachine();
> +  nsresult ScheduleStateMachine(PRInt64 aUsecs);

Need to document what this does, and especially what aUsecs does.
Comment on attachment 540389 [details] [diff] [review]
Patch 7 v1: Unify state machine threads

Review of attachment 540389 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +183,5 @@
> +  NS_ASSERTION(NS_IsMainThread(), "Should be on main thread.");
> +  // Equivalent to MOZ_COUNT_CTOR
> +  NS_LogCtor((void*)gStateMachineThread,
> +              "MediaStateMachineThread",
> +              sizeof(*gStateMachineThread));

I think this is unnecessary. Threads are already tracked, and this is confusing since the thread object has its own refcount internally. Instead of this refcount, I would just have a global counter of the number of nsBuiltinDecoderStateMachine instances, tracked in the nsBuiltinDecoderStateMachine constructor/destructor, and have CreateStateMachineThread/DestroyStateMachineThread methods that get called when the counter rises above zero and reaches zero.

@@ +1038,5 @@
>      }
> +    // Equivalent to MOZ_COUNT_CTOR
> +    NS_LogCtor((void*)mDecodeThread,
> +               "MediaDecodeThread",
> +               sizeof(mDecodeThread));

These naked NS_LogCtors are scary! Are they really needed? They're confusing since MediaDecodeThread is not a class.

@@ +1856,5 @@
> +  mIsRunningStateMachine = PR_FALSE;
> +
> +  if (mRunStateMachineAgain && !mDispatchedStateMachineRun) {
> +    mDispatchedStateMachineRun = PR_TRUE;
> +    return NS_DispatchToCurrentThread(this);

Is mRunStateMachineAgain really needed? Why not have ScheduleStateMachine just dispatch this event and set mDispatchedStateMachineRun to true?

::: content/media/nsBuiltinDecoderStateMachine.h
@@ +219,5 @@
> + 
> +  // The decoder object that created this state machine. We hold a strong
> +  // reference to the decoder to ensure it stays alive after it's been shutdown
> +  // by the media element so that the state machine to keep using the decoder's
> +  // monitor until the state machine has finished shutdown. This is accessed

Grammar error somewhere in this sentence

@@ +579,5 @@
>    PRPackedBool mQuickBuffering;
>  
> +  // PR_TRUE if the shared state machine thread is currently running this
> +  // state machine.
> +  PRPackedBool mIsRunningStateMachine;

Just mIsRunning.

@@ +583,5 @@
> +  PRPackedBool mIsRunningStateMachine;
> +
> +  // PR_TRUE if we should run the state machine again once the current
> +  // state machine run has finished.
> +  PRPackedBool mRunStateMachineAgain;

mRunAgain. Document that this doesn't cause us to run again immediately, just post another event to run this state machine later.

@@ +590,5 @@
> +  // imperative that we don't dispatch multiple events to run the state
> +  // machine at the same time, as our code assume all events are synchronous.
> +  // If we dispatch multiple events, the second event can run while the
> +  // first is shutting down a thread, causing inconsistent state.
> +  PRPackedBool mDispatchedStateMachineRun;

mDispatchedRunEvent?
(In reply to comment #30)
> Comment on attachment 540388 [details] [diff] [review] [review]
> Patch 6 v1: Run state machine in events
> 
> Review of attachment 540388 [details] [diff] [review] [review]:
> -----------------------------------------------------------------
> 
> I think this patch would be much easier to review if it was diffed with -w.
> 
> ::: content/media/nsBuiltinDecoderStateMachine.h
> @@ +411,5 @@
> >    // to call.
> >    void DecodeThreadRun();
> >  
> > +  nsresult ScheduleStateMachine();
> > +  nsresult ScheduleStateMachine(PRInt64 aUsecs);
> 
> Need to document what this does, and especially what aUsecs does.

I add a comment for ScheduleStateMachine() in the next patch.
(In reply to comment #31)
> Comment on attachment 540389 [details] [diff] [review] [review]
> Patch 7 v1: Unify state machine threads
> 
> Review of attachment 540389 [details] [diff] [review] [review]:
> -----------------------------------------------------------------
> 
> ::: content/media/nsBuiltinDecoderStateMachine.cpp
> @@ +183,5 @@

> Instead
> of this refcount, I would just have a global counter of the number of
> nsBuiltinDecoderStateMachine instances, tracked in the
> nsBuiltinDecoderStateMachine constructor/destructor, and have
> CreateStateMachineThread/DestroyStateMachineThread methods that get called
> when the counter rises above zero and reaches zero.

Ah yes, I can do this now. In a previous iteration I needed to factor the release code out, but I don't need to now.

> 
> @@ +1038,5 @@
> >      }
> > +    // Equivalent to MOZ_COUNT_CTOR
> > +    NS_LogCtor((void*)mDecodeThread,
> > +               "MediaDecodeThread",
> > +               sizeof(mDecodeThread));
> 
> These naked NS_LogCtors are scary! Are they really needed? They're confusing
> since MediaDecodeThread is not a class.

They make it easy to distinguish *which* thread is leaking. I found this useful.

I added this because on my system the video mochitests leak an nsIThread (and a bunch of other stuff) about 40% of the time anyway (with or without my patches) so there's no way to distinguish that nsIThread from the one of the decode threads (or the state machine thread for that matter) unless I do this. Is there a better way to distinguish leaking objects?

> 
> @@ +1856,5 @@
> > +  mIsRunningStateMachine = PR_FALSE;
> > +
> > +  if (mRunStateMachineAgain && !mDispatchedStateMachineRun) {
> > +    mDispatchedStateMachineRun = PR_TRUE;
> > +    return NS_DispatchToCurrentThread(this);
> 
> Is mRunStateMachineAgain really needed? Why not have ScheduleStateMachine
> just dispatch this event and set mDispatchedStateMachineRun to true?

If we have more than one event to run a state machine dispatched at the same time, and the first event shuts down one of the helper threads, that event's execution will block, but the other event can still run. The other event will put the state machine into an inconsistent state. Ensuring that only one event is active for a state machine at one time ensures that our logic can assume everything for that state machine is synchronous, which is simpler.
> I added this because on my system the video mochitests leak an nsIThread
> (and a bunch of other stuff) about 40% of the time anyway (with or without
> my patches)

Is there a bug filed about that?
Patch 1 unbitrotten.
Attachment #540383 - Attachment is obsolete: true
Attachment #542992 - Flags: review+
Unbitrotten.
Attachment #540384 - Attachment is obsolete: true
Attachment #542993 - Flags: review+
Unbitrotten...
Attachment #540385 - Attachment is obsolete: true
Attachment #542994 - Flags: review+
Unbitrotten
Attachment #540386 - Attachment is obsolete: true
Attachment #542995 - Flags: review+
Patch 5 unbitrotten. Roc, see my comment 29 for a response to the question you asked about this patch. Why doesn't bugzilla add those who post review comments to the CC list?!?
Attachment #540387 - Attachment is obsolete: true
Attachment #542996 - Flags: review?(roc)
Attachment #540387 - Flags: review?(roc)
Patch 6 unbitrotten. I'll post an `diff -w` patch for easier review.
Attachment #540388 - Attachment is obsolete: true
Attachment #540388 - Flags: review?(roc)
Patch 6 v2, `diff -w` for easier review. I removed the COUNT_CTOR equivalence, it's not precisely what I want, it doesn't track the object's lifetime, just our reference to it, which isn't actually that useful.
Attachment #542998 - Flags: review?(roc)
Patch 7 unbitrotten, and review comments addressed.
Attachment #540389 - Attachment is obsolete: true
Attachment #542999 - Flags: review?(roc)
Attachment #540389 - Flags: review?(roc)
Patch 8 unbitrotten.
Attachment #540391 - Attachment is obsolete: true
Attachment #543000 - Flags: review?(roc)
Attachment #540391 - Flags: review?(roc)
Patch 9 unbitrotten. Yes, it was bitrotted. That's a real word. Look it up.
Attachment #540921 - Attachment is obsolete: true
Attachment #543002 - Flags: review?(roc)
Attachment #540921 - Flags: review?(roc)
Attachment #543002 - Attachment is patch: true
Comment on attachment 542996 [details] [diff] [review]
Patch 5 v2: Move mAudioStream creation to audio thread

Review of attachment 542996 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #542996 - Flags: review?(roc) → review+
Comment on attachment 543002 [details] [diff] [review]
Patch 9 v2: Update comments/documentation

Review of attachment 543002 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #543002 - Flags: review?(roc) → review+
Comment on attachment 542998 [details] [diff] [review]
Patch 6 v2 -w: Run state machine in events

Review of attachment 542998 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoder.cpp
@@ +252,5 @@
>                 "Must have state machine to start state machine thread");
>    if (mStateMachineThread) {
>      return NS_OK;
>    }
> +  nsresult res = NS_NewThread(getter_AddRefs(mStateMachineThread));

We're probably going to want a thread pool here, but this'll do for now.

::: content/media/nsBuiltinDecoder.h
@@ +570,5 @@
>    // Ensures the state machine thread is running, starting a new one
>    // if necessary.
>    nsresult StartStateMachineThread();
>  
> +  nsresult CreateStateMachineThread();

Need comments to explain the difference between this and StartStateMachineThread, and why both are needed.

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +1678,5 @@
>    stream->Unpin();
>    return res;
>  }
> +
> +void RunStateMachine(nsITimer *aTimer, void *aClosure) {

static?

@@ +1714,5 @@
> +  // Ensure the state machine thread is alive; we'll be running on it!
> +  nsresult res = mDecoder->CreateStateMachineThread();
> +  if (NS_FAILED(res)) return res;
> +
> +  nsCOMPtr<nsIEventTarget> target = do_QueryInterface(mDecoder->mStateMachineThread);

Why do you need a do_QI here? An implicit static cast should work fine.
Comment on attachment 542999 [details] [diff] [review]
Patch 7 v2: Unify state machine threads

Review of attachment 542999 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +223,4 @@
>  {
>    MOZ_COUNT_CTOR(nsBuiltinDecoderStateMachine);
> +  NS_ASSERTION(NS_IsMainThread(), "Should be on main thread.");
> +  if (gStateMachineRefCount == 0) {

Call it gStateMachineCount.

@@ +1826,5 @@
> +   if (mIsRunning) {
> +     mRunAgain = PR_TRUE;
> +   } else if (!mDispatchedRunEvent) {
> +     mDispatchedRunEvent = PR_TRUE;
> +    gStateMachineThread->Dispatch(this, NS_DISPATCH_NORMAL);

Why not just run an iteration of the state machine right here now?
Comment on attachment 543000 [details] [diff] [review]
Patch 8 v2: Shutdown idle decode threads

Review of attachment 543000 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoderStateMachine.h
@@ +600,5 @@
>    // If we dispatch multiple events, the second event can run while the
>    // first is shutting down a thread, causing inconsistent state.
>    PRPackedBool mDispatchedRunEvent;
>  
> +  // PR_TRUE if the decode thread has gone filled its buffers and is now

"has gone filled its buffers"?
Attachment #543000 - Flags: review?(roc) → review+
Patch 6 with review comments addressed.
Attachment #542997 - Attachment is obsolete: true
Attachment #542998 - Attachment is obsolete: true
Attachment #542998 - Flags: review?(roc)
Patch 6 `diff -w` for easier review.
Attachment #543279 - Flags: review?(roc)
Patch 7 with review comments addressed.
Attachment #542999 - Attachment is obsolete: true
Attachment #543280 - Flags: review?(roc)
Attachment #542999 - Flags: review?(roc)
Comment on attachment 543279 [details] [diff] [review]
Patch 6 v3 -w: Run state machine in events

Review of attachment 543279 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #543279 - Flags: review?(roc) → review+
Comment on attachment 543280 [details] [diff] [review]
Patch 7 v3: Unify state machine threads

Review of attachment 543280 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #543280 - Flags: review?(roc) → review+
The patch 6 and patch 6 -w versions seem to to result in different nesting of some closing '}' amongst other things. Which is the correct patch that should be applied? For example, see the 'case DECODER_STATE_DECODING_METADATA' in nsBuiltinDecoderStateMachine::Run. In the '-w' version the closing '}' for the case is nested incorrectly.
I will be landing "patch 6". Roc asked me to upload a diff -w patch to make it easier for him to review. diff -w patches don't include whitespace changes, which is why you're not seeing correct indentation in some instances with the -w patch applied.
Target Milestone: --- → mozilla8
Whiteboard: [MemShrink:P1][inbound] → [MemShrink:P1]
Target Milestone: mozilla8 → ---
Btw, the style guide demands "rv" rather than "res" for nsresults.
There's something I still don't understand about the general approach here. If I understand correctly, these patches reduce the number of threads per media element to one, but that's still one thread per media element so if a page has 300 media elements, that's 300 threads. What's the point of using 300 threads on a 4-core machine? Wouldn't it be better to limit the number of concurrent media threads to the number of logical CPUs and use a queue?
Yes, an excellent question, ideally we'd use a thread pool for the decoders. But one step at a time eh?

With these patches we now have one less thread per media element, and we shutdown threads while we're not decoding (basically when not playing). But we still have 2 threads per *playing* media element (3 on Linux). We should be able to get down to one thread per element once bug 623444 has landed.

The hard part is our decoders assume they run synchronously. But decoders can block while reading data when they run out of downloaded data to decode. Their reads will wait for some data to download before returning and continuing. If a decoder's download stalls, such a blocked read could take a long time to return, so in that case we need to switch decode to another decoder. Then the question becomes, how can we switch to another decoder safely? Do we just call the other decoder (conceptually) on top of the existing call stack of the previous decode? What if that decode also blocks, how do we get back to the first decode? What if a malicious website creates enough videos with blocking connections that we end up running out of stack space?

These are hard questions, which I don't yet have a good enough answer for.

One option is to change our underlying decode to being non-blocking, but that would require us to mutilate libnestegg (our WebM demuxer), so we'd need to consider it carefully. Regardless, we should consider that in another bug.
I think once we get down to one thread per playing element, we'll be in a reasonable good place. We can make further progress relatively easily by giving up the decoder thread when our buffers are full. As Chris says, getting beyond that --- to handle cases where a lot of elements are playing, but stalled --- is hard, but hopefully that isn't the common case, because it would mean the user is getting a terrible experience already.
(In reply to comment #59)
> Backed out of inbound due to suspected android crashtests permaorange.

The crashtest hang was introduced in "Patch 5 v2: Move mAudioStream creation to audio thread".
The crashtest which is timing out on Android is content/media/tests/crashtests/481136-1.html

http://tbpl.mozilla.org/?tree=Try&rev=942ebc7636c3
content/media/test/crashtests/459439-1.html is timing out too.

http://tbpl.mozilla.org/?tree=Try&rev=080d2c0c9eec
The problem is that I was holding the decoder monitor while calling certain audio stream methods. On Android some nsAudioStreamRemote methods do synchronous event dispatch to the main thread, and the main thread sometimes needs to acquire the decoder monitor, which can cause a deadlock.

This patch fixes that...
http://tbpl.mozilla.org/?tree=Try&rev=f689e1455cfc
Attachment #545313 - Flags: review?(roc)
Comment on attachment 545313 [details] [diff] [review]
Patch 10: Don't hold decoder monitor while audio stream does sync dispatch to main thread (on Android).

Review of attachment 545313 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #545313 - Flags: review?(roc) → review+
Depends on: 671978
Depends on: 682593
Depends on: 816041
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: