Closed Bug 664918 Opened 13 years ago Closed 12 years ago

Infrastructure for media stream graph processing

Categories

(Core :: Audio/Video, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla15

People

(Reporter: roc, Assigned: roc)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-needed)

Attachments

(13 files, 13 obsolete files)

7.88 KB, patch
jesup
: review+
Details | Diff | Splinter Review
30.25 KB, patch
jesup
: review+
Details | Diff | Splinter Review
76.15 KB, patch
jesup
: review+
Details | Diff | Splinter Review
8.59 KB, patch
jesup
: review+
smaug
: review+
Details | Diff | Splinter Review
15.96 KB, patch
jesup
: review+
Details | Diff | Splinter Review
1.06 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
36.40 KB, patch
cpearce
: review+
jesup
: review+
Details | Diff | Splinter Review
3.49 KB, patch
jesup
: review+
Details | Diff | Splinter Review
3.15 KB, patch
jesup
: review+
Details | Diff | Splinter Review
9.93 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
83.13 KB, patch
cpearce
: review+
jesup
: review+
Details | Diff | Splinter Review
5.24 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
8.81 KB, patch
cpearce
: review+
jesup
: review+
Details | Diff | Splinter Review
Any update on how this is going?

Being the only (contemporary) browser without basic looping support is no fun :)
No longer blocks: 449157
Blocks: 654787
Blocks: 507672
Attachment #597686 - Attachment is obsolete: true
Ok, all those patches apply cleanly to alder as of today (and should apply cleanly to m-c as of yesterday).  They're taken from roc's patch-queue he shared for this work, and won't be considered for landing on m-c until he gets back.  I will probably land them on alder shortly, however, so that everyone doesn't have to constantly be pushing and popping them.
Comment on attachment 599867 [details] [diff] [review]
Part 1: Create TimeVarying<T> class to represent values that change over time

I realize you currently have semantics defined in Processed MediaStreams where timed setters null out all changes after the time specified, but I can see considerable utility for not nulling out all following changes (especially if this very generic class is used to address other problems).

I'd suggest SetAt() have an optional "clear all following sets" boolean; it can default to true if you like.

This would imply possible a Merge method in addition to Append.

Given it's potentially widely usable perhaps is should live lower down in our hierarchy of classes.

I see none of these comments/request as mandatory; just suggestions.

Other than that, feedback+
Attachment #599867 - Flags: feedback+
(In reply to Randell Jesup [:jesup] from comment #13)
> I realize you currently have semantics defined in Processed MediaStreams
> where timed setters null out all changes after the time specified, but I can
> see considerable utility for not nulling out all following changes
> (especially if this very generic class is used to address other problems).
> 
> I'd suggest SetAt() have an optional "clear all following sets" boolean; it
> can default to true if you like.

Boolean parameters are bad. I should just rename SetAt to SetAfter and if we need another method that doesn't change the entire future, we'll add one.
Agreed: SetAfter() (though we should be careful that people don't assume the value changes only *after* the specified time... it's wordy but you could use SetAtAndAfter().  But I agree - separate methods is good, and reserve SetAt() for the version that doesn't clear out following changes.
I made it SetAtAndAfter.
Comment on attachment 599997 [details] [diff] [review]
Part 2: Basic StreamBuffer structures for holding audio and video data, including audio resampling support. Basic stream GraphManager infrastructure for routing media data through Streams.

This section has me concerned due to latency:

+/**
+ * This is how much data we should try to have precomputed at any given time.
+ * If we've buffered K milliseconds, then we might write AUDIO_WRITTEN_TARGET_MS out,
+ * sleep for that long, wake up, do control processing (taking GRAPH_MANAGER_CONTROL_DELAY_MS),
+ * queue AUDIO_WRITTEN_TARGET_MS more data without blocking, and
+ * trigger computation of the next needed data.
+ */
+static const int BUFFER_TARGET_MS = AUDIO_WRITTEN_TARGET_MS*2 + GRAPH_MANAGER_CONTROL_DELAY_MS;

With 30 and 10ms for those constants, that means a target for buffering in MediaStreams of 70ms.  That's a LOT - that's 1/2 our mouth-to-ear over-the-network budget (for good quality conversation) right there (150ms is the knee in the quality curve, where people start really noticing delay and start to talk over one another).

What are the prospects for running shorter delays?  How risky is it from an underflow perspective?  Can we make the delay adaptive, where it runs longer delays if we get "too many" underflows?

+ * Allow video frames to be late by this much. Helps avoid unnecessary wakeups
+ * by giving us some scheduling slack.
+ */
+static const int ALLOW_VIDEO_FRAME_DELAY_MS = 5;

It's used like this:

+    // Don't wake up to play the next frame before ALLOW_VIDEO_FRAME_DELAY_MS from now
+    EnsureWakeUpAt(NS_MAX(mCurrentTime + ALLOW_VIDEO_FRAME_DELAY_MS*1000,
+                          frameEndWithBlocking));

The problem is that frame display times want to be as consistent as possible, and it's better to drop a frame and keep sync than to play a frame 'late'.  I realize there are all sorts of constraints on video frame display, but the closer to consistent and the later the decision is made as to which frame of video is displayed the better.

Also, humans are much more adept at noticing audio leading video (which is 'unnatural'), and not so much video leading audio (since that's similar to hearing a speech or sound from a distance, a very common ('natural') situation).

If there's any way to push the "which frame" decision closer to the actual display point, the better.  In past uses we would pre-decode up to 3 or 4 frames, and at the last ms in the compositor we'd select the 'best' frame from those available. The compositor ran in a PLL against video refresh, and would schedule a wakeup to composite the output just enough ms ahead of vblank/frame-refresh to avoid missing a frame (adaptively).  Not sure we can do anything approaching this (perhaps not), and if we did it would involve the GFX team I imagine (deeply).


In the current algorithm, if we're running late enough that we worried about not having time to wait, we should consider dropping the current frame and substituting the next frame for it, a smidge early.

Still reviewing...
(In reply to Randell Jesup [:jesup] from comment #17)
> What are the prospects for running shorter delays?  How risky is it from an
> underflow perspective?  Can we make the delay adaptive, where it runs longer
> delays if we get "too many" underflows?

Yes, we need to make the delay adaptive.

> If there's any way to push the "which frame" decision closer to the actual
> display point, the better.  In past uses we would pre-decode up to 3 or 4
> frames, and at the last ms in the compositor we'd select the 'best' frame
> from those available. The compositor ran in a PLL against video refresh, and
> would schedule a wakeup to composite the output just enough ms ahead of
> vblank/frame-refresh to avoid missing a frame (adaptively).  Not sure we can
> do anything approaching this (perhaps not), and if we did it would involve
> the GFX team I imagine (deeply).

We haven't seen a need to do that so far in the A/V playback engine. But you have a good point about preferring video leading audio, one that I wasn't aware of.

We need to make sure we don't wake up too often just to update a video frame, but I can probably fix it another way.
Comment on attachment 599997 [details] [diff] [review]
Part 2: Basic StreamBuffer structures for holding audio and video data, including audio resampling support. Basic stream GraphManager infrastructure for routing media data through Streams.

+GraphManagerImpl::ChooseActionTime()
+{
+  // Our actions should take effect as soon as possible, but we need to make sure
+  // that any playing audio stream which is affected by our actions will be
+  // able to recompute its data without under-running. Right now we have to
+  // estimate how long it will take us to recompute the data --- the estimate
+  // is DEFAULT_BUFFERING_MS.
+  mLastActionTime = GetEarliestActionTime();

This comment seems incorrect given the code (and there is no DEFAULT_BUFFERING_MS).

+  // End time of the audio that has been written to the hardware. The written
+  // audio can be a mix of silence (while blocking) and actual samples.
+  PRInt64 mAudioWrittenTime;

Just a question: I remember kinetik talking about Pulse allowing changing audio data after it's been written to Pulse, but hasn't gone to the hardware.  Is that worth considering, or does that just seriously mess up everything (probably it does, and especially echo cancellation if it's not being done inside pulse).  Or is that the un-implemented PruneDataStream()?

+GraphManagerImpl::PlayAudio(Stream* aStream)
...
+  float volume = 0.0;
+  for (PRUint32 i = 0; i < aStream->mAudioOutputs.Length(); ++i) {
+    volume += aStream->mAudioOutputs[i].mVolume;
+  }

This seems...odd; perhaps I don't understand the usage.  Multiplicative would make some sense (since 1.0f is the default), or an average perhaps, but additive confused me.

+ * A stream of synchronized audio and video data. All (not blocked) streams
+ * progress at the same rate --- "real time". Streams cannot seek. The only
+ * operation readers can perform on a stream is to read the next data.

As discussed by email/IRC, there are issues here dealing with real-time or otherwise uncontrolled sources (and sinks); for example taking input from a USB mic or remote audio stream and feeding to motherboard-connected speakers; even if theoretically the same sampling rate, they will drift and so some type of compensation (resampling, etc) will eventually be needed.

+/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*-
+/* This Source Code Form is subject to the terms of the Mozilla Public
+ * License, v. 2.0. If a copy of the MPL was not distributed with this file,
+ * You can obtain one at http://mozilla.org/MPL/2.0/. */

That appears all over; it causes compiler warnings due to /* within a comment.  Please close out the emacs mode line comment.

+static PRInt16
+FlipByteOrderIfBigEndian(PRInt16 aValue)
+{
+  short s = aValue;
+#if defined(IS_BIG_ENDIAN)
+  s = ((s & 0x00ff) << 8) | ((s & 0xff00) >> 8);
+#endif
+  return s;
+}


Normally I'd look to have the function take a PRUInt16 instead of PRInt16.

Various GCC versions generate better code if this is coded as:
   PRUInt16 s = (PRUInt16) aValue;
   return (PRInt16) ((s << 8) | (s >> 8));
Shouldn't make a difference, but it does (in some GCCs).  MSVC, Mac and Linux have optimized byte-swapping macros (gcc has bswap_16() - #include <byteswap.h>).

+FloatToSample(float aValue, PRUint8* aOut)
+{
+  PRInt32 v = PRInt32(aValue*128 + 128);
+  *aOut = PRUint8(NS_MIN(v, 255));
+}

Either were concerned about samples out of the +-1.0f range (as shown above) or we're not and can use a simpler conversion.  Since we are concerned, the code here aliases on clipping in the negative direction:

2.0f  -> 256+128=384   -> NS_MIN(384,255)  -> PRUInt8(255)  = 255 (ok)
-2.0f -> -256+128=-128 -> NS_MIN(-128,255) -> PRUInt8(-128) = 128 (oops)
1.5f  -> 192+128=320   -> NS_MIN(320,255)  -> PRUInt8(255)  = 255 (ok)
-1.5f -> -192+128=-64  -> NS_MIN(-64,255)  -> PRUInt8(-64)  = 192 (oops)

A similar problem applies to the 16-bit version:

+FloatToSample(float aValue, PRInt16* aOut)
+{
+  PRInt32 v = PRInt32(aValue*32768.0f);
+  *aOut = FlipByteOrderIfBigEndian(PRInt16(NS_MIN(v, 32767)));
+}

This converts to PRInt16, not PRUInt16.  However, we still clamp only the positive side, and let the negative side alias.  The numeric results are slightly different due to sign bits, but the problem still exists.

I'm sure there are other or better references (jmspeex should know), but one is:
http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html
and
http://blog.bjornroche.com/2009/12/linearity-and-dynamic-range-in-int.html

SampleToFloat() functions are ok I believe, though that isn't the only reasonable conversion (see links); the choice of conversion should at least be explicit and documented.

+float*
+AudioFrame::GetWritableBuffer(PRInt32 aChannels)
...
+  } else {
+    memset(buf, 0, len);
+  }
+  mBuffer = new AudioBuffer(mDuration, aChannels,
+                            nsAudioStream::FORMAT_FLOAT32, buf);
+  mOffset = 0;
+  return buf;
+}

Please comment that binary 0's are 0.0 (float or double) in IEEE 754 FP.  In theory other FP architectures might break on this; in practice not likely.

+void
+StreamBufferMultiplyFloats(float* aBase, float aComputed)
+{
+  *aBase *= aComputed;
+}

Defined (though rather unclear why), and not referenced.

I'm a little unsure about exactly what AppendAudio() and AppendVideo() are supposed to be doing, so it's hard to evaluate them without delving deep into the class structure to understand why some frames are suppressed.

+InterleaveAndConvertBuffer(const SrcT* aSource, PRInt32 aChannels, PRInt32 aLength,
+                           PRInt32 aStride, DestT* aOutput, float aVolume)
+{
+  DestT* output = aOutput;
+  for (PRInt32 i = 0; i < aLength; ++i) {
+    for (PRInt32 channel = 0; channel < aChannels; ++channel) {
+      float v = SampleToFloat(aSource[channel*aStride + i])*aVolume;
+      v = NS_MAX(v, -1.0f);
+      v = NS_MIN(v, 1.0f);
+      FloatToSample(v, output);
+      ++output;
+    }
+  }
+}

This clamps the floats and then converts; avoiding the problem above, but with this clamping the code to clamp in FloatToSample is unneeded and a waste of cycles.

+InterleaveAndConvertBuffer(const PRInt16* aSource, PRInt32 aChannels, PRInt32 aLength,
+                           PRInt32 aStride, PRInt16* aOutput, float aVolume)
...
+      short s = aSource[channel*aStride + i];
+#if defined(IS_BIG_ENDIAN)
+      s = ((s & 0x00ff) << 8) | ((s & 0xff00) >> 8);
+#endif

Same comment as above
Also, why does the code often use "short" instead of PRInt16? (there are cases with similar types as well)

+  nsAutoTArray<PRUint8,50000> buf;

Could we move these arbitrary constants up to a #define, and justify the value?  Also, that's a 50K stack allocation... I understand the reason, but still, it's big.

My apologies, MEGO at around 2800 lines into the patch (~80%).  I'm sure I skimmed some parts I shouldn't have; it's late.

feedback- due to the aliasing problem at least (though it's unclear if this will affect things in practice immediately).
Attachment #599997 - Flags: feedback-
Comment on attachment 599997 [details] [diff] [review]
Part 2: Basic StreamBuffer structures for holding audio and video data, including audio resampling support. Basic stream GraphManager infrastructure for routing media data through Streams.

+void CombineAudioFrameSamples(float* aBuffer, PRInt32 aBufferDuration,
+                              PRInt32 aChannels,
+                              nsTArray<AudioFrame>& aAudio)
...
+      for (PRInt32 channel = 0; channel < aChannels; ++channel) {
+        if (volume == 1.0f) {
+          for (PRInt32 j = 0; j < duration; ++j) {
+            Combine(dest + j, src[j]);
+          }
+        } else {
+          for (PRInt32 j = 0; j < duration; ++j) {
+            Combine(dest + j, src[j]*volume);
+          }
+        }

Might be a smidge more efficient to move the FP test out of the outer loop:

+      if (volume == 1.0f) {
+        for (PRInt32 channel = 0; channel < aChannels; ++channel) {
+          for (PRInt32 j = 0; j < duration; ++j) {
+            Combine(dest + j, src[j]);
+          }
+        }
+      } else {
+        for (PRInt32 channel = 0; channel < aChannels; ++channel) {
+          for (PRInt32 j = 0; j < duration; ++j) {
+            Combine(dest + j, src[j]*volume);
+          }
+        }
+      }

But probably not worth it, and the compiler might well make it moot anyways.

+typedef void (* CombineAudioValues)(float* aDest, float aSrc);
+template <CombineAudioValues Combine>
+void CombineAudioFrameSamples(float* aBuffer, PRInt32 aBufferDuration,
+                              PRInt32 aChannels,
+                              nsTArray<AudioFrame>& aAudio)

This doesn't appear to be used anywhere.

+ * XXX This is the worst possible implementation. We average the channels
+ * to get mono, then we duplicate that across all channels with some super-dumb
+ * linear-interpolation resampling. This must be replaced with something that
+ * doesn't suck!

Hmmm, sounds like an action item/follow-up-bug.  :-)

+  enum {
+    // Always keep around at least 5ms of audio data (if available) to ensure
+    // that we can resample accurately as required
+    AUDIO_MIN_MS = 5
+  };

a) this isn't used anywhere
b) Is this really what we'd want to do?  seems a bit imprecise.

+  void ConsumeInput(nsTArray<AudioFrame>* aOutput)

I worry about the allocation pattern this function may produce; where the outBuf TArray allocates perhaps multiple times to hold samples, then re moz_xalloc a new buffer and memcpy all the data over.  Especially as the output size *should* be computable from the input size.

(it comes to mind that some of these "unused" functions may well be for the MediaStream Processing code, which I don't have applied)

And I finally come to the end of media-graph!
Comment on attachment 599997 [details] [diff] [review]
Part 2: Basic StreamBuffer structures for holding audio and video data, including audio resampling support. Basic stream GraphManager infrastructure for routing media data through Streams.

Oh, one more:  eshan just revised the recommendations for namespaces, and we're trying to avoid second-level namespaces below mozilla except in specific instances, so the mozilla::media namespace should probably go away.
Comment on attachment 599998 [details] [diff] [review]
Part 3: Support MediaStream DOM object

Looks good to me modulo the namespace thing
Attachment #599998 - Flags: feedback+
Comment on attachment 599999 [details] [diff] [review]
Part 4: Add InputStreams --- streams that can have data written into them by some media data producer.

+void
+GraphManagerImpl::ExtractPendingInput(InputStream* aStream)

Perhaps some comments, though this can be figured out from the code and the name.

+void
+InputStream::Finish()
+{
+  {
+    MutexAutoLock lock(mMutex);
+    mPendingFinished = true;
+  }
+  gManager->WakeUp(ALLOW_DELAY_AFTER_OUTPUT_MS*1000);
+}

(this define is 2ms) - these sorts of delays make me uneasy; the stream is finished or it isn't; shouldn't the wakeup of the graph manager be triggered when the InputStream is really finished?  If we're worried about too many of them triggering independent wakeups, they can be batched in some manner.  If there's a good reason for the 2ms delay, let's document that and why it's 2ms.

+ * XXX This should probably be called SourceStream.

Agreed.

+  // Returns true if the buffer currently "enough"
+  // microseconds of audio data. If it returns true, then when the buffer drops

There's a missing word in the first line I think
Attachment #599999 - Flags: feedback+
Comment on attachment 600002 [details] [diff] [review]
Part 5: Let the src attribute of HTML media elements accept a MediaStream object, so that the media element plays back the contents of the given stream

Good by me, though it will need real review by someone else for m-c (who knows the media elements better).
Attachment #600002 - Flags: feedback+
Comment on attachment 600003 [details] [diff] [review]
Part 6: Add captureStream()/captureStreamUntilEnded() APIs to HTML media elements, returning a stream representing the contents of the media element.

+  // This logic has to mimic AudioLoop closely to make sure we're write
+  // the exact same silences

Minor english

+static void WriteSilence(nsAudioStream* aStream, PRUint32 aFrames)
+{
+  PRUint32 numSamples = aFrames * aStream->GetChannels();
+  nsAutoTArray<AudioDataValue, 1000> buf;
+  buf.SetLength(numSamples);
+  memset(buf.Elements(), 0, numSamples * sizeof(AudioDataValue));
+  aStream->Write(buf.Elements(), aFrames);

Is 1000 the right value?  What sizes do we expect to see here?  What chance is there that we'll ever do an allocation due to SetLength(), and how much space are we wasting on the stack?

Would it be more efficient to do it the "old" way:
-    nsAutoArrayPtr<AudioDataValue> buf(new AudioDataValue[numSamples]);
Attachment #600003 - Flags: feedback+
Comment on attachment 600005 [details] [diff] [review]
Part 8: Tentative support for StreamListener::NotifyQueued

Seems good as far as it goes
Attachment #600005 - Flags: feedback+
Assignee: roc → nobody
Component: Video/Audio → WebRTC: Audio/Video
QA Contact: video.audio → webrtc-audiovideo
Component: WebRTC: Audio/Video → Video/Audio
QA Contact: webrtc-audiovideo → video.audio
Assignee: nobody → roc
Comment on attachment 599867 [details] [diff] [review]
Part 1: Create TimeVarying<T> class to represent values that change over time

In TimeVarying.h:
+    for (PRInt32 i = mChanges.Length() - 1; i >= 0; --i) {
+      NS_ASSERTION(i == mChanges.Length() - 1, "Always considering last element of array");

The NS_ASSERTION needs a cast to avoid erroring out on builders.
Attachment #599867 - Attachment is obsolete: true
Attachment #605642 - Flags: feedback?(tterribe)
Comment on attachment 605642 [details] [diff] [review]
fix to TimeVarying.h for warnings-as-errors

Review of attachment 605642 [details] [diff] [review]:
-----------------------------------------------------------------

f+ with one nit.

::: content/media/TimeVarying.h
@@ +39,5 @@
>     */
>    void SetAt(PRInt64 aTime, const T& aValue)
>    {
>      for (PRInt32 i = mChanges.Length() - 1; i >= 0; --i) {
> +      NS_ASSERTION(i == (PRInt32) mChanges.Length() - 1, "Always considering last element of array");

Suggest wrapping the line at 80 chars.
Attachment #605642 - Flags: feedback?(tterribe) → feedback+
(In reply to Timothy B. Terriberry (:derf) from comment #29)
> > +      NS_ASSERTION(i == (PRInt32) mChanges.Length() - 1, "Always considering last element 

It occurred to me afterwards (too late) that this should also be a static_cast, instead of a C-style cast. I spend so much time working with C that I forget we're allergic to C-style casts in C++ code.
Roc will be revising his patches based on my changes, feedback from myself and the other media guys, etc, so I'm sure he can do a better fix.  

A lot of me still thinks in C as well, as I worked mostly with C code at WG.
Everything fixed except for what's mentioned below.

(In reply to Randell Jesup [:jesup] from comment #19)
> +  // End time of the audio that has been written to the hardware. The
> written
> +  // audio can be a mix of silence (while blocking) and actual samples.
> +  PRInt64 mAudioWrittenTime;
> 
> Just a question: I remember kinetik talking about Pulse allowing changing
> audio data after it's been written to Pulse, but hasn't gone to the
> hardware.  Is that worth considering, or does that just seriously mess up
> everything (probably it does, and especially echo cancellation if it's not
> being done inside pulse).  Or is that the un-implemented PruneDataStream()?

That doesn't work on most platforms. Let's not do it.

> +GraphManagerImpl::PlayAudio(Stream* aStream)
> ...
> +  float volume = 0.0;
> +  for (PRUint32 i = 0; i < aStream->mAudioOutputs.Length(); ++i) {
> +    volume += aStream->mAudioOutputs[i].mVolume;
> +  }
> 
> This seems...odd; perhaps I don't understand the usage.  Multiplicative
> would make some sense (since 1.0f is the default), or an average perhaps,
> but additive confused me.

Each element mAudioOutputs represents an independent <audio> element playing back the same stream. Jean-Marc told me that, since these streams are perfectly correlated, adding the volumes together for a single output stream is the same as playing multiple output streams with those volumes.

> + * A stream of synchronized audio and video data. All (not blocked) streams
> + * progress at the same rate --- "real time". Streams cannot seek. The only
> + * operation readers can perform on a stream is to read the next data.
> 
> As discussed by email/IRC, there are issues here dealing with real-time or
> otherwise uncontrolled sources (and sinks); for example taking input from a
> USB mic or remote audio stream and feeding to motherboard-connected
> speakers; even if theoretically the same sampling rate, they will drift and
> so some type of compensation (resampling, etc) will eventually be needed.

Yeah. This definition stands, and it's up to graph inputs to make sure they resample as necessary to avoid drift.

> Various GCC versions generate better code if this is coded as:
>    PRUInt16 s = (PRUInt16) aValue;
>    return (PRInt16) ((s << 8) | (s >> 8));
> Shouldn't make a difference, but it does (in some GCCs).  MSVC, Mac and
> Linux have optimized byte-swapping macros (gcc has bswap_16() - #include
> <byteswap.h>).

I care this much about big-endian performance: ><, but fixed anyway.

> +FloatToSample(float aValue, PRUint8* aOut)
> +{
> +  PRInt32 v = PRInt32(aValue*128 + 128);
> +  *aOut = PRUint8(NS_MIN(v, 255));
> +}
> 
> Either were concerned about samples out of the +-1.0f range (as shown above)
> or we're not and can use a simpler conversion.  Since we are concerned, the
> code here aliases on clipping in the negative direction:
> 
> 2.0f  -> 256+128=384   -> NS_MIN(384,255)  -> PRUInt8(255)  = 255 (ok)
> -2.0f -> -256+128=-128 -> NS_MIN(-128,255) -> PRUInt8(-128) = 128 (oops)
> 1.5f  -> 192+128=320   -> NS_MIN(320,255)  -> PRUInt8(255)  = 255 (ok)
> -1.5f -> -192+128=-64  -> NS_MIN(-64,255)  -> PRUInt8(-64)  = 192 (oops)
> 
> A similar problem applies to the 16-bit version:
> 
> +FloatToSample(float aValue, PRInt16* aOut)
> +{
> +  PRInt32 v = PRInt32(aValue*32768.0f);
> +  *aOut = FlipByteOrderIfBigEndian(PRInt16(NS_MIN(v, 32767)));
> +}
> 
> This converts to PRInt16, not PRUInt16.  However, we still clamp only the
> positive side, and let the negative side alias.  The numeric results are
> slightly different due to sign bits, but the problem still exists.
> 
> I'm sure there are other or better references (jmspeex should know), but one
> is:
> http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html
> and
> http://blog.bjornroche.com/2009/12/linearity-and-dynamic-range-in-int.html

Wow, what a tar pit. I'll fix for out-of-range cases and use a 2^N conversion. Someone who cares more than I do and is willing to fight the flame wars can patch it afterwards.

> +float*
> +AudioFrame::GetWritableBuffer(PRInt32 aChannels)
> ...
> +  } else {
> +    memset(buf, 0, len);
> +  }
> +  mBuffer = new AudioBuffer(mDuration, aChannels,
> +                            nsAudioStream::FORMAT_FLOAT32, buf);
> +  mOffset = 0;
> +  return buf;
> +}
> 
> Please comment that binary 0's are 0.0 (float or double) in IEEE 754 FP.  In
> theory other FP architectures might break on this; in practice not likely.

Right. This is only used for processing, so I'm deferring that until a little bit later (not this bug).

> Defined (though rather unclear why), and not referenced.

Will remove.

(In reply to Randell Jesup [:jesup] from comment #20)
> +void CombineAudioFrameSamples(float* aBuffer, PRInt32 aBufferDuration,
> +                              PRInt32 aChannels,
> +                              nsTArray<AudioFrame>& aAudio)

Deferring to processing patches.

> +typedef void (* CombineAudioValues)(float* aDest, float aSrc);
> +template <CombineAudioValues Combine>
> +void CombineAudioFrameSamples(float* aBuffer, PRInt32 aBufferDuration,
> +                              PRInt32 aChannels,
> +                              nsTArray<AudioFrame>& aAudio)
> 
> This doesn't appear to be used anywhere.

Removed.

> + * XXX This is the worst possible implementation. We average the channels
> + * to get mono, then we duplicate that across all channels with some
> super-dumb
> + * linear-interpolation resampling. This must be replaced with something
> that
> + * doesn't suck!
> 
> Hmmm, sounds like an action item/follow-up-bug.  :-)

Yes.

> +  void ConsumeInput(nsTArray<AudioFrame>* aOutput)
> 
> I worry about the allocation pattern this function may produce; where the
> outBuf TArray allocates perhaps multiple times to hold samples, then re
> moz_xalloc a new buffer and memcpy all the data over.  Especially as the
> output size *should* be computable from the input size.

We never copy sample data, we just addref buffers.
(In reply to Randell Jesup [:jesup] from comment #21)
> Oh, one more:  eshan just revised the recommendations for namespaces, and
> we're trying to avoid second-level namespaces below mozilla except in
> specific instances, so the mozilla::media namespace should probably go away.

Yes, I've renamed a lot of things and done that.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #32)

> > +GraphManagerImpl::PlayAudio(Stream* aStream)
> > ...
> > +  float volume = 0.0;
> > +  for (PRUint32 i = 0; i < aStream->mAudioOutputs.Length(); ++i) {
> > +    volume += aStream->mAudioOutputs[i].mVolume;
> > +  }
> > 
> > This seems...odd; perhaps I don't understand the usage.  Multiplicative
> > would make some sense (since 1.0f is the default), or an average perhaps,
> > but additive confused me.
> 
> Each element mAudioOutputs represents an independent <audio> element playing
> back the same stream. Jean-Marc told me that, since these streams are
> perfectly correlated, adding the volumes together for a single output stream
> is the same as playing multiple output streams with those volumes.

Ah... So these are multiple copies of the same stream going to the same output, playing with the exact same timing, and we're collapsing them to one, louder version.  That wasn't clear (probably means a comment is in order).  Probably good to have jmspeex glance over this code anyways.

What's the use-case for this situation being created?

> 
> > + * A stream of synchronized audio and video data. All (not blocked) streams
> > + * progress at the same rate --- "real time". Streams cannot seek. The only
> > + * operation readers can perform on a stream is to read the next data.
> > 
> > As discussed by email/IRC, there are issues here dealing with real-time or
> > otherwise uncontrolled sources (and sinks); for example taking input from a
> > USB mic or remote audio stream and feeding to motherboard-connected
> > speakers; even if theoretically the same sampling rate, they will drift and
> > so some type of compensation (resampling, etc) will eventually be needed.
> 
> Yeah. This definition stands, and it's up to graph inputs to make sure they
> resample as necessary to avoid drift.

Lets think long and hard on this.  This is a fundamental, critical design point, and I would like to be very sure we understand all the ramifications and that this is the best way to do it (and that we've made sure they can do so properly - that they're getting the feedback they need to adapt).  This likely is the correct decision, but I'm just very wary of the rats-nest of issue that tie into it.

> > http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html
> > and
> > http://blog.bjornroche.com/2009/12/linearity-and-dynamic-range-in-int.html
> 
> Wow, what a tar pit. I'll fix for out-of-range cases and use a 2^N
> conversion. Someone who cares more than I do and is willing to fight the
> flame wars can patch it afterwards.

Yeah, seriously.


> 
> > +float*
> > +AudioFrame::GetWritableBuffer(PRInt32 aChannels)
> > ...
> > +  } else {
> > +    memset(buf, 0, len);
> > +  }
> > +  mBuffer = new AudioBuffer(mDuration, aChannels,
> > +                            nsAudioStream::FORMAT_FLOAT32, buf);
> > +  mOffset = 0;
> > +  return buf;
> > +}
> > 
> > Please comment that binary 0's are 0.0 (float or double) in IEEE 754 FP.  In
> > theory other FP architectures might break on this; in practice not likely.
> 
> Right. This is only used for processing, so I'm deferring that until a
> little bit later (not this bug).

Sure; all that's needed here (and it's a trivial issue anyways) is a comment.
(In reply to Randell Jesup [:jesup] from comment #34)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #32)
> > Each element mAudioOutputs represents an independent <audio> element playing
> > back the same stream. Jean-Marc told me that, since these streams are
> > perfectly correlated, adding the volumes together for a single output stream
> > is the same as playing multiple output streams with those volumes.
> 
> Ah... So these are multiple copies of the same stream going to the same
> output, playing with the exact same timing, and we're collapsing them to
> one, louder version.  That wasn't clear (probably means a comment is in
> order).  Probably good to have jmspeex glance over this code anyways.

OK.

> What's the use-case for this situation being created?

var stream = ...;
var a1 = new Audio(stream);
var a2 = new Audio(stream);
a1.volume = ...;
a2.volume = ...;
a1.play(); a2.play();

Not really a use-case, but it's easier to add volumes than actually manage multiple nsAudioStreams.

> > Yeah. This definition stands, and it's up to graph inputs to make sure they
> > resample as necessary to avoid drift.
> 
> Lets think long and hard on this.  This is a fundamental, critical design
> point, and I would like to be very sure we understand all the ramifications
> and that this is the best way to do it (and that we've made sure they can do
> so properly - that they're getting the feedback they need to adapt).  This
> likely is the correct decision, but I'm just very wary of the rats-nest of
> issue that tie into it.

Sure. We did discuss it on the phone a while back.
Attachment #599997 - Attachment is obsolete: true
Attachment #599998 - Attachment is obsolete: true
Attachment #599999 - Attachment is obsolete: true
Attachment #600002 - Attachment is obsolete: true
Attachment #600003 - Attachment is obsolete: true
Attachment #600004 - Attachment is obsolete: true
Attachment #600005 - Attachment is obsolete: true
Attachment #605642 - Attachment is obsolete: true
Attachment #610001 - Flags: review?(bzbarsky)
Attachment #610002 - Flags: review? → review?(rjesup)
These patches take into account all the comments above.

There are some pretty major changes compared to the first round of patches. Things that spring to mind:
-- A lot of code that was only used for ProcessedMediaStream has been removed. (I'll put it back later in a separate round of patches for ProcessedMediaStream.)
-- The old StreamBuffer implementation with its Audio/VideoFrames and fixed "one audio track, one video track" setup was totally overhauled. The new StreamBuffer supports dynamic addition and removal of any number of tracks of any kind. This actually simplified some of the code.
-- Logic around various timelines was simplified and clarified by adding explicit types "GraphTime", "StreamTime", "TrackTicks" and methods for converting between them.
-- The graph thread scheduling logic was greatly simplified. The current approach is: try to run the graph thread control loop every 10ms unless there is nothing to do. Try to buffer enough to cover missing our wakeup deadline by up to 10ms.
-- An nsIPrincipal is associated with every nsDOMMediaStream. This tracks the origin of the stream data. Appropriate security checks are in place so that you can't use MediaStreams to get pixel data from non-same-origin videos.

I think the weakest link here is part 8. I need to pretty quickly do some followup work to handle cases where the author monkeys with a media element that's generating a MediaStream.
Comment on attachment 610005 [details] [diff] [review]
Part 4: Create nsDOMMediaStream, a DOM object wrapping an underlying MediaStream

So, MediaStream will get some member variables which will cause cycles.
Attachment #610005 - Flags: review?(bugs) → review+
Attachment #610007 - Flags: review?(bas.schouten) → review+
Comment on attachment 610001 [details] [diff] [review]
Part 0: Refactor nsMediaCache's handling of principals into a helper method in nsContentUtils

It's tempting to mention something about computing the meet of the two principals, but I guess that's ok.

It might be worth exposing an isNull() method on nsIPrincipal.  A followup, please.

I think it's worth more clearly documenting that aResourcePrincipal is required to be non-null but *aResourcePrincipal may be null.  Or perhaps just make aResourcePrincipal an nsCOMPtr<nsIPrincipal>& and avoid that issue?  It might also be worth more clearly distinguishing between null _pointers_ and null _principals_ in the documentation...

Do you really want Equals here?  Seems like a Subsumes() check in one direction or other would make more sense.  And yes, I know you just copied this code...

r=me with the nits fixed.
Attachment #610001 - Flags: review?(bzbarsky) → review+
Comment on attachment 610008 [details] [diff] [review]
Part 7: Let the "src" attribute of HTML media elements accept a MediaStream DOM object, to make the media element play back the contents of the given stream

Review of attachment 610008 [details] [diff] [review]:
-----------------------------------------------------------------

r=cpearce assuming you figure out what to do about SetCurrentTime().

::: content/html/content/src/nsHTMLMediaElement.cpp
@@ +1121,5 @@
>    NS_ENSURE_ARG_POINTER(aListener);
>  
>    *aListener = nsnull;
>  
> +  // Make sure we don't reenter during synchronous abort events

Append full stop.

@@ +1147,5 @@
>  NS_IMETHODIMP nsHTMLMediaElement::MozLoadFrom(nsIDOMHTMLMediaElement* aOther)
>  {
>    NS_ENSURE_ARG_POINTER(aOther);
>  
> +  // Make sure we don't reenter during synchronous abort events

Append full stop.

@@ +1209,5 @@
>  
> +  if (mStream) {
> +    // do nothing since streams aren't seekable; we effectively clamp to
> +    // the current time.
> +    SeekCompleted();

Doesn't this mean you'll dispatch a "seeked" event without having first dispatched a "seeking" event? Maybe you should call SeekStarted() before calling SeekCompleted()?

On the other hand the WhatWG spec for currentTime [1] says "On setting, if the media element has a current media controller, then the user agent must throw an InvalidStateError exception", so you could follow that example and throw an exception. This makes more sense to me, since MediaStreams are supposed to be seekable, and I don't think it makes sense to be firing seeking/seeked events if we're not actually seeking.

[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#dom-media-currenttime

@@ +1262,5 @@
>    return NS_OK;
>  }
>  
>  /* readonly attribute nsIDOMHTMLTimeRanges seekable; */
>  NS_IMETHODIMP nsHTMLMediaElement::GetSeekable(nsIDOMTimeRanges** aSeekable)

It looks like GetSeekable() and GetBuffered() would return empty ranges when playing streams. Presumably that's ok?

@@ +2202,2 @@
>    if (NS_FAILED(rv)) {
> +    LOG(PR_LOG_DEBUG, ("%p Failed to load decoder/stream for decoder %p", this, aDecoder));

s/decoder\/stream/decoder/ right? This is only called when setting up a decoder (otherwise you'd need to null check aDecoder above).

::: dom/interfaces/html/nsIDOMHTMLMediaElement.idl
@@ +56,5 @@
>  #undef GetCurrentTime
>  #endif
>  %}
>  
> +[scriptable, uuid(3e672e79-a0ea-45ef-87de-828402f1f6d7)]

You'll need to rev the uuids of classes that inherit MediaElement too.
Attachment #610008 - Flags: review?(cpearce) → review+
Comment on attachment 610009 [details] [diff] [review]
Part 8: Add mozCaptureStream()/mozCaptureStreamUntilEnded() APIs to HTML media elements, returning a MediaStream representing the contents of the media element

Review of attachment 610009 [details] [diff] [review]:
-----------------------------------------------------------------

I think we should use AudioDataValue rather than VorbisPCMValue.

::: content/html/content/public/nsHTMLMediaElement.h
@@ +405,5 @@
>     * Stop playback on mStream.
>     */
>    void EndMediaStreamPlayback();
>  
> +  already_AddRefed<nsDOMMediaStream> CaptureStreamInternal(bool aFinishWhenEnded);

Comment describing affects of aFinishWhenEnded.

@@ +771,5 @@
>  
>    // True if the sound is muted
>    bool mMuted;
>  
> +  // True if the sound is being captured

Full stop.

::: content/media/nsBuiltinDecoder.cpp
@@ +99,5 @@
> +  ms->Init(PRInt64(mCurrentTime*USECS_PER_S), aStream, aFinishWhenEnded);
> +
> +  if (mDecoderStateMachine) {
> +    // Queue data for this stream now in case the decode loop doesn't take
> +    // any more steps for a while.

Can't you just calls ScheduleStateMachine here?

::: content/media/nsBuiltinDecoder.h
@@ +421,5 @@
> +    PRInt64 mLastAudioPacketEndTime; // microseconds
> +    // Count of audio frames written to the stream
> +    PRInt64 mAudioFramesWritten;
> +    // Timestamp of the first audio packet whose frames we wrote.
> +   PRInt64 mAudioFramesWrittenBaseTime; // microseconds

Indentation is off.

@@ +431,5 @@
> +    // the image.
> +    nsRefPtr<Image> mLastVideoImage;
> +    nsRefPtr<SourceMediaStream> mStream;
> +    gfxIntSize mLastVideoImageDisplaySize;
> +    bool mIsInitialized;

Might be handy to say why mIsInitialized isn't set in the Init() method.

@@ +735,5 @@
>    // to Wait on this monitor will block the thread until the next
>    // state change.
>    ReentrantMonitor mReentrantMonitor;
>  
> +  nsTArray<OutputMediaStream> mOutputStreams;

Add comment please.

::: content/media/nsBuiltinDecoderReader.cpp
@@ +80,5 @@
> +
> +  AudioDataValue* data = static_cast<AudioDataValue*>(mAudioBuffer->Data());
> +  for (PRUint32 i = 0; i < mFrames; ++i) {
> +    for (PRUint32 j = 0; j < mChannels; ++j) {
> +      data[j*mFrames + i] = mAudioData[i*mChannels + j];

You should make it clear that your deliberately converting to channel-major ordering here. Or even we should just always store samples in channel-major order until we write to hardware.

@@ +411,5 @@
> +                                           PRInt64 aTime,
> +                                           PRInt64 aDuration,
> +                                           PRUint32 aFrames,
> +                                           PRUint32 aChannels,
> +                                           VorbisPCMValue** aChannelBuffers)

Why can't aChannelBuffers be AudioDataValue rather than VorbisPCMValue? I'd rather not pollute the general nsBuiltinDecoderReader architecture with codec specific data types. Why should the Wave decoder output a VorbisPCMValue? If we implemented an MP3 or AAC decoder, why should they?

::: content/media/nsBuiltinDecoderReader.h
@@ +473,5 @@
>    // Queue of audio frames. This queue is threadsafe, and is accessed from
>    // the audio, decoder, state machine, and main threads.
>    MediaQueue<AudioData> mAudioQueue;
>  
> +  void PushAudioData(PRInt64 aOffset,

Comment, including units for aTime and aDuration. And I'd prefer aChannelBuffers to be AudioDataValue** rather than VorbisPCMValue**.

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +562,5 @@
> +  if (!audioWrittenOffset.valid() || !frameOffset.valid())
> +    return;
> +  if (audioWrittenOffset.value() < frameOffset.value()) {
> +    // Write silence to catch up
> +    LOG(PR_LOG_DEBUG, ("%p Decoder writing %d frames of silence to stream",

Maybe you should change this logging to make it clear that this is for the MediaStream output, not regular playback?

@@ +604,5 @@
> +}
> +
> +static const TrackID TRACK_AUDIO = 1;
> +static const TrackID TRACK_VIDEO = 2;
> +static const TrackRate RATE_VIDEO = 1000000;

USECS_PER_S ?

@@ +670,5 @@
> +      VideoSegment output;
> +      for (PRUint32 i = 0; i < video.Length(); ++i) {
> +        VideoData* v = video[i];
> +        if (stream->mNextVideoTime + mStartTime < v->mTime) {
> +          LOG(PR_LOG_DEBUG, ("%p Decoder writing last video to stream for %lld ms",

Same thing here, make clear this is the decoder writing to media stream rather than to regular playback.

@@ +741,5 @@
> +    }
> +  }
> +}
> +
> +bool nsBuiltinDecoderStateMachine::HaveEnoughDecodedAudio(PRInt64 aAmpleAudio)

aAmpleAudioUsecs (presumably).

@@ +890,5 @@
>  
>      // Video decode.
> +    bool throttleVideoDecoding = !videoPlaying || HaveEnoughDecodedVideo();
> +    if (mDidThrottleVideoDecoding && !throttleVideoDecoding) {
> +      videoPump = true;

This seems like a reasonable change to make.  We've got some bugs on file regarding our skip-to-next-keyframe logic being too aggressive (bug 688363), this will affect that.

::: content/media/nsBuiltinDecoderStateMachine.h
@@ +286,5 @@
>  
> +  // Copy queued audio/video data in the reader to any output MediaStreams that
> +  // need it.
> +  void SendOutputStreamData();
> +  bool HaveEnoughDecodedAudio(PRInt64 aAmpleAudio);

aAmpleAudioUsecs (presumably).

@@ +452,5 @@
>    // Decode thread run function. Determines which of the Decode*() functions
>    // to call.
>    void DecodeThreadRun();
>  
> +  void SendOutputStreamAudio(AudioData* aAudio, OutputMediaStream* aStream,

Comment?

@@ +659,5 @@
>  
>    // True is we are decoding a realtime stream, like a camera stream
>    bool mRealTime;
>  
> +  bool mDidThrottleAudioDecoding;

Descriptive comments?

::: content/media/nsMediaCache.h
@@ +476,5 @@
>    bool                   mDidNotifyDataEnded;
>  
>    // The following fields are protected by the cache's monitor but are
> +  // only written on the main thread, thus can be used either on the main thread
> +  // or while holding the cache's monitor.

Presumably you must hold the monitor while writing though?

::: dom/interfaces/html/nsIDOMHTMLMediaElement.idl
@@ +48,5 @@
>   *
>   * For more information on this interface, please see
>   * http://www.whatwg.org/specs/web-apps/current-work/#htmlmediaelement
>   *
>   * @status UNDER_DEVELOPMENT

Rev uuid and uuid of subclasses.

There's a tool for that: http://people.mozilla.com/~sfink/uploads/update-uuids
`update-uuids nsIDOMHTMLMediaElement dom` should do it.
Attachment #610009 - Flags: review?(cpearce) → review-
I fixed a few bugs and my patches are green on try on all platforms. Of course, the test here is pretty simple but it does exercise a great deal of the existing functionality.
(In reply to Chris Pearce (:cpearce) from comment #50)
> On the other hand the WhatWG spec for currentTime [1] says "On setting, if
> the media element has a current media controller, then the user agent must
> throw an InvalidStateError exception", so you could follow that example and
> throw an exception. This makes more sense to me, since MediaStreams are
> supposed to be seekable, and I don't think it makes sense to be firing
> seeking/seeked events if we're not actually seeking.

I changed it to throw an exception.

> It looks like GetSeekable() and GetBuffered() would return empty ranges when
> playing streams. Presumably that's ok?

I think so.

Everything else fixed.
(In reply to Chris Pearce (:cpearce) from comment #51)
> I think we should use AudioDataValue rather than VorbisPCMValue.

I removed all changes to PushAudioData. It was an experiment that didn't work out. However, at some point after this lands we should probably switch the decoder stack to use the same channel-major audio buffers as media streams.

> Might be handy to say why mIsInitialized isn't set in the Init() method.

I changed its name to mStreamInitialized and added a comment.

Everything else fixed.
(In reply to Boris Zbarsky (:bz) from comment #49)

I'm actually not so sure about the safety of the code in this patch. My concern is that if the current principal is some null principal, and some other principal is merged into it, we don't change the current principal, but maybe we should. Because imagine if we have a stream A with a null principal, and that principal gets propagated to stream B because B gets data from A (and nowhere else). Then imagine we inject data from some fresh origin O into A (but B is not changed). In theory, we should disallow future access from B to A, but with this code, we wouldn't because they'd still have the same principal. In theory, it seems to me that every time we mix content from some origin O into a null principal for a resource, we should generate a fresh null principal for the resource.

Now, maybe it doesn't matter because nothing with a null principal can "access" A in a meaningful way, i.e., treating different null principals as same-origin to each other is harmless. But I don't find that convincing.

Maybe what I really want is a special "poisoned" principal that isn't even same-origin to itself. Does that make sense?
Maybe a better way to go would be to regard the principal of a resource as one that must be subsumed by any principal reading the resource? Then the meet of two incompatible origins could be the system principal. How does that sound?
Per irc discussion, if the principal only controls access to the stream, not what the stream can access, then using the system principal to indicate "no one but system can access this" makes perfect sense.
Attachment #610009 - Attachment is obsolete: true
Attachment #610009 - Flags: review?(rjesup)
Attachment #611715 - Flags: review?(cpearce) → review+
Comment on attachment 610002 [details] [diff] [review]
Part 1: Create TimeVarying<T> class to represent values that change over time

Looks good, like the parameterization of Time
Attachment #610002 - Flags: review?(rjesup) → review+
Comment on attachment 610004 [details] [diff] [review]
Part 3: Create MediaStream and MediaGraphManager for internal management of real-time media processing

Review of attachment 610004 [details] [diff] [review]:
-----------------------------------------------------------------

r=jesup, though note that I expect some additional synchronization work will be needed.

::: content/media/MediaStreamGraph.cpp
@@ +297,5 @@
> +  bool IsEmpty() { return mStreams.IsEmpty(); }
> +
> +  // For use by control messages
> +  /**
> +   * Identiy which graph update index we are currently processing.

"Identiy" -> Identify

@@ +311,5 @@
> +   * Marks aStream as affected by a change in its output at the earliest
> +   * possible time.
> +   */
> +  void NoteStreamAffected(MediaStream* aStream);
> +  void AddStream(MediaStream* aStream)

Given how wonderfully thorough the comments here are, AddStream() and RemoveStream() stand out as un-commented (especially AddStream).

@@ +357,5 @@
> +   * Readonly after initialization on the main thread.
> +   */
> +  nsCOMPtr<nsIThread> mThread;
> +
> +  // This state is managed on the graph thread only, unless

Which state is "this state"?  mStreams?  Or all the stuff following this?

@@ +359,5 @@
> +  nsCOMPtr<nsIThread> mThread;
> +
> +  // This state is managed on the graph thread only, unless
> +  // mLifecycleState > LIFECYCLE_RUNNING in which case the graph thread
> +  // is not running and this can all be used from the main thread.

So, if this can be used when the state is > LIFECYCLE_RUNNING, that should be commented in the enum ("values below here allow you to assume that you can safely access <blah> state from the main thread"), or some such.  How does this interact with mMonitor below?

::: content/media/MediaStreamGraph.h
@@ +138,5 @@
> + * Any stream can have its audio and video playing when requested. The media
> + * stream graph plays audio by constructing audio output streams as necessary.
> + * Video is played by setting video frames into an VideoFrameContainer at the right
> + * time. To ensure video plays in sync with audio, make sure that the same
> + * stream is playing both the audio and video.

One thing we need to be prepared to handle: synchronizing the output (from the media graph) of the audio and video does not mean they'll be synchronized in playback.

In particular, if audio has a downstream mixer + audio driver + kernel driver + hardware delay, it might be consistently much more than video's delay (output code plus 0-1 frames.  While it's better to have video lead audio, if it's by too much people notice lack of frame sync.  Ideally you want to have audio be roughly +10 to -50ms range (roughly).  I'd prefer +0 to -20ms.

I suspect we'll need some A/V offset to compensate, plus perhaps a value fed back to us by the audio driver about known additional latency.

@@ +332,5 @@
> +  // The first active track is the track that started earliest; if multiple
> +  // tracks start at the same time, the one with the lowest ID.
> +  TrackID mFirstActiveTracks[MediaSegment::TYPE_COUNT];
> +
> +  // Temporary data used by MediaStreamGraph on the grpah thread

typo
Attachment #610004 - Flags: review?(rjesup) → review+
Comment on attachment 610003 [details] [diff] [review]
Part 2: Create MediaSegment, AudioSegment and VideoSegment classes to manage intervals of media data

Review of attachment 610003 [details] [diff] [review]:
-----------------------------------------------------------------

r+ modulo correcting the 50ms constant

::: content/media/MediaSegment.h
@@ +20,5 @@
> +
> +inline MediaTime MillisecondsToMediaTime(PRInt32 aMS)
> +{
> +  NS_ASSERTION(aMS <= (MEDIA_TIME_MAX >> MEDIA_TIME_FRAC_BITS)*1000,
> +               "Out of range");

About 278 years or so ;-)

@@ +21,5 @@
> +inline MediaTime MillisecondsToMediaTime(PRInt32 aMS)
> +{
> +  NS_ASSERTION(aMS <= (MEDIA_TIME_MAX >> MEDIA_TIME_FRAC_BITS)*1000,
> +               "Out of range");
> +  return (MediaTime(aMS) << MEDIA_TIME_FRAC_BITS)/1000;

Not that it matters in the slightest, but loses range due to 64-bit intermediary - max 278 years instead of something huge.

::: content/media/SharedBuffer.h
@@ +20,5 @@
> +public:
> +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SharedBuffer)
> +  ~SharedBuffer() {}
> +
> +  void* Data() { return this + 1; }

I see the ASSERTION below that guarantees that this+1 is at least 4-byte aligned; so long as this+1 == ((char *) this) + sizeof(*this), this (and the allocation) should be fine.
There are some more complex ways to do this that are probably a bit more canonically correct - I think this is good for any platform we care about.
At most comment along the lings of "assumes sizeof(SharedBuffer) includes padding for use in arrays".

My whole comment here is probably overly pedantic. :-)

::: content/media/StreamBuffer.cpp
@@ +37,5 @@
> +void
> +StreamBuffer::ForgetUpTo(StreamTime aTime)
> +{
> +  // Round to nearest 50ms so we don't spend too much time pruning segments.
> +  StreamTime forget = (aTime/50000)*50000;

50ms != 50000, and also it should use MEDIA_TIME_FRAC_BITS -- ((1<<MEDIA_TIME_FRAC_BITS)-1)/20
Nearest would be ((aTime+25000)/50000)*50000 (appropriate modification to replace 50000/etc assumed)
Not that it matters, but binary masking would be faster - forget &= ^(((1<<MEDIA_TIME_FRAC_BITS)-1)>>4) or some such (mods for nearest would be to add 1/32th sec to forget first)
Attachment #610003 - Flags: review?(rjesup) → review+
Attachment #610005 - Flags: review?(rjesup) → review+
Comment on attachment 610006 [details] [diff] [review]
Part 5: Create SourceMediaStream, a MediaStream with an API allowing data to be injected into it by some source

Review of attachment 610006 [details] [diff] [review]:
-----------------------------------------------------------------

r- on RemoveElementAt; if I'm wrong and it's safe, comment it please and I'll r+ (feel free to convert)

::: content/media/MediaStreamGraph.cpp
@@ +618,5 @@
> +        dest->AppendFrom(data->mData);
> +      }
> +      if (data->mCommands & SourceMediaStream::TRACK_END) {
> +        aStream->mBuffer.FindTrack(data->mID)->SetEnded();
> +        aStream->mUpdateTracks.RemoveElementAt(i);

Is this safe given we're iterating through mUpdateTracks[i]?  Wouldn't that cause us to skip a track following the current one?
Attachment #610006 - Flags: review?(rjesup) → review-
(In reply to Randell Jesup [:jesup] from comment #63)
> ::: content/media/MediaStreamGraph.cpp
> @@ +618,5 @@
> > +        dest->AppendFrom(data->mData);
> > +      }
> > +      if (data->mCommands & SourceMediaStream::TRACK_END) {
> > +        aStream->mBuffer.FindTrack(data->mID)->SetEnded();
> > +        aStream->mUpdateTracks.RemoveElementAt(i);
> 
> Is this safe given we're iterating through mUpdateTracks[i]?  Wouldn't that
> cause us to skip a track following the current one?

No, because we're deliberately iterating in reverse order to avoid this issue.
Comment on attachment 610008 [details] [diff] [review]
Part 7: Let the "src" attribute of HTML media elements accept a MediaStream DOM object, to make the media element play back the contents of the given stream

Review of attachment 610008 [details] [diff] [review]:
-----------------------------------------------------------------

So long as re-setting element.src from a URL to a stream and vice-versa works properly, r+ with nits addressed.

::: content/html/content/public/nsHTMLMediaElement.h
@@ +602,5 @@
>     * Process any media fragment entries in the URI
>     */
>    void ProcessMediaFragmentURI();
>  
>    // The current decoder. Load() has been called on this decoder.

I think this comment may now be slightly misleading (we call Load() when setting src to an object aka MediaStream)

::: content/html/content/src/nsHTMLMediaElement.cpp
@@ +471,5 @@
> +    stream = do_QueryInterface(nsContentUtils::XPConnect()->
> +        GetNativeOfWrapper(aCtx, JSVAL_TO_OBJECT(aParams)));
> +    if (stream) {
> +      mSrcAttrStream = static_cast<nsDOMMediaStream*>(stream.get());
> +      UnsetAttr(kNameSpaceID_None, nsGkAtoms::src, true);

Does this (or the Load()) clear out mDecoder when mStream is set?  How do we handle an element with changing the src element in general?  (This may all be fine, but I haven't wandered through all the code paths.)

@@ +2831,5 @@
>  {
> +  if (mDecoder) {
> +    return mDecoder->GetCurrentPrincipal();
> +  }
> +  if (mStream) {

If mDecoder and mStream are truly non-overlapping, that should be "else if" (and is elsewhere)

::: dom/interfaces/html/nsIDOMHTMLMediaElement.idl
@@ +56,5 @@
>  #undef GetCurrentTime
>  #endif
>  %}
>  
> +[scriptable, uuid(3e672e79-a0ea-45ef-87de-828402f1f6d7)]

Same comment as cpearce - update-uuids.sh is handy :-)
Attachment #610008 - Flags: review?(rjesup) → review+
Comment on attachment 611714 [details] [diff] [review]
Part 0: Refactor nsMediaCache's handling of principals into a helper method in nsContentUtils

r=me
Attachment #611714 - Flags: review?(bzbarsky) → review+
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #64)
> > Is this safe given we're iterating through mUpdateTracks[i]?  Wouldn't that
> > cause us to skip a track following the current one?
> 
> No, because we're deliberately iterating in reverse order to avoid this
> issue.

Cool, sorry I missed it was iterating backwards.  Might be worth a couple-word comment ("safe because we're iterating backwards")
Attachment #610010 - Flags: review?(rjesup) → review+
Comment on attachment 611715 [details] [diff] [review]
Part 8: Add mozCaptureStream()/mozCaptureStreamUntilEnded() APIs to HTML media elements, returning a MediaStream representing the contents of the media element

Review of attachment 611715 [details] [diff] [review]:
-----------------------------------------------------------------

Just nits

::: content/html/content/src/nsHTMLMediaElement.cpp
@@ +2373,5 @@
>    NS_ASSERTION(!mStream && !mStreamListener, "Should have been ended already");
>  
>    mStream = mSrcAttrStream;
> +  // XXX if we ever support capturing the output of this media element,
> +  // need to add a CombineWithPrincipal call here.

Isn't this patch to capture the output of media elements?  That makes this XXX comment confusing.

::: content/media/nsAudioAvailableEventManager.cpp
@@ +160,5 @@
>  
>      // Fill the signalBuffer.
>      PRUint32 i;
>      float *signalBuffer = mSignalBuffer.get() + mSignalBufferPosition;
> +    if (audioData) {

Was there any chance this (null audioData) would have happened before this patchset lands?  If so, is there any reason to file a bug on that issue?  It would be a straightforward null-ptr crash.

::: content/media/nsBuiltinDecoderReader.h
@@ +441,5 @@
>    // Reads and decodes one video frame. Packets with a timestamp less
>    // than aTimeThreshold will be decoded (unless they're not keyframes
>    // and aKeyframeSkip is true), but will not be added to the queue.
>    virtual bool DecodeVideoFrame(bool &aKeyframeSkip,
> +                                PRInt64 aTimeThreshold) = 0;

Whitespace change?

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +554,5 @@
> +  NS_ASSERTION(aOutput->GetChannels() == aAudio->mChannels,
> +               "Wrong number of channels");
> +
> +  // This logic has to mimic AudioLoop closely to make sure we're write
> +  // the exact same silences

we're -> we

@@ +682,5 @@
> +        }
> +        if (stream->mNextVideoTime + mStartTime < v->mEndTime) {
> +          LOG(PR_LOG_DEBUG, ("%p Decoder writing video frame %lld to MediaStream",
> +                             mDecoder.get(), v->mTime));
> +          WriteVideoToMediaStream(v->mImage,

Not really sure of the logic here without getting a bunch  deeper into the code - is there an instance where we'd WriteVideoToMediaStream() in the previous if() (which modifies mNextVideoTime) and then do so again in this if?  If so, is this correct, and what does it mean?
Attachment #611715 - Flags: review?(rjesup) → review+
Attachment #610011 - Flags: review?(rjesup) → review+
Comment on attachment 610006 [details] [diff] [review]
Part 5: Create SourceMediaStream, a MediaStream with an API allowing data to be injected into it by some source

per comments, r+
Attachment #610006 - Flags: review- → review+
Roc, What holds landing this?
Me refreshing patches. I am about to do that.
(In reply to Randell Jesup [:jesup] from comment #62)
> I see the ASSERTION below that guarantees that this+1 is at least 4-byte
> aligned; so long as this+1 == ((char *) this) + sizeof(*this), this (and the
> allocation) should be fine.
> There are some more complex ways to do this that are probably a bit more
> canonically correct - I think this is good for any platform we care about.
> At most comment along the lings of "assumes sizeof(SharedBuffer) includes
> padding for use in arrays".
> 
> My whole comment here is probably overly pedantic. :-)

I added a comment.

> 50ms != 50000, and also it should use MEDIA_TIME_FRAC_BITS --
> ((1<<MEDIA_TIME_FRAC_BITS)-1)/20
> Nearest would be ((aTime+25000)/50000)*50000 (appropriate modification to
> replace 50000/etc assumed)
> Not that it matters, but binary masking would be faster - forget &=
> ^(((1<<MEDIA_TIME_FRAC_BITS)-1)>>4) or some such (mods for nearest would be
> to add 1/32th sec to forget first)

I just used MillisecondsToMediaTime(50) and did the division. Choosing a power-of-2 to mask with was unnecessarily obscure.
(In reply to Randell Jesup [:jesup] from comment #61)
> @@ +357,5 @@
> > +   * Readonly after initialization on the main thread.
> > +   */
> > +  nsCOMPtr<nsIThread> mThread;
> > +
> > +  // This state is managed on the graph thread only, unless
> 
> Which state is "this state"?  mStreams?  Or all the stuff following this?

Everything following (until mMonitor). Improved the comment.

> @@ +359,5 @@
> > +  nsCOMPtr<nsIThread> mThread;
> > +
> > +  // This state is managed on the graph thread only, unless
> > +  // mLifecycleState > LIFECYCLE_RUNNING in which case the graph thread
> > +  // is not running and this can all be used from the main thread.
> 
> So, if this can be used when the state is > LIFECYCLE_RUNNING, that should
> be commented in the enum ("values below here allow you to assume that you
> can safely access <blah> state from the main thread"), or some such.

Done.

> How does this interact with mMonitor below?

It doesn't. Data guarded by mMonitor must always be accessed with mMonitor held. Added a comment.

> One thing we need to be prepared to handle: synchronizing the output (from
> the media graph) of the audio and video does not mean they'll be
> synchronized in playback.
> 
> In particular, if audio has a downstream mixer + audio driver + kernel
> driver + hardware delay, it might be consistently much more than video's
> delay (output code plus 0-1 frames.  While it's better to have video lead
> audio, if it's by too much people notice lack of frame sync.  Ideally you
> want to have audio be roughly +10 to -50ms range (roughly).  I'd prefer +0
> to -20ms.
> 
> I suspect we'll need some A/V offset to compensate, plus perhaps a value fed
> back to us by the audio driver about known additional latency.

Yes. MediaStreamGraphImpl::GetAudioPosition calls the nsAudioStream API for that. I don't currently use it for video sync because it wasn't reliable enough in my testing. There's some work to do here, but the situation isn't much different to our current A/V sync for media elements, which generally seems to be acceptable.
(In reply to Randell Jesup [:jesup] from comment #65)
> ::: content/html/content/src/nsHTMLMediaElement.cpp
> @@ +471,5 @@
> > +    stream = do_QueryInterface(nsContentUtils::XPConnect()->
> > +        GetNativeOfWrapper(aCtx, JSVAL_TO_OBJECT(aParams)));
> > +    if (stream) {
> > +      mSrcAttrStream = static_cast<nsDOMMediaStream*>(stream.get());
> > +      UnsetAttr(kNameSpaceID_None, nsGkAtoms::src, true);
> 
> Does this (or the Load()) clear out mDecoder when mStream is set?  How do we
> handle an element with changing the src element in general?  (This may all
> be fine, but I haven't wandered through all the code paths.)

Load() calls AbortExistingLoads() which clears mDecoder and mStream if necessary. Well, unless 'src' is set during a synchronous event triggered by Load(), in which case Load() immediately bails out. Basically we handle 'src' changes the same way whether it's being set to a stream or a URI.

> @@ +2831,5 @@
> >  {
> > +  if (mDecoder) {
> > +    return mDecoder->GetCurrentPrincipal();
> > +  }
> > +  if (mStream) {
> 
> If mDecoder and mStream are truly non-overlapping, that should be "else if"
> (and is elsewhere)

Style is to not put "else" clauses after returns.
(In reply to Randell Jesup [:jesup] from comment #68)
> ::: content/html/content/src/nsHTMLMediaElement.cpp
> @@ +2373,5 @@
> >    NS_ASSERTION(!mStream && !mStreamListener, "Should have been ended already");
> >  
> >    mStream = mSrcAttrStream;
> > +  // XXX if we ever support capturing the output of this media element,
> > +  // need to add a CombineWithPrincipal call here.
> 
> Isn't this patch to capture the output of media elements?  That makes this
> XXX comment confusing.

Clarified.

> ::: content/media/nsAudioAvailableEventManager.cpp
> @@ +160,5 @@
> >  
> >      // Fill the signalBuffer.
> >      PRUint32 i;
> >      float *signalBuffer = mSignalBuffer.get() + mSignalBufferPosition;
> > +    if (audioData) {
> 
> Was there any chance this (null audioData) would have happened before this
> patchset lands?

No.

> @@ +682,5 @@
> > +        }
> > +        if (stream->mNextVideoTime + mStartTime < v->mEndTime) {
> > +          LOG(PR_LOG_DEBUG, ("%p Decoder writing video frame %lld to MediaStream",
> > +                             mDecoder.get(), v->mTime));
> > +          WriteVideoToMediaStream(v->mImage,
> 
> Not really sure of the logic here without getting a bunch  deeper into the
> code - is there an instance where we'd WriteVideoToMediaStream() in the
> previous if() (which modifies mNextVideoTime) and then do so again in this
> if?  If so, is this correct, and what does it mean?

The first if() block handles the case where the previous video frame is supposed to have ended before the current video frame starts. That doesn't make sense, so we write another copy of the previous video frame (not really a copy of course!) with enough duration to catch up the start time of the current video frame. Then, we write the current video frame with a duration up to its stated duration.
This is a prep patch for part 12.
Attachment #618994 - Flags: review?(cpearce)
As I noted above, I was a bit worried about stability of the media playback capture code. I did some testing and found that seeking doesn't cause significant problems but loading a new media resource does. So this patch just finishes and disconnects all output media streams when the current decoder shuts down.

This doesn't implement the correct behavior in such situations --- that depends on whether we did captureStreamUntilEnded or just captureStream --- but I have a plan for fixing that soon. I just want to get this in so we can land these patches without introducing known crashability.

(The plan to fix it soon is:
-- Land basic ProcessedMediaStream (internal use only)
-- When output is being captured to a stream, make the decoder output to an internal per-decoder MediaStream, starting a new output stream at each seek
-- Attach the decoder output stream as the input to each captureStream/captureStreamUntilEnded
-- Play the video of the internal streams through the media element, so the output stays in sync with the captured streams)
Attachment #619000 - Flags: review?(cpearce)
Attachment #618994 - Flags: review?(cpearce) → review+
Comment on attachment 619000 [details] [diff] [review]
Part 12: finish output media streams when the decoder is destroyed

Review of attachment 619000 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me, but I'm not sure about the MediaStreamGraphImpl change; I don't feel qualified to review that.

::: content/media/test/test_streams_element_capture_reset.html
@@ +56,5 @@
> +
> +var audio = new Audio();
> +for (var i = 0; i < gSmallTests.length; ++i) {
> +  if (audio.canPlayType(gSmallTests[i].type)) {
> +    startTest(gSmallTests[i]);

You could have used getPlayableAudio() from manifest.js to achieve this.
Attachment #619000 - Flags: review?(cpearce) → review+
(In reply to Chris Pearce (:cpearce) from comment #78)
> You could have used getPlayableAudio() from manifest.js to achieve this.

Done.
Attachment #619000 - Flags: review?(rjesup) → review+
Depends on: 750769
Depends on: 752087
Pushed followup to fix missing comment-terminator in the emacs modeline for StreamBuffer.cpp:
 https://hg.mozilla.org/integration/mozilla-inbound/rev/2f62ae9d981c
(Now the header for StreamBuffer.cpp matches StreamBuffer.h exactly.)
Depends on: 752784
Depends on: 752826
I noticed instances of comment 82 (gcc warning: "/*" within comment [-Wcomment]) in 2 other files added here, so I just fixed those in another followup:
 https://hg.mozilla.org/integration/mozilla-inbound/rev/a9f44b9bfa13
(I also verified that those were the last 2 instances of this warning in m-c.)
(Sorry for bugspam)
The static constructor was added in nsDOMMediaStream. The only global stuff there is DOMCI_DATA and CC boilerplate:
http://mxr.mozilla.org/mozilla-central/source/content/media/nsDOMMediaStream.cpp

So I don't think there's anything I can do about it.
Cycle collection boilerplate involves a static constructor for the cycle collection helper class, fwiw.

So yes, not much you can do about it short of restructuring how the cycle collector works.
So the static constructor should be fixed by bug 616262, if I understand correctly. Thanks!
Depends on: 758583
Depends on: 752796
(Note to the documentation team: this feature's documentation may not be targeted at web developers.)
Depends on: 790854
No longer depends on: 790854
Comment on attachment 610004 [details] [diff] [review]
Part 3: Create MediaStream and MediaGraphManager for internal management of real-time media processing

>+    *mStreams.AppendElement() = already_AddRefed<MediaStream>(aStream);
[Presumably you temporarily forgot that we have a templated function dont_AddRef() which saves you from typing <MediaStream> again.]
You need to log in before you can comment on or make changes to this bug.