Last Comment Bug 664918 - Infrastructure for media stream graph processing
: Infrastructure for media stream graph processing
Status: RESOLVED FIXED
: dev-doc-needed
Product: Core
Classification: Components
Component: Audio/Video (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: mozilla15
Assigned To: Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
:
Mentors:
Depends on: 708109 708116 726889 726891 726894 727694 727695 750769 752087 752784 752796 752826 758583
Blocks: 507672 654787 709459 gecko-games webrtc 691234 731429 739566 750656
  Show dependency treegraph
 
Reported: 2011-06-16 21:02 PDT by Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
Modified: 2014-02-27 09:14 PST (History)
35 users (show)
ryanvm: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Part 1: Create TimeVarying<T> class to represent values that change over time (7.24 KB, patch)
2012-02-15 21:41 PST, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
no flags Details | Diff | Review
Part 1: Create TimeVarying<T> class to represent values that change over time (7.17 KB, patch)
2012-02-15 21:44 PST, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
no flags Details | Diff | Review
Part 1: Create TimeVarying<T> class to represent values that change over time (7.36 KB, patch)
2012-02-22 21:22 PST, Randell Jesup [:jesup]
rjesup: feedback+
Details | Diff | Review
Part 2: Basic StreamBuffer structures for holding audio and video data, including audio resampling support. Basic stream GraphManager infrastructure for routing media data through Streams. (121.12 KB, patch)
2012-02-23 07:05 PST, Randell Jesup [:jesup]
rjesup: feedback-
Details | Diff | Review
Part 3: Support MediaStream DOM object (7.95 KB, patch)
2012-02-23 07:08 PST, Randell Jesup [:jesup]
rjesup: feedback+
Details | Diff | Review
Part 4: Add InputStreams --- streams that can have data written into them by some media data producer. (14.26 KB, patch)
2012-02-23 07:09 PST, Randell Jesup [:jesup]
rjesup: feedback+
Details | Diff | Review
Part 5: Let the src attribute of HTML media elements accept a MediaStream object, so that the media element plays back the contents of the given stream (37.07 KB, patch)
2012-02-23 07:10 PST, Randell Jesup [:jesup]
rjesup: feedback+
Details | Diff | Review
Part 6: Add captureStream()/captureStreamUntilEnded() APIs to HTML media elements, returning a stream representing the contents of the media element. (74.50 KB, patch)
2012-02-23 07:11 PST, Randell Jesup [:jesup]
rjesup: feedback+
Details | Diff | Review
Part 7: MediaStream/etc tests (2.81 KB, patch)
2012-02-23 07:12 PST, Randell Jesup [:jesup]
rjesup: feedback+
Details | Diff | Review
Part 8: Tentative support for StreamListener::NotifyQueued (4.25 KB, patch)
2012-02-23 07:12 PST, Randell Jesup [:jesup]
rjesup: feedback+
Details | Diff | Review
fix to TimeVarying.h for warnings-as-errors (1.09 KB, patch)
2012-03-13 20:24 PDT, Randell Jesup [:jesup]
tterribe: feedback+
Details | Diff | Review
Part 0: Refactor nsMediaCache's handling of principals into a helper method in nsContentUtils (8.84 KB, patch)
2012-03-27 21:03 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
bzbarsky: review+
Details | Diff | Review
Part 1: Create TimeVarying<T> class to represent values that change over time (7.88 KB, patch)
2012-03-27 21:04 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
rjesup: review+
Details | Diff | Review
Part 2: Create MediaSegment, AudioSegment and VideoSegment classes to manage intervals of media data (30.25 KB, patch)
2012-03-27 21:05 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
rjesup: review+
Details | Diff | Review
Part 3: Create MediaStream and MediaGraphManager for internal management of real-time media processing (76.15 KB, patch)
2012-03-27 21:06 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
rjesup: review+
Details | Diff | Review
Part 4: Create nsDOMMediaStream, a DOM object wrapping an underlying MediaStream (8.59 KB, patch)
2012-03-27 21:07 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
rjesup: review+
bugs: review+
Details | Diff | Review
Part 5: Create SourceMediaStream, a MediaStream with an API allowing data to be injected into it by some source (15.96 KB, patch)
2012-03-27 21:08 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
rjesup: review+
Details | Diff | Review
Part 6: ImageContainer::GetCurrentAsSurface shouldn't crash when mActiveImage is null (1.06 KB, patch)
2012-03-27 21:08 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
bas: review+
Details | Diff | Review
Part 7: Let the "src" attribute of HTML media elements accept a MediaStream DOM object, to make the media element play back the contents of the given stream (36.40 KB, patch)
2012-03-27 21:09 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
cpearce: review+
rjesup: review+
Details | Diff | Review
Part 8: Add mozCaptureStream()/mozCaptureStreamUntilEnded() APIs to HTML media elements, returning a MediaStream representing the contents of the media element (90.85 KB, patch)
2012-03-27 21:11 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
cpearce: review-
Details | Diff | Review
Part 9: Tentative support for MediaStreamListener::NotifyQueuedTrackChanges (3.49 KB, patch)
2012-03-27 21:12 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
rjesup: review+
Details | Diff | Review
Part 10: Add test_streams_element_capture.html to test capturing a media element into a stream and playing that stream through another media element (3.15 KB, patch)
2012-03-27 21:13 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
rjesup: review+
Details | Diff | Review
Part 0: Refactor nsMediaCache's handling of principals into a helper method in nsContentUtils (9.93 KB, patch)
2012-04-02 21:43 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
bzbarsky: review+
Details | Diff | Review
Part 8: Add mozCaptureStream()/mozCaptureStreamUntilEnded() APIs to HTML media elements, returning a MediaStream representing the contents of the media element (83.13 KB, patch)
2012-04-02 21:44 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
cpearce: review+
rjesup: review+
Details | Diff | Review
Part 11: refactor code that shuts down mDecoder (5.24 KB, patch)
2012-04-27 05:32 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
cpearce: review+
Details | Diff | Review
Part 12: finish output media streams when the decoder is destroyed (8.81 KB, patch)
2012-04-27 05:41 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
cpearce: review+
rjesup: review+
Details | Diff | Review

Description Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-06-16 21:02:50 PDT
See http://weblogs.mozillazine.org/roc/archives/2011/06/media_processin.html.
Comment 1 Dietrich Ayala (:dietrich) 2011-11-16 18:10:43 PST
Any update on how this is going?

Being the only (contemporary) browser without basic looping support is no fun :)
Comment 2 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-02-15 21:41:55 PST
Created attachment 597684 [details] [diff] [review]
Part 1: Create TimeVarying<T> class to represent values that change over time
Comment 3 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-02-15 21:44:33 PST
Created attachment 597686 [details] [diff] [review]
Part 1: Create TimeVarying<T> class to represent values that change over time
Comment 4 Randell Jesup [:jesup] 2012-02-22 21:22:04 PST
Created attachment 599867 [details] [diff] [review]
Part 1: Create TimeVarying<T> class to represent values that change over time
Comment 5 Randell Jesup [:jesup] 2012-02-23 07:05:48 PST
Created attachment 599997 [details] [diff] [review]
Part 2: Basic StreamBuffer structures for holding audio and video data, including audio resampling support. Basic stream GraphManager infrastructure for routing media data through Streams.
Comment 6 Randell Jesup [:jesup] 2012-02-23 07:08:11 PST
Created attachment 599998 [details] [diff] [review]
Part 3: Support MediaStream DOM object
Comment 7 Randell Jesup [:jesup] 2012-02-23 07:09:07 PST
Created attachment 599999 [details] [diff] [review]
Part 4: Add InputStreams --- streams that can have data written into them by some media data producer.
Comment 8 Randell Jesup [:jesup] 2012-02-23 07:10:07 PST
Created attachment 600002 [details] [diff] [review]
Part 5: Let the src attribute of HTML media elements accept a MediaStream object, so that the media element plays back the contents of the given stream
Comment 9 Randell Jesup [:jesup] 2012-02-23 07:11:10 PST
Created attachment 600003 [details] [diff] [review]
Part 6: Add captureStream()/captureStreamUntilEnded() APIs to HTML media elements, returning a stream representing the contents of the media element.
Comment 10 Randell Jesup [:jesup] 2012-02-23 07:12:06 PST
Created attachment 600004 [details] [diff] [review]
Part 7: MediaStream/etc tests
Comment 11 Randell Jesup [:jesup] 2012-02-23 07:12:53 PST
Created attachment 600005 [details] [diff] [review]
Part 8: Tentative support for StreamListener::NotifyQueued
Comment 12 Randell Jesup [:jesup] 2012-02-23 07:16:21 PST
Ok, all those patches apply cleanly to alder as of today (and should apply cleanly to m-c as of yesterday).  They're taken from roc's patch-queue he shared for this work, and won't be considered for landing on m-c until he gets back.  I will probably land them on alder shortly, however, so that everyone doesn't have to constantly be pushing and popping them.
Comment 13 Randell Jesup [:jesup] 2012-02-24 11:26:16 PST
Comment on attachment 599867 [details] [diff] [review]
Part 1: Create TimeVarying<T> class to represent values that change over time

I realize you currently have semantics defined in Processed MediaStreams where timed setters null out all changes after the time specified, but I can see considerable utility for not nulling out all following changes (especially if this very generic class is used to address other problems).

I'd suggest SetAt() have an optional "clear all following sets" boolean; it can default to true if you like.

This would imply possible a Merge method in addition to Append.

Given it's potentially widely usable perhaps is should live lower down in our hierarchy of classes.

I see none of these comments/request as mandatory; just suggestions.

Other than that, feedback+
Comment 14 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-02-28 04:39:42 PST
(In reply to Randell Jesup [:jesup] from comment #13)
> I realize you currently have semantics defined in Processed MediaStreams
> where timed setters null out all changes after the time specified, but I can
> see considerable utility for not nulling out all following changes
> (especially if this very generic class is used to address other problems).
> 
> I'd suggest SetAt() have an optional "clear all following sets" boolean; it
> can default to true if you like.

Boolean parameters are bad. I should just rename SetAt to SetAfter and if we need another method that doesn't change the entire future, we'll add one.
Comment 15 Randell Jesup [:jesup] 2012-02-28 09:42:31 PST
Agreed: SetAfter() (though we should be careful that people don't assume the value changes only *after* the specified time... it's wordy but you could use SetAtAndAfter().  But I agree - separate methods is good, and reserve SetAt() for the version that doesn't clear out following changes.
Comment 16 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-02-28 10:06:39 PST
I made it SetAtAndAfter.
Comment 17 Randell Jesup [:jesup] 2012-02-28 22:45:34 PST
Comment on attachment 599997 [details] [diff] [review]
Part 2: Basic StreamBuffer structures for holding audio and video data, including audio resampling support. Basic stream GraphManager infrastructure for routing media data through Streams.

This section has me concerned due to latency:

+/**
+ * This is how much data we should try to have precomputed at any given time.
+ * If we've buffered K milliseconds, then we might write AUDIO_WRITTEN_TARGET_MS out,
+ * sleep for that long, wake up, do control processing (taking GRAPH_MANAGER_CONTROL_DELAY_MS),
+ * queue AUDIO_WRITTEN_TARGET_MS more data without blocking, and
+ * trigger computation of the next needed data.
+ */
+static const int BUFFER_TARGET_MS = AUDIO_WRITTEN_TARGET_MS*2 + GRAPH_MANAGER_CONTROL_DELAY_MS;

With 30 and 10ms for those constants, that means a target for buffering in MediaStreams of 70ms.  That's a LOT - that's 1/2 our mouth-to-ear over-the-network budget (for good quality conversation) right there (150ms is the knee in the quality curve, where people start really noticing delay and start to talk over one another).

What are the prospects for running shorter delays?  How risky is it from an underflow perspective?  Can we make the delay adaptive, where it runs longer delays if we get "too many" underflows?

+ * Allow video frames to be late by this much. Helps avoid unnecessary wakeups
+ * by giving us some scheduling slack.
+ */
+static const int ALLOW_VIDEO_FRAME_DELAY_MS = 5;

It's used like this:

+    // Don't wake up to play the next frame before ALLOW_VIDEO_FRAME_DELAY_MS from now
+    EnsureWakeUpAt(NS_MAX(mCurrentTime + ALLOW_VIDEO_FRAME_DELAY_MS*1000,
+                          frameEndWithBlocking));

The problem is that frame display times want to be as consistent as possible, and it's better to drop a frame and keep sync than to play a frame 'late'.  I realize there are all sorts of constraints on video frame display, but the closer to consistent and the later the decision is made as to which frame of video is displayed the better.

Also, humans are much more adept at noticing audio leading video (which is 'unnatural'), and not so much video leading audio (since that's similar to hearing a speech or sound from a distance, a very common ('natural') situation).

If there's any way to push the "which frame" decision closer to the actual display point, the better.  In past uses we would pre-decode up to 3 or 4 frames, and at the last ms in the compositor we'd select the 'best' frame from those available. The compositor ran in a PLL against video refresh, and would schedule a wakeup to composite the output just enough ms ahead of vblank/frame-refresh to avoid missing a frame (adaptively).  Not sure we can do anything approaching this (perhaps not), and if we did it would involve the GFX team I imagine (deeply).


In the current algorithm, if we're running late enough that we worried about not having time to wait, we should consider dropping the current frame and substituting the next frame for it, a smidge early.

Still reviewing...
Comment 18 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-02-29 16:38:01 PST
(In reply to Randell Jesup [:jesup] from comment #17)
> What are the prospects for running shorter delays?  How risky is it from an
> underflow perspective?  Can we make the delay adaptive, where it runs longer
> delays if we get "too many" underflows?

Yes, we need to make the delay adaptive.

> If there's any way to push the "which frame" decision closer to the actual
> display point, the better.  In past uses we would pre-decode up to 3 or 4
> frames, and at the last ms in the compositor we'd select the 'best' frame
> from those available. The compositor ran in a PLL against video refresh, and
> would schedule a wakeup to composite the output just enough ms ahead of
> vblank/frame-refresh to avoid missing a frame (adaptively).  Not sure we can
> do anything approaching this (perhaps not), and if we did it would involve
> the GFX team I imagine (deeply).

We haven't seen a need to do that so far in the A/V playback engine. But you have a good point about preferring video leading audio, one that I wasn't aware of.

We need to make sure we don't wake up too often just to update a video frame, but I can probably fix it another way.
Comment 19 Randell Jesup [:jesup] 2012-02-29 22:20:38 PST
Comment on attachment 599997 [details] [diff] [review]
Part 2: Basic StreamBuffer structures for holding audio and video data, including audio resampling support. Basic stream GraphManager infrastructure for routing media data through Streams.

+GraphManagerImpl::ChooseActionTime()
+{
+  // Our actions should take effect as soon as possible, but we need to make sure
+  // that any playing audio stream which is affected by our actions will be
+  // able to recompute its data without under-running. Right now we have to
+  // estimate how long it will take us to recompute the data --- the estimate
+  // is DEFAULT_BUFFERING_MS.
+  mLastActionTime = GetEarliestActionTime();

This comment seems incorrect given the code (and there is no DEFAULT_BUFFERING_MS).

+  // End time of the audio that has been written to the hardware. The written
+  // audio can be a mix of silence (while blocking) and actual samples.
+  PRInt64 mAudioWrittenTime;

Just a question: I remember kinetik talking about Pulse allowing changing audio data after it's been written to Pulse, but hasn't gone to the hardware.  Is that worth considering, or does that just seriously mess up everything (probably it does, and especially echo cancellation if it's not being done inside pulse).  Or is that the un-implemented PruneDataStream()?

+GraphManagerImpl::PlayAudio(Stream* aStream)
...
+  float volume = 0.0;
+  for (PRUint32 i = 0; i < aStream->mAudioOutputs.Length(); ++i) {
+    volume += aStream->mAudioOutputs[i].mVolume;
+  }

This seems...odd; perhaps I don't understand the usage.  Multiplicative would make some sense (since 1.0f is the default), or an average perhaps, but additive confused me.

+ * A stream of synchronized audio and video data. All (not blocked) streams
+ * progress at the same rate --- "real time". Streams cannot seek. The only
+ * operation readers can perform on a stream is to read the next data.

As discussed by email/IRC, there are issues here dealing with real-time or otherwise uncontrolled sources (and sinks); for example taking input from a USB mic or remote audio stream and feeding to motherboard-connected speakers; even if theoretically the same sampling rate, they will drift and so some type of compensation (resampling, etc) will eventually be needed.

+/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*-
+/* This Source Code Form is subject to the terms of the Mozilla Public
+ * License, v. 2.0. If a copy of the MPL was not distributed with this file,
+ * You can obtain one at http://mozilla.org/MPL/2.0/. */

That appears all over; it causes compiler warnings due to /* within a comment.  Please close out the emacs mode line comment.

+static PRInt16
+FlipByteOrderIfBigEndian(PRInt16 aValue)
+{
+  short s = aValue;
+#if defined(IS_BIG_ENDIAN)
+  s = ((s & 0x00ff) << 8) | ((s & 0xff00) >> 8);
+#endif
+  return s;
+}


Normally I'd look to have the function take a PRUInt16 instead of PRInt16.

Various GCC versions generate better code if this is coded as:
   PRUInt16 s = (PRUInt16) aValue;
   return (PRInt16) ((s << 8) | (s >> 8));
Shouldn't make a difference, but it does (in some GCCs).  MSVC, Mac and Linux have optimized byte-swapping macros (gcc has bswap_16() - #include <byteswap.h>).

+FloatToSample(float aValue, PRUint8* aOut)
+{
+  PRInt32 v = PRInt32(aValue*128 + 128);
+  *aOut = PRUint8(NS_MIN(v, 255));
+}

Either were concerned about samples out of the +-1.0f range (as shown above) or we're not and can use a simpler conversion.  Since we are concerned, the code here aliases on clipping in the negative direction:

2.0f  -> 256+128=384   -> NS_MIN(384,255)  -> PRUInt8(255)  = 255 (ok)
-2.0f -> -256+128=-128 -> NS_MIN(-128,255) -> PRUInt8(-128) = 128 (oops)
1.5f  -> 192+128=320   -> NS_MIN(320,255)  -> PRUInt8(255)  = 255 (ok)
-1.5f -> -192+128=-64  -> NS_MIN(-64,255)  -> PRUInt8(-64)  = 192 (oops)

A similar problem applies to the 16-bit version:

+FloatToSample(float aValue, PRInt16* aOut)
+{
+  PRInt32 v = PRInt32(aValue*32768.0f);
+  *aOut = FlipByteOrderIfBigEndian(PRInt16(NS_MIN(v, 32767)));
+}

This converts to PRInt16, not PRUInt16.  However, we still clamp only the positive side, and let the negative side alias.  The numeric results are slightly different due to sign bits, but the problem still exists.

I'm sure there are other or better references (jmspeex should know), but one is:
http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html
and
http://blog.bjornroche.com/2009/12/linearity-and-dynamic-range-in-int.html

SampleToFloat() functions are ok I believe, though that isn't the only reasonable conversion (see links); the choice of conversion should at least be explicit and documented.

+float*
+AudioFrame::GetWritableBuffer(PRInt32 aChannels)
...
+  } else {
+    memset(buf, 0, len);
+  }
+  mBuffer = new AudioBuffer(mDuration, aChannels,
+                            nsAudioStream::FORMAT_FLOAT32, buf);
+  mOffset = 0;
+  return buf;
+}

Please comment that binary 0's are 0.0 (float or double) in IEEE 754 FP.  In theory other FP architectures might break on this; in practice not likely.

+void
+StreamBufferMultiplyFloats(float* aBase, float aComputed)
+{
+  *aBase *= aComputed;
+}

Defined (though rather unclear why), and not referenced.

I'm a little unsure about exactly what AppendAudio() and AppendVideo() are supposed to be doing, so it's hard to evaluate them without delving deep into the class structure to understand why some frames are suppressed.

+InterleaveAndConvertBuffer(const SrcT* aSource, PRInt32 aChannels, PRInt32 aLength,
+                           PRInt32 aStride, DestT* aOutput, float aVolume)
+{
+  DestT* output = aOutput;
+  for (PRInt32 i = 0; i < aLength; ++i) {
+    for (PRInt32 channel = 0; channel < aChannels; ++channel) {
+      float v = SampleToFloat(aSource[channel*aStride + i])*aVolume;
+      v = NS_MAX(v, -1.0f);
+      v = NS_MIN(v, 1.0f);
+      FloatToSample(v, output);
+      ++output;
+    }
+  }
+}

This clamps the floats and then converts; avoiding the problem above, but with this clamping the code to clamp in FloatToSample is unneeded and a waste of cycles.

+InterleaveAndConvertBuffer(const PRInt16* aSource, PRInt32 aChannels, PRInt32 aLength,
+                           PRInt32 aStride, PRInt16* aOutput, float aVolume)
...
+      short s = aSource[channel*aStride + i];
+#if defined(IS_BIG_ENDIAN)
+      s = ((s & 0x00ff) << 8) | ((s & 0xff00) >> 8);
+#endif

Same comment as above
Also, why does the code often use "short" instead of PRInt16? (there are cases with similar types as well)

+  nsAutoTArray<PRUint8,50000> buf;

Could we move these arbitrary constants up to a #define, and justify the value?  Also, that's a 50K stack allocation... I understand the reason, but still, it's big.

My apologies, MEGO at around 2800 lines into the patch (~80%).  I'm sure I skimmed some parts I shouldn't have; it's late.

feedback- due to the aliasing problem at least (though it's unclear if this will affect things in practice immediately).
Comment 20 Randell Jesup [:jesup] 2012-03-01 13:00:59 PST
Comment on attachment 599997 [details] [diff] [review]
Part 2: Basic StreamBuffer structures for holding audio and video data, including audio resampling support. Basic stream GraphManager infrastructure for routing media data through Streams.

+void CombineAudioFrameSamples(float* aBuffer, PRInt32 aBufferDuration,
+                              PRInt32 aChannels,
+                              nsTArray<AudioFrame>& aAudio)
...
+      for (PRInt32 channel = 0; channel < aChannels; ++channel) {
+        if (volume == 1.0f) {
+          for (PRInt32 j = 0; j < duration; ++j) {
+            Combine(dest + j, src[j]);
+          }
+        } else {
+          for (PRInt32 j = 0; j < duration; ++j) {
+            Combine(dest + j, src[j]*volume);
+          }
+        }

Might be a smidge more efficient to move the FP test out of the outer loop:

+      if (volume == 1.0f) {
+        for (PRInt32 channel = 0; channel < aChannels; ++channel) {
+          for (PRInt32 j = 0; j < duration; ++j) {
+            Combine(dest + j, src[j]);
+          }
+        }
+      } else {
+        for (PRInt32 channel = 0; channel < aChannels; ++channel) {
+          for (PRInt32 j = 0; j < duration; ++j) {
+            Combine(dest + j, src[j]*volume);
+          }
+        }
+      }

But probably not worth it, and the compiler might well make it moot anyways.

+typedef void (* CombineAudioValues)(float* aDest, float aSrc);
+template <CombineAudioValues Combine>
+void CombineAudioFrameSamples(float* aBuffer, PRInt32 aBufferDuration,
+                              PRInt32 aChannels,
+                              nsTArray<AudioFrame>& aAudio)

This doesn't appear to be used anywhere.

+ * XXX This is the worst possible implementation. We average the channels
+ * to get mono, then we duplicate that across all channels with some super-dumb
+ * linear-interpolation resampling. This must be replaced with something that
+ * doesn't suck!

Hmmm, sounds like an action item/follow-up-bug.  :-)

+  enum {
+    // Always keep around at least 5ms of audio data (if available) to ensure
+    // that we can resample accurately as required
+    AUDIO_MIN_MS = 5
+  };

a) this isn't used anywhere
b) Is this really what we'd want to do?  seems a bit imprecise.

+  void ConsumeInput(nsTArray<AudioFrame>* aOutput)

I worry about the allocation pattern this function may produce; where the outBuf TArray allocates perhaps multiple times to hold samples, then re moz_xalloc a new buffer and memcpy all the data over.  Especially as the output size *should* be computable from the input size.

(it comes to mind that some of these "unused" functions may well be for the MediaStream Processing code, which I don't have applied)

And I finally come to the end of media-graph!
Comment 21 Randell Jesup [:jesup] 2012-03-01 13:45:03 PST
Comment on attachment 599997 [details] [diff] [review]
Part 2: Basic StreamBuffer structures for holding audio and video data, including audio resampling support. Basic stream GraphManager infrastructure for routing media data through Streams.

Oh, one more:  eshan just revised the recommendations for namespaces, and we're trying to avoid second-level namespaces below mozilla except in specific instances, so the mozilla::media namespace should probably go away.
Comment 22 Randell Jesup [:jesup] 2012-03-01 13:48:00 PST
Comment on attachment 599998 [details] [diff] [review]
Part 3: Support MediaStream DOM object

Looks good to me modulo the namespace thing
Comment 23 Randell Jesup [:jesup] 2012-03-01 14:15:25 PST
Comment on attachment 599999 [details] [diff] [review]
Part 4: Add InputStreams --- streams that can have data written into them by some media data producer.

+void
+GraphManagerImpl::ExtractPendingInput(InputStream* aStream)

Perhaps some comments, though this can be figured out from the code and the name.

+void
+InputStream::Finish()
+{
+  {
+    MutexAutoLock lock(mMutex);
+    mPendingFinished = true;
+  }
+  gManager->WakeUp(ALLOW_DELAY_AFTER_OUTPUT_MS*1000);
+}

(this define is 2ms) - these sorts of delays make me uneasy; the stream is finished or it isn't; shouldn't the wakeup of the graph manager be triggered when the InputStream is really finished?  If we're worried about too many of them triggering independent wakeups, they can be batched in some manner.  If there's a good reason for the 2ms delay, let's document that and why it's 2ms.

+ * XXX This should probably be called SourceStream.

Agreed.

+  // Returns true if the buffer currently "enough"
+  // microseconds of audio data. If it returns true, then when the buffer drops

There's a missing word in the first line I think
Comment 24 Randell Jesup [:jesup] 2012-03-01 14:28:08 PST
Comment on attachment 600002 [details] [diff] [review]
Part 5: Let the src attribute of HTML media elements accept a MediaStream object, so that the media element plays back the contents of the given stream

Good by me, though it will need real review by someone else for m-c (who knows the media elements better).
Comment 25 Randell Jesup [:jesup] 2012-03-01 15:14:26 PST
Comment on attachment 600003 [details] [diff] [review]
Part 6: Add captureStream()/captureStreamUntilEnded() APIs to HTML media elements, returning a stream representing the contents of the media element.

+  // This logic has to mimic AudioLoop closely to make sure we're write
+  // the exact same silences

Minor english

+static void WriteSilence(nsAudioStream* aStream, PRUint32 aFrames)
+{
+  PRUint32 numSamples = aFrames * aStream->GetChannels();
+  nsAutoTArray<AudioDataValue, 1000> buf;
+  buf.SetLength(numSamples);
+  memset(buf.Elements(), 0, numSamples * sizeof(AudioDataValue));
+  aStream->Write(buf.Elements(), aFrames);

Is 1000 the right value?  What sizes do we expect to see here?  What chance is there that we'll ever do an allocation due to SetLength(), and how much space are we wasting on the stack?

Would it be more efficient to do it the "old" way:
-    nsAutoArrayPtr<AudioDataValue> buf(new AudioDataValue[numSamples]);
Comment 26 Randell Jesup [:jesup] 2012-03-01 15:18:08 PST
Comment on attachment 600005 [details] [diff] [review]
Part 8: Tentative support for StreamListener::NotifyQueued

Seems good as far as it goes
Comment 27 Randell Jesup [:jesup] 2012-03-13 20:21:56 PDT
Comment on attachment 599867 [details] [diff] [review]
Part 1: Create TimeVarying<T> class to represent values that change over time

In TimeVarying.h:
+    for (PRInt32 i = mChanges.Length() - 1; i >= 0; --i) {
+      NS_ASSERTION(i == mChanges.Length() - 1, "Always considering last element of array");

The NS_ASSERTION needs a cast to avoid erroring out on builders.
Comment 28 Randell Jesup [:jesup] 2012-03-13 20:24:59 PDT
Created attachment 605642 [details] [diff] [review]
fix to TimeVarying.h for warnings-as-errors
Comment 29 Timothy B. Terriberry (:derf) 2012-03-13 21:07:13 PDT
Comment on attachment 605642 [details] [diff] [review]
fix to TimeVarying.h for warnings-as-errors

Review of attachment 605642 [details] [diff] [review]:
-----------------------------------------------------------------

f+ with one nit.

::: content/media/TimeVarying.h
@@ +39,5 @@
>     */
>    void SetAt(PRInt64 aTime, const T& aValue)
>    {
>      for (PRInt32 i = mChanges.Length() - 1; i >= 0; --i) {
> +      NS_ASSERTION(i == (PRInt32) mChanges.Length() - 1, "Always considering last element of array");

Suggest wrapping the line at 80 chars.
Comment 30 Timothy B. Terriberry (:derf) 2012-03-14 06:57:01 PDT
(In reply to Timothy B. Terriberry (:derf) from comment #29)
> > +      NS_ASSERTION(i == (PRInt32) mChanges.Length() - 1, "Always considering last element 

It occurred to me afterwards (too late) that this should also be a static_cast, instead of a C-style cast. I spend so much time working with C that I forget we're allergic to C-style casts in C++ code.
Comment 31 Randell Jesup [:jesup] 2012-03-14 10:54:29 PDT
Roc will be revising his patches based on my changes, feedback from myself and the other media guys, etc, so I'm sure he can do a better fix.  

A lot of me still thinks in C as well, as I worked mostly with C code at WG.
Comment 32 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-22 02:19:32 PDT
Everything fixed except for what's mentioned below.

(In reply to Randell Jesup [:jesup] from comment #19)
> +  // End time of the audio that has been written to the hardware. The
> written
> +  // audio can be a mix of silence (while blocking) and actual samples.
> +  PRInt64 mAudioWrittenTime;
> 
> Just a question: I remember kinetik talking about Pulse allowing changing
> audio data after it's been written to Pulse, but hasn't gone to the
> hardware.  Is that worth considering, or does that just seriously mess up
> everything (probably it does, and especially echo cancellation if it's not
> being done inside pulse).  Or is that the un-implemented PruneDataStream()?

That doesn't work on most platforms. Let's not do it.

> +GraphManagerImpl::PlayAudio(Stream* aStream)
> ...
> +  float volume = 0.0;
> +  for (PRUint32 i = 0; i < aStream->mAudioOutputs.Length(); ++i) {
> +    volume += aStream->mAudioOutputs[i].mVolume;
> +  }
> 
> This seems...odd; perhaps I don't understand the usage.  Multiplicative
> would make some sense (since 1.0f is the default), or an average perhaps,
> but additive confused me.

Each element mAudioOutputs represents an independent <audio> element playing back the same stream. Jean-Marc told me that, since these streams are perfectly correlated, adding the volumes together for a single output stream is the same as playing multiple output streams with those volumes.

> + * A stream of synchronized audio and video data. All (not blocked) streams
> + * progress at the same rate --- "real time". Streams cannot seek. The only
> + * operation readers can perform on a stream is to read the next data.
> 
> As discussed by email/IRC, there are issues here dealing with real-time or
> otherwise uncontrolled sources (and sinks); for example taking input from a
> USB mic or remote audio stream and feeding to motherboard-connected
> speakers; even if theoretically the same sampling rate, they will drift and
> so some type of compensation (resampling, etc) will eventually be needed.

Yeah. This definition stands, and it's up to graph inputs to make sure they resample as necessary to avoid drift.

> Various GCC versions generate better code if this is coded as:
>    PRUInt16 s = (PRUInt16) aValue;
>    return (PRInt16) ((s << 8) | (s >> 8));
> Shouldn't make a difference, but it does (in some GCCs).  MSVC, Mac and
> Linux have optimized byte-swapping macros (gcc has bswap_16() - #include
> <byteswap.h>).

I care this much about big-endian performance: ><, but fixed anyway.

> +FloatToSample(float aValue, PRUint8* aOut)
> +{
> +  PRInt32 v = PRInt32(aValue*128 + 128);
> +  *aOut = PRUint8(NS_MIN(v, 255));
> +}
> 
> Either were concerned about samples out of the +-1.0f range (as shown above)
> or we're not and can use a simpler conversion.  Since we are concerned, the
> code here aliases on clipping in the negative direction:
> 
> 2.0f  -> 256+128=384   -> NS_MIN(384,255)  -> PRUInt8(255)  = 255 (ok)
> -2.0f -> -256+128=-128 -> NS_MIN(-128,255) -> PRUInt8(-128) = 128 (oops)
> 1.5f  -> 192+128=320   -> NS_MIN(320,255)  -> PRUInt8(255)  = 255 (ok)
> -1.5f -> -192+128=-64  -> NS_MIN(-64,255)  -> PRUInt8(-64)  = 192 (oops)
> 
> A similar problem applies to the 16-bit version:
> 
> +FloatToSample(float aValue, PRInt16* aOut)
> +{
> +  PRInt32 v = PRInt32(aValue*32768.0f);
> +  *aOut = FlipByteOrderIfBigEndian(PRInt16(NS_MIN(v, 32767)));
> +}
> 
> This converts to PRInt16, not PRUInt16.  However, we still clamp only the
> positive side, and let the negative side alias.  The numeric results are
> slightly different due to sign bits, but the problem still exists.
> 
> I'm sure there are other or better references (jmspeex should know), but one
> is:
> http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html
> and
> http://blog.bjornroche.com/2009/12/linearity-and-dynamic-range-in-int.html

Wow, what a tar pit. I'll fix for out-of-range cases and use a 2^N conversion. Someone who cares more than I do and is willing to fight the flame wars can patch it afterwards.

> +float*
> +AudioFrame::GetWritableBuffer(PRInt32 aChannels)
> ...
> +  } else {
> +    memset(buf, 0, len);
> +  }
> +  mBuffer = new AudioBuffer(mDuration, aChannels,
> +                            nsAudioStream::FORMAT_FLOAT32, buf);
> +  mOffset = 0;
> +  return buf;
> +}
> 
> Please comment that binary 0's are 0.0 (float or double) in IEEE 754 FP.  In
> theory other FP architectures might break on this; in practice not likely.

Right. This is only used for processing, so I'm deferring that until a little bit later (not this bug).

> Defined (though rather unclear why), and not referenced.

Will remove.

(In reply to Randell Jesup [:jesup] from comment #20)
> +void CombineAudioFrameSamples(float* aBuffer, PRInt32 aBufferDuration,
> +                              PRInt32 aChannels,
> +                              nsTArray<AudioFrame>& aAudio)

Deferring to processing patches.

> +typedef void (* CombineAudioValues)(float* aDest, float aSrc);
> +template <CombineAudioValues Combine>
> +void CombineAudioFrameSamples(float* aBuffer, PRInt32 aBufferDuration,
> +                              PRInt32 aChannels,
> +                              nsTArray<AudioFrame>& aAudio)
> 
> This doesn't appear to be used anywhere.

Removed.

> + * XXX This is the worst possible implementation. We average the channels
> + * to get mono, then we duplicate that across all channels with some
> super-dumb
> + * linear-interpolation resampling. This must be replaced with something
> that
> + * doesn't suck!
> 
> Hmmm, sounds like an action item/follow-up-bug.  :-)

Yes.

> +  void ConsumeInput(nsTArray<AudioFrame>* aOutput)
> 
> I worry about the allocation pattern this function may produce; where the
> outBuf TArray allocates perhaps multiple times to hold samples, then re
> moz_xalloc a new buffer and memcpy all the data over.  Especially as the
> output size *should* be computable from the input size.

We never copy sample data, we just addref buffers.
Comment 33 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-22 02:27:51 PDT
(In reply to Randell Jesup [:jesup] from comment #21)
> Oh, one more:  eshan just revised the recommendations for namespaces, and
> we're trying to avoid second-level namespaces below mozilla except in
> specific instances, so the mozilla::media namespace should probably go away.

Yes, I've renamed a lot of things and done that.
Comment 34 Randell Jesup [:jesup] 2012-03-22 07:45:33 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #32)

> > +GraphManagerImpl::PlayAudio(Stream* aStream)
> > ...
> > +  float volume = 0.0;
> > +  for (PRUint32 i = 0; i < aStream->mAudioOutputs.Length(); ++i) {
> > +    volume += aStream->mAudioOutputs[i].mVolume;
> > +  }
> > 
> > This seems...odd; perhaps I don't understand the usage.  Multiplicative
> > would make some sense (since 1.0f is the default), or an average perhaps,
> > but additive confused me.
> 
> Each element mAudioOutputs represents an independent <audio> element playing
> back the same stream. Jean-Marc told me that, since these streams are
> perfectly correlated, adding the volumes together for a single output stream
> is the same as playing multiple output streams with those volumes.

Ah... So these are multiple copies of the same stream going to the same output, playing with the exact same timing, and we're collapsing them to one, louder version.  That wasn't clear (probably means a comment is in order).  Probably good to have jmspeex glance over this code anyways.

What's the use-case for this situation being created?

> 
> > + * A stream of synchronized audio and video data. All (not blocked) streams
> > + * progress at the same rate --- "real time". Streams cannot seek. The only
> > + * operation readers can perform on a stream is to read the next data.
> > 
> > As discussed by email/IRC, there are issues here dealing with real-time or
> > otherwise uncontrolled sources (and sinks); for example taking input from a
> > USB mic or remote audio stream and feeding to motherboard-connected
> > speakers; even if theoretically the same sampling rate, they will drift and
> > so some type of compensation (resampling, etc) will eventually be needed.
> 
> Yeah. This definition stands, and it's up to graph inputs to make sure they
> resample as necessary to avoid drift.

Lets think long and hard on this.  This is a fundamental, critical design point, and I would like to be very sure we understand all the ramifications and that this is the best way to do it (and that we've made sure they can do so properly - that they're getting the feedback they need to adapt).  This likely is the correct decision, but I'm just very wary of the rats-nest of issue that tie into it.

> > http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html
> > and
> > http://blog.bjornroche.com/2009/12/linearity-and-dynamic-range-in-int.html
> 
> Wow, what a tar pit. I'll fix for out-of-range cases and use a 2^N
> conversion. Someone who cares more than I do and is willing to fight the
> flame wars can patch it afterwards.

Yeah, seriously.


> 
> > +float*
> > +AudioFrame::GetWritableBuffer(PRInt32 aChannels)
> > ...
> > +  } else {
> > +    memset(buf, 0, len);
> > +  }
> > +  mBuffer = new AudioBuffer(mDuration, aChannels,
> > +                            nsAudioStream::FORMAT_FLOAT32, buf);
> > +  mOffset = 0;
> > +  return buf;
> > +}
> > 
> > Please comment that binary 0's are 0.0 (float or double) in IEEE 754 FP.  In
> > theory other FP architectures might break on this; in practice not likely.
> 
> Right. This is only used for processing, so I'm deferring that until a
> little bit later (not this bug).

Sure; all that's needed here (and it's a trivial issue anyways) is a comment.
Comment 35 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-22 09:39:28 PDT
(In reply to Randell Jesup [:jesup] from comment #34)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #32)
> > Each element mAudioOutputs represents an independent <audio> element playing
> > back the same stream. Jean-Marc told me that, since these streams are
> > perfectly correlated, adding the volumes together for a single output stream
> > is the same as playing multiple output streams with those volumes.
> 
> Ah... So these are multiple copies of the same stream going to the same
> output, playing with the exact same timing, and we're collapsing them to
> one, louder version.  That wasn't clear (probably means a comment is in
> order).  Probably good to have jmspeex glance over this code anyways.

OK.

> What's the use-case for this situation being created?

var stream = ...;
var a1 = new Audio(stream);
var a2 = new Audio(stream);
a1.volume = ...;
a2.volume = ...;
a1.play(); a2.play();

Not really a use-case, but it's easier to add volumes than actually manage multiple nsAudioStreams.

> > Yeah. This definition stands, and it's up to graph inputs to make sure they
> > resample as necessary to avoid drift.
> 
> Lets think long and hard on this.  This is a fundamental, critical design
> point, and I would like to be very sure we understand all the ramifications
> and that this is the best way to do it (and that we've made sure they can do
> so properly - that they're getting the feedback they need to adapt).  This
> likely is the correct decision, but I'm just very wary of the rats-nest of
> issue that tie into it.

Sure. We did discuss it on the phone a while back.
Comment 36 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:03:45 PDT
Created attachment 610001 [details] [diff] [review]
Part 0: Refactor nsMediaCache's handling of principals into a helper method in nsContentUtils
Comment 37 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:04:21 PDT
Created attachment 610002 [details] [diff] [review]
Part 1: Create TimeVarying<T> class to represent values that change over time
Comment 38 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:05:38 PDT
Created attachment 610003 [details] [diff] [review]
Part 2: Create MediaSegment, AudioSegment and VideoSegment classes to manage intervals of media data
Comment 39 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:06:14 PDT
Created attachment 610004 [details] [diff] [review]
Part 3: Create MediaStream and MediaGraphManager for internal management of real-time media processing
Comment 40 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:07:04 PDT
Created attachment 610005 [details] [diff] [review]
Part 4: Create nsDOMMediaStream, a DOM object wrapping an underlying MediaStream
Comment 41 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:08:01 PDT
Created attachment 610006 [details] [diff] [review]
Part 5: Create SourceMediaStream, a MediaStream with an API allowing data to be injected into it by some source
Comment 42 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:08:33 PDT
Created attachment 610007 [details] [diff] [review]
Part 6: ImageContainer::GetCurrentAsSurface shouldn't crash when mActiveImage is null
Comment 43 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:09:36 PDT
Created attachment 610008 [details] [diff] [review]
Part 7: Let the "src" attribute of HTML media elements accept a MediaStream DOM object, to make the media element play back the contents of the given stream
Comment 44 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:11:36 PDT
Created attachment 610009 [details] [diff] [review]
Part 8: Add mozCaptureStream()/mozCaptureStreamUntilEnded() APIs to HTML media elements, returning a MediaStream representing the contents of the media element
Comment 45 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:12:35 PDT
Created attachment 610010 [details] [diff] [review]
Part 9: Tentative support for MediaStreamListener::NotifyQueuedTrackChanges
Comment 46 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:13:21 PDT
Created attachment 610011 [details] [diff] [review]
Part 10: Add test_streams_element_capture.html to test capturing a media element into a stream and playing that stream through another media element
Comment 47 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-27 21:24:27 PDT
These patches take into account all the comments above.

There are some pretty major changes compared to the first round of patches. Things that spring to mind:
-- A lot of code that was only used for ProcessedMediaStream has been removed. (I'll put it back later in a separate round of patches for ProcessedMediaStream.)
-- The old StreamBuffer implementation with its Audio/VideoFrames and fixed "one audio track, one video track" setup was totally overhauled. The new StreamBuffer supports dynamic addition and removal of any number of tracks of any kind. This actually simplified some of the code.
-- Logic around various timelines was simplified and clarified by adding explicit types "GraphTime", "StreamTime", "TrackTicks" and methods for converting between them.
-- The graph thread scheduling logic was greatly simplified. The current approach is: try to run the graph thread control loop every 10ms unless there is nothing to do. Try to buffer enough to cover missing our wakeup deadline by up to 10ms.
-- An nsIPrincipal is associated with every nsDOMMediaStream. This tracks the origin of the stream data. Appropriate security checks are in place so that you can't use MediaStreams to get pixel data from non-same-origin videos.

I think the weakest link here is part 8. I need to pretty quickly do some followup work to handle cases where the author monkeys with a media element that's generating a MediaStream.
Comment 48 Olli Pettay [:smaug] 2012-03-27 21:44:11 PDT
Comment on attachment 610005 [details] [diff] [review]
Part 4: Create nsDOMMediaStream, a DOM object wrapping an underlying MediaStream

So, MediaStream will get some member variables which will cause cycles.
Comment 49 Boris Zbarsky [:bz] 2012-03-28 02:54:20 PDT
Comment on attachment 610001 [details] [diff] [review]
Part 0: Refactor nsMediaCache's handling of principals into a helper method in nsContentUtils

It's tempting to mention something about computing the meet of the two principals, but I guess that's ok.

It might be worth exposing an isNull() method on nsIPrincipal.  A followup, please.

I think it's worth more clearly documenting that aResourcePrincipal is required to be non-null but *aResourcePrincipal may be null.  Or perhaps just make aResourcePrincipal an nsCOMPtr<nsIPrincipal>& and avoid that issue?  It might also be worth more clearly distinguishing between null _pointers_ and null _principals_ in the documentation...

Do you really want Equals here?  Seems like a Subsumes() check in one direction or other would make more sense.  And yes, I know you just copied this code...

r=me with the nits fixed.
Comment 50 Chris Pearce (:cpearce) 2012-03-28 15:15:41 PDT
Comment on attachment 610008 [details] [diff] [review]
Part 7: Let the "src" attribute of HTML media elements accept a MediaStream DOM object, to make the media element play back the contents of the given stream

Review of attachment 610008 [details] [diff] [review]:
-----------------------------------------------------------------

r=cpearce assuming you figure out what to do about SetCurrentTime().

::: content/html/content/src/nsHTMLMediaElement.cpp
@@ +1121,5 @@
>    NS_ENSURE_ARG_POINTER(aListener);
>  
>    *aListener = nsnull;
>  
> +  // Make sure we don't reenter during synchronous abort events

Append full stop.

@@ +1147,5 @@
>  NS_IMETHODIMP nsHTMLMediaElement::MozLoadFrom(nsIDOMHTMLMediaElement* aOther)
>  {
>    NS_ENSURE_ARG_POINTER(aOther);
>  
> +  // Make sure we don't reenter during synchronous abort events

Append full stop.

@@ +1209,5 @@
>  
> +  if (mStream) {
> +    // do nothing since streams aren't seekable; we effectively clamp to
> +    // the current time.
> +    SeekCompleted();

Doesn't this mean you'll dispatch a "seeked" event without having first dispatched a "seeking" event? Maybe you should call SeekStarted() before calling SeekCompleted()?

On the other hand the WhatWG spec for currentTime [1] says "On setting, if the media element has a current media controller, then the user agent must throw an InvalidStateError exception", so you could follow that example and throw an exception. This makes more sense to me, since MediaStreams are supposed to be seekable, and I don't think it makes sense to be firing seeking/seeked events if we're not actually seeking.

[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#dom-media-currenttime

@@ +1262,5 @@
>    return NS_OK;
>  }
>  
>  /* readonly attribute nsIDOMHTMLTimeRanges seekable; */
>  NS_IMETHODIMP nsHTMLMediaElement::GetSeekable(nsIDOMTimeRanges** aSeekable)

It looks like GetSeekable() and GetBuffered() would return empty ranges when playing streams. Presumably that's ok?

@@ +2202,2 @@
>    if (NS_FAILED(rv)) {
> +    LOG(PR_LOG_DEBUG, ("%p Failed to load decoder/stream for decoder %p", this, aDecoder));

s/decoder\/stream/decoder/ right? This is only called when setting up a decoder (otherwise you'd need to null check aDecoder above).

::: dom/interfaces/html/nsIDOMHTMLMediaElement.idl
@@ +56,5 @@
>  #undef GetCurrentTime
>  #endif
>  %}
>  
> +[scriptable, uuid(3e672e79-a0ea-45ef-87de-828402f1f6d7)]

You'll need to rev the uuids of classes that inherit MediaElement too.
Comment 51 Chris Pearce (:cpearce) 2012-03-28 19:32:43 PDT
Comment on attachment 610009 [details] [diff] [review]
Part 8: Add mozCaptureStream()/mozCaptureStreamUntilEnded() APIs to HTML media elements, returning a MediaStream representing the contents of the media element

Review of attachment 610009 [details] [diff] [review]:
-----------------------------------------------------------------

I think we should use AudioDataValue rather than VorbisPCMValue.

::: content/html/content/public/nsHTMLMediaElement.h
@@ +405,5 @@
>     * Stop playback on mStream.
>     */
>    void EndMediaStreamPlayback();
>  
> +  already_AddRefed<nsDOMMediaStream> CaptureStreamInternal(bool aFinishWhenEnded);

Comment describing affects of aFinishWhenEnded.

@@ +771,5 @@
>  
>    // True if the sound is muted
>    bool mMuted;
>  
> +  // True if the sound is being captured

Full stop.

::: content/media/nsBuiltinDecoder.cpp
@@ +99,5 @@
> +  ms->Init(PRInt64(mCurrentTime*USECS_PER_S), aStream, aFinishWhenEnded);
> +
> +  if (mDecoderStateMachine) {
> +    // Queue data for this stream now in case the decode loop doesn't take
> +    // any more steps for a while.

Can't you just calls ScheduleStateMachine here?

::: content/media/nsBuiltinDecoder.h
@@ +421,5 @@
> +    PRInt64 mLastAudioPacketEndTime; // microseconds
> +    // Count of audio frames written to the stream
> +    PRInt64 mAudioFramesWritten;
> +    // Timestamp of the first audio packet whose frames we wrote.
> +   PRInt64 mAudioFramesWrittenBaseTime; // microseconds

Indentation is off.

@@ +431,5 @@
> +    // the image.
> +    nsRefPtr<Image> mLastVideoImage;
> +    nsRefPtr<SourceMediaStream> mStream;
> +    gfxIntSize mLastVideoImageDisplaySize;
> +    bool mIsInitialized;

Might be handy to say why mIsInitialized isn't set in the Init() method.

@@ +735,5 @@
>    // to Wait on this monitor will block the thread until the next
>    // state change.
>    ReentrantMonitor mReentrantMonitor;
>  
> +  nsTArray<OutputMediaStream> mOutputStreams;

Add comment please.

::: content/media/nsBuiltinDecoderReader.cpp
@@ +80,5 @@
> +
> +  AudioDataValue* data = static_cast<AudioDataValue*>(mAudioBuffer->Data());
> +  for (PRUint32 i = 0; i < mFrames; ++i) {
> +    for (PRUint32 j = 0; j < mChannels; ++j) {
> +      data[j*mFrames + i] = mAudioData[i*mChannels + j];

You should make it clear that your deliberately converting to channel-major ordering here. Or even we should just always store samples in channel-major order until we write to hardware.

@@ +411,5 @@
> +                                           PRInt64 aTime,
> +                                           PRInt64 aDuration,
> +                                           PRUint32 aFrames,
> +                                           PRUint32 aChannels,
> +                                           VorbisPCMValue** aChannelBuffers)

Why can't aChannelBuffers be AudioDataValue rather than VorbisPCMValue? I'd rather not pollute the general nsBuiltinDecoderReader architecture with codec specific data types. Why should the Wave decoder output a VorbisPCMValue? If we implemented an MP3 or AAC decoder, why should they?

::: content/media/nsBuiltinDecoderReader.h
@@ +473,5 @@
>    // Queue of audio frames. This queue is threadsafe, and is accessed from
>    // the audio, decoder, state machine, and main threads.
>    MediaQueue<AudioData> mAudioQueue;
>  
> +  void PushAudioData(PRInt64 aOffset,

Comment, including units for aTime and aDuration. And I'd prefer aChannelBuffers to be AudioDataValue** rather than VorbisPCMValue**.

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +562,5 @@
> +  if (!audioWrittenOffset.valid() || !frameOffset.valid())
> +    return;
> +  if (audioWrittenOffset.value() < frameOffset.value()) {
> +    // Write silence to catch up
> +    LOG(PR_LOG_DEBUG, ("%p Decoder writing %d frames of silence to stream",

Maybe you should change this logging to make it clear that this is for the MediaStream output, not regular playback?

@@ +604,5 @@
> +}
> +
> +static const TrackID TRACK_AUDIO = 1;
> +static const TrackID TRACK_VIDEO = 2;
> +static const TrackRate RATE_VIDEO = 1000000;

USECS_PER_S ?

@@ +670,5 @@
> +      VideoSegment output;
> +      for (PRUint32 i = 0; i < video.Length(); ++i) {
> +        VideoData* v = video[i];
> +        if (stream->mNextVideoTime + mStartTime < v->mTime) {
> +          LOG(PR_LOG_DEBUG, ("%p Decoder writing last video to stream for %lld ms",

Same thing here, make clear this is the decoder writing to media stream rather than to regular playback.

@@ +741,5 @@
> +    }
> +  }
> +}
> +
> +bool nsBuiltinDecoderStateMachine::HaveEnoughDecodedAudio(PRInt64 aAmpleAudio)

aAmpleAudioUsecs (presumably).

@@ +890,5 @@
>  
>      // Video decode.
> +    bool throttleVideoDecoding = !videoPlaying || HaveEnoughDecodedVideo();
> +    if (mDidThrottleVideoDecoding && !throttleVideoDecoding) {
> +      videoPump = true;

This seems like a reasonable change to make.  We've got some bugs on file regarding our skip-to-next-keyframe logic being too aggressive (bug 688363), this will affect that.

::: content/media/nsBuiltinDecoderStateMachine.h
@@ +286,5 @@
>  
> +  // Copy queued audio/video data in the reader to any output MediaStreams that
> +  // need it.
> +  void SendOutputStreamData();
> +  bool HaveEnoughDecodedAudio(PRInt64 aAmpleAudio);

aAmpleAudioUsecs (presumably).

@@ +452,5 @@
>    // Decode thread run function. Determines which of the Decode*() functions
>    // to call.
>    void DecodeThreadRun();
>  
> +  void SendOutputStreamAudio(AudioData* aAudio, OutputMediaStream* aStream,

Comment?

@@ +659,5 @@
>  
>    // True is we are decoding a realtime stream, like a camera stream
>    bool mRealTime;
>  
> +  bool mDidThrottleAudioDecoding;

Descriptive comments?

::: content/media/nsMediaCache.h
@@ +476,5 @@
>    bool                   mDidNotifyDataEnded;
>  
>    // The following fields are protected by the cache's monitor but are
> +  // only written on the main thread, thus can be used either on the main thread
> +  // or while holding the cache's monitor.

Presumably you must hold the monitor while writing though?

::: dom/interfaces/html/nsIDOMHTMLMediaElement.idl
@@ +48,5 @@
>   *
>   * For more information on this interface, please see
>   * http://www.whatwg.org/specs/web-apps/current-work/#htmlmediaelement
>   *
>   * @status UNDER_DEVELOPMENT

Rev uuid and uuid of subclasses.

There's a tool for that: http://people.mozilla.com/~sfink/uploads/update-uuids
`update-uuids nsIDOMHTMLMediaElement dom` should do it.
Comment 52 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-29 13:47:58 PDT
I fixed a few bugs and my patches are green on try on all platforms. Of course, the test here is pretty simple but it does exercise a great deal of the existing functionality.
Comment 53 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-29 20:48:55 PDT
(In reply to Chris Pearce (:cpearce) from comment #50)
> On the other hand the WhatWG spec for currentTime [1] says "On setting, if
> the media element has a current media controller, then the user agent must
> throw an InvalidStateError exception", so you could follow that example and
> throw an exception. This makes more sense to me, since MediaStreams are
> supposed to be seekable, and I don't think it makes sense to be firing
> seeking/seeked events if we're not actually seeking.

I changed it to throw an exception.

> It looks like GetSeekable() and GetBuffered() would return empty ranges when
> playing streams. Presumably that's ok?

I think so.

Everything else fixed.
Comment 54 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-03-29 21:12:22 PDT
(In reply to Chris Pearce (:cpearce) from comment #51)
> I think we should use AudioDataValue rather than VorbisPCMValue.

I removed all changes to PushAudioData. It was an experiment that didn't work out. However, at some point after this lands we should probably switch the decoder stack to use the same channel-major audio buffers as media streams.

> Might be handy to say why mIsInitialized isn't set in the Init() method.

I changed its name to mStreamInitialized and added a comment.

Everything else fixed.
Comment 55 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-01 19:20:02 PDT
(In reply to Boris Zbarsky (:bz) from comment #49)

I'm actually not so sure about the safety of the code in this patch. My concern is that if the current principal is some null principal, and some other principal is merged into it, we don't change the current principal, but maybe we should. Because imagine if we have a stream A with a null principal, and that principal gets propagated to stream B because B gets data from A (and nowhere else). Then imagine we inject data from some fresh origin O into A (but B is not changed). In theory, we should disallow future access from B to A, but with this code, we wouldn't because they'd still have the same principal. In theory, it seems to me that every time we mix content from some origin O into a null principal for a resource, we should generate a fresh null principal for the resource.

Now, maybe it doesn't matter because nothing with a null principal can "access" A in a meaningful way, i.e., treating different null principals as same-origin to each other is harmless. But I don't find that convincing.

Maybe what I really want is a special "poisoned" principal that isn't even same-origin to itself. Does that make sense?
Comment 56 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-01 19:23:05 PDT
Maybe a better way to go would be to regard the principal of a resource as one that must be subsumed by any principal reading the resource? Then the meet of two incompatible origins could be the system principal. How does that sound?
Comment 57 Boris Zbarsky [:bz] 2012-04-01 21:50:01 PDT
Per irc discussion, if the principal only controls access to the stream, not what the stream can access, then using the system principal to indicate "no one but system can access this" makes perfect sense.
Comment 58 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-02 21:43:58 PDT
Created attachment 611714 [details] [diff] [review]
Part 0: Refactor nsMediaCache's handling of principals into a helper method in nsContentUtils
Comment 59 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-02 21:44:46 PDT
Created attachment 611715 [details] [diff] [review]
Part 8: Add mozCaptureStream()/mozCaptureStreamUntilEnded() APIs to HTML media elements, returning a MediaStream representing the contents of the media element
Comment 60 Randell Jesup [:jesup] 2012-04-03 21:51:12 PDT
Comment on attachment 610002 [details] [diff] [review]
Part 1: Create TimeVarying<T> class to represent values that change over time

Looks good, like the parameterization of Time
Comment 61 Randell Jesup [:jesup] 2012-04-04 23:52:38 PDT
Comment on attachment 610004 [details] [diff] [review]
Part 3: Create MediaStream and MediaGraphManager for internal management of real-time media processing

Review of attachment 610004 [details] [diff] [review]:
-----------------------------------------------------------------

r=jesup, though note that I expect some additional synchronization work will be needed.

::: content/media/MediaStreamGraph.cpp
@@ +297,5 @@
> +  bool IsEmpty() { return mStreams.IsEmpty(); }
> +
> +  // For use by control messages
> +  /**
> +   * Identiy which graph update index we are currently processing.

"Identiy" -> Identify

@@ +311,5 @@
> +   * Marks aStream as affected by a change in its output at the earliest
> +   * possible time.
> +   */
> +  void NoteStreamAffected(MediaStream* aStream);
> +  void AddStream(MediaStream* aStream)

Given how wonderfully thorough the comments here are, AddStream() and RemoveStream() stand out as un-commented (especially AddStream).

@@ +357,5 @@
> +   * Readonly after initialization on the main thread.
> +   */
> +  nsCOMPtr<nsIThread> mThread;
> +
> +  // This state is managed on the graph thread only, unless

Which state is "this state"?  mStreams?  Or all the stuff following this?

@@ +359,5 @@
> +  nsCOMPtr<nsIThread> mThread;
> +
> +  // This state is managed on the graph thread only, unless
> +  // mLifecycleState > LIFECYCLE_RUNNING in which case the graph thread
> +  // is not running and this can all be used from the main thread.

So, if this can be used when the state is > LIFECYCLE_RUNNING, that should be commented in the enum ("values below here allow you to assume that you can safely access <blah> state from the main thread"), or some such.  How does this interact with mMonitor below?

::: content/media/MediaStreamGraph.h
@@ +138,5 @@
> + * Any stream can have its audio and video playing when requested. The media
> + * stream graph plays audio by constructing audio output streams as necessary.
> + * Video is played by setting video frames into an VideoFrameContainer at the right
> + * time. To ensure video plays in sync with audio, make sure that the same
> + * stream is playing both the audio and video.

One thing we need to be prepared to handle: synchronizing the output (from the media graph) of the audio and video does not mean they'll be synchronized in playback.

In particular, if audio has a downstream mixer + audio driver + kernel driver + hardware delay, it might be consistently much more than video's delay (output code plus 0-1 frames.  While it's better to have video lead audio, if it's by too much people notice lack of frame sync.  Ideally you want to have audio be roughly +10 to -50ms range (roughly).  I'd prefer +0 to -20ms.

I suspect we'll need some A/V offset to compensate, plus perhaps a value fed back to us by the audio driver about known additional latency.

@@ +332,5 @@
> +  // The first active track is the track that started earliest; if multiple
> +  // tracks start at the same time, the one with the lowest ID.
> +  TrackID mFirstActiveTracks[MediaSegment::TYPE_COUNT];
> +
> +  // Temporary data used by MediaStreamGraph on the grpah thread

typo
Comment 62 Randell Jesup [:jesup] 2012-04-05 01:59:04 PDT
Comment on attachment 610003 [details] [diff] [review]
Part 2: Create MediaSegment, AudioSegment and VideoSegment classes to manage intervals of media data

Review of attachment 610003 [details] [diff] [review]:
-----------------------------------------------------------------

r+ modulo correcting the 50ms constant

::: content/media/MediaSegment.h
@@ +20,5 @@
> +
> +inline MediaTime MillisecondsToMediaTime(PRInt32 aMS)
> +{
> +  NS_ASSERTION(aMS <= (MEDIA_TIME_MAX >> MEDIA_TIME_FRAC_BITS)*1000,
> +               "Out of range");

About 278 years or so ;-)

@@ +21,5 @@
> +inline MediaTime MillisecondsToMediaTime(PRInt32 aMS)
> +{
> +  NS_ASSERTION(aMS <= (MEDIA_TIME_MAX >> MEDIA_TIME_FRAC_BITS)*1000,
> +               "Out of range");
> +  return (MediaTime(aMS) << MEDIA_TIME_FRAC_BITS)/1000;

Not that it matters in the slightest, but loses range due to 64-bit intermediary - max 278 years instead of something huge.

::: content/media/SharedBuffer.h
@@ +20,5 @@
> +public:
> +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SharedBuffer)
> +  ~SharedBuffer() {}
> +
> +  void* Data() { return this + 1; }

I see the ASSERTION below that guarantees that this+1 is at least 4-byte aligned; so long as this+1 == ((char *) this) + sizeof(*this), this (and the allocation) should be fine.
There are some more complex ways to do this that are probably a bit more canonically correct - I think this is good for any platform we care about.
At most comment along the lings of "assumes sizeof(SharedBuffer) includes padding for use in arrays".

My whole comment here is probably overly pedantic. :-)

::: content/media/StreamBuffer.cpp
@@ +37,5 @@
> +void
> +StreamBuffer::ForgetUpTo(StreamTime aTime)
> +{
> +  // Round to nearest 50ms so we don't spend too much time pruning segments.
> +  StreamTime forget = (aTime/50000)*50000;

50ms != 50000, and also it should use MEDIA_TIME_FRAC_BITS -- ((1<<MEDIA_TIME_FRAC_BITS)-1)/20
Nearest would be ((aTime+25000)/50000)*50000 (appropriate modification to replace 50000/etc assumed)
Not that it matters, but binary masking would be faster - forget &= ^(((1<<MEDIA_TIME_FRAC_BITS)-1)>>4) or some such (mods for nearest would be to add 1/32th sec to forget first)
Comment 63 Randell Jesup [:jesup] 2012-04-05 02:13:29 PDT
Comment on attachment 610006 [details] [diff] [review]
Part 5: Create SourceMediaStream, a MediaStream with an API allowing data to be injected into it by some source

Review of attachment 610006 [details] [diff] [review]:
-----------------------------------------------------------------

r- on RemoveElementAt; if I'm wrong and it's safe, comment it please and I'll r+ (feel free to convert)

::: content/media/MediaStreamGraph.cpp
@@ +618,5 @@
> +        dest->AppendFrom(data->mData);
> +      }
> +      if (data->mCommands & SourceMediaStream::TRACK_END) {
> +        aStream->mBuffer.FindTrack(data->mID)->SetEnded();
> +        aStream->mUpdateTracks.RemoveElementAt(i);

Is this safe given we're iterating through mUpdateTracks[i]?  Wouldn't that cause us to skip a track following the current one?
Comment 64 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-05 03:02:16 PDT
(In reply to Randell Jesup [:jesup] from comment #63)
> ::: content/media/MediaStreamGraph.cpp
> @@ +618,5 @@
> > +        dest->AppendFrom(data->mData);
> > +      }
> > +      if (data->mCommands & SourceMediaStream::TRACK_END) {
> > +        aStream->mBuffer.FindTrack(data->mID)->SetEnded();
> > +        aStream->mUpdateTracks.RemoveElementAt(i);
> 
> Is this safe given we're iterating through mUpdateTracks[i]?  Wouldn't that
> cause us to skip a track following the current one?

No, because we're deliberately iterating in reverse order to avoid this issue.
Comment 65 Randell Jesup [:jesup] 2012-04-05 09:26:10 PDT
Comment on attachment 610008 [details] [diff] [review]
Part 7: Let the "src" attribute of HTML media elements accept a MediaStream DOM object, to make the media element play back the contents of the given stream

Review of attachment 610008 [details] [diff] [review]:
-----------------------------------------------------------------

So long as re-setting element.src from a URL to a stream and vice-versa works properly, r+ with nits addressed.

::: content/html/content/public/nsHTMLMediaElement.h
@@ +602,5 @@
>     * Process any media fragment entries in the URI
>     */
>    void ProcessMediaFragmentURI();
>  
>    // The current decoder. Load() has been called on this decoder.

I think this comment may now be slightly misleading (we call Load() when setting src to an object aka MediaStream)

::: content/html/content/src/nsHTMLMediaElement.cpp
@@ +471,5 @@
> +    stream = do_QueryInterface(nsContentUtils::XPConnect()->
> +        GetNativeOfWrapper(aCtx, JSVAL_TO_OBJECT(aParams)));
> +    if (stream) {
> +      mSrcAttrStream = static_cast<nsDOMMediaStream*>(stream.get());
> +      UnsetAttr(kNameSpaceID_None, nsGkAtoms::src, true);

Does this (or the Load()) clear out mDecoder when mStream is set?  How do we handle an element with changing the src element in general?  (This may all be fine, but I haven't wandered through all the code paths.)

@@ +2831,5 @@
>  {
> +  if (mDecoder) {
> +    return mDecoder->GetCurrentPrincipal();
> +  }
> +  if (mStream) {

If mDecoder and mStream are truly non-overlapping, that should be "else if" (and is elsewhere)

::: dom/interfaces/html/nsIDOMHTMLMediaElement.idl
@@ +56,5 @@
>  #undef GetCurrentTime
>  #endif
>  %}
>  
> +[scriptable, uuid(3e672e79-a0ea-45ef-87de-828402f1f6d7)]

Same comment as cpearce - update-uuids.sh is handy :-)
Comment 66 Boris Zbarsky [:bz] 2012-04-05 13:25:47 PDT
Comment on attachment 611714 [details] [diff] [review]
Part 0: Refactor nsMediaCache's handling of principals into a helper method in nsContentUtils

r=me
Comment 67 Randell Jesup [:jesup] 2012-04-05 13:36:52 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #64)
> > Is this safe given we're iterating through mUpdateTracks[i]?  Wouldn't that
> > cause us to skip a track following the current one?
> 
> No, because we're deliberately iterating in reverse order to avoid this
> issue.

Cool, sorry I missed it was iterating backwards.  Might be worth a couple-word comment ("safe because we're iterating backwards")
Comment 68 Randell Jesup [:jesup] 2012-04-08 23:56:12 PDT
Comment on attachment 611715 [details] [diff] [review]
Part 8: Add mozCaptureStream()/mozCaptureStreamUntilEnded() APIs to HTML media elements, returning a MediaStream representing the contents of the media element

Review of attachment 611715 [details] [diff] [review]:
-----------------------------------------------------------------

Just nits

::: content/html/content/src/nsHTMLMediaElement.cpp
@@ +2373,5 @@
>    NS_ASSERTION(!mStream && !mStreamListener, "Should have been ended already");
>  
>    mStream = mSrcAttrStream;
> +  // XXX if we ever support capturing the output of this media element,
> +  // need to add a CombineWithPrincipal call here.

Isn't this patch to capture the output of media elements?  That makes this XXX comment confusing.

::: content/media/nsAudioAvailableEventManager.cpp
@@ +160,5 @@
>  
>      // Fill the signalBuffer.
>      PRUint32 i;
>      float *signalBuffer = mSignalBuffer.get() + mSignalBufferPosition;
> +    if (audioData) {

Was there any chance this (null audioData) would have happened before this patchset lands?  If so, is there any reason to file a bug on that issue?  It would be a straightforward null-ptr crash.

::: content/media/nsBuiltinDecoderReader.h
@@ +441,5 @@
>    // Reads and decodes one video frame. Packets with a timestamp less
>    // than aTimeThreshold will be decoded (unless they're not keyframes
>    // and aKeyframeSkip is true), but will not be added to the queue.
>    virtual bool DecodeVideoFrame(bool &aKeyframeSkip,
> +                                PRInt64 aTimeThreshold) = 0;

Whitespace change?

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +554,5 @@
> +  NS_ASSERTION(aOutput->GetChannels() == aAudio->mChannels,
> +               "Wrong number of channels");
> +
> +  // This logic has to mimic AudioLoop closely to make sure we're write
> +  // the exact same silences

we're -> we

@@ +682,5 @@
> +        }
> +        if (stream->mNextVideoTime + mStartTime < v->mEndTime) {
> +          LOG(PR_LOG_DEBUG, ("%p Decoder writing video frame %lld to MediaStream",
> +                             mDecoder.get(), v->mTime));
> +          WriteVideoToMediaStream(v->mImage,

Not really sure of the logic here without getting a bunch  deeper into the code - is there an instance where we'd WriteVideoToMediaStream() in the previous if() (which modifies mNextVideoTime) and then do so again in this if?  If so, is this correct, and what does it mean?
Comment 69 Randell Jesup [:jesup] 2012-04-12 15:01:41 PDT
Comment on attachment 610006 [details] [diff] [review]
Part 5: Create SourceMediaStream, a MediaStream with an API allowing data to be injected into it by some source

per comments, r+
Comment 70 [:fabrice] Fabrice Desré 2012-04-25 10:57:50 PDT
Roc, What holds landing this?
Comment 71 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-25 19:55:11 PDT
Me refreshing patches. I am about to do that.
Comment 72 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-26 20:05:22 PDT
(In reply to Randell Jesup [:jesup] from comment #62)
> I see the ASSERTION below that guarantees that this+1 is at least 4-byte
> aligned; so long as this+1 == ((char *) this) + sizeof(*this), this (and the
> allocation) should be fine.
> There are some more complex ways to do this that are probably a bit more
> canonically correct - I think this is good for any platform we care about.
> At most comment along the lings of "assumes sizeof(SharedBuffer) includes
> padding for use in arrays".
> 
> My whole comment here is probably overly pedantic. :-)

I added a comment.

> 50ms != 50000, and also it should use MEDIA_TIME_FRAC_BITS --
> ((1<<MEDIA_TIME_FRAC_BITS)-1)/20
> Nearest would be ((aTime+25000)/50000)*50000 (appropriate modification to
> replace 50000/etc assumed)
> Not that it matters, but binary masking would be faster - forget &=
> ^(((1<<MEDIA_TIME_FRAC_BITS)-1)>>4) or some such (mods for nearest would be
> to add 1/32th sec to forget first)

I just used MillisecondsToMediaTime(50) and did the division. Choosing a power-of-2 to mask with was unnecessarily obscure.
Comment 73 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-26 20:23:50 PDT
(In reply to Randell Jesup [:jesup] from comment #61)
> @@ +357,5 @@
> > +   * Readonly after initialization on the main thread.
> > +   */
> > +  nsCOMPtr<nsIThread> mThread;
> > +
> > +  // This state is managed on the graph thread only, unless
> 
> Which state is "this state"?  mStreams?  Or all the stuff following this?

Everything following (until mMonitor). Improved the comment.

> @@ +359,5 @@
> > +  nsCOMPtr<nsIThread> mThread;
> > +
> > +  // This state is managed on the graph thread only, unless
> > +  // mLifecycleState > LIFECYCLE_RUNNING in which case the graph thread
> > +  // is not running and this can all be used from the main thread.
> 
> So, if this can be used when the state is > LIFECYCLE_RUNNING, that should
> be commented in the enum ("values below here allow you to assume that you
> can safely access <blah> state from the main thread"), or some such.

Done.

> How does this interact with mMonitor below?

It doesn't. Data guarded by mMonitor must always be accessed with mMonitor held. Added a comment.

> One thing we need to be prepared to handle: synchronizing the output (from
> the media graph) of the audio and video does not mean they'll be
> synchronized in playback.
> 
> In particular, if audio has a downstream mixer + audio driver + kernel
> driver + hardware delay, it might be consistently much more than video's
> delay (output code plus 0-1 frames.  While it's better to have video lead
> audio, if it's by too much people notice lack of frame sync.  Ideally you
> want to have audio be roughly +10 to -50ms range (roughly).  I'd prefer +0
> to -20ms.
> 
> I suspect we'll need some A/V offset to compensate, plus perhaps a value fed
> back to us by the audio driver about known additional latency.

Yes. MediaStreamGraphImpl::GetAudioPosition calls the nsAudioStream API for that. I don't currently use it for video sync because it wasn't reliable enough in my testing. There's some work to do here, but the situation isn't much different to our current A/V sync for media elements, which generally seems to be acceptable.
Comment 74 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-26 20:51:26 PDT
(In reply to Randell Jesup [:jesup] from comment #65)
> ::: content/html/content/src/nsHTMLMediaElement.cpp
> @@ +471,5 @@
> > +    stream = do_QueryInterface(nsContentUtils::XPConnect()->
> > +        GetNativeOfWrapper(aCtx, JSVAL_TO_OBJECT(aParams)));
> > +    if (stream) {
> > +      mSrcAttrStream = static_cast<nsDOMMediaStream*>(stream.get());
> > +      UnsetAttr(kNameSpaceID_None, nsGkAtoms::src, true);
> 
> Does this (or the Load()) clear out mDecoder when mStream is set?  How do we
> handle an element with changing the src element in general?  (This may all
> be fine, but I haven't wandered through all the code paths.)

Load() calls AbortExistingLoads() which clears mDecoder and mStream if necessary. Well, unless 'src' is set during a synchronous event triggered by Load(), in which case Load() immediately bails out. Basically we handle 'src' changes the same way whether it's being set to a stream or a URI.

> @@ +2831,5 @@
> >  {
> > +  if (mDecoder) {
> > +    return mDecoder->GetCurrentPrincipal();
> > +  }
> > +  if (mStream) {
> 
> If mDecoder and mStream are truly non-overlapping, that should be "else if"
> (and is elsewhere)

Style is to not put "else" clauses after returns.
Comment 75 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-26 21:00:34 PDT
(In reply to Randell Jesup [:jesup] from comment #68)
> ::: content/html/content/src/nsHTMLMediaElement.cpp
> @@ +2373,5 @@
> >    NS_ASSERTION(!mStream && !mStreamListener, "Should have been ended already");
> >  
> >    mStream = mSrcAttrStream;
> > +  // XXX if we ever support capturing the output of this media element,
> > +  // need to add a CombineWithPrincipal call here.
> 
> Isn't this patch to capture the output of media elements?  That makes this
> XXX comment confusing.

Clarified.

> ::: content/media/nsAudioAvailableEventManager.cpp
> @@ +160,5 @@
> >  
> >      // Fill the signalBuffer.
> >      PRUint32 i;
> >      float *signalBuffer = mSignalBuffer.get() + mSignalBufferPosition;
> > +    if (audioData) {
> 
> Was there any chance this (null audioData) would have happened before this
> patchset lands?

No.

> @@ +682,5 @@
> > +        }
> > +        if (stream->mNextVideoTime + mStartTime < v->mEndTime) {
> > +          LOG(PR_LOG_DEBUG, ("%p Decoder writing video frame %lld to MediaStream",
> > +                             mDecoder.get(), v->mTime));
> > +          WriteVideoToMediaStream(v->mImage,
> 
> Not really sure of the logic here without getting a bunch  deeper into the
> code - is there an instance where we'd WriteVideoToMediaStream() in the
> previous if() (which modifies mNextVideoTime) and then do so again in this
> if?  If so, is this correct, and what does it mean?

The first if() block handles the case where the previous video frame is supposed to have ended before the current video frame starts. That doesn't make sense, so we write another copy of the previous video frame (not really a copy of course!) with enough duration to catch up the start time of the current video frame. Then, we write the current video frame with a duration up to its stated duration.
Comment 76 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-27 05:32:10 PDT
Created attachment 618994 [details] [diff] [review]
Part 11: refactor code that shuts down mDecoder

This is a prep patch for part 12.
Comment 77 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-27 05:41:34 PDT
Created attachment 619000 [details] [diff] [review]
Part 12: finish output media streams when the decoder is destroyed

As I noted above, I was a bit worried about stability of the media playback capture code. I did some testing and found that seeking doesn't cause significant problems but loading a new media resource does. So this patch just finishes and disconnects all output media streams when the current decoder shuts down.

This doesn't implement the correct behavior in such situations --- that depends on whether we did captureStreamUntilEnded or just captureStream --- but I have a plan for fixing that soon. I just want to get this in so we can land these patches without introducing known crashability.

(The plan to fix it soon is:
-- Land basic ProcessedMediaStream (internal use only)
-- When output is being captured to a stream, make the decoder output to an internal per-decoder MediaStream, starting a new output stream at each seek
-- Attach the decoder output stream as the input to each captureStream/captureStreamUntilEnded
-- Play the video of the internal streams through the media element, so the output stays in sync with the captured streams)
Comment 78 Chris Pearce (:cpearce) 2012-04-29 18:56:08 PDT
Comment on attachment 619000 [details] [diff] [review]
Part 12: finish output media streams when the decoder is destroyed

Review of attachment 619000 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me, but I'm not sure about the MediaStreamGraphImpl change; I don't feel qualified to review that.

::: content/media/test/test_streams_element_capture_reset.html
@@ +56,5 @@
> +
> +var audio = new Audio();
> +for (var i = 0; i < gSmallTests.length; ++i) {
> +  if (audio.canPlayType(gSmallTests[i].type)) {
> +    startTest(gSmallTests[i]);

You could have used getPlayableAudio() from manifest.js to achieve this.
Comment 79 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-04-29 19:47:05 PDT
(In reply to Chris Pearce (:cpearce) from comment #78)
> You could have used getPlayableAudio() from manifest.js to achieve this.

Done.
Comment 82 Daniel Holbert [:dholbert] 2012-05-07 11:26:37 PDT
Pushed followup to fix missing comment-terminator in the emacs modeline for StreamBuffer.cpp:
 https://hg.mozilla.org/integration/mozilla-inbound/rev/2f62ae9d981c
(Now the header for StreamBuffer.cpp matches StreamBuffer.h exactly.)
Comment 83 Ryan VanderMeulen [:RyanVM] 2012-05-07 18:14:29 PDT
Follow-up:
http://hg.mozilla.org/mozilla-central/rev/2f62ae9d981c
Comment 84 Daniel Holbert [:dholbert] 2012-05-11 10:42:34 PDT
I noticed instances of comment 82 (gcc warning: "/*" within comment [-Wcomment]) in 2 other files added here, so I just fixed those in another followup:
 https://hg.mozilla.org/integration/mozilla-inbound/rev/a9f44b9bfa13
(I also verified that those were the last 2 instances of this warning in m-c.)
(Sorry for bugspam)
Comment 85 Matt Brubeck (:mbrubeck) 2012-05-12 08:59:20 PDT
https://hg.mozilla.org/mozilla-central/rev/a9f44b9bfa13
Comment 86 Matt Brubeck (:mbrubeck) 2012-05-16 22:07:37 PDT
This patch regressed the number of static constructors by 1:
http://graphs.mozilla.org/graph.html#tests=[[81,63,18]]&sel=1335718072146.6555,1335796290669.028&displayrange=30&datatype=running
Comment 87 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-16 23:54:58 PDT
The static constructor was added in nsDOMMediaStream. The only global stuff there is DOMCI_DATA and CC boilerplate:
http://mxr.mozilla.org/mozilla-central/source/content/media/nsDOMMediaStream.cpp

So I don't think there's anything I can do about it.
Comment 88 Boris Zbarsky [:bz] 2012-05-17 00:10:14 PDT
Cycle collection boilerplate involves a static constructor for the cycle collection helper class, fwiw.

So yes, not much you can do about it short of restructuring how the cycle collector works.
Comment 89 Matt Brubeck (:mbrubeck) 2012-05-17 07:11:34 PDT
So the static constructor should be fixed by bug 616262, if I understand correctly. Thanks!
Comment 90 Martin Best (:mbest) 2012-06-06 10:52:39 PDT
*** Bug 668449 has been marked as a duplicate of this bug. ***
Comment 91 :Ehsan Akhgari (busy, don't ask for review please) 2012-08-15 10:16:22 PDT
(Note to the documentation team: this feature's documentation may not be targeted at web developers.)
Comment 92 neil@parkwaycc.co.uk 2013-03-21 04:52:38 PDT
Comment on attachment 610004 [details] [diff] [review]
Part 3: Create MediaStream and MediaGraphManager for internal management of real-time media processing

>+    *mStreams.AppendElement() = already_AddRefed<MediaStream>(aStream);
[Presumably you temporarily forgot that we have a templated function dont_AddRef() which saves you from typing <MediaStream> again.]

Note You need to log in before you can comment on or make changes to this bug.