Closed Bug 779715 Opened 12 years ago Closed 12 years ago

Create ProcessedMediaStream infrastructure and use it to fix media element mozCaptureStream

Categories

(Core :: Web Audio, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: roc, Assigned: roc)

References

Details

Attachments

(8 files)

54.82 KB, patch
jesup
: review+
Details | Diff | Splinter Review
13.37 KB, patch
jesup
: review+
Details | Diff | Splinter Review
6.31 KB, patch
jesup
: review+
Details | Diff | Splinter Review
21.11 KB, text/plain
jesup
: review+
Details
21.65 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
50.56 KB, patch
cpearce
: review+
jesup
: review+
Details | Diff | Splinter Review
4.77 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
3.54 KB, patch
jesup
: review+
Details | Diff | Splinter Review
Currently the media element methods mozCaptureStream and mozCaptureStreamUntilEnded are quite broken when the media element seeks or restarts playback after ending.

I'm fixing this by introducing stream processing and creating a TrackUnionStream that aggregates the tracks of its input streams. Then I make nsBuiltinDecoder create a new SourceMediaStream each time we play back the resource, i.e., after each seek or end it creates a new SourceMediaStream. mozCaptureStream creates a TrackUnionStream and each SourceMediaStream created for the element is wired up as the sole input to all the element's TrackUnionStreams. mozCaptureStreamUntilEnded behaves the same way, except that when resource playback actually ends we forget about the TrackUnionStream so that no future SourceMediaStream is added as an input.

TrackUnionStream will also be useful for WebRTC. The ProcessedMediaStream superclass will be useful for Web Audio.
No longer blocks: 779721
Depends on: 779721
I have put the entire implementation in the header file since it's only included from MediaStreamGraph.cpp and we don't have to separate definitions from declaratoins that way. I could break a lot of it out into TrackUnionStream.cpp if you think that's better.
Attachment #648904 - Flags: review?(rjesup)
Attachment #648908 - Flags: review?(cpearce) → review+
Comment on attachment 648909 [details] [diff] [review]
Part 6: Rework capturing MediaStreams from media elements to use TrackUnionStreams

Review of attachment 648909 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoder.cpp
@@ +124,5 @@
> +{
> +  NS_ASSERTION(NS_IsMainThread(), "Should be on main thread.");
> +  mReentrantMonitor.AssertCurrentThreadIn();
> +  LOG(PR_LOG_DEBUG, ("nsBuiltinDecoder::RecreateDecodedStream this=%p aStartTimeUSecs=%lld!",
> +                     this, (long long)aStartTimeUSecs));

You shouldn't need the (long long) cast?

@@ +689,5 @@
> +      }
> +    }
> +  }
> +
> +  printf("nsBuiltinDecoder::PlaybackEnded this=%p mPlayState=%d\n", this, mPlayState);

This should use LOG, not printf.

@@ +908,5 @@
>        ChangeState(PLAY_STATE_SEEKING);
>        seekWasAborted = true;
>      } else {
>        UnpinForSeek();
> +      printf("nsBuiltinDecoder::SeekingStopped, this=%p next state=%d\n", this, mNextState);

Use LOG not printf.

@@ +992,5 @@
> +      mDecodedStream->mStream->ChangeExplicitBlockerCount(blockForPlayState ? 1 : -1);
> +      mDecodedStream->mHaveBlockedForPlayState = blockForPlayState;
> +    }
> +  }
> +  printf("nsBuiltinDecoder::ChangeState this=%p prev=%d new=%d\n", this, mPlayState, aState);

LOG not printf.

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +571,2 @@
>  {
>    mDecoder->GetReentrantMonitor().AssertCurrentThreadIn();

It would be good if you could assert what thread this runs on; it's a good way to document what's going on.
Attachment #648909 - Flags: review?(cpearce) → review+
Comment on attachment 648912 [details] [diff] [review]
Part 7: Update test_streams_element_capture_reset.html to test new functionality

Review of attachment 648912 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/test/test_streams_element_capture_reset.html
@@ +45,5 @@
>    };
> +  function endedAfterSeek() {
> +    isWithinEps(v.currentTime, test.duration, "checking v.currentTime at second 'ended' event");
> +    isWithinEps(vout.currentTime, (v.currentTime - seekTime) + test.duration,
> +	            "checking vout.currentTime after seeking and playing through again");

Looks like there's tabs messing up indentation here.

@@ +58,3 @@
>      v.removeEventListener("seeked", seeked, false);
> +    v.addEventListener("ended", endedAfterSeek, false);
> +	v.play();

So I'm guessing vout_untilended should *not* play again now, since the captured stream ended? Maybe you should add checks to ensure that?

Tabs here too.

@@ +61,5 @@
>    };
> +  function ended() {
> +    isWithinEps(vout.currentTime, test.duration, "checking vout.currentTime at first 'ended' event");
> +    isWithinEps(v.currentTime, test.duration, "checking v.currentTime at first 'ended' event");
> +	is(vout.ended, false, "checking vout has not ended");

Indentation seems off here? Are there tabs instead of spaces inside this function?
Attachment #648912 - Flags: review?(cpearce) → review+
Comment on attachment 648902 [details] [diff] [review]
Part 1: Add basic support for ProcessedMediaStreams

Review of attachment 648902 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaStreamGraph.h
@@ +625,5 @@
> +  }
> +
> +  // Called on graph manager thread
> +  // Do not call these from outside MediaStreamGraph.cpp!
> +  void Init();

Is there a convenient way to enforce this or flag it if done from elsewhere?
Attachment #648902 - Flags: review?(rjesup) → review+
Comment on attachment 648904 [details] [diff] [review]
Part 2: Create TrackUnionStream

Review of attachment 648904 [details] [diff] [review]:
-----------------------------------------------------------------

All looks good to me as best I can tell.

::: content/media/TrackUnionStream.h
@@ +15,5 @@
> +#else
> +#define LOG(type, msg)
> +#endif
> +
> +class TrackUnionStream : public ProcessedMediaStream {

Not that it's an issue, but is there a reason the entire TrackUnionStream impl is in the .h file?  If there is a specific reason, perhaps that should be noted; if it's just convenience then I don't think any note is needed.
Attachment #648904 - Flags: review?(rjesup) → review+
Attachment #648905 - Flags: review?(rjesup) → review+
Comment on attachment 648906 [details]
Part 4: Play all tracks of a media stream with multiple tracks

"For video, we pick the last track that has a video frame for the current time, and display that."

+  for (StreamBuffer::TrackIter tracks(aStream->GetStreamBuffer(), MediaSegment::VIDEO);
+       !tracks.IsEnded(); tracks.Next()) {
+    VideoSegment* segment = tracks->Get<VideoSegment>();
+    TrackTicks thisStart;
+    const VideoFrame* thisFrame =
+      segment->GetFrameAt(tracks->TimeToTicksRoundDown(frameBufferTime), &thisStart);
+    if (thisFrame && thisFrame->GetImage()) {
+      start = thisStart;
+      frame = thisFrame;
+      track = tracks.get();
+    }
   }

I don't think it matter a whole lot, but if there are multiple video tracks, then this will get and throw away frames from all but the last, which seems... inefficient.  Also, isn't normally priority given to the first track not the last, up to now?

Does this mean it's impossible to process all the video tracks in a MediaStream from MediaStream Processing?  Can we take the individual tracks and break them into separate MediaStreams?  Can we process all of them, and then break them out?  Can we write a JS process filter that swizzles the video to show the track that has the most motion?  (Think security system with 8 cameras, or conference system with 4 cameras.)  Etc....

All that said, r+ on the code itself; the above are design questions.
Attachment #648906 - Flags: review?(rjesup) → review+
Comment on attachment 648909 [details] [diff] [review]
Part 6: Rework capturing MediaStreams from media elements to use TrackUnionStreams

Review of attachment 648909 [details] [diff] [review]:
-----------------------------------------------------------------

Great comments.  All code should have ones like these.

::: content/media/nsBuiltinDecoder.cpp
@@ +108,5 @@
> +    // During cycle collection, nsDOMMediaStream can be destroyed and send
> +    // its Destroy message before this decoder is destroyed. So we have to
> +    // be careful not to send any messages after the Destroy().
> +    if (!os.mStream->IsDestroyed()) {
> +      os.mStream->ChangeExplicitBlockerCount(1);

I see the reentrantMonitor is held here.  Is that sufficient to avoid race conditions with the Cycle Collector destroying nsDOMMediaStream?  I.e. does nsDOMMediaStream hold mReentrantMonitor?

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +571,2 @@
>  {
>    mDecoder->GetReentrantMonitor().AssertCurrentThreadIn();

Chris suggested asserting the thread this runs on, but I think the point of holding the ReentrantMonitor is that this can be called from more than one thread.

@@ +1493,1 @@
>    ScheduleStateMachine();

Vague concern: what if there are multiple Seek() calls before the state machine runs?  I.e. multiple RecreateDecodedStreams(), with (I assume) only the last one actually happening.  It may well be fine.
Attachment #648909 - Flags: review?(rjesup) → review+
(In reply to Chris Pearce (:cpearce) from comment #11)
> ::: content/media/nsBuiltinDecoder.cpp
> @@ +124,5 @@
> > +{
> > +  NS_ASSERTION(NS_IsMainThread(), "Should be on main thread.");
> > +  mReentrantMonitor.AssertCurrentThreadIn();
> > +  LOG(PR_LOG_DEBUG, ("nsBuiltinDecoder::RecreateDecodedStream this=%p aStartTimeUSecs=%lld!",
> > +                     this, (long long)aStartTimeUSecs));
> 
> You shouldn't need the (long long) cast?

I don't know that we're allowed to assume PRInt64 == long long. This is safer.

> > +  printf("nsBuiltinDecoder::PlaybackEnded this=%p mPlayState=%d\n", this, mPlayState);
> 
> This should use LOG, not printf.

I just removed it.

> @@ +908,5 @@
> > +      printf("nsBuiltinDecoder::SeekingStopped, this=%p next state=%d\n", this, mNextState);
> 
> Use LOG not printf.

I just removed it.

> @@ +992,5 @@
> > +  printf("nsBuiltinDecoder::ChangeState this=%p prev=%d new=%d\n", this, mPlayState, aState);
> 
> LOG not printf.

I just removed it.

> ::: content/media/nsBuiltinDecoderStateMachine.cpp
> @@ +571,2 @@
> >  {
> >    mDecoder->GetReentrantMonitor().AssertCurrentThreadIn();
> 
> It would be good if you could assert what thread this runs on; it's a good
> way to document what's going on.

Done.
(In reply to Chris Pearce (:cpearce) from comment #12)
> So I'm guessing vout_untilended should *not* play again now, since the
> captured stream ended? Maybe you should add checks to ensure that?

Done. Everything else fixed.
(In reply to Randell Jesup [:jesup] from comment #13)
> > +  // Called on graph manager thread
> > +  // Do not call these from outside MediaStreamGraph.cpp!
> > +  void Init();
> 
> Is there a convenient way to enforce this or flag it if done from elsewhere?

It's a bit of a pain, because I call this from a class that's local to a function so there's no wait to friend it.
(In reply to Randell Jesup [:jesup] from comment #14)
> Not that it's an issue, but is there a reason the entire TrackUnionStream
> impl is in the .h file?  If there is a specific reason, perhaps that should
> be noted; if it's just convenience then I don't think any note is needed.

Added a comment. It's only included from MediaStreamGraph.cpp so it's a bit simpler and less code to put it all in the .h file.

(In reply to Randell Jesup [:jesup] from comment #15)
> I don't think it matter a whole lot, but if there are multiple video tracks,
> then this will get and throw away frames from all but the last, which
> seems... inefficient.

Adding the ability to iterate tracks in reverse order just for this case seems not worthwhile.

> Also, isn't normally priority given to the first
> track not the last, up to now?

I think the behavior we actually want is to composite the tracks in track order so later tracks are rendered on top of earlier tracks. Since currently all our images are opaque, that means choosing the last one (modulo some issues around image sizes).

> Does this mean it's impossible to process all the video tracks in a
> MediaStream from MediaStream Processing?

No, this has nothing to do with processing. This only affects what gets rendered when a MediaStream is used as the source for a <video>.

> Can we take the individual tracks
> and break them into separate MediaStreams?  Can we process all of them, and
> then break them out?  Can we write a JS process filter that swizzles the
> video to show the track that has the most motion?  (Think security system
> with 8 cameras, or conference system with 4 cameras.)  Etc....

We can still do all of those things.
(In reply to Randell Jesup [:jesup] from comment #16)
> ::: content/media/nsBuiltinDecoder.cpp
> @@ +108,5 @@
> > +    // During cycle collection, nsDOMMediaStream can be destroyed and send
> > +    // its Destroy message before this decoder is destroyed. So we have to
> > +    // be careful not to send any messages after the Destroy().
> > +    if (!os.mStream->IsDestroyed()) {
> > +      os.mStream->ChangeExplicitBlockerCount(1);
> 
> I see the reentrantMonitor is held here.  Is that sufficient to avoid race
> conditions with the Cycle Collector destroying nsDOMMediaStream?  I.e. does
> nsDOMMediaStream hold mReentrantMonitor?

It doesn't. I'm not sure what race conditions you're worried about, since this code and the cycle collector both run on the main thread.

> ::: content/media/nsBuiltinDecoderStateMachine.cpp
> @@ +571,2 @@
> >  {
> >    mDecoder->GetReentrantMonitor().AssertCurrentThreadIn();
> 
> Chris suggested asserting the thread this runs on, but I think the point of
> holding the ReentrantMonitor is that this can be called from more than one
> thread.

No, in quite a few places we hold the monitor and ensure we're only running on the main thread. We need to exclude other threads (decoder and state machine threads) from touching the decoder while we do our work.

> @@ +1493,1 @@
> >    ScheduleStateMachine();
> 
> Vague concern: what if there are multiple Seek() calls before the state
> machine runs?  I.e. multiple RecreateDecodedStreams(), with (I assume) only
> the last one actually happening.  It may well be fine.

It is fine. ScheduleStateMachine is idempotent.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #20)
> (In reply to Randell Jesup [:jesup] from comment #15)
> > I don't think it matter a whole lot, but if there are multiple video tracks,
> > then this will get and throw away frames from all but the last, which
> > seems... inefficient.
> 
> Adding the ability to iterate tracks in reverse order just for this case
> seems not worthwhile.

Given I think this returns a single frame, from the last track that provided one, wouldn't it makes sense to instead go backwards and return the first track that gives you one?  That said: multiple video tracks (especially >2) are very rare and I doubt ever will be common, though some specific usercases will use them), and if GetFrameAt() just returns a reference to it and doesn't actually copy data, then this is not worth worrying about.

> > Also, isn't normally priority given to the first
> > track not the last, up to now?
> 
> I think the behavior we actually want is to composite the tracks in track
> order so later tracks are rendered on top of earlier tracks. Since currently
> all our images are opaque, that means choosing the last one (modulo some
> issues around image sizes).

If we're compositing by default, the <video> tag spec needs a LOT of updating.  It currently provides several mechanisms for selecting a single track (of video), and none that appear to allow selecting multiple tracks at once.  If it did allow that, you'd really want the ability to place them independently, support for (partial) transparency, swizzling the order, etc, directly in the <video> tag (and spec).

I could be wrong; you've lived this *far* more than I, but that's what I'd understood, and what reading the spec now to check appears to uphold.
(In reply to Randell Jesup [:jesup] from comment #23)
> Given I think this returns a single frame, from the last track that provided
> one, wouldn't it makes sense to instead go backwards and return the first
> track that gives you one?

Yes, that code would be slightly simpler than what's here, but it would require us to add an API to iterate through tracks in reverse order. Currently we have no other use for that API and maybe we'll never have another use for it. So I don't think it's worth it at this time.

> if GetFrameAt() just returns a reference to it
> and doesn't actually copy data, then this is not worth worrying about.

Yes, we never copy Images. There isn't even an API to do that.

> > I think the behavior we actually want is to composite the tracks in track
> > order so later tracks are rendered on top of earlier tracks. Since currently
> > all our images are opaque, that means choosing the last one (modulo some
> > issues around image sizes).
> 
> If we're compositing by default, the <video> tag spec needs a LOT of
> updating.  It currently provides several mechanisms for selecting a single
> track (of video), and none that appear to allow selecting multiple tracks at
> once.

True.

> If it did allow that, you'd really want the ability to place them
> independently, support for (partial) transparency, swizzling the order, etc,
> directly in the <video> tag (and spec).

I don't want to go down that path. The main use for compositing video tracks that I've thought of is that we could have <canvas> elements produce a video track of their contents, and then you could use composite video to create "burned in" captions, special effects, etc.

But maybe we shouldn't bother with that. We could just render one video track and support the compositing use-cases with a special built-in ProcessedMediaStream subclass that composites video tracks together. Or, we could support it later with real-time JS video stream processing.

I can post a followup patch to select the first video track instead of the last, if you think that's the right way to go. I don't really mind.
Attachment #653691 - Flags: review?(rjesup) → review+
I forgot to add [leave open].
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 786081
Mass moving Web Audio bugs to the Web Audio component.  Filter on duckityduck.
Component: Video/Audio → Web Audio
Depends on: 898967
Depends on: 966800
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: