Closed Bug 801843 Opened 7 years ago Closed 7 years ago

Video captured via getUserMedia() isn't correctly inserted into MediaStreams (blocks)

Categories

(Core :: WebRTC: Audio/Video, defect, major)

defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla19
Tracking Status
firefox18 --- affected
firefox19 --- verified

People

(Reporter: jesup, Assigned: jesup)

References

Details

Attachments

(2 files, 2 obsolete files)

Audio isn't slow if we only capture audio.

It appears to be slow even if we assign the result to an audio element, so it's likely an A/V sync issue that's causing the mediastream to slow audio down, perhaps assumptions about timestamps.
Comment on attachment 671859 [details] [diff] [review]
Change how video frames are inserted into getUserMedia streams to remove blocking

The problem was video, not audio.  In order to avoid blocking in MediaStreams, we need to manage frame duplication in the MediaStream insertion code via NotifyPull given the current design of MediaStreams.

We manage a "last ended" time ("end" time of the last frame pushed), and on new frames insert the frame with a duration of 1ms.  Then if MediaStreams needs more video data, we re-send the same image (which roc tells me doesn't actually cause data copying) with the minimum required duration to avoid blocking.
Attachment #671859 - Flags: review?(roc)
Attachment #671859 - Flags: review?(anant)
Summary: Audio is very slow/delayed when capturing with video in one GetUserMedia call → Video captured via getUserMedia() isn't correctly inserted into MediaStreams (blocks)
Comment on attachment 671859 [details] [diff] [review]
Change how video frames are inserted into getUserMedia streams to remove blocking

Looks good. I was going to suggest moving NotifyPull into the MediaEngineVideoSource interface since audio is always pushed, but it looks like it might be useful in certain cases?
Attachment #671859 - Flags: review?(anant) → review+
Comment on attachment 671859 [details] [diff] [review]
Change how video frames are inserted into getUserMedia streams to remove blocking

Review of attachment 671859 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/webrtc/MediaEngineDefault.cpp
@@ +204,5 @@
> +void
> +MediaEngineDefaultVideoSource::NotifyPull(MediaStreamGraph* aGraph,
> +                                          StreamTime aDesiredTime)
> +{
> +  // Ignore - we push audio data

"we push video data"

::: content/media/webrtc/MediaEngineWebRTCVideo.cpp
@@ +116,2 @@
>    VideoSegment segment;
> +  // we only advance the timeline 1ms because video frames inherently have

1 microsecond

@@ +119,3 @@
>    segment.AppendFrame(image.forget(), 1, gfxIntSize(mWidth, mHeight));
>    mSource->AppendToTrack(mTrackID, &(segment));
> +  mLastEndTime += 1;

This is OK, but I don't know why you need to do it.

@@ +130,5 @@
> +MediaEngineWebRTCVideoSource::NotifyPull(MediaStreamGraph* aGraph,
> +                                         StreamTime aDesiredTime)
> +{
> +  (void) aGraph;
> +  (void) aDesiredTime;

Why do we need to fake these?

@@ +288,2 @@
>    mSource->AdvanceKnownTracksTime(STREAM_TIME_MAX);
> +  mLastEndTime = TimeToTicksRoundDown(USECS_PER_S, mSource->GetCurrentTime());

Isn't this always just zero?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #4)
> Comment on attachment 671859 [details] [diff] [review]
> Change how video frames are inserted into getUserMedia streams to remove
> blocking
> 
> Review of attachment 671859 [details] [diff] [review]:
> -----------------------------------------------------------------

> @@ +119,3 @@
> >    segment.AppendFrame(image.forget(), 1, gfxIntSize(mWidth, mHeight));
> >    mSource->AppendToTrack(mTrackID, &(segment));
> > +  mLastEndTime += 1;
> 
> This is OK, but I don't know why you need to do it.

Just trying to keep mLastEndTime matching when the end of the last video frame we pushed was.  At USEC_PER_SEC it's not likely to drift a noticeable amount, but if that were changed I'd like the code to just work.  :-)

> 
> @@ +130,5 @@
> > +MediaEngineWebRTCVideoSource::NotifyPull(MediaStreamGraph* aGraph,
> > +                                         StreamTime aDesiredTime)
> > +{
> > +  (void) aGraph;
> > +  (void) aDesiredTime;
> 
> Why do we need to fake these?

Habit to avoid unused-params warnings.

> @@ +288,2 @@
> >    mSource->AdvanceKnownTracksTime(STREAM_TIME_MAX);
> > +  mLastEndTime = TimeToTicksRoundDown(USECS_PER_S, mSource->GetCurrentTime());
> 
> Isn't this always just zero?

Could be.  Didn't know that was mandated. Any situation where we might not find that to be true some time in the future?  I assumed the stream's currenttime came from the audio track(s) anyways, or rather (or should) come from the audio output.
(In reply to Randell Jesup [:jesup] from comment #5)
> Just trying to keep mLastEndTime matching when the end of the last video
> frame we pushed was.  At USEC_PER_SEC it's not likely to drift a noticeable
> amount, but if that were changed I'd like the code to just work.  :-)

I don't know why we need to push the video frame here at all.

> Habit to avoid unused-params warnings.

I don't think we need to do this in Gecko.

> Could be.  Didn't know that was mandated. Any situation where we might not
> find that to be true some time in the future?  I assumed the stream's
> currenttime came from the audio track(s) anyways, or rather (or should) come
> from the audio output.

I think it's always zero because you're setting up the stream for the first time and nothing can have played yet. Is that true? If it is, let's use zero here. If it's not true, then we need to do some more work to pad the new track with null data up to the current point in the stream (because all tracks, even newly added ones, logically start at the beginning of the stream).
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #6)
> (In reply to Randell Jesup [:jesup] from comment #5)
> > Just trying to keep mLastEndTime matching when the end of the last video
> > frame we pushed was.  At USEC_PER_SEC it's not likely to drift a noticeable
> > amount, but if that were changed I'd like the code to just work.  :-)
> 
> I don't know why we need to push the video frame here at all.

Well, when I didn't no media flowed - but I must have had another bug (I was rather tired...), with that removed it still works. Done.

> > Could be.  Didn't know that was mandated. Any situation where we might not
> > find that to be true some time in the future?  I assumed the stream's
> > currenttime came from the audio track(s) anyways, or rather (or should) come
> > from the audio output.
> 
> I think it's always zero because you're setting up the stream for the first
> time and nothing can have played yet. Is that true? If it is, let's use zero
> here. If it's not true, then we need to do some more work to pad the new
> track with null data up to the current point in the stream (because all
> tracks, even newly added ones, logically start at the beginning of the
> stream).

Yup.  The audio and video should have Start() called together (virtually), but their first data being available may vary widely.  On my Mac the difference is ~1.5 seconds, which is around the audio-video delay.  Trying a solution
Attachment #671859 - Attachment is obsolete: true
Attachment #671859 - Flags: review?(roc)
Comment on attachment 672080 [details] [diff] [review]
Change how video frames are inserted into getUserMedia streams to remove blocking

No audio delay with this.  It's a hack, but it works.  I'd love a better solution.
Attachment #672080 - Flags: review?(roc)
Comment on attachment 672080 [details] [diff] [review]
Change how video frames are inserted into getUserMedia streams to remove blocking

Review of attachment 672080 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/webrtc/MediaEngineDefault.cpp
@@ +204,5 @@
> +void
> +MediaEngineDefaultVideoSource::NotifyPull(MediaStreamGraph* aGraph,
> +                                          StreamTime aDesiredTime)
> +{
> +  // Ignore - we push audio data

"we push video data"

::: content/media/webrtc/MediaEngineWebRTCVideo.cpp
@@ +136,5 @@
> +    LOG(("NotifyPull, target = %lu, delta = %lu", (uint64_t) target, (uint64_t) delta));
> +#endif
> +    segment.AppendFrame(image.forget(), delta, gfxIntSize(mWidth, mHeight));
> +    mSource->AppendToTrack(mTrackID, &(segment));
> +    mLastEndTime = target;

I suspect the problem is that this code doesn't run if mImage is null, which it will be until we've received the first image.

I think you can fix your problems simply by removing the "if (mImage)". It's OK to pass a null Image pointer to AppendFrame (that should be documented, although VideoFrame documents that its mImage can be null).

@@ +308,5 @@
> +  PR_Free(frame);
> +
> +  // AddTrack takes ownership of segment
> +  VideoSegment *segment = new VideoSegment();
> +  segment->AppendFrame(image.forget(), 1, gfxIntSize(mWidth, mHeight));

Just call segment->AppendNullData(duration) to pad with "no image". You don't need to create an image at all. But I'm not sure how this helps since it's only one microsecond long...
I'm assuming your desired behavior before we receive the first camera image is that audio plays and there is no image.
correct.  passing null seems to work fine if we don't have an image yet, with no AddTrack-time push.  so I think we're about ready!
Thanks for the pointer; I would not have assumed a NULL image was allowed.  Hopefully this should be a minimal patch.  Audio is smooth and low-delay
Assignee: nobody → rjesup
Attachment #672080 - Attachment is obsolete: true
Attachment #672080 - Flags: review?(roc)
Attachment #672113 - Flags: review?(roc)
Comment on attachment 672113 [details] [diff] [review]
Final patch.  No temp frames created.

Review of attachment 672113 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with those changes

::: content/media/webrtc/MediaEngineWebRTCVideo.cpp
@@ +277,5 @@
> +
> +  // Allocate a single blank Image
> +  ImageFormat format = PLANAR_YCBCR;
> +
> +  nsRefPtr<layers::Image> image = mImageContainer->CreateImage(&format, 1);

The above new lines are not used. Remove.

@@ +284,2 @@
>    mSource->AdvanceKnownTracksTime(STREAM_TIME_MAX);
> +  mLastEndTime = TimeToTicksRoundDown(USECS_PER_S, mSource->GetCurrentTime());

This should be zero. Right? I think we should just use zero here. GetCurrentTime isn't the right thing to use in any case, since it returns how much has been played (as relayed to the main thread) rather than how much has been buffered.
Attachment #672113 - Flags: review?(roc) → review+
With nits fixed.
Includes check of mState against kStarted in NotifyPull() r+'d by roc on IRC

https://hg.mozilla.org/integration/mozilla-inbound/rev/b5964726ad30
Adds audio+video button.
Properly sets mozSrcObject to null when you hit stop.
Anant, please update your page... You can wait to enable audio+video until that hits m-c or nightly.  You could also detect Aurora and not display it or warn people it won't work (until we uplift)
Whiteboard: [getUserMedia] [blocking-gum+]
https://hg.mozilla.org/mozilla-central/rev/b5964726ad30
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Whiteboard: [getUserMedia] [blocking-gum+]
Target Milestone: --- → mozilla19
Should this have a test?
Flags: in-testsuite?
(In reply to Ryan VanderMeulen from comment #18)
> Should this have a test?

Would be helpful, yes. Randell, should this be an unit test?
Yeah, I'm not sure how to test this reasonably, hard even from a unit test.

basically, we'd need to know if the MediaStream blocks periodically while capturing video from a real camera (or reasonable facsimile thereof)
Depends on: 803086
Keywords: verifyme
Tested this on Win 7 with the audio/video test page here - https://people.mozilla.com/~anarayanan/webrtc/gum_test.html, but no dice on this entirely working. It appears I'm hitting the followup bug - bug 803799.
Keywords: verifyme
Whiteboard: [qa verification failed]
Well, this problem/fix can be checked two ways: One, video+audio captures will be horribly distorted and delayed/slowed down (though not in frequency).  Two, if you capture *just* a video stream, the "blockedTime=x.xxxxx" will be frequently non-zero.  This can be done even if the audio is delayed on Windows and occasionally mac due to bug 803799.
(In reply to Randell Jesup [:jesup] from comment #22)
> Well, this problem/fix can be checked two ways: One, video+audio captures
> will be horribly distorted and delayed/slowed down (though not in
> frequency).  Two, if you capture *just* a video stream, the
> "blockedTime=x.xxxxx" will be frequently non-zero.  This can be done even if
> the audio is delayed on Windows and occasionally mac due to bug 803799.

Ah. What I saw when I tested this was the slow down effect you've described - if I clap into my mic, I'll hear it seconds later with an audio/video gUM call. If I was just viewing it with audio only, then I hear the clap immediately on generating the clap.

I think we're on the same page here, but if I understand this fix correctly, then the slowness/delay I'm seeing is expected as of a result of this patch then, right? Is this intended behavior then? Can you explain why?
It wasn't expected that his patch would leave a second-plus delays on Windows, but a side-effect of issues with the design of where the camera gets enabled.  We needed to move that to a thread that can block.   This is bug 803799.  We didn't expect the Windows camera driver to block us for 1 second every time we try to open it...  (And Mac to semi-randomly block us, though not usually.)

Before this patch combined video+audio was horrid (and video had issues too, though they were mostly internal or not noticable).
Okay. Makes sense. I'll mark this as verified then.
Status: RESOLVED → VERIFIED
Whiteboard: [qa verification failed]
You need to log in before you can comment on or make changes to this bug.