Closed Bug 1109644 Opened 5 years ago Closed 5 years ago

Ever-increasing MediaStreamGraph video delay (and jerkiness) on 36/37

Categories

(Core :: Audio/Video, defect, major)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
firefox36 + fixed
firefox37 --- fixed

People

(Reporter: jesup, Assigned: roc)

References

()

Details

(Keywords: regression)

Attachments

(6 files, 1 obsolete file)

5.69 KB, patch
jesup
: review+
Details | Diff | Splinter Review
1.41 KB, patch
jesup
: review+
Details | Diff | Splinter Review
1.24 KB, patch
jesup
: review+
Details | Diff | Splinter Review
1.48 KB, patch
jesup
: review+
Details | Diff | Splinter Review
1.42 KB, patch
jesup
: review+
Details | Diff | Splinter Review
16.78 KB, patch
jesup
: review+
Details | Diff | Splinter Review
+++ This bug was initially created as a clone of Bug #1061046 +++

On 36 and 37 in uses of gUM + MediaStreams (gum tests, or local video viewing in a call), there's ever-increasing video delay.  After 40 minutes on linux, I had ~2s of video delay, but no significant audio delay.  Similar results on windows, but there after 3-5 minutes the video also gets increasingly jerky (goes down from 15-30fps to ~5fps).  This happens with video or video+audio gum.

This can be seen in a simple gUM test
https://mozilla.github.io/webrtc-landing/gum_test.html

I strongly believe this is a regression from bug 1061046.  The only other things to touch these paths significantly were bug 1080755 (which is in 34 - it was uplifted) and bug 1053130 (it's in 35, and 35 doesn't show this).  The callback-driven MSG is also in 34, so it's not that.
roc - I think this one's yours or KarlT's
Flags: needinfo?(roc)
Summary: Ever-increasing video delay (and jerkiness) on 36/37 → Ever-increasing MediaStreamGraph video delay (and jerkiness) on 36/37
On my machine, I don't see video lag but I do see jerky video.

MediaEngineWebRTCVideoSource::NotifyPull is being called on every rendered video frame (i.e. not frequently enough). But MediaStreamGraphImpl::Process is being called at a higher rate. So for some reason NotifyPull is not being called often enough.
Flags: needinfo?(roc)
It looks like mLastEndTimeVideo is incorrect (too low). It doesn't match the end of the video track in the MSG. That causes us to append too-long video segments to the track in each NotifyPull.
There's some kind of rounding error accumulating in the track length:
### MediaEngineWebRTCVideoSource::NotifyPull changed aLastEndTime from 27392 to 27904
### Track duration changed from 27392 to 27904
### MediaEngineWebRTCVideoSource::NotifyPull changed aLastEndTime from 27904 to 28288
### Track duration changed from 27904 to 28289 <--- this is where we pulled the data from the SourceMediaStream's staging buffer into the track itself, right after calling NotifyPull
Ah, it's MediaEngineWebRTCVideoSource::DeliverFrame appending a one-sample frame periodically. Unfortunately that is not being added to mLastEndTimeVideo!
Attached patch fix (obsolete) — Splinter Review
Assignee: nobody → roc
Attachment #8535359 - Flags: review?(rjesup)
Comment on attachment 8535359 [details] [diff] [review]
fix

Review of attachment 8535359 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaStreamGraph.cpp
@@ +216,5 @@
>          STREAM_LOG(PR_LOG_DEBUG+1, ("SourceMediaStream %p track %d, advancing end from %lld to %lld",
>                                      aStream, data->mID,
>                                      int64_t(dest->GetDuration()),
>                                      int64_t(dest->GetDuration() + data->mData->GetDuration())));
> +        auto old = dest->GetDuration();

Why is this here?  Looks like dead code..

@@ +2342,5 @@
>        ResampleAudioToGraphSampleRate(track, aSegment);
>  
>        // Must notify first, since AppendFrom() will empty out aSegment
>        NotifyDirectConsumers(track, aRawSegment ? aRawSegment : aSegment);
> +      auto old = track->mData->GetDuration();

Why is this here?  Looks like dead code..
Attachment #8535359 - Flags: review?(rjesup) → review+
Needless to say I can't reproduce this crash locally.
The bug was that MediaEngineWebRTCVideoSource (and I guess other sources) are used with more than one SourceMediaStream, each one with a different start time. We tried to use a single mProducedDuration value for all the streams, which doesn't make sense.
Attachment #8541960 - Flags: review?(rjesup) → review+
Attachment #8541961 - Flags: review?(rjesup) → review+
Attachment #8541962 - Flags: review?(rjesup) → review+
Attachment #8541963 - Flags: review?(rjesup) → review+
Attachment #8541964 - Flags: review?(rjesup) → review+
Comment on attachment 8541965 [details] [diff] [review]
Part 6: Remove aLastEndTime parameter from NotifyPull

Review of attachment 8541965 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webrtc/MediaEngineWebRTCAudio.cpp
@@ +402,4 @@
>  {
>    // Ignore - we push audio data
>  #ifdef DEBUG
> +  LOG(("Audio: NotifyPull: aDesiredTime %ld", aDesiredTime));

Let's move this to a higher logging level - add a LOG_FRAMES to match MediaEngineWebRTCVideo.cpp (logs at level 6/DEBUG+1), and use that here.
#define LOG_FRAMES(msg) PR_LOG(GetMediaManagerLog(), 6, msg)
Attachment #8541965 - Flags: review?(rjesup) → review+
[Tracking Requested - why for this release]: Regression, and quite a bad one.
Comment on attachment 8541965 [details] [diff] [review]
Part 6: Remove aLastEndTime parameter from NotifyPull

Approval Request Comment
[Feature/regressing bug #]: 1061046
[User impact if declined]: bad regression in long-running getUserMedia sessions
[Describe test coverage new/current, TBPL]: many automated tests exercise this code
[Risks and why]: pretty low risk
[String/UUID change made/needed]: none
Attachment #8541965 - Flags: approval-mozilla-aurora?
Attachment #8541965 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.