Closed Bug 884365 Opened 11 years ago Closed 11 years ago

Audio realtime input clock mismatch (drift) blows up delay in MediaStreamGraph

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

22 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla26
Tracking Status
firefox22 --- wontfix
firefox23 - wontfix
firefox24 - wontfix
firefox25 + fixed
firefox26 --- verified

People

(Reporter: jesup, Assigned: jesup)

References

Details

(Keywords: verifyme, Whiteboard: [getUserMedia] [blocking-gum-] [desktop+android verification needed])

Attachments

(5 files, 7 obsolete files)

8.75 KB, patch
Details | Diff | Splinter Review
3.15 KB, patch
roc
: review+
Details | Diff | Splinter Review
33.91 KB, patch
jesup
: review+
Details | Diff | Splinter Review
14.86 KB, patch
Details | Diff | Splinter Review
25.38 KB, patch
roc
: review+
Details | Diff | Splinter Review
We have a huge problem with audio latency building up over time in MSG (or, conversely, with it having to insert silence on underflows) when the microphone input and MSG don't share the same clock (which is common, as different hardware devices often will have different clocks)

MSG is currently clocked off the system clock, delivering 10ms of audio data for each 10ms of system clock time.  This will be changed to clock off the output clock of cubeb.  Neither is guaranteed to be the same as any given microphone, even if both are part of the same headset.  For added fun, hardware clockrates themselves can (slowly) drift.

Resampling the input to the system clock rate will work, but can cause problems with the AEC, especially if the resampling rate is not *extremely* consistent.

* A simple feedback loop modifying the resampling rate (above and below the "long-term" estimate of the clockrate skew) will control delay buildup, but will cause the resampling rate to vary significantly from moment to moment, which the AEC very likely can't handle.  

* A highly damped PLL on the sampling rate with input from the delay measurement may find a more stable average sampling rate (eventually), but likely will react too slowly to control delay in any reasonable time period.

* We can't drop/duplicate data to control delay since that will require a AEC reset and re-converge each time (it's as if you suddenly moved the mic forward or back).  We could do so *after* the AEC, but the AEC currently lives in PeerConnection, which is after MSG, and MSG will buffer and delay.

* Moving the AEC to getUserMedia (before MSG) would allow us to do that (resample before, drop/add data after) and is planned anyways.  However, that's noticeably more work to accomplish.  Combined with a (highly) damped PLL for resampling per above to try to minimize the amount of drop/dups long-term, this may work.

* Saving measurements of previous mic-specific clock skew factors may help improve call-start performance

* Avoiding reclocking in the MSG (when directly connected to PeerConnection, for example) would push the problem off to the AEC and to the far-end's jitter buffer and time-base-corrector, which is the "classic" VoIP method of dealing with clock skew.  However, MSG is designed to reclock always, and even a special-case for direct connection to PeerConnection would not solve the problem for local connections to other objects.  It might also cause problems for a/v sync.

** To resolve this, a modified MSG to allow an output to get "raw" inputs (no reclocking or queuing) while not affecting other users would help, but you still could get "infinite" buffer buildup in MSG (to non-realtime outputs).  This could be controlled by adding a resampler to data that goes into the graph's buffers, while providing immediate non-resampled non-reclocked input to destinations that want it.  I.e.
    AppendToTrack(trackID, segment, rawdata) (rawdata would be optional)
where rawdata is provided in place of the segment data to realtime consumers only.  Non-realtime outputs would get the segment data, queued and reclocked, and would have a resampler and delay-control logic.  Note that unless there's a consumer that needs the buffered/reclocked data, we wouldn't bother doing that and resampling it.

Realtime consumers would register by implementing NotifyUnqueuedTrackChanges().  They would get rawdata if available, or the normal segment data (for example if attached to a non-realtime source).

Realtime sources would call call AddTrack(trackid, rate, start, segment, realtime, resample); (realtime would default to false, as would resample).  Realtime sources should always allow resampling unless you're sure they share a clock with MSG.  We might still avoid delay buildup in realtime sources without resampling by dropping/adding buffers.

To avoid sync problems, PeerConnection/MediaPipeline would register for NotifyUnqueueTrackChanges().  To avoid excess drift at the AEC (currently in PeerConnection) between input and output, a highly-damped PLL would control the existing pre-AEC resampler (this resampler is currently disabled).  Expansion or contraction of the output would not matter, as the far-end's jitter buffer and Time Base Correction would reclock the audio there.

This would be followed by moving the AEC (with its input resampler) to getUserMedia later.

------------------
So, to summarize, my proposed plan is:

1) Add support for unqueued outputs from MSG
 a) Modify the control code for the existing pre-AEC resampler to correct for drift (highly damped)
 b) Add a resampler similar to my current prototype (with delay control via resampling variation or sample dropping/dupping) to queued data from realtime sources.  This would only run if there were non-realtime sinks; we would do something to know this.

2) Move the AEC to getUserMedia
 a) If possible (and it may not) combine the resamplers 

#1 will solve the delay buildup problem AND will remove 15-25ms of inherent delay in MSG

#2 will solve the problem of not cancelling echos from *other* peerconnections (see 3-way calls), and not cancelling echoes from other audio outputs from the browser (see YouTube).  It also will make it easier to use OS driver echo cancelers (recent windows versions apparently have them; android may well, though I've heard they only function at low sample rates).  It *may* allow us to merge the two resamplers.
Depends on: 818822
Blocks: 881785
http://kokkinizita.linuxaudio.org/papers/adapt-resamp.pdf

Relevant paper about controlling a resampler while also controlling delay.  It's unclear how much of this is useful, as we have a slightly different usecase here, but a lot of the discussion in the paper is relevant, and if we can implement a resampler control here equivalent to this (or even close), it should control delay in a way that an AEC would likely be happy with.  (Jean-Marc, do you agree?)
Flags: needinfo?(jmvalin)
This looks like a reasonable starting point. The way I see this, there are three parts to solving this problem:
1) Getting actual timing info. As far as I understand, the GIPS stack already does that for Windows and the papers above discuss how to do this on Linux (haven't checked if it's the same)
2) Reliably estimating the skew from the data. Looking at the GIPS code, even though I did not understand all the details of it, they seem to have a carefully tuned robust estimator for this. It's worth checking if it's worth reusing for other operating systems.
3) Actually controlling the sampling rate based on 2) and the delay. I'm not sure if the GIPS stack does it, but the paper you link to seems to address the issue.
Flags: needinfo?(jmvalin)
(In reply to Randell Jesup [:jesup] from comment #0)
> * Moving the AEC to getUserMedia (before MSG) would allow us to do that
> (resample before, drop/add data after) and is planned anyways.  However,
> that's noticeably more work to accomplish.  Combined with a (highly) damped
> PLL for resampling per above to try to minimize the amount of drop/dups
> long-term, this may work.

I think moving the AEC is significantly less work than the rest of the plan outlined in this bug, and we know we have to do it anyway, so I don't understand why you're putting it off.

> ** To resolve this, a modified MSG to allow an output to get "raw" inputs
> (no reclocking or queuing) while not affecting other users would help, but
> you still could get "infinite" buffer buildup in MSG (to non-realtime
> outputs).  This could be controlled by adding a resampler to data that goes
> into the graph's buffers, while providing immediate non-resampled
> non-reclocked input to destinations that want it.  I.e.

Given that we have AEC CPU consumption issues on B2G already, your proposal sounds like it will resample twice, at least in the case any of the media gets processed before being sent to the PC, and using any audio effect at all add a bunch of delay. This seems bad.

> share a clock with MSG.  We might still avoid delay buildup in realtime
> sources without resampling by dropping/adding buffers.

Maybe I'm confused, but doesn't dropping/adding buffers have to happen _after_ AEC? It sounds like you're proposing to do this before the data enters GIPS.

> Expansion or contraction of the output would not matter, as the far-end's
> jitter buffer and Time Base Correction would reclock the audio there.

Well, that's not exactly true. One thing you're going to have to deal with is that audio buffers are being passed around in discrete chunks of, e.g., 10 ms. Once a chunk goes through a resampler (at any layer), it's no longer 10 ms worth of data, but we have a packetization time that's a multiple of 10 ms. So every trip through a resampler adds delay.
(In reply to Timothy B. Terriberry (:derf) from comment #5)
> (In reply to Randell Jesup [:jesup] from comment #0)
> > * Moving the AEC to getUserMedia (before MSG) would allow us to do that
> > (resample before, drop/add data after) and is planned anyways.  However,
> > that's noticeably more work to accomplish.  Combined with a (highly) damped
> > PLL for resampling per above to try to minimize the amount of drop/dups
> > long-term, this may work.
> 
> I think moving the AEC is significantly less work than the rest of the plan
> outlined in this bug, and we know we have to do it anyway, so I don't
> understand why you're putting it off.

This is gated on access to the speaker stream, and reclocking MSG.  I don't know the eta on those landing.

> > ** To resolve this, a modified MSG to allow an output to get "raw" inputs
> > (no reclocking or queuing) while not affecting other users would help, but
> > you still could get "infinite" buffer buildup in MSG (to non-realtime
> > outputs).  This could be controlled by adding a resampler to data that goes
> > into the graph's buffers, while providing immediate non-resampled
> > non-reclocked input to destinations that want it.  I.e.
> 
> Given that we have AEC CPU consumption issues on B2G already, your proposal
> sounds like it will resample twice, at least in the case any of the media
> gets processed before being sent to the PC, and using any audio effect at
> all add a bunch of delay. This seems bad.

ehsan and I talked, and we think WebAudio can be fed and process mic input without added much delay (just chunking to 128 samples and back to 10ms).  This was just an IRC conversation without going into details, however.

> > share a clock with MSG.  We might still avoid delay buildup in realtime
> > sources without resampling by dropping/adding buffers.
> 
> Maybe I'm confused, but doesn't dropping/adding buffers have to happen
> _after_ AEC? It sounds like you're proposing to do this before the data
> enters GIPS

No, dropping/dupping *after* the AEC; sorry I didn't state that.  You certainly can't do that anywhere ahead of the AEC.
 
> > Expansion or contraction of the output would not matter, as the far-end's
> > jitter buffer and Time Base Correction would reclock the audio there.
> 
> Well, that's not exactly true. One thing you're going to have to deal with
> is that audio buffers are being passed around in discrete chunks of, e.g.,
> 10 ms. Once a chunk goes through a resampler (at any layer), it's no longer
> 10 ms worth of data, but we have a packetization time that's a multiple of
> 10 ms. So every trip through a resampler adds delay.

Correct, if the resampler is (or has been) active; if it stays off or if you're careful to turn it off at a 10ms boundary when possible you can avoid that adding delay.  Average delay for this would be 5ms (0-10 depending).
Resampler for feeding to MSG that controls delay (in a somewhat ugly fashion) and will likely not be compatible with an AEC.  Tested on a laptop with a 0.24% fast clock, and on my Linux box (not sure of clock rate, though seemed a little fast).  Control loop/PLL/delay-adjustments work, but are ugly and at least could use further tuning and testing on more systems; far preferable would be a Jack-like resampler, or at least a better PLL.
rename NotifyQueuedTrackChanges() to NotifyTrackChanges() in preparation for supporting both queued and unqueued notifications ('bypass')
Implement 'bypass' of MSG queuing for realtime sinks (i.e. PeerConnection).  Does not yet handle mid-call graph changes, though most of the support for that is there (need to kick off a rebuilding of the mRealtimeListener arrays when a TrackUnion input changes for all potentially affected SourceMediaStreams).  Also, my latest changes to clean things up seem to have introduced a bug I'm tracking down, so don't try running this yet (or expect quality issues and an eventual crash).
Attachment #765645 - Flags: review?(roc)
Attachment #765649 - Flags: feedback?(roc)
Comment on attachment 765644 [details] [diff] [review]
Use resampler from AEC to control clock drift and MSG buffering/delay (Obsolete)

f? to Jean-Marc, who will probably lose his dinner over this control loop. ;-)

It's more of a proof-of-concept that we can control delay in the MSG here with a resampler.
Attachment #765644 - Flags: feedback?(jmvalin)
Based on the patch I'm seeing here, this might not be safe for Beta - this looks like a lot of cross-cutting changes across WebRTC and content/media, which could have regression risk. I do understand this is important, but I think an analysis of risk vs. value is needed before figuring out if this is worth tracking for Beta (Fx23).
(In reply to Jason Smith [:jsmith] from comment #11)
> Based on the patch I'm seeing here, this might not be safe for Beta - this
> looks like a lot of cross-cutting changes across WebRTC and content/media,
> which could have regression risk. I do understand this is important, but I
> think an analysis of risk vs. value is needed before figuring out if this is
> worth tracking for Beta (Fx23).

The patchset here currently likely would not be nominated for uplift.  A simpler solution might be.
So, the worst root problem here is that on systems/devices using 44100 Hz (Lenovo W520 on Windows by default!), the internal webtc.org code lies and says it's 44000Hz, to avoid having to do an ugly resample in their fixed-rate resampler (it supports about 20 fixed ratios).  They then compensate for that in the floating-point sampler on the input to the AEC.  (Ugh)  There still is a need to control clock drift and mismatch, so delay control-resampling will still be needed, though likely with tighter operating ranges.

After looking all over the code and talking jean-marc, I'm trying replacing the fixed-ratios-resampler on the input with a speex resampler, and removing all the special-casing of 44100/44000
Depends on: 886886
Comment on attachment 765649 [details] [diff] [review]
add 'bypass' of the MSG queuing for listeners who want realtime input WIP

Review of attachment 765649 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaSegment.h
@@ +110,5 @@
> +  virtual bool DeliveredRealtime() = 0;
> +  /**
> +   * Set realtime delivery status
> +   */
> +  virtual void SetDeliveredRealtime(bool aStatus) = 0;

This kind of statefulness is nasty, sorry. Can we possibly avoid it?
Will track for a simpler, lower risk solution for branches.  We should aim to get this landed within the next couple of weeks though so there's some bake time on Beta and also opportunity to back out if it destabilizes the product.
Comment on attachment 765644 [details] [diff] [review]
Use resampler from AEC to control clock drift and MSG buffering/delay (Obsolete)

Note: this patch is partly invalidated by the work in bug 886886 which deals with the primary source of skew.  Remaining skew will be (very) small and delay buildup or slide towards underrun will be corresponding slow (and except maybe for underrun, can be corrected fairly slowly).

Per IRC discussion with jmspeex, this rough control loop should be replaced with a (tuned) PID controller (or in this case more likely a PI controller).  Right now it's only very roughly proportional (0 and two steps on either side).  However, given the likely remaining magnitude, PI may not be needed; fixed-rate correction (different above and below 0 though) is likely all that is needed.
Attachment #765644 - Flags: feedback?(jmvalin)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #14)
> Comment on attachment 765649 [details] [diff] [review]
> add 'bypass' of the MSG queuing for listeners who want realtime input WIP
> 
> Review of attachment 765649 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: content/media/MediaSegment.h
> @@ +110,5 @@
> > +  virtual bool DeliveredRealtime() = 0;
> > +  /**
> > +   * Set realtime delivery status
> > +   */
> > +  virtual void SetDeliveredRealtime(bool aStatus) = 0;
> 
> This kind of statefulness is nasty, sorry. Can we possibly avoid it?

The problem here was to make sure that we don't double-deliver data.

So long as all data is delivered "realtime" to realtime consumers (as AppendToTrack does now) I think we can get rid of it.  Think of the reason for it being there as paranoia about not being 100% sure all the sources of data were going through AppendToTrack when I started writing it.
New patch coming and nominated for uplift?  We'll want to get this in sooner rather than later for time to verify before shipping.
Flags: needinfo?(rjesup)
The patch we'd look at for 23 is in bug 886886.  I'll mark that for tracking, and we can unmark this one
Flags: needinfo?(rjesup)
Moving tracking over to bug 886886 (as well as relnote nomination).
Keywords: relnote
Blocks: 785584
Attachment #765645 - Attachment is obsolete: true
Attachment #790069 - Attachment description: Add method to return the amount of buffered data on a SourceMediaStream → Part 1: Add method to return the amount of buffered data on a SourceMediaStream
Comment on attachment 790070 [details] [diff] [review]
Part 2: Rename NotifyQueuedTrackChanges to NotifyTrackChanges

carry forward r=roc
Attachment #790070 - Flags: review+
Attachment #765649 - Attachment is obsolete: true
Attachment #765649 - Flags: feedback?(roc)
Attachment #765644 - Attachment description: Use resampler from AEC to control clock drift and MSG buffering/delay WIP → Use resampler from AEC to control clock drift and MSG buffering/delay (Obsolete)
Attachment #790069 - Flags: review?(roc)
Attachment #790074 - Flags: review?(tterribe)
Comment on attachment 790073 [details] [diff] [review]
Part 3: add 'bypass' of the MSG queuing for listeners who want realtime input

This patchset (tested with the bug 886052 and bug 905062 patches) delivers data to consumers directly on the appending thread.  This bypasses a small amount of synchronization delay in MediaStreamGraph, and also any delay due to buffer buildup in the graph for any reason.

In the normal case, you can hear a slight "echo" if you enable audio on both the self-image and remote image in http://mozilla.github.io/webrtc-landing/pc_test.html   After 30 minutes running (without triggering bug 901831), there was <100ms delay between them, but when I started it was probably <30ms.  This may have been clock drift (circa 0.005%) and/or a couple of "buffer" events.

If something causes buffering (clock mismatch/drift, or a bug such as bug 901831 where persistent drift occurs due to blocking of MSG), you will hear an ever-increasing offset between the audio (after a few minutes of bug 901831, I heard 10 second(!) differences).

The missing piece of this puzzle will be to hook up a resampler to feed data not for real-time consumption into AppendToTrack() to avoid/fix any delay buildup in data for non-realtime sinks such as media elements.

(Also, the MediaRecorder sink likely should be a realtime sink... - that can be a follow-on bug)
Attachment #790073 - Flags: review?(roc)
Comment on attachment 790069 [details] [diff] [review]
Part 1: Add method to return the amount of buffered data on a SourceMediaStream

Review of attachment 790069 [details] [diff] [review]:
-----------------------------------------------------------------

The code looks fine but the comment is unclear.

::: content/media/MediaStreamGraph.h
@@ +662,5 @@
> +   * Only call from Media Graph thread (eg NotifyPull)
> +   * Returns amount of time (data) that is currently buffered in the track,
> +   * assuming playout via PlayAudio or via a TrackUnion - note that NotifyTrackChanges()
> +   * on a SourceMediaStream will occur without any "extra" buffering, but
> +   * NotifyTrackChanges() on a TrackUnion will be buffered.

I don't understand the last three lines here.
Comment on attachment 790073 [details] [diff] [review]
Part 3: add 'bypass' of the MSG queuing for listeners who want realtime input

Review of attachment 790073 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaStreamGraph.h
@@ +155,5 @@
>    enum {
>      TRACK_EVENT_CREATED = 0x01,
>      TRACK_EVENT_ENDED = 0x02
>    };
> +  virtual bool UnqueuedDataPreferred() { return false; } // FIX - const data?

This needs documentation.

@@ +613,5 @@
>     * but aSegment is emptied.
>     * Returns false if the data was not appended because no such track exists
>     * or the stream was already finished.
>     */
> +  bool AppendToTrack(TrackID aID, MediaSegment* aSegment, MediaSegment *aRawData = nullptr);

So does this.

@@ +926,5 @@
> +  /*
> +   * Make sure all the "ancestor" SourceMediaStream nodes have
> +   * aRealtimeListeners removed
> +   */
> +  void RemoveRealtimeListeners(nsTArray<nsRefPtr<MediaStreamListener> > &aRealtimeListeners);

And these :-)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #28)
> Comment on attachment 790069 [details] [diff] [review]
> Part 1: Add method to return the amount of buffered data on a
> SourceMediaStream
> 
> Review of attachment 790069 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The code looks fine but the comment is unclear.
> 
> ::: content/media/MediaStreamGraph.h
> @@ +662,5 @@
> > +   * Only call from Media Graph thread (eg NotifyPull)
> > +   * Returns amount of time (data) that is currently buffered in the track,
> > +   * assuming playout via PlayAudio or via a TrackUnion - note that NotifyTrackChanges()
> > +   * on a SourceMediaStream will occur without any "extra" buffering, but
> > +   * NotifyTrackChanges() on a TrackUnion will be buffered.
> 
> I don't understand the last three lines here.

Those lines capture (poorly) the information in the code added in the patch in AppendToTrack():

     if (track) {
+      // Data goes into mData, and on the next iteration of the MSG moves
+      // into the track's segment after NotifyQueuedTrackChanges().  This adds
+      // 0-10ms of delay before data gets to direct listeners.
+      // Indirect listeners (via subsequent TrackUnion nodes) are synced to
+      // playout time, and so can be delayed by buffering.
+
       track->mData->AppendFrom(aSegment);


These perhaps can be written more clearly, though this second one is likely clearer than the .h version.  If I'm wrong about this, please correct me - I wrote these a while ago after a lot of stepping through the code (visually and in GDB).  I should note that these calls should say "NotifyQueuedTrackChanges()" instead of NotifyTrackChanges(), since this patch is before the renaming patch, but it did keep them not as dependent on each other.
Attachment #790073 - Attachment is obsolete: true
Attachment #790073 - Flags: review?(roc)
Comment on attachment 790635 [details] [diff] [review]
Part 3: add 'bypass' of the MSG queuing for listeners who want realtime input

Documentation of those added (and ClearRealtimeListeners() removed as we aren't calling it)
Attachment #790635 - Flags: review?(roc)
(In reply to Randell Jesup [:jesup] from comment #30)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #28)
> > Comment on attachment 790069 [details] [diff] [review]
> > Part 1: Add method to return the amount of buffered data on a
> > SourceMediaStream
> > 
> > Review of attachment 790069 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > The code looks fine but the comment is unclear.
> > 
> > ::: content/media/MediaStreamGraph.h
> > @@ +662,5 @@
> > > +   * Only call from Media Graph thread (eg NotifyPull)
> > > +   * Returns amount of time (data) that is currently buffered in the track,
> > > +   * assuming playout via PlayAudio or via a TrackUnion - note that NotifyTrackChanges()
> > > +   * on a SourceMediaStream will occur without any "extra" buffering, but
> > > +   * NotifyTrackChanges() on a TrackUnion will be buffered.
> > 
> > I don't understand the last three lines here.
> 
> Those lines capture (poorly) the information in the code added in the patch
> in AppendToTrack():
> 
>      if (track) {
> +      // Data goes into mData, and on the next iteration of the MSG moves
> +      // into the track's segment after NotifyQueuedTrackChanges().  This
> adds
> +      // 0-10ms of delay before data gets to direct listeners.
> +      // Indirect listeners (via subsequent TrackUnion nodes) are synced to
> +      // playout time, and so can be delayed by buffering.
> +

Indirect listeners and direct listeners should receive their NotifyQueuedTrackChanges at about the same time, in the same iteration of the MSG thread.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #33)
> (In reply to Randell Jesup [:jesup] from comment #30)

> >      if (track) {
> > +      // Data goes into mData, and on the next iteration of the MSG moves
> > +      // into the track's segment after NotifyQueuedTrackChanges().  This
> > adds
> > +      // 0-10ms of delay before data gets to direct listeners.
> > +      // Indirect listeners (via subsequent TrackUnion nodes) are synced to
> > +      // playout time, and so can be delayed by buffering.
> > +
> 
> Indirect listeners and direct listeners should receive their
> NotifyQueuedTrackChanges at about the same time, in the same iteration of
> the MSG thread.


I believe that's demonstratably not true, as calls that experience large amounts of drift (bug 901831) or hardware clock drift (this bug) get badly delayed across a PeerConnection (NotifyQueuedTrackChanges via a TrackUnion).

Start Aurora.  Start windows Nightly, go to http://mozilla.github.io/webrtc-landing/pc_test.html, Start (use a headset and select the headset mic!) turn on audio for both self and remote video, wait a minute or two.  (i.e. provoke bug 901831, let some drift build up)  Talk.  You'll hear a small delay between the two (which varies by around 100+ms - I think mostly due to sawtoothing of delay in AudioStream), but after a few minutes both will be delayed by a second or two.   This says that the data isn't reaching PeerConnection until then, since PeerConnection sends data as soon as it's received, and on the receive side, incoming data goes into NetEQ and it resamples based on the calls from NotifyPull, so it won't build up there.

Side note: data provided via NotifyPull() (i.e. video) does not get delayed and does not (I think) build up in MSG, I believe due to our simply always providing the most-recently-captured frame (and not trying to translate the actual capture time into duration info) on NotifyPull().  Providing data on NotifyPull effectively means resampling it to the MSG clock.
Comment on attachment 790074 [details] [diff] [review]
Part 4: Lock access to ExternalRecordingInsertData (non-thread-safe)

Review of attachment 790074 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/media-conduit/AudioConduit.cpp
@@ +516,5 @@
>    }
>  
>    capture_delay = mCaptureDelay;
>    //Insert the samples
> +  // This function is NOT threadsafe

So, the first question I tried to answer was "What threads does this run on?"

The answer appears to be the MSG thread, of which there is only one, and this is the only place we call this function. So this mutex isn't actually preventing multiple calls to ExternalRecordingInsertData().

Now, ExternalRecordingInsertData() certainly does access variables that were set on other threads. It may be possible we guarantee a happens-before relationship for all of them, but if not, this lock isn't buying us anything because we don't use it anywhere else.

In other words, I have no idea what this lock is for.

::: media/webrtc/signaling/src/media-conduit/AudioConduit.h
@@ +227,5 @@
>    webrtc::EcModes  mEchoCancel;
> +
> +  // Avoid thread collisions in ExternalRecordingInsertData()
> +  // XXX Find a better solution
> +  Mutex mMutex;

If we do use this for something, give it a less generic name and document what, exactly, it is protecting.
Attachment #790074 - Flags: review?(tterribe) → review-
(In reply to Timothy B. Terriberry (:derf) from comment #35)
> Comment on attachment 790074 [details] [diff] [review]
> Part 4: Lock access to ExternalRecordingInsertData (non-thread-safe)
> 
> Review of attachment 790074 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: media/webrtc/signaling/src/media-conduit/AudioConduit.cpp
> @@ +516,5 @@
> >    }
> >  
> >    capture_delay = mCaptureDelay;
> >    //Insert the samples
> > +  // This function is NOT threadsafe
> 
> So, the first question I tried to answer was "What threads does this run on?"
> 
> The answer appears to be the MSG thread, of which there is only one, and
> this is the only place we call this function. So this mutex isn't actually
> preventing multiple calls to ExternalRecordingInsertData().
> 
> Now, ExternalRecordingInsertData() certainly does access variables that were
> set on other threads. It may be possible we guarantee a happens-before
> relationship for all of them, but if not, this lock isn't buying us anything
> because we don't use it anywhere else.
> 
> In other words, I have no idea what this lock is for.

This patchset changes the threading model such that ExternalRecordingInsertData() can be called on threads other than the MSG thread (e.g. the thread that calls ::Process() and thus AppendToTrack()).
Comment on attachment 790074 [details] [diff] [review]
Part 4: Lock access to ExternalRecordingInsertData (non-thread-safe)

Review of attachment 790074 [details] [diff] [review]:
-----------------------------------------------------------------

Clearing pending review of the prior part. I'll take another look when that's sorted.
Attachment #790074 - Flags: review-
(In reply to Randell Jesup [:jesup] from comment #34)
> I believe that's demonstratably not true, as calls that experience large
> amounts of drift (bug 901831) or hardware clock drift (this bug) get badly
> delayed across a PeerConnection (NotifyQueuedTrackChanges via a TrackUnion).

You're right and I was wrong.

NotifyQueuedTrackChanges gets called when the data is moved from a SourceMediaStream's input buffer to its MSG-thread buffer. This can happen arbitrarily far ahead if that input buffer keeps growing.

But on a TrackUnionStream, NotifyQueuedTrackChanges gets called when the TrackUnionStream moves data into its track buffers, which only happens for the times up to mStateComputedTime, which is not allowed to get too far ahead of the current time.
Comment on attachment 790069 [details] [diff] [review]
Part 1: Add method to return the amount of buffered data on a SourceMediaStream

Review of attachment 790069 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaStreamGraph.h
@@ +662,5 @@
> +   * Only call from Media Graph thread (eg NotifyPull)
> +   * Returns amount of time (data) that is currently buffered in the track,
> +   * assuming playout via PlayAudio or via a TrackUnion - note that NotifyTrackChanges()
> +   * on a SourceMediaStream will occur without any "extra" buffering, but
> +   * NotifyTrackChanges() on a TrackUnion will be buffered.

Say this:
 * Returns amount of time (data) that is currently buffered in the track.
 * Note that NotifyTrackChanges() fires on a SourceMediaStream when the MSG moves data
 * from mData to the track buffer.
Attachment #790069 - Flags: review?(roc) → review+
This still doesn't have a working method for verifying if a DOMMediaStream is a gUM stream (or DOMLocalMediaStream).  Works, but I haven't verified it blocks delay yet (though it should).  Note again: not finished
Cleaned up and tested for delay, now just calls AddDirectListener on all DOMMediaStreams, which does nothing except if it's a gUM-sourced stream.
Attachment #794412 - Attachment is obsolete: true
Attachment #794524 - Flags: review?(roc)
Comment on attachment 794524 [details] [diff] [review]
alternate implementation that special-cases DOMLocalMediaStream->PeerConnection

Review of attachment 794524 [details] [diff] [review]:
-----------------------------------------------------------------

Basically looks good.

::: content/media/MediaStreamGraph.h
@@ +117,5 @@
>     * some reason, then data before aDesiredTime may not be played immediately.
>     */
>    virtual void NotifyPull(MediaStreamGraph* aGraph, StreamTime aDesiredTime) {}
>  
> +  virtual void NotifyRealtimeData(MediaStreamGraph* graph, TrackID tid,

Needs a comment explaining how this will be called when the listener is set on a SourceMediaStream being fed by getUserMedia, and also explaining that 'media' is non-resampled raw input.

We shouldn't be reusing MediaStreamListener here. Let's put this in a new class MediaStreamDirectListener.

@@ +121,5 @@
> +  virtual void NotifyRealtimeData(MediaStreamGraph* graph, TrackID tid,
> +                                  TrackRate rate,
> +                                  TrackTicks offset,
> +                                  uint32_t events,
> +                                  const MediaSegment& media) {}

a* prefixes for parameters

@@ +606,5 @@
>    void SetPullEnabled(bool aEnabled);
>    /**
> +   * Add a direct listener who will get direct NotifyRealtimeData() calls
> +   * bypassing MSG and any TrackUnions.  The data does not have to be
> +   * resampled.

Remove comment. This should be self-evident once we have a MediaStreamDirectListener class with good documentation.
Attachment #790635 - Attachment is obsolete: true
Attachment #790635 - Flags: review?(roc)
Attachment #794524 - Attachment is obsolete: true
Attachment #794524 - Flags: review?(roc)
Comment on attachment 794642 [details] [diff] [review]
alternate implementation that special-cases DOMLocalMediaStream->PeerConnection

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #42)

> ::: content/media/MediaStreamGraph.h
> @@ +117,5 @@
> >     * some reason, then data before aDesiredTime may not be played immediately.
> >     */
> >    virtual void NotifyPull(MediaStreamGraph* aGraph, StreamTime aDesiredTime) {}
> >  
> > +  virtual void NotifyRealtimeData(MediaStreamGraph* graph, TrackID tid,
> 
> Needs a comment explaining how this will be called when the listener is set
> on a SourceMediaStream being fed by getUserMedia, and also explaining that
> 'media' is non-resampled raw input.

done (rather that it will be non-resampled if the AppendToTrack caller provides it)

> We shouldn't be reusing MediaStreamListener here. Let's put this in a new
> class MediaStreamDirectListener.

Since anything listening via NotifyRealtimeData() must also be prepared to listen via NotifyQueuedTrackChanges() (if AddDirectListener fails), I'm making MediaStreamDirectListener a superclass of MediaStreamListener (to avoid needing to have a more complex listener setup that doesn't actually gain anything useful).
Attachment #794642 - Flags: review?(roc)
Blocks: 908834
Target Milestone: --- → mozilla26
Attachment #790074 - Attachment is obsolete: true
Verification instructions:

In addition to observing for general regressions, explicit steps to verify this patch:

Find a headset or sound card/mic and PC where the microphone runs at a slightly higher rate than the PC.  This can be verified (at least on Windows) by using
http://mozilla.github.com/webrtc-landing/pc_test.html

First, using a build from before this patch to find/verify you have hardware that shows it:
1) start a call to yourself.
2) unmute the large 'remote' image of yourself.
3) There should be a small delay between speaking and hearing yourself (generally <200ms, perhaps much less with this patch applied)
4) Wait 30-60 minutes with the 'call' running.  See if the delay has increased and estimate the amount.  If delay has increased, wait a similar amount of time again and see if the increase over the initial delay has roughly doubled.   
5) you can calculate the rough clock mismatch: (amount-of-drift-in-seconds/number-of-seconds-in-call) * 100 = percentage mismatch.  My headset has a 0.0055% mismatch to my Windows laptop.  These values can move some with temperature, etc.

Once you have a known combination that drifts over time, switch to a build with bug 884365 applied, and retry the test.  To Verify:
1) see if after a similar amount of time the drift isn't present
2) verify this by un-muting the "local" (small) image of yourself, and see if it has drifted (note: with both unmuted you'll hear two echoes of yourself, probably one fast and one delayed).  Currently "local" destinations will still see the drift until we land a followup bug.
Keywords: verifyme
Blocks: 909187
QA Contact: jsmith
Just an FYI here, that our Fx 24 Beta 7 will be going to build tomorrow morning PT and may be the last opportunity to get this in.If this is not landed by then it might be a wontfix for Fx24, given we do not want to introduce a change like this in so late in the cycle.

At a glance on the patches this looks like a substantial change and I am pretty sure we want to get good testing here before the uplift.Let me know if I got anything wrong here.
I'm marking this as wontfix for 24; I'll probably ask for 25 uplift in the next day or so - we know it wallpapers over another problem with graphics (on some windows systems) there, but that bug is not believed to be in 24, and we've seen no verified regressions in 24 that would require this fix (though it helps call quality a lot).
Comment on attachment 794642 [details] [diff] [review]
alternate implementation that special-cases DOMLocalMediaStream->PeerConnection

[Approval Request Comment]
(for this bug and the regression fix for track.enable)

Bug caused by (feature/regressing bug #): N/A
 
User impact if declined: Anything that causes delay in the MediaStreamGraph (underruns, clock mismatches between microphone and system, CPU overload of the MSG graph due to encoding audio on the thread, etc) will cause permanent delay in the call for audio  - this is a major improvement in call quality.  Also, on windows some systems are affected by rapidly increasing delay caused by running two browsers (bug 901831); this patch masks the affect of that bug for WebRTC calls (though not for local loopback of audio, which normally isn't done).

Testing completed (on m-c, etc.): On M/C for a while now; tested on a bunch of platforms.

Risk to taking this patch (and alternatives if risky): Fairly low risk - the primary risk (and also a benefit in avoiding MSG thread overload) is moving the audio AEC and codec encoding from the MSG graph to the capturing thread.  This appears to cause no problems (and in a number of the audio subsystems like Android and B2G, we or the audio system is using a separate thread for this).

String or IDL/UUID changes made by this patch: None
Attachment #794642 - Flags: approval-mozilla-aurora?
Note - in order for verifyme to be met, this really should be verified on Desktop and FxAndroid. Adding a note in the whiteboard as such.
Whiteboard: [getUserMedia] [blocking-gum-] → [getUserMedia] [blocking-gum-] [desktop+android verification needed]
SoftVision, do you hear any audio improvements in Android testing with this patch on Nightly?
Flags: needinfo?(fennec)
(In reply to Aaron Train [:aaronmt] from comment #54)
> SoftVision, do you hear any audio improvements in Android testing with this
> patch on Nightly?

Just to be precise: the audio won't sound better, but the audio delay relative to video that we sometimes hear about should be much better -- and should not build up (increase) over the duration of the call.  Please read Comment 49 (above) for detailed instructions on how to verify the fix for this bug.  Thanks.
(In reply to Aaron Train [:aaronmt] from comment #54)
> SoftVision, do you hear any audio improvements in Android testing with this
> patch on Nightly?

Waiting on this prior to uplift - given this was wontfix'd twice already though, I'm curious if this requires uplift at all.
(In reply to Alex Keybl [:akeybl] from comment #56)
> (In reply to Aaron Train [:aaronmt] from comment #54)
> > SoftVision, do you hear any audio improvements in Android testing with this
> > patch on Nightly?
> 
> Waiting on this prior to uplift - given this was wontfix'd twice already
> though, I'm curious if this requires uplift at all.

Getting this onto release train soon is extremely important. The current user experience around audio latency with webrtc calls on FxAndroid and Desktop Firefox runs into a lot of problems specifically around build up of audio, causing audio to fall behind in the call (e.g. Desktop sees audio falling behind by a few seconds, FxAndroid is 6 - 8 seconds behind). These patches aim to address this problem. If we see a significant improvement from QA testing on audio latency from these patches, then I definitely think we should get this onto Aurora. Otherwise, we risk putting our users through another Firefox release with having a bad UX for doing webrtc calls between peers.

Note that this is currently #1 pain point from QA's perspective on the WebRTC implementation.
Agreed strongly with jsmith - this can make a huge difference on Android, and also can help desktop a lot on low-powered systems or when other bugs cause underflows or temporary CPU overloads (see bug 901831 for an example that causes a 1-2 second slip per minute on my Lenovo W520, which is fixed for WebRTC calls by this patch).
Report from QA in bug 902438 (dupped to this; the cause was this bug, not bug 901527 (the bug 886886 regression)).  This helps verify this bug.

>In both cases I can see a great improvement on Latest Nightly BuildID: 20130903030201 .No video/audio >delay occured. Based on this I can say it was fixed but not sure if Bug 901527 fixed this issue or not.
I've asked our team in Romania to verify the fix on Nightly desktop. Because, in recent 24beta testing, they have machines known to be affected by this bug and are far more familiar with testing this feature than I am.
per https://bugzilla.mozilla.org/show_bug.cgi?id=902428#c19 this is fixed on nightly

leaving verifyme keyword for retesting on uplift..
Status: RESOLVED → VERIFIED
(In reply to Tracy Walker [:tracy] from comment #62)
> per https://bugzilla.mozilla.org/show_bug.cgi?id=902428#c19 this is fixed on
> nightly
> 
> leaving verifyme keyword for retesting on uplift..

See the whiteboard - this needs to be verified on FxAndroid as well. Moving back to resolved.
Status: VERIFIED → RESOLVED
Closed: 11 years ago11 years ago
Testing with Nightly 26.0a1 (2013-09-05) on Samsung Galaxy Tab(Android 4.0.4) and Nightly 26.0a1 (2013-09-05) on LG Nexus 4(Android 4.2.2) using http://mysecondwebrtc.appspot.com/. The audio does not have any delay even after a long call. However, this behaviour is not consistent, and sometimes I've still encountered delays.
(In reply to Pop Mihai from comment #64)
> Testing with Nightly 26.0a1 (2013-09-05) on Samsung Galaxy Tab(Android
> 4.0.4) and Nightly 26.0a1 (2013-09-05) on LG Nexus 4(Android 4.2.2) using
> http://mysecondwebrtc.appspot.com/. The audio does not have any delay even
> after a long call. However, this behaviour is not consistent, and sometimes
> I've still encountered delays.

Thanks. Switching to verified now that we've done testing on desktop & android.
Status: RESOLVED → VERIFIED
Flags: needinfo?(fennec)
Keywords: verifyme
One question though - in regards to the delays you saw, how long were they in terms of seconds?
As the time of call is increasing, the delay is also increasing, in some cases the delay was over 90 seconds.
(In reply to Pop Mihai from comment #67)
> As the time of call is increasing, the delay is also increasing, in some
> cases the delay was over 90 seconds.

Was this delay for audio and video or for audio only?
Flags: needinfo?(mihai.g.pop)
The delay was for audio only.
Flags: needinfo?(mihai.g.pop)
Attachment #794642 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Jason, is there any testing that needs to be repeated here now that this is uplifted to Aurora?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #71)
> Jason, is there any testing that needs to be repeated here now that this is
> uplifted to Aurora?

It's probably worth doing a sanity check on this for FxAndroid to FxAndroid calls as well on Aurora, given that they also saw this audio latency problem.
(In reply to Jason Smith [:jsmith] from comment #72)
> (In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #71)
> > Jason, is there any testing that needs to be repeated here now that this is
> > uplifted to Aurora?
> 
> It's probably worth doing a sanity check on this for FxAndroid to FxAndroid
> calls as well on Aurora, given that they also saw this audio latency problem.

Meant to say - just a general quick sanity check (FxAndroid & Desktop) on Aurora - nothing extensive.
Thanks Jason, I'll flag this for verification with a Firefox Desktop -> Firefox Android and Firefox Android -> Firefox Desktop call using Aurora 25.
Keywords: verifyme
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #74)
> Thanks Jason, I'll flag this for verification with a Firefox Desktop ->
> Firefox Android and Firefox Android -> Firefox Desktop call using Aurora 25.
Tested on FF 25b4, 27.0a1 (2013-10-03).
https://apprtc.webrtc.org [1]
http://mysecondwebrtc.appspot.com/ [2]
Android devices: Samsung Galaxy Tab(Android 4.0.4), Samsung Galaxy Note(Android 4.0.4)
Desktop: Win 7 x64
In both cases (android -> desktop and desktop -> android), on FF desktop I couldn't see (video is frozen on [1], no video at all on [2]) and hear the person from the mobile side. Instead, on FF android, the video and the sound of the desktop caller work fine.
Blocks: 958090
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: