884365 - Audio realtime input clock mismatch (drift) blows up delay in MediaStreamGraph

Assignee

Description

•

12 years ago

We have a huge problem with audio latency building up over time in MSG (or, conversely, with it having to insert silence on underflows) when the microphone input and MSG don't share the same clock (which is common, as different hardware devices often will have different clocks) MSG is currently clocked off the system clock, delivering 10ms of audio data for each 10ms of system clock time. This will be changed to clock off the output clock of cubeb. Neither is guaranteed to be the same as any given microphone, even if both are part of the same headset. For added fun, hardware clockrates themselves can (slowly) drift. Resampling the input to the system clock rate will work, but can cause problems with the AEC, especially if the resampling rate is not *extremely* consistent. * A simple feedback loop modifying the resampling rate (above and below the "long-term" estimate of the clockrate skew) will control delay buildup, but will cause the resampling rate to vary significantly from moment to moment, which the AEC very likely can't handle. * A highly damped PLL on the sampling rate with input from the delay measurement may find a more stable average sampling rate (eventually), but likely will react too slowly to control delay in any reasonable time period. * We can't drop/duplicate data to control delay since that will require a AEC reset and re-converge each time (it's as if you suddenly moved the mic forward or back). We could do so *after* the AEC, but the AEC currently lives in PeerConnection, which is after MSG, and MSG will buffer and delay. * Moving the AEC to getUserMedia (before MSG) would allow us to do that (resample before, drop/add data after) and is planned anyways. However, that's noticeably more work to accomplish. Combined with a (highly) damped PLL for resampling per above to try to minimize the amount of drop/dups long-term, this may work. * Saving measurements of previous mic-specific clock skew factors may help improve call-start performance * Avoiding reclocking in the MSG (when directly connected to PeerConnection, for example) would push the problem off to the AEC and to the far-end's jitter buffer and time-base-corrector, which is the "classic" VoIP method of dealing with clock skew. However, MSG is designed to reclock always, and even a special-case for direct connection to PeerConnection would not solve the problem for local connections to other objects. It might also cause problems for a/v sync. ** To resolve this, a modified MSG to allow an output to get "raw" inputs (no reclocking or queuing) while not affecting other users would help, but you still could get "infinite" buffer buildup in MSG (to non-realtime outputs). This could be controlled by adding a resampler to data that goes into the graph's buffers, while providing immediate non-resampled non-reclocked input to destinations that want it. I.e. AppendToTrack(trackID, segment, rawdata) (rawdata would be optional) where rawdata is provided in place of the segment data to realtime consumers only. Non-realtime outputs would get the segment data, queued and reclocked, and would have a resampler and delay-control logic. Note that unless there's a consumer that needs the buffered/reclocked data, we wouldn't bother doing that and resampling it. Realtime consumers would register by implementing NotifyUnqueuedTrackChanges(). They would get rawdata if available, or the normal segment data (for example if attached to a non-realtime source). Realtime sources would call call AddTrack(trackid, rate, start, segment, realtime, resample); (realtime would default to false, as would resample). Realtime sources should always allow resampling unless you're sure they share a clock with MSG. We might still avoid delay buildup in realtime sources without resampling by dropping/adding buffers. To avoid sync problems, PeerConnection/MediaPipeline would register for NotifyUnqueueTrackChanges(). To avoid excess drift at the AEC (currently in PeerConnection) between input and output, a highly-damped PLL would control the existing pre-AEC resampler (this resampler is currently disabled). Expansion or contraction of the output would not matter, as the far-end's jitter buffer and Time Base Correction would reclock the audio there. This would be followed by moving the AEC (with its input resampler) to getUserMedia later. ------------------ So, to summarize, my proposed plan is: 1) Add support for unqueued outputs from MSG a) Modify the control code for the existing pre-AEC resampler to correct for drift (highly damped) b) Add a resampler similar to my current prototype (with delay control via resampling variation or sample dropping/dupping) to queued data from realtime sources. This would only run if there were non-realtime sinks; we would do something to know this. 2) Move the AEC to getUserMedia a) If possible (and it may not) combine the resamplers #1 will solve the delay buildup problem AND will remove 15-25ms of inherent delay in MSG #2 will solve the problem of not cancelling echos from *other* peerconnections (see 3-way calls), and not cancelling echoes from other audio outputs from the browser (see YouTube). It also will make it easier to use OS driver echo cancelers (recent windows versions apparently have them; android may well, though I've heard they only function at low sample rates). It *may* allow us to merge the two resamplers.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 1

•

12 years ago

Sounds good.

Use resampler from AEC to control clock drift and MSG buffering/delay (Obsolete) 12 years ago Randell Jesup [:jesup] (needinfo me) 8.75 KB, patch		Details \| Diff \| Splinter Review
Rename NotifyQueuedTrackChanges to NotifyTrackChanges 12 years ago Randell Jesup [:jesup] (needinfo me) 33.39 KB, patch	roc : review+	Details \| Diff \| Splinter Review
add 'bypass' of the MSG queuing for listeners who want realtime input WIP 12 years ago Randell Jesup [:jesup] (needinfo me) 23.93 KB, patch		Details \| Diff \| Splinter Review
Part 1: Add method to return the amount of buffered data on a SourceMediaStream 12 years ago Randell Jesup [:jesup] (needinfo me) 3.15 KB, patch	roc : review+	Details \| Diff \| Splinter Review
Part 2: Rename NotifyQueuedTrackChanges to NotifyTrackChanges 12 years ago Randell Jesup [:jesup] (needinfo me) 33.91 KB, patch	jesup : review+	Details \| Diff \| Splinter Review
Part 3: add 'bypass' of the MSG queuing for listeners who want realtime input 12 years ago Randell Jesup [:jesup] (needinfo me) 20.93 KB, patch		Details \| Diff \| Splinter Review
Part 4: Lock access to ExternalRecordingInsertData (non-thread-safe) 12 years ago Randell Jesup [:jesup] (needinfo me) 3.07 KB, patch		Details \| Diff \| Splinter Review
Part 3: add 'bypass' of the MSG queuing for listeners who want realtime input 12 years ago Randell Jesup [:jesup] (needinfo me) 22.12 KB, patch		Details \| Diff \| Splinter Review
alternate implementation that special-cases DOMLocalMediaStream->PeerConnection (WIP) 12 years ago Randell Jesup [:jesup] (needinfo me) 27.75 KB, patch		Details \| Diff \| Splinter Review
alternate implementation that special-cases DOMLocalMediaStream->PeerConnection 12 years ago Randell Jesup [:jesup] (needinfo me) 26.33 KB, patch		Details \| Diff \| Splinter Review
interdiffs for review response (MediaStreamDirectListener) 12 years ago Randell Jesup [:jesup] (needinfo me) 14.86 KB, patch		Details \| Diff \| Splinter Review
alternate implementation that special-cases DOMLocalMediaStream->PeerConnection 12 years ago Randell Jesup [:jesup] (needinfo me) 25.38 KB, patch	roc : review+ akeybl : approval-mozilla-aurora+	Details \| Diff \| Splinter Review