Need to resample inputs to MediaStreamGraph

RESOLVED FIXED in mozilla31

Status

()

Core
WebRTC: Audio/Video
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: jesup, Assigned: padenot)

Tracking

(Blocks: 3 bugs)

Trunk
mozilla31
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [s=fx32][p=13, 1.5:p1, ft:webrtc])

Attachments

(4 attachments, 2 obsolete attachments)

(Reporter)

Description

5 years ago
We need to resample some/most inputs to the MediaStreamGraph in order to avoid drift issues with audio being sampled on clocks other than the CPU clock (or sampled on an entirely different machine).

There have been considerable discussion (see below and bug 691234 for art of it), and roc, mreavy, derf and I talked extensively about it at IETF and other points.

roc: you may want to summarize your thoughts/plan here.  My understanding was that we would try clocking off the playback driver, and resample inputs.

Handling mic inputs (which may have a different clock than the audio out!) is tricky, especially with echo cancellation, which really needs to see the actual samples that went out to the hardware. 

I *believe* the right way to handle that if the mic and speakers don't share a clock is to resample the "far-end" sample (going to the speakers) to match the input clock rate, since that's effectively what the differing hardware sample rates are doing to you and them anyways.


From bug 691234 comment 41:

 Randell Jesup [:jesup] 2012-06-21 11:03:38 PDT

Right - jitter buffers move a lot more than any reasonable hardware clock, so they need to do more than resample.  The Time Scale Modification (TSM) stuff in NetEQ is a common and quite effective trick.  (For anyone who didn't know.)

I certainly would be tempted to use the hardware capture clock for incoming mic data, and the playout clock (not guaranteed to the be same as the capture clock!) for data from the PeerConnection, and if the two are connected directly (audio.src = getUserMedia_stream) then force the resample; but I realize that's not how MediaStreams work.

An example for a videophone (DSP details omitted).  Buffer nodes rebuffer into 10ms frames.
Recording taps, VAD, control, etc all are omitted.  A lot of this is inside the webrtc code, but it helps to understand the complexity of stream management.

The wonder of Emacs artist-mode!
                                   
-------------------------------------------------------------------------
            RTP                         
             |                                     |
       +-------------+                             |    Send audio thread
       | Jitter Buff |     RTP reception thread    | 
       |   Insert    |                             |         RTP
       +------+------+                             |          |
              |                                    |  +-----------------+ 
 -------------+------------------------------------+  |  Speech Encode  |
       +------+------+     Receive audio           |  |                 | 
       | Jitter Buff |         thread              |  +--------+--------+
       |   Dequeue   |                             |  +--------+--------+
       +------+------+                             |  |   Mixer         +- Other
       +------+------+                             |  +--------+--------+ sources
       |  Decode     |                             |   +-------+--------+
       |             |                             |   |  Buffer        |
       +------+------+                             |   +-------+--------+
       +------+------+                             |           |
       |Time Scale   |                             |    +------+----------+
       |Modification |                             |    |   Conference    |
       +------+------+                             | /--+Bridge (optional)|
       +------+------+       +-------------+   /---+-   +-------+---------+
       |   Buffer    +-------+   audio     +---    |            |
       |             |       | conf. bridge|       |     +------+-------+
       +------+------+       +-------------+       |     |AGC and output|
       +------+------+                             |     | level adjust |
       |   Mixer     +-- other sources             |     |& DTMF insert |
       |             |                             |     +------+-------+
       +------+------+                             |            |
       +------+------+                             |     +------+-------+
       |AGC, DC Filt |            +---------------+|     |    Main      |
       | & volume    +------------+ fsb queue     ++-----+    AEC       |
       +------+------+            |               ||     +-------+------+
       +------+------+            +---------------+|      +------+------+
       |Playout audio|                             |      |Capture audio|
       |    driver   |                             |      |    driver   |
       +------+------+                             |      +------+------+
--------------+------------------------------------+-------------+---------
       +------+------+    Linux Kernel                           |
       |  Buffers    |                                    +------+-------+
       +-------------+                                    |  Buffers     |
       |             |                                    +--------------+
       +-------------+                                    |              |
       |             |                                    +--------------+
       +----------+--+                                    |              |
                  |                                       +-------+------+
               +--+-----------------------------------------------+-------+
               |                                                          |
               |           Hardware Audio Codec                           |
               | Note: Use HW Biquad filters to flatten response if avail |
               +--+-----------------------------------------+-------------+
                  |                                         |
                ..|.                                        |
               ..  ..                                     --+--
               ..  ..                                     --+--
              ..    ..                                   (     )
             ..........                                   -----


This let the output (mic -> RTP) thread run with microphone timestamps, and the input thread (after the TSM and associated buffer node) run with audio playout timestamps.  RTP input to TSM runs with RTP source timestamps; the TSM takes the place of a resampler.  The only complexity is at the cross input/output thread links, which would need resampling if the mic and playout clocks aren't synced (mine were), and mixing of other sources (ditto).

---

It's probably helpful to label your clock sources; merging two streams with the same clock source doesn't need a resample; merging them with different sources does.  The complication here is that MediaStreams run on internal (CPU/NTP) clock time; the only sources that do so are probably pre-recorded sources.

So, for all sources that aren't internal (sourced at a video or audio decoder element), the inputs will need associated resamplers or timestamp transcoders.  For audio, the sources will need to monitor input samples versus nominal frequency (at CPU/NTP clock rate), perhaps with a PLL, and use that to drive the settings for the resampler.  For video, I would simply use the PLL to convert input timestamps to internal CPU timestamps.  This might need some tweakery, as many RTP sinks assume video sources are at consistent cadences and timestamp deltas in order to provide smooth playback, generally some integer frame rate, or some integer number of frames each on a 30 (or 25!) fps boundary (i.e. timestamp deltas of 3000 at 90KHz timestamp rate.)

Note that many audio sources may require a resampler anyways to feed into the core, so this would just affect settings for it.  The PLL aspect of it does mean it may take a short time to converge after starting.

We probably need this to be a core part of MediaStream data inputs, with an option to disable it if we know the source is synced to the CPU.

Sorry for the long message...
(In reply to Randell Jesup [:jesup] from comment #0)
> I *believe* the right way to handle that if the mic and speakers don't share
> a clock is to resample the "far-end" sample (going to the speakers) to match
> the input clock rate, since that's effectively what the differing hardware
> sample rates are doing to you and them anyways.

I think we should do things almost exactly the other way around (and I think I've said this before): the audio output should be the master clock, and we should resample at the microphones to compensate for drift. Part of the reason for this is that while we will basically always have a single audio output device, we won't always have exactly one input device. In fact we will often have zero. In a system with multiple microphones, none of which are in use, which clock do you sync to?

In any case, from a practical point of view, the Speex resampler was designed to handle exactly this case (small, continuously changing sample rate differences), and is already in the tree, so we should probably just use that. After bug 818327 it should even work.
I tend to agree with Tim that syncing with the playback is better than syncing with capture, for exactly the same reasons. I don't think it makes any difference to the AEC whether it's the capture or the playback that's resampled. The other possibility is to go with the "global clock" (RTC/NTP-based I assume) that's used for the RTP timestamps. The advantage of this is that you never have to deal with any clock drift and/or RTP timestamp adjustments. I think the RTC is also much more accurate than what a soudcard gives you. Of course, the downside is that it means twice the resampling.
(Reporter)

Comment 3

5 years ago
My apologies - it was late when I entered that bug.  My discussions with roc involved using the output clock as the master, not the input clock - I got it backwards when writing it down.  And as mentioned, that would mean resampling the input, not the output.

I think locking to the playout clock also may help us control playout latency.  Note that the output frequency may jump if the output destination changes.
(In reply to Randell Jesup [:jesup] from comment #3)
> latency.  Note that the output frequency may jump if the output destination
> changes.

This is something Jean-Marc pointed out to me, as well, but I think it's something we'd want to know about (and since this can happen somewhere beyond our control, may be our only way of finding out).
Summary: Need to resample inputs to MediaStramGraph → Need to resample inputs to MediaStreamGraph
(Reporter)

Updated

5 years ago
Duplicate of this bug: 771834
(Reporter)

Updated

4 years ago
Blocks: 884365
(Reporter)

Updated

4 years ago
Blocks: 908834
(Reporter)

Updated

4 years ago
Blocks: 958090
When there are two different output cards connected to the MSG, I wonder whether there is anything ensuring that they are in sync.  If nothing guarantees this, then something needs to be done on every output except one, to avoid drift.
(Reporter)

Comment 7

4 years ago
Correct: nothing guarantees two outputs are synced.  In theory the "right" way to do it is to have the graph self-divide (graph nodes are driven by the output), and when there's a cross-connect, force a resample.  Most graphs will end in one output, and not cross-connect.  In some rare instances a source might get played on both outputs, in which case at the join point (the source) you'd need to resample.

It may be simpler to choose one output as "main", and resample anything sent to the other one to that (if both are in use).  It gets trickier when there's a switch from two to one outputs active, or one to another, but probably not too problematic modulo that you might need to leave a small buffer in place after updating the resample rate which you wouldn't need if it had been originally run without a resampler.  (ie. if A is direct (no resample), B is added (and resampled to match A), then A stops, B will still need a buffer even if the resampler is turned off (though maybe you can drift it slowly until the buffer underflows and you can remove the buffer...)
(Reporter)

Updated

4 years ago
Assignee: roc → paul
Blocks: 970426, 694814, 929128
(Reporter)

Updated

4 years ago
Duplicate of this bug: 970715
(Reporter)

Comment 9

4 years ago
If we're going to provide a single mixed stream from the MediaStreamGraph back to the AEC (and to support driving the MSG from the outputs), we need to resample to 48000Hz at all inputs that aren't 48000Hz, even if they're don't need it for clock domain reasons (like gUM audio does).

gUM (or anything else with a different clockbase) will need to be resampled/buffered.
(Assignee)

Comment 10

4 years ago
Created attachment 8389635 [details] [diff] [review]
Resample all inputs of the MediaStreamGraph to the ideal audio rate. r=

This resamples all SourceMediaStream to have a rate of IdealAudioRate(), so that
all the audio that flows through the graph has the same sample rate, so that we
can easily mix the tracks.

The resampling happens in `SourceMediaStream::AppendToTrack`.
Attachment #8389635 - Flags: review?(roc)
(Assignee)

Updated

4 years ago
Blocks: 982490
Blocks: 983023
Blocks: 983052
Blocks: 983062
Attachment #8389635 - Flags: review?(roc) → review+
(Assignee)

Comment 11

3 years ago
Created attachment 8397196 [details] [diff] [review]
Update AudioConduit so it can work at 44.1kHz. r=

This is necessary for the rest to work.
Attachment #8397196 - Flags: review?(rjesup)
(Reporter)

Comment 12

3 years ago
Comment on attachment 8397196 [details] [diff] [review]
Update AudioConduit so it can work at 44.1kHz. r=

Review of attachment 8397196 [details] [diff] [review]:
-----------------------------------------------------------------

r+ but this change is also included in my patchset
Attachment #8397196 - Flags: review?(rjesup) → review+
(Assignee)

Comment 13

3 years ago
Created attachment 8400708 [details] [diff] [review]
Resample all inputs of the MediaStreamGraph to the ideal audio rate. r=

Rebased and ready to land.
(Assignee)

Updated

3 years ago
Attachment #8389635 - Attachment is obsolete: true
(Assignee)

Comment 14

3 years ago
Created attachment 8400709 [details] [diff] [review]
Update AudioConduit so it can work at 44.1kHz. r=

Rebased and ready to land.
(Assignee)

Updated

3 years ago
Attachment #8397196 - Attachment is obsolete: true
(Reporter)

Comment 15

3 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/965c62289427
https://hg.mozilla.org/integration/mozilla-inbound/rev/b0ba65997aba
Target Milestone: --- → mozilla31
(Reporter)

Updated

3 years ago
Blocks: 991273
(Reporter)

Comment 16

3 years ago
Backed out for perma-orange on b2g emulator M10:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=cb894b5d342f

http://hg.mozilla.org/integration/mozilla-inbound/rev/c0eaa844f2f0
(Reporter)

Comment 17

3 years ago
Also https://tbpl.mozilla.org/php/getParsedLog.php?id=37161532&tree=Mozilla-Inbound may be new

13:16:47     INFO -  [2358] ###!!! ASSERTION: Buffer underran: 'bufferEnd >= mCurrentTime', file /builds/slave/m-in-lx-d-00000000000000000000/build/content/media/MediaStreamGraph.cpp, line 473
https://hg.mozilla.org/mozilla-central/rev/965c62289427
https://hg.mozilla.org/mozilla-central/rev/b0ba65997aba
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
(Reporter)

Comment 19

3 years ago
Backed out of central:
https://hg.mozilla.org/mozilla-central/rev/ac6cbaa47f34
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 991504
(Reporter)

Comment 20

3 years ago
Got some logs via testing/marionette/client/marionette/emulator.py changes:

     env = {'MOZ_CRASHREPORTER': '1',
            'MOZ_CRASHREPORTER_NO_REPORT': '1',
-           'MOZ_CRASHREPORTER_SHUTDOWN': '1'}
+           'MOZ_CRASHREPORTER_SHUTDOWN': '1',
+           'NSPR_LOG_MODULES': 'mediastreamgraph:5,signaling:5,getusermedia:5'}
 


https://pastebin.mozilla.org/4755653

13:24:39     INFO -  138 INFO TEST-PASS | /tests/dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | iceConnectionState should not be undefined
13:24:39     INFO -  139 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | PeerConnectionWrapper (pcRemote): oniceconnectionstatechange fired, new state is: closed
13:24:39     INFO -  140 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | PeerConnectionWrapper (pcRemote): Closed connection.
13:24:39     INFO -  141 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | Test finished
13:24:39     INFO -  142 INFO TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | Test timed out.

Probably some type of shutdown/cleanup of the MSG or stream...
(Reporter)

Comment 21

3 years ago
See bug 991866 for code needed to run full M10 mochitest-remote locally (and no, it didn't fail for me)
(Reporter)

Comment 22

3 years ago
https://tbpl.mozilla.org/?tree=Try&rev=889689e76b16
and 
https://tbpl.mozilla.org/?tree=Try&rev=8c981d13a945
were my Try runs
(Reporter)

Comment 23

3 years ago
Hmmm, it didnt' seem to have any extra logging in the above.  :-(
(Reporter)

Comment 24

3 years ago
https://pastebin.mozilla.org/4756101 may also be relevant.  The test destroyed the peerconnection at timestamp 489.xx - which may be near the mochitest timeout time
(In reply to Randell Jesup [:jesup] from comment #23)
> Hmmm, it didnt' seem to have any extra logging in the above.  :-(

I wonder whether this might be to blame:

http://hg.mozilla.org/mozilla-central/annotate/6c924a018540/testing/mozbase/mozrunner/mozrunner/remote.py#l122
(Reporter)

Updated

3 years ago
Depends on: 992436
(Reporter)

Comment 26

3 years ago
Created attachment 8402226 [details] [diff] [review]
Reduce fake audio/video rates on b2g debug only to avoid overloading mochitest emulator VMs
(Reporter)

Updated

3 years ago
Attachment #8402226 - Flags: review?(paul)
(Reporter)

Comment 27

3 years ago
https://tbpl.mozilla.org/?tree=Try&rev=303b107d286f

The problem is the debug emulator running out of virtual cpu power to keep up with realtime inputs.

The patch reduces fake video and audio input rates to reduce the CPU load from the media streams.
(Reporter)

Comment 28

3 years ago
https://tbpl.mozilla.org/?tree=Try&rev=ccf4b3d625d8
M10 is 11 for 11 green :-)
Once this is r+'d we can reland
(Reporter)

Comment 29

3 years ago
https://tbpl.mozilla.org/?tree=Try&rev=bd2edc046e1f
(without the wakelock stuff) also green
(Assignee)

Updated

3 years ago
Attachment #8402226 - Flags: review?(paul) → review+
Comment on attachment 8402226 [details] [diff] [review]
Reduce fake audio/video rates on b2g debug only to avoid overloading mochitest emulator VMs

>+#if defined(MOZ_WIDGET_GONK) && defined(DEBUG)
>+// B2G emulator debug is very, very slow and has problems dealing with realtime audio inputs
>+  mTimer->InitWithCallback(this, (1000 / mOpts.mFPS)/10, nsITimer::TYPE_REPEATING_SLACK);
>+#else
>   mTimer->InitWithCallback(this, 1000 / mOpts.mFPS, nsITimer::TYPE_REPEATING_SLACK);
>+#endif

I would have naively guessed that reducing the interval to 1/10th would mean timer is called 10x as often, *increasing* the video rate.
(Reporter)

Comment 31

3 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/974c4db3003e
with nit fix for karlt - thanks.  The fact that the timer is SLACK makes this far less critical.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5349ecd9c313
https://hg.mozilla.org/integration/mozilla-inbound/rev/33072f5b4c66
(Reporter)

Updated

3 years ago
No longer depends on: 992436
(Reporter)

Comment 32

3 years ago
backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/423df46d8d57
https://hg.mozilla.org/integration/mozilla-inbound/rev/670cb6d1750a
https://hg.mozilla.org/integration/mozilla-inbound/rev/0e094298eaef
(Reporter)

Comment 33

3 years ago
Created attachment 8404286 [details] [diff] [review]
Disable problematic test for B2G

This test a) has issues with missing preload/loops which makes it flaky, b) even with that fixed, on b2g emulator opt (both local and on tbpl) randomly fails with timeouts with this patch.  Likely this is a timing issue simply provoked by the changes in this bug.

See bug 994351
(Reporter)

Updated

3 years ago
Attachment #8404286 - Flags: review?(drno)
(Reporter)

Comment 34

3 years ago
Comment on attachment 8404286 [details] [diff] [review]
Disable problematic test for B2G

We'll add the annotation to bug 994351 as well
Comment on attachment 8404286 [details] [diff] [review]
Disable problematic test for B2G

Review of attachment 8404286 [details] [diff] [review]:
-----------------------------------------------------------------

One nit.
Otherwise looks good to me.

::: content/media/test/crashtests/crashtests.list
@@ +65,5 @@
>  load 952756.html
>  load 986901.html
>  load buffer-source-ended-1.html
>  load offline-buffer-source-ended-1.html
> +skip-if(B2G) HTTP load media-element-source-seek-1.html

Can we a comment referring to bug 994351 please?
Otherwise looks good to me.
Attachment #8404286 - Flags: review?(drno) → review+
(Reporter)

Comment 36

3 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/5c57a363fe25
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5b652aacb37
https://hg.mozilla.org/integration/mozilla-inbound/rev/d8cc47051879
(Reporter)

Comment 37

3 years ago
Forgot to re-push the "slow down inputs for B2G emulator debug" patch this time:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ebc60820032
https://hg.mozilla.org/mozilla-central/rev/5c57a363fe25
https://hg.mozilla.org/mozilla-central/rev/c5b652aacb37
https://hg.mozilla.org/mozilla-central/rev/d8cc47051879
https://hg.mozilla.org/mozilla-central/rev/7ebc60820032
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago3 years ago
Resolution: --- → FIXED
No longer blocks: 991273
Depends on: 991273
(Reporter)

Updated

3 years ago
Depends on: 996853
Depends on: 999267

Updated

3 years ago
Whiteboard: [s=fx32]
Updating this to WebRTC since that's the work driving this.
Component: Video/Audio → WebRTC: Audio/Video
Whiteboard: [s=fx32] → [s=fx32][p=2, 1.5:p1, ft:webrtc]
Whiteboard: [s=fx32][p=2, 1.5:p1, ft:webrtc] → [s=fx32][p=13, 1.5:p1, ft:webrtc]
Depends on: 994351
Depends on: 1014862
No longer blocks: 983023
No longer blocks: 983062
You need to log in before you can comment on or make changes to this bug.