Closed Bug 886886 Opened 11 years ago Closed 11 years ago

Replace fixed-ratio audio resampler in webrtc.org capture code with Speex resampler and eliminate pseudo-44000Hz rate

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

22 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla25
Tracking Status
firefox22 --- wontfix
firefox23 + wontfix
firefox24 + verified
firefox25 --- verified
relnote-firefox --- 23+
fennec 24+ ---

People

(Reporter: jesup, Assigned: jesup)

References

Details

(Whiteboard: [getUserMedia][android-gum+][blocking-gum-])

Attachments

(12 files, 11 obsolete files)

11.16 KB, patch
Details | Diff | Splinter Review
13.55 KB, patch
Details | Diff | Splinter Review
60.20 KB, patch
Details | Diff | Splinter Review
38.29 KB, patch
Details | Diff | Splinter Review
8.28 KB, patch
Details | Diff | Splinter Review
1.61 KB, patch
Details | Diff | Splinter Review
40.85 KB, patch
derf
: review+
Details | Diff | Splinter Review
13.23 KB, patch
derf
: review+
Details | Diff | Splinter Review
16.21 KB, patch
jmvalin
: review+
Details | Diff | Splinter Review
4.73 KB, patch
Details | Diff | Splinter Review
59.71 KB, patch
derf
: review+
jmvalin
: review+
Details | Diff | Splinter Review
35.94 KB, text/plain
Details
The webrtc.org code has a fixed-ratios resampler used for input->codec resampling (48000->16000 for example).  This resampler doesn't support 44100 hz directly due to it requiring larger interpolation/decimation ratios than they were willing to code.  Instead, the code lies and says 44100Hz is 44000Hz, which allows 11-to-2/4/8 resampling ratios.  Then it compensates (on windows only) in the input to the AEC with a floating-point resampler to correct things.

We should replace the fixed-ratio resampler with the speex resampler and remove all the 44000/44100 hz special-casing.  We might also end up replacing the arbitrary resampler on the input to the AEC.  Note that some windows laptops, perhaps macs, and android and b2g use 44100.

This does not remove the need to compensate for clock drift and MSG delay (see bug 884365), but this solves the worst problem leading to the work in that bug.

This work should be upstreamed to webrtc.org.  Speex should be possible to include in their 3rd-party list with libyuv.

Note: as there are no current users of the async APIs for this resampler, I'll stub them.
Attached patch Remove 44100->44000 kludges (obsolete) — Splinter Review
Attachment #767326 - Flags: review?(tterribe)
Attachment #767326 - Flags: review?(jmvalin)
Attachment #767327 - Flags: review?(tterribe)
Alternate set of 6 patches to backport the webrtc.org sinc resampler and associated CL's to enable it in place of the fixed resampler (which is still used when the ratios are supported; the sinc resampler is ONLY used when the ratios are not in the existing hard-coded-ratios).  In practice, this means the sinc resampler is used only when the input frequency is 44100.
Why was this marked android-gum+? Can you clarify?
I believe that Android drivers/hardware can't reliably be assumed to support 48000, but always support 44100.  B2G probably has the same issues.
tracking-fennec: --- → ?
tracking-fennec: ? → 24+
Update from chatting with Jan Skoglund and Andrew at Google:

They're moving away from the fixed-ratio resampler as well, and getting rid of the 44000 hack.  It appears that their resampler may glitch on rate changes (such as to compensate for small amounts of drift), and it doesn't have a quality knob.  They have not yet dealt with getUserMedia/MediaStream drift, and their AEC is still in PeerConnection.

jmspeex has looked at their implementation, and while it has Neon and runtime SSE3 optimizations, we believe the speex resampler is better overall (and the Neon and SSE3 changes can be made to it as well).  So we decided it made sense to move forward with it in Firefox, and propose it back to Google for webrtc.org.

This also is a far smaller patch than trying to land all the (not-yet-on-stable) pieces of the google sinc resampler, per the patch sets here.  I believe the speex resampler patches should be upliftable (and I'd like to uplift them to beta).
We'd be looking to land and uplift the first two patches, not the large backport set.
Whiteboard: [getUserMedia][android-gum+][blocking-gum-] → [getUserMedia][android-gum+][blocking-gum-][webrtc-uplift]
Comment on attachment 767326 [details] [diff] [review]
replace fixed-ratio capture resampler in webrtc with speex resample

Review of attachment 767326 [details] [diff] [review]:
-----------------------------------------------------------------

If the plan is to upstream this (I don't see an issue filed upstream, and in fact I seem them actively working on their SincResampler, so if that's the plan, nobody at GIPS knows about it), then we will need to follow upstream style conventions.

r-'ing for the stereo bug, which also tells me that this patch has not been adequately tested (there are unittests for this).

::: media/webrtc/trunk/webrtc/common_audio/resampler/include/resampler.h
@@ +36,5 @@
>      kResamplerInvalid = 0xff
>  };
> +#define RESAMPLE_MODE_MASK 0x0F
> +#define RESAMPLE_MODE_SYNC 0
> +#define RESAMPLE_STEREO    0x20

These need to be documented. (mode & RESAMPLE_STEREO) to test for a stereo mode is, for example, not at all obvious, since the non-stereo modes use 0x10 instead of 0x20 (0x20 is not being used as a bit flag in the original code). Similarly it's not at all clear from the name that RESAMPLE_MODE_MASK goes with RESAMPLE_MODE_SYNC (and just by looking at the values, synchronous modes _might_ be using 0x01 as a bit flag... it's not obvious why you want to mask off four bits). MODE is also pretty misleading, as these apply to a ResamplerType, not a ResamplerMode (you just deleted the latter, in fact).

@@ +41,2 @@
>  
> +// XXX adjust per platform ability

Google style is TODO(name).

@@ +41,3 @@
>  
> +// XXX adjust per platform ability
> +#define RESAMPLER_QUALITY 10

We don't allow specifying the quality anywhere in the API, so this shouldn't be in the public header.

@@ +81,4 @@
>  
>      // State
> +    int inFreq_;
> +    int outFreq_;

Google style is all-lowercase, underscore-delimited. There's old code in the old style, but you'll probably get flak from upstream for writing more code in that style instead of the new style.

::: media/webrtc/trunk/webrtc/common_audio/resampler/resampler.cc
@@ +22,4 @@
>  namespace webrtc
>  {
>  
> +Resampler::Resampler() : state_(NULL), inFreq_(48000), outFreq_(48000),

The sampling rates used to be initialized to 0. Why are we using 48000 now?

@@ +29,4 @@
>  }
>  
>  Resampler::Resampler(int inFreq, int outFreq, ResamplerType type)
>  {

You aren't initializing any member variables here, but Reset() uses plenty of them without writing to them first.

@@ +49,3 @@
>  
> +  } else if (inFreq != inFreq_ || outFreq != outFreq_) {
> +    if (speex_resampler_set_rate(state_, inFreq, outFreq) == RESAMPLER_ERR_SUCCESS)

Long line.

@@ +67,5 @@
> +  if (state_)
> +  {
> +    speex_resampler_destroy(state_);
> +  }
> +  state_ = speex_resampler_init(channels, inFreq, outFreq, RESAMPLER_QUALITY, NULL);

Long line.

@@ +93,5 @@
>  
> +  if (maxLen < lengthIn) {
> +    return -1;
> +  }
> +  spx_uint32_t len = lengthIn;

lengthIn is the total number of samples*channels, but the Speex resampler takes the number of samples per channel. So you're telling Speex to resample twice as many samples as there actually are when using stereo.

@@ +95,5 @@
> +    return -1;
> +  }
> +  spx_uint32_t len = lengthIn;
> +  spx_uint32_t out;
> +  if ((speex_resampler_process_interleaved_int(state_, samplesIn, &len, samplesOut, &out) !=

Long line.

@@ +108,5 @@
>  
>  // Asynchronous resampling, input
>  int Resampler::Insert(WebRtc_Word16 * samplesIn, int lengthIn)
>  {
> +#ifdef ASYNC_RESAMPLER_SUPPORTED

This code isn't going to compile at all.

Either fix it (in which case you don't need the #ifdef) or delete it (in which case you don't need the #ifdef).

Given that I can't find any code that actually uses the asynchronous API (not even in tests! though they do test passing asynchronos types to Reset()), we should confirm with upstream that we can just kill it (all of it, including the types).
Attachment #767326 - Flags: review?(tterribe) → review-
(In reply to Timothy B. Terriberry (:derf) from comment #13)
> Comment on attachment 767326 [details] [diff] [review]
> replace fixed-ratio capture resampler in webrtc with speex resample
> 
> Review of attachment 767326 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> If the plan is to upstream this (I don't see an issue filed upstream, and in
> fact I seem them actively working on their SincResampler, so if that's the
> plan, nobody at GIPS knows about it), then we will need to follow upstream
> style conventions.

We have talked to both Jan Skoglund and Andrew; they're open to taking this in place of theirs if it does a better job, and from our discussion and jmspeex's evaluation, it likely does/will.  If theirs ends up covering it better, great, we'll stick with sinc resampler on next update.

> r-'ing for the stereo bug, which also tells me that this patch has not been
> adequately tested (there are unittests for this).

I don't have their tests built right now.  I can look to do so and apply this against 3.30.  I was holding off until we actually we considering upstreaming it.

> @@ +81,4 @@
> >  
> >      // State
> > +    int inFreq_;
> > +    int outFreq_;
> 
> Google style is all-lowercase, underscore-delimited. There's old code in the
> old style, but you'll probably get flak from upstream for writing more code
> in that style instead of the new style.

Done
 
> ::: media/webrtc/trunk/webrtc/common_audio/resampler/resampler.cc
> @@ +22,4 @@
> >  namespace webrtc
> >  {
> >  
> > +Resampler::Resampler() : state_(NULL), inFreq_(48000), outFreq_(48000),
> 
> The sampling rates used to be initialized to 0. Why are we using 48000 now?

No reason other than to have them be a reasonable value; in practice they're overridden before it can be used - reset MUST be called to use this, so I'm just setting state_ to NULL (see below).

> 
> @@ +29,4 @@
> >  }
> >  
> >  Resampler::Resampler(int inFreq, int outFreq, ResamplerType type)
> >  {
> 
> You aren't initializing any member variables here, but Reset() uses plenty
> of them without writing to them first.

Actually only state_ (it sets all the rest), but that's a bug.  Also made all the other calls check for a null state_ at entry, since in theory Reset() can fail leaving state_ null.

> @@ +93,5 @@
> >  
> > +  if (maxLen < lengthIn) {
> > +    return -1;
> > +  }
> > +  spx_uint32_t len = lengthIn;
> 
> lengthIn is the total number of samples*channels, but the Speex resampler
> takes the number of samples per channel. So you're telling Speex to resample
> twice as many samples as there actually are when using stereo.

Thanks, fixed.
 

> >  // Asynchronous resampling, input
> >  int Resampler::Insert(WebRtc_Word16 * samplesIn, int lengthIn)
> >  {
> > +#ifdef ASYNC_RESAMPLER_SUPPORTED
> 
> This code isn't going to compile at all.
> 
> Either fix it (in which case you don't need the #ifdef) or delete it (in
> which case you don't need the #ifdef).

Pulled it leaving stubs.  If they agree to upstreaming, we can pull all support.
> I was holding off until we actually we considering upstreaming it.
Try #2:
I was holding off until we actually were considering upstreaming it.
Attachment #767326 - Attachment is obsolete: true
Attachment #767326 - Flags: review?(jmvalin)
Jesup, our read is this is needed for android webrtc which is going to be enabled in Fx24 , can you further explain why this would be needed on FX23 ?
bhavana - This problem breaks any system using 44100Hz microphones, like the default setting on Lenovo W520's, or people who have headsets that do 44100 but not 48000, like mreavy's.  So we definitely would like it on 23.
Removed unused and un-tested async interface; fix another stereo issue with returned number of samples.  There should be a separate change to switch the API from total-samples to samples-per-channel.  Also note that the AudioFrame output buffer in theory could be too small, though in practice it's fine and this code avoids overrunning it.
Attachment #772501 - Attachment is obsolete: true
Attachment #767327 - Attachment is obsolete: true
Attachment #767327 - Flags: review?(tterribe)
Attachment #774214 - Flags: review?(tterribe)
Attachment #774216 - Flags: review?(tterribe)
(In reply to Randell Jesup [:jesup] from comment #21)
> Created attachment 774222 [details]
> resampler unit test output and valgrind output

"definitely lost: 11,432 bytes in 15 blocks"

Is that us?
fixes leak due to not destroying the speex object properly in ~Resampler()
Attachment #774214 - Attachment is obsolete: true
Attachment #774214 - Flags: review?(tterribe)
Attached file Updated unit test and valgrind results (obsolete) —
Attachment #774222 - Attachment is obsolete: true
FYI, I could change the conditional assignments on channels_ to /channels_ and *channels_, I just automatically avoid non-static divides (especially if I can avoid them entirely in the common case)
Attachment #774252 - Flags: review?(tterribe)
Comment on attachment 774252 [details] [diff] [review]
replace fixed-ratio capture resampler in webrtc with speex resample

Review of attachment 774252 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/trunk/webrtc/common_audio/resampler/resampler.cc
@@ +37,5 @@
>  
>  Resampler::~Resampler()
>  {
> +  if (state_)
> +  { 

Whitespace.

@@ +91,5 @@
> +  if (!state_ || max_len < length_in) {
> +    return -1;
> +  }
> +  spx_uint32_t len = length_in = (channels_ == 1 ? length_in : length_in/2);
> +  spx_uint32_t out = (spx_uint32_t) (channels_ == 1 ? max_len : max_len/2);

length_in and max_len are signed, so using /2 is more expensive than you think. You should use >> 1.

Personally, I would use >> (channels_ - 1), and assert that channels_ is 1 or 2.

@@ +98,5 @@
> +      len != (spx_uint32_t) length_in)
> +  {
> +    return -1;
> +  }
> +  out_len = (int) (channels_ == 1 ? out : out*2);

I approve of avoiding divides, but out*channels_ is going to be faster than this conditional stuff.
Attachment #774252 - Flags: review?(tterribe) → review+
Comment on attachment 767327 [details] [diff] [review]
Remove 44100->44000 kludges

Review of attachment 767327 [details] [diff] [review]:
-----------------------------------------------------------------

I haven't done a detailed review of this yet, but there are some obvious problems you should fix.

::: media/webrtc/trunk/webrtc/modules/audio_device/linux/audio_device_pulse_linux.cc
@@ +1975,4 @@
>  #endif
>      }
>  
>      _samplingFreq = paSampleRate / 1000;

This, of course, means you still have a problem here.

<http://review.webrtc.org/1384004/> has a perfectly good, reviewed patch for this driver. We should just use that.

::: media/webrtc/trunk/webrtc/modules/audio_device/win/audio_device_core_win.cc
@@ -2318,5 @@
>          _playBlockSize = Wfx.nSamplesPerSec/100;
>          _playSampleRate = Wfx.nSamplesPerSec;
>          _devicePlaySampleRate = Wfx.nSamplesPerSec; // The device itself continues to run at 44.1 kHz.
>          _devicePlayBlockSize = Wfx.nSamplesPerSec/100;
> -        if (_playBlockSize == 441)

There's a perfectly good, reviewed patch for this driver at <http://review.webrtc.org/1383004/>. We should just use that.
Attachment #767327 - Attachment is obsolete: false
Comment on attachment 774216 [details] [diff] [review]
Remove 44100->44000 kludges

Review of attachment 774216 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, did the last review from an old tab. But comment 27 applies to this patch as well.
Attachment #774216 - Flags: review?(tterribe) → review-
Attachment #767327 - Attachment is obsolete: true
Attached patch Remove 44100->44000 kludges (obsolete) — Splinter Review
Replaced those parts of the patch with a direct import of the diffs from issues 1383004 (effectively identical) and 1384004 (adds removal of dead drift accumulator code).  Still unhappy with the fact that they seem to have still not fixed a bunch of these 44000 issues in this patch on trunk, especially with tests and voe_external_media_impl
Attachment #774216 - Attachment is obsolete: true
Attachment #774498 - Flags: review?(tterribe)
tested with a modified version of data_test.html
Attachment #774498 - Attachment is obsolete: true
Attachment #774498 - Flags: review?(tterribe)
Comment on attachment 774680 [details] [diff] [review]
Remove 44100->44000 kludges

Sorry, this is identical to the previous one.  Did a bzexport with the wrong patch at the top of my queue
Attachment #774680 - Flags: review?(tterribe)
This is our last week for more speculative fixes (and ones with slightly higher risk) so please get a nomination for uplift ready this week or we'll be looking at pushing this to FF24.
Comment on attachment 774680 [details] [diff] [review]
Remove 44100->44000 kludges

Review of attachment 774680 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty close.

I was expecting to see fixes for modules/audio_device/android/audio_device_jni_android.cc, as well. Any reason they're not here?

voice_engine/output_mixer_unittest.cc:RemixAndResampleFailsWithBadSampleRate() now looks broken. Once again I think we're going to need to actually run some tests.

::: media/webrtc/trunk/webrtc/modules/media_file/source/media_file_utility.cc
@@ +621,3 @@
>                  (_wavFormatObj.nBitsPerSample / 8);
>          } else if(_wavFormatObj.nSamplesPerSec == 22050) {
> +            _readSizeBytes = 220 * _wavFormatObj.nChannels * // XXX inexact!

I'm not sure what this comment is supposed to mean (here and everywhere below). Obviously 22050/100.0 != 220.0, but what's the consequence of that?

::: media/webrtc/trunk/webrtc/modules/utility/source/file_player_impl.cc
@@ +93,5 @@
>      else if(_codec.plfreq == 22000)
>      {
>          return 32000;
>      }
> +    else if(_codec.plfreq == 44100 || _codec.plfreq == 44000 ) // XXX just 44100?

I think this whole if/else chain should just map >= 32000 to 32000, >= 16000 to 16000, and then > 0 to 8000. If you're mixing in an audio file on the playout side, any rate other than one of these three will break things (e.g., the limiter in the AudioConferenceMixer will not run).

::: media/webrtc/trunk/webrtc/voice_engine/voe_external_media_impl.cc
@@ +299,5 @@
>             "ExternalPlayoutGetData() external playout is not enabled");
>          return -1;
>      }
>      if ((16000 != samplingFreqHz) && (32000 != samplingFreqHz) &&
> +        (48000 != samplingFreqHz) && (44100 != samplingFreqHz))

You need to update the corresponding documentation for this and for GetAudioFrame() in voice_engine/include/voe_external_media.h.
Attachment #774680 - Flags: review?(tterribe) → review-
(In reply to Timothy B. Terriberry (:derf) from comment #33)
> Comment on attachment 774680 [details] [diff] [review]
> Remove 44100->44000 kludges
> 
> Review of attachment 774680 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks pretty close.
> 
> I was expecting to see fixes for
> modules/audio_device/android/audio_device_jni_android.cc, as well. Any
> reason they're not here?

Not sure why they got lost.  They'll be in the update; fairly straightforward.

> voice_engine/output_mixer_unittest.cc:
> RemixAndResampleFailsWithBadSampleRate() now looks broken. Once again I
> think we're going to need to actually run some tests.

In fact in the current sinc-only trunk of webrtc.org that test has been removed, so we should remove it here (it tests for being unable to do the resample).

I'm looking into the rest of the tests there; there's a fixed assumption about the resampling kernel delay which may be violated (and they adjusted for sinc on trunk as well).
 
> ::: media/webrtc/trunk/webrtc/modules/media_file/source/media_file_utility.cc
> @@ +621,3 @@
> >                  (_wavFormatObj.nBitsPerSample / 8);
> >          } else if(_wavFormatObj.nSamplesPerSec == 22050) {
> > +            _readSizeBytes = 220 * _wavFormatObj.nChannels * // XXX inexact!
> 
> I'm not sure what this comment is supposed to mean (here and everywhere
> below). Obviously 22050/100.0 != 220.0, but what's the consequence of that?

Apparently nothing. I'm being pedantic and flagging it.  They didn't modify it in trunk.

> ::: media/webrtc/trunk/webrtc/modules/utility/source/file_player_impl.cc
> @@ +93,5 @@
> >      else if(_codec.plfreq == 22000)
> >      {
> >          return 32000;
> >      }
> > +    else if(_codec.plfreq == 44100 || _codec.plfreq == 44000 ) // XXX just 44100?
> 
> I think this whole if/else chain should just map >= 32000 to 32000, >= 16000
> to 16000, and then > 0 to 8000. If you're mixing in an audio file on the
> playout side, any rate other than one of these three will break things
> (e.g., the limiter in the AudioConferenceMixer will not run).

They didn't bother to even update it on trunk (it checks for 44000 and lets 44100 fall through and get returned).  I agree I'm not fixing all possible bugs in this code or how it's used, but I am fixing the one caused by the other patch shifting 44000 -> 44100.

> ::: media/webrtc/trunk/webrtc/voice_engine/voe_external_media_impl.cc
> @@ +299,5 @@
> >             "ExternalPlayoutGetData() external playout is not enabled");
> >          return -1;
> >      }
> >      if ((16000 != samplingFreqHz) && (32000 != samplingFreqHz) &&
> > +        (48000 != samplingFreqHz) && (44100 != samplingFreqHz))
> 
> You need to update the corresponding documentation for this and for
> GetAudioFrame() in voice_engine/include/voe_external_media.h.

Thanks.
delta to the remove_44000 patch, as a separate patch since these are files the old one didn't touch
update to the output_mixer unit tests for speex resampler.  Also, we found that the RESAMPLER_QUALITY parameter should be reduced to 1 to be roughly equivalent in delay and quality to the sinc_resampler, and to reduce delay from ~8ms-ish to 1-2ms-ish - on a par with sinc.  Final quality is still in the 55-75db SNR range.
Note that valgrind on the original webrtc.org code has the same leaks as shown here (for both tests).
Attachment #774255 - Attachment is obsolete: true
Attachment #777993 - Flags: review?(tterribe)
Attachment #777998 - Flags: review?(tterribe)
Comment on attachment 774680 [details] [diff] [review]
Remove 44100->44000 kludges

re-marking for review as updates in response to review are in the other patch
Attachment #774680 - Flags: review- → review?(tterribe)
Per above, I'd also change RESAMPLER_QUALITY in the main patch (already r+'d) to 1.
jmspeex suggests quality 3, at least for desktop.  delay in each direction is only around 1.5ms, which is ok.  this does require a small adjustment to the unittest max_delay
Comment on attachment 777998 [details] [diff] [review]
output_mixer_unittest updates for speex resampler

Review of attachment 777998 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/trunk/webrtc/voice_engine/output_mixer_unittest.cc
@@ +161,3 @@
>  }
>  
> +#if 0

Rather than say "that's not a valid assumption" and disable the tests, it seems likely that someone wrote a test for this exactly because they wanted this behavior preserved should the code change. I should have caught it in my review of the earlier patch, but that's what tests are for.

It seems particularly important for, e.g., output_mixer, which _always_ calls RemixAndResample(), even when the sample rates are the same and will never change. I don't think we should take the performance hit of running a anti-aliasing low-pass filter on the audio in this case, and I'll bet Google wouldn't be happy about that, either.
Attachment #777998 - Flags: review?(tterribe) → review-
Comment on attachment 777993 [details] [diff] [review]
additional 44000->44100 updates

Review of attachment 777993 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with nits fixed.

::: media/webrtc/trunk/webrtc/modules/audio_device/android/audio_device_jni_android.cc
@@ +134,5 @@
>              _playError(0), _recWarning(0), _recError(0), _delayPlayout(0),
>              _delayRecording(0),
>              _AGC(false),
> +            _samplingFreqIn((N_REC_SAMPLES_PER_SEC)),
> +            _samplingFreqOut((N_PLAY_SAMPLES_PER_SEC)),

Can we drop the extraneous parentheses while we're here?

@@ +2665,5 @@
>          }
>          else if (res > 0)
>          {
>              // we are not recording and have got a delay value from playback
> +            _delayPlayout = (res * 1000) / _samplingFreqOut;

A comment about why this won't overflow might be nice.

::: media/webrtc/trunk/webrtc/voice_engine/include/voe_external_media.h
@@ +95,5 @@
>      virtual int ExternalRecordingInsertData(
>          const int16_t speechData10ms[], int lengthSamples,
>          int samplingFreqHz, int current_delay_ms) = 0;
>  
> +

Extraneous whitespace change.

@@ +100,5 @@
>      // This function gets audio for an external playout sink.
>      // During transmission, this function should be called every ~10 ms
>      // to obtain a new 10 ms frame of audio. The length of the block will
> +    // be 160, 320, 440 or 480 samples (for 16000, 32000, 44100 or 48000
> +    // kHz sampling rates respectively).

Hz, not kHz.

@@ +108,5 @@
>  
>      // Pulls an audio frame from the specified |channel| for external mixing.
>      // If the |desired_sample_rate_hz| is 0, the signal will be returned with
>      // its native frequency, otherwise it will be resampled. Valid frequencies
> +    // are 16000, 22050, 32000, 44100 or 48000 kHz.

Hz, not kHz.
Attachment #777993 - Flags: review?(tterribe) → review+
Interdiffs to supply a memcpy() 'resampler' for Fixed-rate, same-rate, same-channels resamples similar to existing 'sinc resampler'.  This is now used in the output mixer.  Since the output_mixer unit tests are no longer testing the non-fixed-rate cases, I mirrored the tests in resampler_unittests (which were resampling, but not checking the results in any useful way).  Note: Fixed-rate resamples are assumed to not change rates in use, and may glitch when told to change rates.  Non-fixed-rate resamplers cand adjust for drift (for example) by changing from 48000->48000 to 48000->48001) with no glitching, but this requires the resampler to run all the time (and filter)
re-enables the cut-through memcpy resample tests
Attachment #777998 - Attachment is obsolete: true
rolledup changes for resampler and unit tests
Attachment #774252 - Attachment is obsolete: true
Comment on attachment 778532 [details] [diff] [review]
replace fixed-ratio capture resampler in webrtc with speex resample

jean-marc - mostly I'd like you to look at the fixed-rate/memcpy changes in the interdiff patch
Attachment #778532 - Flags: review?(tterribe)
Attachment #778532 - Flags: review?(jmvalin)
Comment on attachment 778532 [details] [diff] [review]
replace fixed-ratio capture resampler in webrtc with speex resample

Review of attachment 778532 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with all comments addressed.

::: media/webrtc/trunk/webrtc/common_audio/resampler/resampler.cc
@@ +23,5 @@
> +// TODO(jesup) better adjust per platform ability
> +#if defined(WEBRTC_ANDROID) || defined(WEBRTC_GONK)
> +#define RESAMPLER_QUALITY 2
> +#else
> +#define RESAMPLER_QUALITY 3

Please use SPEEX_RESAMPLER_QUALITY_VOIP instead of the numeric 3.

I'd also recommend SPEEX_RESAMPLER_QUALITY_VOIP-2 instead of the hard-coded 2 for Android/Gonk (not sure why you're using 2 here instead of 1, since the latter is I believe what jmspeex recommended for mobile, and matches the current GIPS CPU/quality tradeoffs).

@@ +108,5 @@
> +    return -1;
> +  }
> +  if (!state_)
> +  {
> +    if (!IsFixedRate() || in_freq_ != out_freq_)

The member variables you're examining here aren't initialized in the constructor. If you want this check to work, you need to initialize type_ to a non-fixed-rate type.

I'd also include a comment when you do that this will cause Push() to fail until Reset() is invoked. The old code worked the same way (because my_type_ was initialized to kResamplerInvalid, which got treated as asynchronous), but it was not obvious this was intentional.

::: media/webrtc/trunk/webrtc/common_audio/resampler/resampler_unittest.cc
@@ +41,2 @@
>  bool ValidRates(int in_rate, int out_rate) {
>    // Not the most compact notation, for clarity.

This comment no longer applies.

@@ +158,5 @@
> +    SetStereoFrame(data_out_, 0, 0, dst_sample_rate_hz);
> +    SetStereoFrame(data_reference_, kDstLeft, kDstRight, dst_sample_rate_hz);
> +  }
> +
> +  // The sinc resampler has a known delay, which we compute here. Multiplying by

We want this estimated for the Speex resampler, not the sinc resampler, right?

::: media/webrtc/trunk/webrtc/voice_engine/output_mixer_unittest.cc
@@ +151,5 @@
> +  // The speex resampler has a known delay dependent on quality, which we
> +  // approximate here. Multiplying by two gives us a crude maximum for
> +  // any resampling, as the old resampler typically (but not always) has
> +  // lower delay.
> +  static const int kInputKernelDelaySamples = 16*3;

Why 16*3? The Speex filter length is based on the values in its quality_map table, which for Q3 is 48, scaled by src_rate/dst_rate when downsampling. Phrasing 48 as "16*3" might imply that this scales linearly with quality, which it does not. The latency is half the filter length, but you're not dividing by 2 here, which means you're really using something 4 times the known delay, instead of 2 as the comment claims (because the next line scales by 2 more).

A comment pointing to quality_map wouldn't be amiss, either, since if someone does change the quality, they will likely have to manually update this unit test to get it to pass, and might not have any idea how the Speex resampler works.

@@ +153,5 @@
> +  // any resampling, as the old resampler typically (but not always) has
> +  // lower delay.
> +  static const int kInputKernelDelaySamples = 16*3;
> +  const int max_delay = static_cast<double>(dst_sample_rate_hz)
> +      / src_sample_rate_hz * kInputKernelDelaySamples * dst_channels * 2;

Pretty sure you want (resampling_factor > 1 ? 1.0 : 1/resampling_factor) * dst_channels * 2. See line 569 of media/libspeex_resampler/src/resample.c.

::: media/webrtc/trunk/webrtc/voice_engine/transmit_mixer.cc
@@ +1152,5 @@
>  }
>  
>  // TODO(andrew): use RemixAndResample for this.
> +// Note that if RemixAndResample is used, the type will need not be
> +// kResampleFixedSynchronous(Stereo) for this usage

I'm not sure what you mean here. "Will need not be" implies sometimes you might want one, sometimes the other, but there's no guidance for anyone to decide which is which.

I actually think you always want fixed. This is mapping from capture rate to an internal rate used for processing before encoding, which is equal to the capture rate unless the codec is not capable of encoding at a rate that high. This should be a no-op in the typical case (Opus).

We'll only change the codec rate if the codec changes (i.e., on renegotiation). There's no way to avoid a glitch in this case.

Because the API that feeds this only handles chunks of exactly 10 ms, we can only vary the capture rate by multiples of 100, so if we're going to do any fine drift compensation of the capture rate, we have to do that resampling before the audio gets here (in practice we'll do it in gUM before the data enters the MSG, and by the time we get here we're in the PC). That means the capture rate is never going to change unless the capture device changes. Again, there's no way to avoid a glitch when that happens.

I think you also need to use a fixed resampler type for ACMResampler, for the same reasons.
Attachment #778532 - Flags: review?(tterribe) → review+
Comment on attachment 774680 [details] [diff] [review]
Remove 44100->44000 kludges

Review of attachment 774680 [details] [diff] [review]:
-----------------------------------------------------------------

Okay, once you address the comments in the other patches, I think we're good.
Attachment #774680 - Flags: review?(tterribe) → review+
Comment on attachment 778529 [details] [diff] [review]
Interdiffs for adding support for fixed-rate same-frequency "resampling" via memcpy

Review of attachment 778529 [details] [diff] [review]:
-----------------------------------------------------------------

I can only comment on the memcpy() shortcut, but it looks good to me. We originally discussed using quality 1 for mobile, but I think it's OK to start with 2 and see if CPU is a problem before going down to 1 (only if necessary).
Attachment #778529 - Flags: review+
Comment on attachment 778532 [details] [diff] [review]
replace fixed-ratio capture resampler in webrtc with speex resample

Review of attachment 778532 [details] [diff] [review]:
-----------------------------------------------------------------

I actually think using the numeric value is better than referring to SPEEX_RESAMPLER_QUALITY_VOIP. Also, I don't have an issue with referring to the Speex resampler as "sinc resampler" (that's what it is), unless it creates confusion with the Google sinc resampler.
Attachment #778532 - Flags: review?(jmvalin) → review+
(In reply to Timothy B. Terriberry (:derf) from comment #48)
> Comment on attachment 778532 [details] [diff] [review]
> replace fixed-ratio capture resampler in webrtc with speex resample
> 
> Review of attachment 778532 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me with all comments addressed.

Thanks!

> 
> ::: media/webrtc/trunk/webrtc/common_audio/resampler/resampler.cc
> @@ +23,5 @@
> > +// TODO(jesup) better adjust per platform ability
> > +#if defined(WEBRTC_ANDROID) || defined(WEBRTC_GONK)
> > +#define RESAMPLER_QUALITY 2
> > +#else
> > +#define RESAMPLER_QUALITY 3
> 
> Please use SPEEX_RESAMPLER_QUALITY_VOIP instead of the numeric 3.

> I'd also recommend SPEEX_RESAMPLER_QUALITY_VOIP-2 instead of the hard-coded
> 2 for Android/Gonk (not sure why you're using 2 here instead of 1, since the
> latter is I believe what jmspeex recommended for mobile, and matches the
> current GIPS CPU/quality tradeoffs).

Per Jean-marc's comments, I'll keep them as they are for now, with possibility for adjustment once we get real measurements.


> @@ +108,5 @@
> > +    return -1;
> > +  }
> > +  if (!state_)
> > +  {
> > +    if (!IsFixedRate() || in_freq_ != out_freq_)
> 
> The member variables you're examining here aren't initialized in the
> constructor. If you want this check to work, you need to initialize type_ to
> a non-fixed-rate type.
> 
> I'd also include a comment when you do that this will cause Push() to fail
> until Reset() is invoked. The old code worked the same way (because my_type_
> was initialized to kResamplerInvalid, which got treated as asynchronous),
> but it was not obvious this was intentional.

Ok.

> ::: media/webrtc/trunk/webrtc/common_audio/resampler/resampler_unittest.cc
> @@ +41,2 @@
> >  bool ValidRates(int in_rate, int out_rate) {
> >    // Not the most compact notation, for clarity.
> 
> This comment no longer applies.

Removed it and the one call to it, since it does nothing.

I adjusted the comments about the max_delay calculations.

> @@ +153,5 @@
> > +  // any resampling, as the old resampler typically (but not always) has
> > +  // lower delay.
> > +  static const int kInputKernelDelaySamples = 16*3;
> > +  const int max_delay = static_cast<double>(dst_sample_rate_hz)
> > +      / src_sample_rate_hz * kInputKernelDelaySamples * dst_channels * 2;
> 
> Pretty sure you want (resampling_factor > 1 ? 1.0 : 1/resampling_factor) *
> dst_channels * 2. See line 569 of media/libspeex_resampler/src/resample.c.

I'm trying to avoid making it too algorithm-specific here, but this is probably reasonable.  The max_delay just has to be a bit bigger than any actual delay.  I used std::min() however.  Passes all tests with this.

> ::: media/webrtc/trunk/webrtc/voice_engine/transmit_mixer.cc
> @@ +1152,5 @@
> >  }
> >  
> >  // TODO(andrew): use RemixAndResample for this.
> > +// Note that if RemixAndResample is used, the type will need not be
> > +// kResampleFixedSynchronous(Stereo) for this usage
> 
> I'm not sure what you mean here. "Will need not be" implies sometimes you
> might want one, sometimes the other, but there's no guidance for anyone to
> decide which is which.
> 
> I actually think you always want fixed. This is mapping from capture rate to
> an internal rate used for processing before encoding, which is equal to the
> capture rate unless the codec is not capable of encoding at a rate that
> high. This should be a no-op in the typical case (Opus).
> 
> We'll only change the codec rate if the codec changes (i.e., on
> renegotiation). There's no way to avoid a glitch in this case.
> 
> Because the API that feeds this only handles chunks of exactly 10 ms, we can
> only vary the capture rate by multiples of 100, so if we're going to do any
> fine drift compensation of the capture rate, we have to do that resampling
> before the audio gets here (in practice we'll do it in gUM before the data
> enters the MSG, and by the time we get here we're in the PC). That means the
> capture rate is never going to change unless the capture device changes.
> Again, there's no way to avoid a glitch when that happens.

Shortly I'll be working to move the AEC into gUM, and thus the resamples after capture are the best place to adapt to drift.  We want to use a variable resample only in one place since each one requires a 10ms buffer (avg 5ms delay), and since we must do it before gUM we'll want to do it at the input conversion point (even if it's roughly x -> x (+- a few hz occasionally)).

Since that hasn't been done yet (by us or by Google), I switched them to Fixed resamples and rewrote the comment to say:
// TODO(andrew): use RemixAndResample for this.
// Note that if drift compensation is done here, a buffering stage will be
// needed and this will need to switch to non-fixed resamples.

> I think you also need to use a fixed resampler type for ACMResampler, for
> the same reasons.

The ACMResampler is used to feed the encoder, and for output to the mediastream -- but as that is pull-based, any actual adaptation occurs higher up in NetEq and adaptive jitter-buffer/TimeScaleModifier.  I agree; these can be fixed.
Comment on attachment 778532 [details] [diff] [review]
replace fixed-ratio capture resampler in webrtc with speex resample

[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A

User impact if declined: Anyone with a 44100Hz mic on Windows will have significant audio delay within a minute or two, increasing with time (0.23% drift).  Some popular headsets default to 44100 or can do no higher than that (including ones distributed by Mozilla IT at times), some laptops and other devices default to 44100 even if they can do 48000Hz, such as the (popular) Lenovo W520.  This bug makes getUserMedia() and WebRTC peerconnections/calls unusable on those devices.

Testing completed (on m-c, etc.): On m-c soon (landed Sat).  Mirrors a solution used in later versions of the upstream code, but (as can be seen on this patch) taking the set of patches from upstream to fix it requires a slew of changes.  We mirrored the fix using the already-available Speex resampler.  Tested using the webrtc.org codebase unit tests, including under valgrind.  Should be verifiable.

Risk to taking this patch (and alternatives if risky): Low.  No real alternative other than the much larger patchset from upstream (and getting them to apply to the older Aurora and Beta webrtc.org imports would be even tougher).

String or IDL/UUID changes made by this patch: none

Note: the related patch to removed the 44000/44100 kludges is part of this request
Attachment #778532 - Flags: approval-mozilla-beta?
Attachment #778532 - Flags: approval-mozilla-aurora?
Comment on attachment 774680 [details] [diff] [review]
Remove 44100->44000 kludges

Approval request for part 1 of the kludge removal
Attachment #774680 - Flags: approval-mozilla-beta?
Attachment #774680 - Flags: approval-mozilla-aurora?
Comment on attachment 777993 [details] [diff] [review]
additional 44000->44100 updates

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: 
Testing completed (on m-c, etc.): 
Risk to taking this patch (and alternatives if risky): 
String or IDL/UUID changes made by this patch:

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: 
Testing completed (on m-c, etc.): 
Risk to taking this patch (and alternatives if risky): 
String or IDL/UUID changes made by this patch:

Approval request for part 2 of the kludge removal
Attachment #777993 - Flags: approval-mozilla-beta?
Attachment #777993 - Flags: approval-mozilla-aurora?
Attachment #774680 - Flags: approval-mozilla-beta?
Attachment #774680 - Flags: approval-mozilla-beta+
Attachment #774680 - Flags: approval-mozilla-aurora?
Attachment #774680 - Flags: approval-mozilla-aurora+
Attachment #777993 - Flags: approval-mozilla-beta?
Attachment #777993 - Flags: approval-mozilla-beta+
Attachment #777993 - Flags: approval-mozilla-aurora?
Attachment #777993 - Flags: approval-mozilla-aurora+
Comment on attachment 778532 [details] [diff] [review]
replace fixed-ratio capture resampler in webrtc with speex resample

approving for Beta - let's get this on today's Beta so if there's any unexpected fallout we can pull it from Thursday's Beta.
Attachment #778532 - Flags: approval-mozilla-beta?
Attachment #778532 - Flags: approval-mozilla-beta+
Attachment #778532 - Flags: approval-mozilla-aurora?
Attachment #778532 - Flags: approval-mozilla-aurora+
Whiteboard: [getUserMedia][android-gum+][blocking-gum-][webrtc-uplift] → [getUserMedia][android-gum+][blocking-gum-]
How can QA verify this fix?
Flags: needinfo?
You need a windows machine with a mic (internal or headset) set to 44100Hz.

In a call, or in http://mozilla.github.com/webrtc-landing/gum_test.html , watch for audio that you send (or is echoed to you in the gum_test) should not get progressively more delayed.  Before this patch, it would get delayed by a factor of 0.23% (which adds up quickly -- About 130ms per minute).  After 4 minutes you'd have about 1/2 second delay.

Note that there still may be small sources of clock drift/delay-buildup, but they should be <0.01% (and likely much smaller).

Also verify that audio (getUserMedia) works correctly and has no major drift on Linux and Mac, and that other frequencies on Windows work as well (at least 48000).  On windows the mic frequencies are often adjustable in the Advanced tab of the properties for the mic.
Flags: needinfo?
Did you file a bug for the clobber you needed?
Flags: needinfo?(rjesup)
The gypi dependency issue is known; though I can't find the bug for it now.

Bug 800847 solved this for .gyp files, but not gypi.  It's possible that Ted and I only discussed it on IRC.

I'll file a bug
Flags: needinfo?(rjesup)
I've done different verifications, 1 to 1 call across two different machines on different networks.

1. tokbox.com Testing: http://tokbox.com/opentok/webrtc/demo
Results: Worked as expected.

2. freshtilledsoil.com Testing: http://freshtilledsoil.com/the-future-of-web/webrtc-video/
Results: Worked as expected.

3. codassium.com Testing: http://codassium.com/
Results: Worked as expected.

4. conversat.io Testing: http://conversat.io/  
Results: Worked as expected.

All tests were done on Windows 7x64, Windows 8x86 and Mac OS 10 using FF 23b8.

The only issue that I had found was that there were a couple of audio delays and interruptions, but less than 1sec.

Based on my verifications done I can confirm the fix is verified.
Keywords: verifyme
QA Contact: mihai.morar
Depends on: 897300
This is noted in the Firefox 23 release notes as:

CHANGED
Replace fixed-ratio audio resampler in webrtc.org capture code with Speex resampler and eliminate pseudo-44000Hz rate


Any changes or concerns can be addressed to release-mgmt@mozilla.com
Huge audio delay while testing on FF 24b1 on following URLs :

1:1 call across 2 different machines on same network:
http://freshtilledsoil.com/the-future-of-web/webrtc-video/

3-4 call across different machines from different networks
http://codassium.com/

In both cases in first 3-5 minutes everything was fine, after that a huge audio delay appeared (about 15-20 sec)
(In reply to Mihai Morar, QA (:MihaiMorar) [ PTO 08/09 - 08/26] from comment #66)
> Huge audio delay while testing on FF 24b1 on following URLs :
> 
> 1:1 call across 2 different machines on same network:
> http://freshtilledsoil.com/the-future-of-web/webrtc-video/
> 
> 3-4 call across different machines from different networks
> http://codassium.com/
> 
> In both cases in first 3-5 minutes everything was fine, after that a huge
> audio delay appeared (about 15-20 sec)

This description sounds like a different bug than the one in this bug report.  So I opened a new bug for the audio delay: Bug 902428.
Depends on: 901527
Blocks: 785584
Regression caused by this: bug 901527

Fixed via that bug on 24/25/26; this change backed out in 23.0.1.  23.0 will show the bug 901527 regression.
Blocks: 908834
What is needed to call this verified for Firefox 25 and 26?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #69)
> What is needed to call this verified for Firefox 25 and 26?

Can you use the steps from comment #61?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #69)
> What is needed to call this verified for Firefox 25 and 26?

We did different investigations in 1:1 call on Latest Nightly 26 and FF 24b9 last Friday and everything worked as expected. We used most scenarios from Comment 64 and Comment 66.

If there are any additional investigations required please let me know
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: