Closed Bug 852366 Opened 11 years ago Closed 11 years ago

Avoid resampling on the main thread

Categories

(Core :: Web Audio, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: ehsan.akhgari, Assigned: padenot)

References

Details

Attachments

(1 file, 3 obsolete files)

Attached patch WIP (obsolete) — Splinter Review
I think we should avoid doing any resampling on the main thread and just do it on the fly in the AudioBufferSourceNodeEngine.  I did a quick and dirty implementation but libspeex's resampler sucks in that it doesn't guarantee how many output samples it produces, so the output with this patch is choppy.

The other thing to consider is that we should probably just produce the entire output buffer and then resample it in one go as opposed to the current approach in the patch.
Blocks: 854336
I'm going to finish this, possibly writing a very simple linear resampler in the process.

Linear resampling is not great, but we need a resampler that can have a predictable number of output samples to go forward. If someone complains (and I expect someone will), we will switch to a sinc resampler (either tweaking speex's or writing a new one. Or using some other resampler).
Assignee: nobody → paul
Please do not ever use a linear resampler. You may listen to a few files and think you can get away with it, but other files (or in the context of WebRTC) will sound absolutely horrible. I'm not exactly sure what you mean by "doesn't guarantee how many output samples it produces", but I'm sure there's something that can be done. If you give it enough input samples, it should definitely fill the output buffer (there was a bug where it wouldn't but it's fixed in opus-tools).
fwiw, Chrome uses a linear resampler in their WebAudio code.

We are using speex's resampler, and it seem to do the job. I'll let you know if we experience problems, but I agree having a nice resampler would be cool.
(In reply to Jean-Marc Valin (:jmspeex) from comment #2)
> I'm not exactly sure what you mean
> by "doesn't guarantee how many output samples it produces", but I'm sure
> there's something that can be done.

Basically, every time we put N samples in, we want to get N samples out.
(In reply to Paul Adenot (:padenot) from comment #1)
> Linear resampling is not great, but we need a resampler that can have a
> predictable number of output samples to go forward. 

Actually now that I think about it I don't know what the problem is.
(In reply to comment #5)
> (In reply to Paul Adenot (:padenot) from comment #1)
> > Linear resampling is not great, but we need a resampler that can have a
> > predictable number of output samples to go forward. 
> 
> Actually now that I think about it I don't know what the problem is.

I suspect that the problem is that if the source and destination sampling rate is not an exact ratio and you're not keeping track of the non-integer fraction of the sample manually, just asking for N samples sometimes gives you back N-1.  Paul told me today that he has found some magic rounding tricks which seem to make the speex resampler return N frames always.

In an ideal world, an API that I would have liked to have woud be a stream resampler which gives back N resampled frames every time it's called (except for the last time, of course) and manages the state of the left-over fraction internally, but I'm not sure if speex does that (or if it does, using it as simply as I have in the patch that I have attached here will not work.)
The Speex resampler is close to that ideal behaviour you're describing. It'll buffer the "fractional samples", but it will not buffer multiple samples internally to avoid having to dynamically allocate memory. When you give it N input samples and ask for M output samples, it will convert as much as it can and then it may tell you that it either did not use all input samples or did not create all output samples. In this case, it seems like you simply want to give it more than N samples as input to make sure it can always produce M output samples.
(In reply to comment #7)
> The Speex resampler is close to that ideal behaviour you're describing. It'll
> buffer the "fractional samples", but it will not buffer multiple samples
> internally to avoid having to dynamically allocate memory. When you give it N
> input samples and ask for M output samples, it will convert as much as it can
> and then it may tell you that it either did not use all input samples or did
> not create all output samples. In this case, it seems like you simply want to
> give it more than N samples as input to make sure it can always produce M
> output samples.

Hmm, should that be done by first calling it with N and M and if it returns M-1, call it with N+1 one more time?  Or is there a better way to predict how many samples will be returned without actually resampling the buffer twice?
Right now, I just report that it consumed less data that asked for, and recall it with the remaining data we want to resample, until we have the amount of data we need for this |produceAudioBlock| call.

That is to say, sometimes, we are short by 1 sample on the output, and have to call the resampler again (because we resample, say, from 44100Hz to 88200Hz), giving it one frame in input (ceil(number of output frames needed * input rate / output rate)), to get one frame on the output. I can't hear audio glitches, for the reason Jean-Marc explains in 7, I believe.
What Paul describes works. One simpler way is to always call it with a large enough N that it's guaranteed to return M output samples. Then if the resampler says it read N2 samples, you save the last N-N2 samples for the next frame you need to resample. The number of extra samples is the max ratio you need to handle. For Example, if need to resample 88.2 to 48, then you need at least two extra samples on the input.
So, I'm having this working just fine (with the nice speex resampler), plus the playbackRate property of the AudioBufferSourceNode, plus the doppler effect for the panner node, but somehow it's all in one patch. I'm going to split it in three patches tomorrow.
This basically is ehsan's patch, with the following changes:
- The function |CopyFromInputBufferWithResampling| has been modified a lot to
prepare for the playbackRate member. I've added comments, it is probably enough
to understand what is going on. (Note that everywhere we are doing computations
with the samplerate, there will be a multiplication by the playback rate, that
explains the bizarre code style).
- I added a |mPosition| member, which keeps track of the position from where we
should read the input buffer to resample it, so we don't have to use the time of
the time, that is expressed in resampled frames, and would need an adjustment.
- The parameters we send to the engine (mLoopEnd, mDuration, etc.) are now
passed before adjusted by the resampling. This allows to have much clearer code
in |CopyFromInputBufferWithResampling|, and of course does not affect the rest
of the code.

With this code, the resampling is smooth, no dropped frames.
Attachment #731128 - Flags: review?(roc)
Attachment #731128 - Flags: review?(ehsan)
Comment on attachment 731128 [details] [diff] [review]
Don't resample on the main thread. r=

Review of attachment 731128 [details] [diff] [review]:
-----------------------------------------------------------------

This mostly looks good.

I am a bit concerned that doing resampling this way means if a game has stored sounds at a non-ideal sample rate and plays them a lot we'll do a lot of resampling work. This sounds like it could be a common case. I think we should follow up with some plan for caching resampled buffers for at least some cases.

::: content/media/webaudio/AudioBuffer.cpp
@@ -165,5 @@
>  {
> -  if (mResampledChannels && mResampledChannelsRate == aRate) {
> -    // return cached data
> -    *aLength = mResampledChannelsLength;
> -    return mResampledChannels;

We can remove mResampledChannelsLength and mResampledChannels from AudioBuffer I think

::: content/media/webaudio/AudioBufferSourceNode.cpp
@@ +156,5 @@
> +      float* outputData =
> +        static_cast<float*>(const_cast<void*>(aOutput->mChannelData[i])) +
> +        aBufferOffset;
> +
> +      speex_resampler_process_float(Resampler(aChannels), i,

Hoist this call to Resampler() outside the channel loop.

@@ +272,5 @@
>        TrackTicks t = currentPosition - mStart;
>        if (mLoop) {
> +        uint32_t loopDuration = (mLoopEnd - mLoopStart);
> +        uint32_t offsetInLoop = (mOffset + t) % loopDuration;
> +        CopyFromBuffer(aOutput, channels, &written, &currentPosition, mLoopStart + offsetInLoop, mLoopEnd);

I don't believe this change is correct when mOffset + t < mLoopStart.

@@ +299,5 @@
>    bool mLoop;
>    int32_t mLoopStart;
>    int32_t mLoopEnd;
> +  int32_t mSampleRate;
> +  SpeexResamplerState* mResampler;

Move this pointer up next to mBuffer. And move mLoop down to the end for better field packing.

@@ +312,5 @@
>    , mStartCalled(false)
>  {
>    SetProduceOwnOutput(true);
> +  mStream = aContext->Graph()->CreateAudioNodeStream(
> +      new AudioBufferSourceNodeEngine(aContext->GetRate()),

What's the point of setting the rate here? It's going to be overridden in the Start() call before we could ever use it.
Ideal indeed is sinc, but linear in most cases is fine. Resampling should be able to target floating point sample rates, something I noticed the original moz audio api doesn't allow (Converts to long int).
A linear resampler (upsample is linear, down is a custom that doesn't fail like linear when down factor > 2) done in JS a long time ago, and does fractional resampling to full frame right (Carries over the fractional part from the last resampling iteration to the next): https://github.com/grantgalitz/XAudioJS/blob/master/resampler.js
Linear is not fine. If you need cheaper resampling, it's possible to use a lower quality setting with the Speex resampler -- even quality 0 is *much* better than linear resampling. As for floating point sample rates, I'm not sure I see the application for this. The main reason that float rates are not supported in the Speex resampler is that I wanted to make sure that every machine would end up with *exactly* the same number of samples. With a float ratio, you could have two different machines resampling the same file and ending up with a different number of samples because of rounding. That being said, it's possible to set rates as fractions of Hz with the current API, so I don't think it's an issue. Oh and BTW, if there are some conversions that are very common (e.g. 44.1 kHz to 48 kHz), it would be possible to write a specially optimized resampler just for that.
One of the issues I have with a resampler that actively filters is that it can cause the waveform amplitude to go out of range. Energies built up can push a wave already at the max/min can go out of bounds. I usually see a lot of resampler actively reduce volume to mitigate this.
I agree about linear resampling being improper to use, there are better algorithms out there. two in=>one out will fail to tap all samples in the stream if the downsample factor gets too large. That is a problem with it, even ignoring the issue with frequency energies above nyquist causing issues with it. For one thing, white noise (mHZ gen rate) will sound wrong when heavily downsampled with linear (as frequency of white noise approaches infinity, amplitude should get softer, this is not the case with linear).
Linear resampling is bad for down-sampling, but it's actually *terrible* for up-sampling because it adds content high frequencies where there is nothing to mask them. About the waveform amplitude issue, it's overall a minor problem (it's much better to clip once in a while than add HF aliasing all the time), but I actually have code that does soft-clipping in such a way that the waveform amplitude issues you mention become inaudible.
Indeed it adds high frequency content, not going to ignore that triangular wave pattern that results as we approach high upsampling.
It personally never bothered me, but you're right, that high frequency noise does indeed get generated in linear. I originally used the downsampler code in js for the upsample portion as well, but that resulted in square waves rather than triangular for high upsampling. At some point I need to drop in another that produces clean sine for the upsampling.
jean: The reason I avoided filtering that would cause the clipping was due to me maxing out amplitude in the js gameboy emulator a long time ago. The audio is essentially all sharp square waves, so it would be constantly going past min/max with minor filtering. This is probably the worst-case scenario for the clipping issue with filtering.
But in any case, the js gameboy emulator uses downsampling and not upsampling, and sources at 4194304 hertz, and white noise is generated at half that, which explains the custom downsampler.
Comment on attachment 731128 [details] [diff] [review]
Don't resample on the main thread. r=

Review of attachment 731128 [details] [diff] [review]:
-----------------------------------------------------------------

Minusing becuase of the mPosition and loop handling comments.

::: content/media/webaudio/AudioBufferSourceNode.cpp
@@ +24,5 @@
>  
>  class AudioBufferSourceNodeEngine : public AudioNodeEngine
>  {
>  public:
> +  AudioBufferSourceNodeEngine(uint32_t aInputRate) :

Nit: please make this explicit.

@@ +54,5 @@
>    };
>    virtual void SetStreamTimeParameter(uint32_t aIndex, TrackTicks aParam)
>    {
>      switch (aIndex) {
> +    case START: mStart = aParam; mPosition = aParam; break;

If mPosition is trying to track our current position, you should not change it here or in the LOOPSTART case below.  If not, then I have totally missed its purpose.

@@ +68,4 @@
>      case OFFSET: mOffset = aParam; break;
>      case DURATION: mDuration = aParam; break;
>      case LOOP: mLoop = !!aParam; break;
> +    case LOOPSTART: mLoopStart = aParam; mPosition = aParam; break;

No matter what mPosition is, it should not be modified here for sure, since LOOPSTART just encodes where we should start looping once the first round through the buffer has been played back.  The spec semantics here is sort of weird, please review it to make sure you're on the same page as I am.  :-)

@@ +86,5 @@
> +                                        IdealAudioRate(),
> +                                        SPEEX_RESAMPLER_QUALITY_DEFAULT,
> +                                        nullptr);
> +    }
> +    return mResampler;

Please add a comment saying that it's OK to ignore aChannels here since it will never change for an AudioBufferSourceNode.

@@ +132,5 @@
> +                                         uint32_t aMaxNumFrames,
> +                                         uint32_t& aFramesRead,
> +                                         uint32_t& aFramesWritten) {
> +    // Compute the sample rate we want to resample to.
> +    double finalSamplerate = mSampleRate;

Nit: finalSampleRate.

@@ +231,5 @@
> +        mPosition += numFrames;
> +      } else {
> +        uint32_t framesRead, framesWritten, neededFrames;
> +
> +        neededFrames = aBufferMax - aBufferOffset;

Hmm, why is this correct?  I think you should just use |numFrames| here.

@@ +234,5 @@
> +
> +        neededFrames = aBufferMax - aBufferOffset;
> +
> +        CopyFromInputBufferWithResampling(aOutput, aChannels, aBufferOffset, *aOffsetWithinBlock, neededFrames, framesRead, framesWritten);
> +        *aOffsetWithinBlock += numFrames;

Shouldn't you drop this?

@@ +257,5 @@
>        return;
>      }
>  
>      uint32_t written = 0;
> +    TrackTicks currentPosition = mPosition;

This doesn't seem right if the track is started at a non-zero position.  You will probably do the wrong thing before playback should start (see the |currentPosition < mStart| case below.)

I _think_ the right way to track mPosition is to set it to aStream->GetCurrentPosition() before the time when we start playback, then stop paying attention to GetCurrentPosition() and increment it by the rules you have below, dropping the START/LOOPSTART handling you have above.  Let me know if that would make sense...

@@ +272,5 @@
>        TrackTicks t = currentPosition - mStart;
>        if (mLoop) {
> +        uint32_t loopDuration = (mLoopEnd - mLoopStart);
> +        uint32_t offsetInLoop = (mOffset + t) % loopDuration;
> +        CopyFromBuffer(aOutput, channels, &written, &currentPosition, mLoopStart + offsetInLoop, mLoopEnd);

Yeah, you need to handle that case separately.

@@ +299,5 @@
>    bool mLoop;
>    int32_t mLoopStart;
>    int32_t mLoopEnd;
> +  int32_t mSampleRate;
> +  SpeexResamplerState* mResampler;

And beware of fatal compiler warnings about out of order member initialization when doing so.  :-)
Attachment #731128 - Flags: review?(ehsan) → review-
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #13)
> I am a bit concerned that doing resampling this way means if a game has
> stored sounds at a non-ideal sample rate and plays them a lot we'll do a lot
> of resampling work. This sounds like it could be a common case. I think we
> should follow up with some plan for caching resampled buffers for at least
> some cases.

Agreed.  One complication is that it is the AudioBufferSourceNode not the AudioBuffer which should determine the caching "key", since the result of the resampling will be affected by params on the source node (such as looping and start/stop positions.)  I can't think of an easy and obvious way to cache those complicated cases.  One possibly easier approach could be adjusting the resampling quality if we detect that resampling is causing us to glitch, but we need to do some careful measurements first to see what the cost is, once this patch lands.  I can do that.
Comment on attachment 731128 [details] [diff] [review]
Don't resample on the main thread. r=

Review of attachment 731128 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/webaudio/AudioBufferSourceNode.cpp
@@ +86,5 @@
> +                                        IdealAudioRate(),
> +                                        SPEEX_RESAMPLER_QUALITY_DEFAULT,
> +                                        nullptr);
> +    }
> +    return mResampler;

Actually, that's not true, the channel count can change by setting the channelCount attribute on AudioNode.  So I guess you want to remember which channel count was used the last time and recreate the resampler if the channel count changes.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #25)
> Agreed.  One complication is that it is the AudioBufferSourceNode not the
> AudioBuffer which should determine the caching "key", since the result of
> the resampling will be affected by params on the source node (such as
> looping and start/stop positions.)  I can't think of an easy and obvious way
> to cache those complicated cases.

I would just give up on those cases. I expect the by-far-most-common case is that there is no looping and no special start or stop positions.
> @@ +231,5 @@
> > +        mPosition += numFrames;
> > +      } else {
> > +        uint32_t framesRead, framesWritten, neededFrames;
> > +
> > +        neededFrames = aBufferMax - aBufferOffset;
> 
> Hmm, why is this correct?  I think you should just use |numFrames| here.

|numFrames| is always <= 128, because |WEBAUDIO_BLOCK_SIZE - *aOffsetWithinBlock| is always <= 128.

Because we might be downsampling (either to resample, or to apply the |playbackRate| or the doppler shift), we might need to pull in more that 128 frames in input to produce exactly the 128 needed frames for the output. |neededFrames|, being the number of frames left in the input buffer, allows |CopyFromInputBufferWithResampling| to determine if we have enough input material to produce the number of frames we need. If we don't, then either we are done with this AudioBufferSourceNode, or |CopyFromInputBufferWithResampling| will be called again because we are looping.

Let me know if this makes sense, or if you have a better way of doing this.
New version with all the comments addressed.
Attachment #733767 - Flags: review?(roc)
Attachment #733767 - Flags: review?(ehsan)
Attachment #731128 - Attachment is obsolete: true
Attachment #731128 - Flags: review?(roc)
Attachment #726445 - Attachment is obsolete: true
Comment on attachment 733767 [details] [diff] [review]
Don't resample on the main thread. r=

Review of attachment 733767 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!

::: content/media/webaudio/AudioBuffer.h
@@ +117,1 @@
>    uint32_t mResampledChannelsRate;

Please remove this too, as it's currently unused.

::: content/media/webaudio/AudioBufferSourceNode.cpp
@@ +24,5 @@
>  
>  class AudioBufferSourceNodeEngine : public AudioNodeEngine
>  {
>  public:
> +  explicit AudioBufferSourceNodeEngine() :

Ah sorry, making this explicit would only make sense if it accepted one argument.  Please remove the explicit keyword now.
Attachment #733767 - Flags: review?(ehsan) → review+
This new version fixes the loopstart/loopend calculation, where I forgot to
convert from seconds to ticks.

This has been tested using
http://people.mozilla.org/~eakhgari/webaudio/loop.html, and I hear similar
output when comparing chrome, nightly (without off main thread resampling), and
a build with this patch.
Attachment #734758 - Flags: review?(ehsan)
Attachment #733767 - Attachment is obsolete: true
Attachment #734758 - Flags: review?(ehsan) → review+
Backed out because of mochitest-1 crashes: https://hg.mozilla.org/integration/mozilla-inbound/rev/7a7d890a2253
https://hg.mozilla.org/mozilla-central/rev/ee683fed039d
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Depends on: 870174
Mass moving Web Audio bugs to the Web Audio component.  Filter on duckityduck.
Component: Video/Audio → Web Audio
Depends on: 937475
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: