Closed Bug 913854 Opened 6 years ago Closed 6 years ago

AudioBufferSourceNodes scheduled in a sequence aren't seamless when buffer sample rate doesn't match context


(Core :: Web Audio, defect, P1)

26 Branch
Windows 7



Tracking Status
firefox30 --- verified


(Reporter: jujjyl, Assigned: karlt)




(5 files, 1 obsolete file)

User Agent: Opera/9.80 (Windows NT 6.1; WOW64) Presto/2.12.388 Version/12.16

Steps to reproduce:

Press 'Start Audio' on this page:

Actual results:

Web Audio API playback produces a constant snapping sound along the output tone.

Expected results:

The playback should be error-free, like in the Youtube video linked in the page.
Tested on Firefox Nightly 26.0a1 (2013-08-21).
Component: Untriaged → Web Audio
Product: Firefox → Core
Confirmed 27.0a1 (2013-10-06)
Blocks: webaudio
Ever confirmed: true
Closed: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 848954
48 kHz plays fine with large enough sample frame size, which is how far this testcase buffers in advance.

The suggestion in Bug 886381 comment 2 would improve things for other frame rates.
Note that bug 886653 has changed things a bit from what was described in that comment.
Blocks: 886381
Resolution: DUPLICATE → ---
Summary: Audio synthesis produces glitches with Web Audio. → AudioBufferSourceNodes scheduled in a sequence aren't seamless when buffer sample rate doesn't match context
Duplicate of this bug: 883844
Keeping this separate from Bug 886381 because the examples there are using requestAnimationFrame.
If you add a printf statement here <> to print underrunFrames, do you see underruns happening in this test case?
There are occasionally underruns (on Linux debug build), but very much less frequently than the snaps between buffers.
Depends on: 937057
Changes in bug 937057 will hopefully mean that buffers chopped at zero crossings should be close to seamless.

This patch won't apply without some work because this was on top of a patch for bug 937057 which has now changed.

With some tweaks to (real) time to (integer) sample number rounding, this sometimes played seamlessly for buffers chopped at non-zero points when the buffer sample rate was half the context sample rate, suggesting that with some more careful rounding we could behave similarly to chromium.  I'm not sure that is a valuable goal though.

For buffer sample rates that are not an integer fraction of the context sample rate, things are more complex and chromium doesn't produce seamless results.

A fundamental problem is that the buffer start time is rounded to the nearest sample and the duration is then also rounded to the nearest sample.  The end point is therefore rounded twice, which is inconsistent with the start time of the next buffer.  That means that buffers can overlap by one sample or have a missing sample between them.

To deal with that perfectly in the implementation, we'd need to be able to start play back at an intersample time, and the speex resampler doesn't bother to provide such an option.  Perhaps it would be easier to make things sound better if buffers overlap a little, each having extra samples at the end, and AudioBufferSourceNode.stop(when) is used to stop the buffer playback exactly when the next buffer starts.  We may also need to prefill the resampler, as in this patch, so as not to interpolate from zero when starting each new buffer.  That may work well enough without needing extra samples at the beginning for interpolation and an offset in the start() call.

However AudioBufferSourceNode.stop(when) is not precise when resampling and that is a bug we should fix.
Priority: -- → P2
Jukka, if this is important then please speak up and we can adjust priorities.

We can add a method to the Speex resampler to set samp_frac_num for a specific
channel index, so that the start time can be used to ensure that sub-sample
start times of buffer playback align with the ends of the previous buffers.
This requires passing the double precision start time through to the audio

The Speex resampler is a linear function, so adding the appropriately aligned
resampled output of subsequent buffers can give seamless results.  We need to
ensure that the resampled output of the buffers overlaps sufficiently to cover
the effects of the resampler's filter.  We would no longer call
speex_resampler_skip_zeros, which actually chops off some initial non-zeros in
the output, but instead start the resampler early using
speex_resampler_get_input_latency and setting samp_frac_num appropriately.
We also need to feed 2 * input_latency - 1 samples into the resampler at the
end to extract all the trailing effects of the filter.
This is still a problem, more so after I understood that the audio context sampling frequency was not hardcoded in both Chrome and Firefox to 48kHz, but can change between systems, which means that it's not possible to author audio data in that one known working sampling frequency, but if the user happens to get bad sampling frequency, then he'll get snapping audio.

To recap: In Emscripten streamed audio synthesis is used in the SDL and OpenAL backends. OpenAL allows playing back variable-length clips as indidivual audio files (and also allows streaming), but the whole SDL API is based on streaming fixed-length audio data, i.e. SDL soft-mixes its own audio data on the CPU. This issue therefore hits any port of C/C++ projects that use SDL as their audio backend, or projects that use OpenAL and stream audio through that. OpenAL projects playing single audio clips don't give snaps. 

The two Emscripten projects I know of in the wild that use SDL are HTML5ScummVM: and JSMESS: . Both exhibit audio issues due to this bug. The JSMESS people are probably the ones currently battling hardest through this issue, to the point that they expressed their wish that Mozilla Audio Data API were still supported, since they report it is working much better than Web Audio API.

The above comment 10 makes it sound like the snapping issue could be resolved by smarter logic in internal resampler? (i.e "just" an internal code change) I think the big worry was whether that really is the case, or if the fix needs to go all the way to the Web Audio spec to make sense.

So the P1 question here is probably that - does something need to be changed in the Web Audio spec itself to make sure that this streaming audio use case is covered?
I should be able to make this work without a spec change.
Assignee: nobody → karlt
Priority: P2 → P1
(In reply to comment #12)
> I should be able to make this work without a spec change.

That would be awesome!!
Depends on: 972678
This allows a client to align output samples consistently for independent
resampling of contiguous input buffers.

A method to explicitly set the fractional offset makes it clearer that this
doesn't affect sub-sample alignment at any precision tighter than provided by
ratio_den (or ratio_num, if considering output samples), and it suits my needs nicely.  If this is depending too much on the resampler implementation, then perhaps I could make the method take a floating point argument in [0,1], or in [0, 1/in_rate], with the resampler selecting the nearest subsample.
Attachment #8377308 - Flags: review?(jmvalin)
The resampling filter means that the buffer influences a greater number of
samples than indicated by just its length.  Including the full influence of
the linear filter means that adjacent buffers aligned appropriately will
behave as if they were one extended buffer.

The buffers are not yet aligned more carefully than track ticks, so buffers
play back seamlessly only if their sample rates and lengths are such that
their duration is an integer number of track ticks.

Knowing how far the filter extends before the start time requires
initializing the resampler before buffer processing.

The patch also includes the input latency in the first resampler input
buffer sample count estimate to reduce the number to calls required
to start the resampler.

See documentation of speex_resampler_set_skip_frac_num in attachment 8377308 [details] [diff] [review] to explain some of the math.
Attachment #8377309 - Flags: review?(paul)
The subsample alignment of resampled buffers provides seamless playback even
when buffer durations are not an integer number of track ticks.
Attachment #8377310 - Flags: review?(paul)
Restoring some previous behavior:
Even simple values of playbackRate may be -ve, and so need sanitizing.
Attachment #8377309 - Attachment is obsolete: true
Attachment #8377309 - Flags: review?(paul)
Attachment #8377311 - Flags: review?(paul)
Attachment #8377310 - Flags: review?(paul) → review+
Attachment #8377311 - Flags: review?(paul) → review+
Comment on attachment 8377308 [details] [diff] [review]
add speex_resampler_set_skip_frac_num

Review of attachment 8377308 [details] [diff] [review]:

Attachment #8377308 - Flags: review?(jmvalin) → review+
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
The initial problem is gone, but:
Press Start, Stop, then Start again - I hear a short snapping sound.
Thoughts ?
Flags: needinfo?(karlt)
(In reply to Paul Silaghi, QA [:pauly] from comment #21)
> The initial problem is gone, but:
> Press Start, Stop, then Start again - I hear a short snapping sound.
> Thoughts ?

Yes, thanks.  A snapping sound is expected on stop (because the testcase cuts the sound abruptly.  A snapping sound is also expected on subsequent starts because the phase does not begin at zero.  There is also an expected glitch one block (fraction of a second) into the start of the signal because the first block runs a little out of sync with the other blocks.
Flags: needinfo?(karlt)
Verified fixed on 30.0a1 (2014-03-03) Win 7 x64 based on comment 22.
Depends on: 1020370
Depends on: 1215096
Attached patch mochitestSplinter Review
Attachment #8674598 - Flags: review?(padenot)
Attachment #8674598 - Flags: review?(padenot) → review+
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.