Closed Bug 1427267 Opened 7 years ago Closed 5 years ago

Audio skips if <video>.playbackRate > 2

Categories

(Core :: Audio/Video: Playback, defect, P3)

64 Branch
x86_64
All
defect

Tracking

()

RESOLVED FIXED
mozilla73
Tracking Status
relnote-firefox --- 73+
firefox73 --- fixed

People

(Reporter: s.bugzilla.mozilla, Assigned: ke5trel)

References

Details

Attachments

(10 files)

Steps to reproduce:

0. (fresh profile)
1. Navigate to any page with a <video> element on it

- Preferably with human speech, since it's much more noticeable that way
- Suggested links: https://vimeo.com/64654583, https://www.youtube.com/watch?v=JWD1Fpdd4Pc, https://www.youtube.com/watch?v=OyfBQmvr2Hc

2. Open the JS Console
3. Run `document.querySelector('video').playbackRate = 3.5` (any number > 2 works, but higher speeds are much more noticeable)
4. Listen to the audio (there's nothing wrong with the video)

Expected results: Smooth audio playback with no audible stuttering, skipping, or artifacts

Observed results: Audio with skipping and stuttering, especially noticeable when there's human speech

Notes:

- Playback works just fine on WebKit and its derivatives (Opera, Chrome, Safari, Electron-based apps)
- Playback is also just fine on VLC
- Reproduced on: my 2014 MacBook Pro, a 2013 Mac Mini, and a 1st-gen MacBook, all running macOS High Sierra 10.13.2
Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core
I can't repro the issue.

I suspect you network connection is not fast enough so playback can consume data without underflow. Can you try to download the file and play it on the local storage? Thanks!
Flags: needinfo?(s.bugzilla.mozilla)
Playing a video from the local filesystem has the same effect. Networking is not the issue since video playback works perfectly fine in WebKit-based browsers. 

For example, compare the playback of https://www.youtube.com/watch?v=K0Tsa3smr1w on Firefox at 3.5 speed v the same on Safari: audio on Safari is much smoother than FF, especially since FF skips audio every few seconds. Human speech is practically undescipherable on FF, but is smooth on Safari.
Can you launch firefox from the console by doing:
MOZ_LOG=MediaDecoder:4,AudioStream:3 path/to/your/firefox "https://www.youtube.com/watch?v=K0Tsa3smr1w"

and submit the logs?

Thanks!
Attached file fflog.log
Log from `MOZ_LOG=MediaDecoder:4,AudioStream:3 /Applications/Firefox.app/Contents/MacOS/firefox "https://www.youtube.com/watch?v=K0Tsa3smr1w"`
Bryce, do you have any interest in the playback rate stuff?
Flags: needinfo?(bvandyk)
Priority: -- → P3
:rillian, I've taken a look and can't repro thus far. I'm going to hold the needinfo while I wait for some new hardware to arrive and then I'll be able to test this on OSX and see if I have any more luck.
I've been unable to repro this on OSX and a couple of other systems I've tested.

Isaac, are you still seeing issue? If so, could you please gather a log using MOZ_LOG=MediaFormatReader:5?
Flags: needinfo?(bvandyk)
Attached file newlog.log.gz
Log from `MOZ_LOG=MediaFormatReader:5 /Applications/Firefox\ Developer\ Edition.app/Contents/MacOS/firefox-bin`

I changed the playback rate about 5-10 secs into the video playing
Flags: needinfo?(s.bugzilla.mozilla)
Attached audio firefox.mp3
A loopback recording of https://www.youtube.com/watch?v=UzsnzeaOD84 played at 4x speed from firefox. I have the raw wav file if needed.
Attached audio opera.mp3
A loopback recording of https://www.youtube.com/watch?v=UzsnzeaOD84 played at 4x speed from Opera. I have the raw wav file if needed for further analysis.
If you play the recording from firefox, you can see that the words are mangled, compared to that from Opera. You can clearly tell that some audio segments are completely skipped over if you slow down the playback. The audio from Opera isn't that different, from what I can hear, to that of Chrome's. I haven't tested it on Safari.

I can upload wav files for analysis if you need them.
I can reproduce this bug on version 61.0b6, currently the latest release for Developer Edition.
This is the #1 reason I don't use Firefox for YouTube. Chrome plays back audio smoothly at 4x. Firefox sounds like it's dropping samples after 2x. This might not be problematic for music, but extremely problematic for speech.

I started digging into the Firefox code for playback adjustments. It looks like Firefox uses the SoundTouch library: https://dxr.mozilla.org/mozilla-central/source/media/libsoundtouch/src/RateTransposer.h. I'm not sure if this bug exists upstream.
I did a visual comparison of the two samples in Audacity and while they look similar there are occasionally dropped words which are detectable in playback. For example 8 seconds into the Firefox sample (27 seconds in original) is the line "So for people who don't know" and the word "for" is missing. 

The intro tune at 5 seconds has a wavy max amplitude when it should be relatively flat and have a faster beat.

I recorded my own sample on Nightly 64 Ubuntu 18.04 and was able to reproduce the issue.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Mac OS X → All
Version: 57 Branch → 64 Branch
Attached image missing-word.png
Attached image wavy-intro-tune.png
See Also: → 1383363

I believe this is due to Firefox playing ~70ms samples of audio at original speed, and skipping between them. In my experience with a similar effect in mplayer (https://mplayerhq.hu/), blending samples of about 15ms produces quality comparable to Webkit (Chrome, Opera).

To test this yourself in Firefox and mplayer:

video: https://www.youtube.com/watch?v=K0Tsa3smr1w
"DASH Audio" stream URL for same video (retrieved with youtube-dl -j K0Tsa3smr1w)

mplayer -speed 2 -af scaletempo=stride=70 <media_file_or_url> # present Firefox quality

mplayer -speed 2 -af scaletempo=stride=15:overlap=1 <media_file_or_url> # better quality
Note that using ] to speed up greater than 3x, the words are still intelligible in the latter method. BTW: "overlap=1" causes the samples to be blended together across their entire duration.

(In reply to Bryce Seager van Dyk (:bryce) from comment #7)

I've been unable to repro this on OSX and a couple of other systems I've
tested.

Isaac, are you still seeing issue? If so, could you please gather a log
using MOZ_LOG=MediaFormatReader:5?

What are the prospects of this bug getting fixed in the near future? Do you need anything to reproduce the bug? (I personally have noticed the bug for months (ever since I wanted to switch back to firefox) across devices and platforms; so reproducing it should be easy I guess?)

Paul, do you have any thoughts on the difficulty of a fix here and if there are any related audio bugs we should be aware of/link to this bug?

Flags: needinfo?(padenot)

It would be 1383363, and it's not too hard for someone that knows how to write DSP code, but needs to be prioritized.

Flags: needinfo?(padenot)

I also just want to clarify: I notice this at 1.5 and 1.75 and 2x speeds as well, it's just more subtle - definitely still there though. Since the title here is talking about the issue at higher speeds, just want to make sure that it's not ignored for speeds <2x.

I used MOZ_DUMP_AUDIO=1 to capture samples from 1x vs 3x. The waveforms at 3x are identical to 1x for about 45ms. Then it skips and copies another 45ms.

In this screenshot, the first track is Chrome, 2nd track is Firefox at 3x, and 3rd track is Firefox at 1x.

Top is 3x. Bottom is 1x. Estimate of the amount of audio data that is skipped before waveforms align.

I started digging into the SoundTouch library and confirmed my suspicion. https://hg.mozilla.org/mozilla-central/file/tip/media/libsoundtouch/src/TDStretch.cpp#l637

Author says it's implemented using "WSOLA-like method". Waveform-similarity-based synchronized overlap-add. Maybe based on this paper from 1993? https://www.semanticscholar.org/paper/An-overlap-add-technique-based-on-waveform-(WSOLA)-Verhelst-Roelands/d94abd77e52a56c425e4b86e6c7d692583ea406d

Perhaps it doesn't scale well past 2x.

(In reply to alex.korchemniy from comment #23)

The waveforms at 3x are identical to 1x for about 45ms. Then it skips and copies another 45ms.

This is because the default sequence size at 2x speed is 40ms with some variation due to 15ms seek window and 8ms overlap. I made a build with a 10ms sequence size that produces similar results to Chrome up to 4x speed, 1ms produces slighter better results but the difference is subtle. According to documentation, a smaller sequence is more computationally expensive but I have been unable to detect any significant difference myself. I think 10ms is a good middle ground, below which there are diminishing returns. The seek window and overlap can be left alone as these values help smooth out irregularities quite well.

Example video used in comparison: https://www.youtube.com/watch?v=fDek6cYijxI

Depends on: 1588233
Assignee: nobody → ke5trel
Status: NEW → ASSIGNED

Thanks. The patch is a significant improvement!

At 4x with 10ms window, it's easy to comprehend speech. There is some slightly noticeable choppiness in some videos. I'm not an audio expert, but I tried to do some analysis using Audacity + Gimp. I plotted both track as Mel Spectrograms, took a screenshot, aligned the two graphs in Gimp, set the layer mode to difference. There are some noticeable periodic artifacts. Perhaps related to the "sliding" part of the algorithm.

I'll try 1ms later today.

With the SoundTouch algorithm there isn't much improvement with a window size below 10ms. I found a quicker way to test various parameters using mpv, which uses same/similar algo. When I get some time I'll generate random samples with various parameters and do a blind perceptual test.

mpv --af=scaletempo=stride=8:overlap=1:search=10 --speed=4 test.mp3

  • Overlap in this case is a percentage rather than ms.
Pushed by padenot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f21621632fac
Improve libsoundtouch high playback rate speech clarity with pitch preservation by using a smaller time stretcher sequence size r=padenot
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla73

Not sure whether this is considered inappropriate here, however I just wanted to say thank you tackling this issue.

The latest Firefox nightly sounds nearly identical to Chrome for me now. So this is a huge improvement! I'm not sure whether this would be worth the effort, however I might see use cases (especially for power users) to configure the SoundTouch without recompiling Firefox.

So thanks again for making this happen <3. This has been a show stopper for me to switch from Chrome to Firefox for the past years and now I can finally do the switch. I really grateful for this.

FWICT there seem to be many people on the internet who did/could not switch to Firefox because of this bug (OTOH this might be selective perception). Anyway IMHO it might be a good idea communicate this change broadly and not bury it somewhere deep in the changelog. This is a major feature for some people and might be the biggest improvement in Firefox 73!

Release Note Request (optional, but appreciated)
[Why is this notable]: Vastly improved audio quality when changing the audio playback rate during media playback. This was requested for years, and brings Firefox closer perceptually to other browsers
[Affects Firefox for Android]: yes
[Suggested wording]: Improved audio quality when playing back audio at a faster or slower speed
[Links (documentation, blog post, etc)]:

relnote-firefox: --- → ?

Added to the Beta73 relnotes, thanks for flagging this.

QA Whiteboard: [qa-73b-p2]

Firefox is an essential project. I am so god damn happy that this has been fixed. omg. a basic feature built into youtube/audible, any major media distributor has built in speed controls expecting functional browser implementation. especially in the era of chromium based browser monopoly this is important. (I don't trust chromium even though it is open source) I listen to audio at like 3.5-5.5 speeds with closed backed studio tuned headphones and any improvement can be noticeable. I can't get past 2.0-3.0 with laptop speakers on chrome/firefox 73 or with pre firefox 73 and headphones. but now, YEET Chrome, google. I moved my bookmarks over and deleted most of my chrome data already this here is my last action on chrome before uninstall.

I am wondering as an aside, could i get some code usable in the console to mess with more precise window and stuff to experiment at very high speeds. (If you have spare time, I can figure it out after some finagling) I am very satisfied with beta Firefox 73 audio at high speeds, but i wanna experiment. I consume a lot media always involving speech. In any case thank you, thank you, thank you guys ad infinitum for your very important work here and in general.

Clearly something that we can do yes. I'm going to think about it today. Thanks for your kind message!

Flags: needinfo?(padenot)
See Also: → 1618603
See Also: → 1624026
See Also: → 1344756
Regressions: 1677881
Flags: needinfo?(padenot)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: