Linux <audio> broken, corrupted (and <video> b/c of audio) regression

RESOLVED FIXED in Firefox 15

Status

()

Core
Audio/Video
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: scientes, Assigned: kinetik)

Tracking

({regression})

Trunk
mozilla17
x86_64
Linux
regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox15+ verified, firefox16+ verified)

Details

(URL)

Attachments

(6 attachments, 1 obsolete attachment)

(Reporter)

Description

5 years ago
video goes fast/slow, sound corrupted. Images in video good.

pushlog:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=64187d60fae7&tochange=36e938e51481

likely culprit: 
Bug 756944 - Import ALSA cubeb backend. r=doublec
https://hg.mozilla.org/mozilla-central/rev/4fd1d6d84b07
(Reporter)

Updated

5 years ago
Component: Untriaged → Video/Audio
Product: Firefox → Core
(Reporter)

Updated

5 years ago
Blocks: 756944
Depends on: 382267
Keywords: regression
(Assignee)

Comment 1

5 years ago
Can you please attach the output from alsa-info.sh (found here: http://git.alsa-project.org/?p=alsa-driver.git;a=blob_plain;f=utils/alsa-info.sh) to the bug?

Can you confirm that creating and setting the boolean pref "media.use_cubeb" to false works around the problem you're seeing?
(Reporter)

Comment 2

5 years ago
yes, i set that pref to false, and it fixed the problem. If I then set the pref to true, and reset the audio context (stop, then start--pause/play only doesn't do it) It goes back to the corrupted state
(Reporter)

Comment 3

5 years ago
Created attachment 630461 [details]
alsa-info.sh output
(Assignee)

Comment 4

5 years ago
Thanks, I'll see if I can reproduce this in a VM running the same software.
status-firefox15: --- → affected
tracking-firefox15: --- → ?
(Assignee)

Comment 5

5 years ago
In a VirtualBox VM, I can't reproduce this with Ubuntu 12.04 LTS x86.  Installing the x64_64 version to test against now.
(Assignee)

Comment 6

5 years ago
No luck with a fresh or updated x86_64 install, either.
(Reporter)

Comment 7

5 years ago
You could try kvm's emulated intel ac97. Its not HDA audio, which which would be better (closer) however.
Cant confirm any problems with html5 audio/video on 
Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/16.0 Firefox/16.0a1 with either cubeb setting

My alsa-info output http://www.alsa-project.org/db/?f=6b970b27fbd72c1ede6e98aaf4e7d7b1883e088d
(Reporter)

Comment 9

5 years ago
I built this: https://github.com/kinetiknz/cubeb/
against both alsa and pulse backends. And test_tone worked in both cases.
However ever with a recompile with system compiler and libraries, Firefox audio is broken if i turn cubeb audio on.
(Assignee)

Comment 10

5 years ago
Thanks, that's useful to know.  The major difference between test_tone and Firefox is the latency requested when creating the stream.  Would you mind trying the same buffering value in Firefox?  Create the integer pref "media.cubeb_latency_ms" and try setting it to 250.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Assignee)

Comment 11

5 years ago
(In reply to scientes from comment #7)
> You could try kvm's emulated intel ac97. Its not HDA audio, which which
> would be better (closer) however.

I tried this (and the ich6 device, which is HDA) with KVM and a Fedora 17 host, but was unable to reproduce--same result as with VirtualBox and a Windows 7 host.  I checked the VirtualBox config and found it was using ac97; I tried it again with the virtualized HDA device but couldn't get any sound out of the VM at all.
(Reporter)

Comment 12

5 years ago
> Create the integer pref "media.cubeb_latency_ms" and try setting it to 250.

I tried setting this at 250, and between 10 and 1000, and no value made it work, and it is an integer type.
(Assignee)

Comment 13

5 years ago
Thanks for testing that.  Would you mind running PulseAudio in verbose mode, reproducing the playback problem in Firefox, and attaching the resulting PA output?  Instructions here: https://wiki.ubuntu.com/PulseAudio/Log
(Reporter)

Comment 14

5 years ago
I can't reproduce the problem after manually launching pulseaudio

even if started just as "pulseaudio" from bash

....wierd
(Assignee)

Comment 15

5 years ago
That is odd.  When you can reproduce the problem, can you confirm that Firefox is still using PulseAudio?  Open the PA volume control and check the applications tab for "ALSA plug-in [firefox]".

Comment 16

5 years ago
> When you can reproduce the problem, can you confirm that Firefox is still using PulseAudio? 
Yes.

> Open the PA volume control and check the applications tab for "ALSA plug-in [firefox]".
Yes, this is present.
(Reporter)

Comment 17

5 years ago
I just reproduced it and it was not showing up in the pavucontrol, but still made the corrupting sounds. I turned cubeb off, and then it worked, when i turned cubeb back on, it still worked after a reload of the page, disconnecting from pulseaudio and then reconnecting.
Adding qawanted to see if we can reproduce. If not, we'll untrack for release.
tracking-firefox15: ? → +
Keywords: qawanted
(Reporter)

Comment 19

5 years ago
I'm feeling its some sort of race condition at this point, even though initially it was 100%--could be related to the pulseaudio daemon running at nice -11 by default.

Comment 20

5 years ago
In my case pulseaudio daemon running at nice level -6, but I have same issue.
(Assignee)

Comment 21

5 years ago
runetmember@gmail.com, would you mind posting the output of alsa-info.sh (see comment 1 for a link)?

Comment 22

5 years ago
Created attachment 634828 [details]
Another alsa-info.sh report

Sure, attached. Only E-Mu 0204 is in use.
(Reporter)

Comment 23

5 years ago
I'm having the problem no with youtube html5. when i hit play, it goes in and out of the "applications" sound prefs, like its getting kicked off the bus for some reason

If i set media.cubeb_latency_ms to 115(ms) it works
if I then set it to 114(ms) it goes corrupted (above symptoms) again.

Not sure if I was testing with youtube before, but if I was, youtube resets to flash if you do a force-reset, (ctrl-r) but if you turn off flash, while it resets to flash, it will fallback to html5. That might explain the wierd behavior reported above. I then reproduced this with the url in the bug report (<audio> tag)
(Assignee)

Comment 24

5 years ago
Thanks, that is useful info.  Are you testing with a very recent nightly?  The latency pref turned out to be broken until I fixed bug 765524 a few days ago (the result being that if the pref was present, it would force a latency of 20ms).  I was going to get you to retest with that fix, and also with an instrumented build, but hadn't had time to prepare it yet.

The disconnect/reconnects are probably caused by the underrun handling--when we call snd_pcm_recover it causes the ALSA PulseAudio plugin to disconnect and reconnect from the PA server.  That a higher latency works around the problem adds weight to that theory.
(Reporter)

Comment 25

5 years ago
Yes I am on the nightly channel, and am up to date http://hg.mozilla.org/mozilla-central/rev/10e019421e6b

I was surprised that there was such a specific cutoff, I expected it to be more range.
(Assignee)

Comment 26

5 years ago
There are instrumented builds here: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mgregan@mozilla.com-65a68f56e408/

They log to stderr, and will produce quite a lot of output.  If you could run them while reproducing the problem for a short time and attach the resulting log, that would be much appreciated.
(Assignee)

Comment 27

5 years ago
Just a friendly reminder: if you are able to download and reproduce this bug with the instrumented build I mentioned in comment 26, that would be greatly appreciated.  Thanks in advance.
(Reporter)

Comment 28

5 years ago
Created attachment 638783 [details]
stderr of instrumented build reproducing problem

I reproduced the problem, then set up the latency, and playback was fine, then set latency down, and playback broke again. Lots of "underrun occurred" as was suspected

stdout:

Suppressed timeupdate during seeking: currentTime=0.000000, new time=0.000000
Suppressed timeupdate during seeking: currentTime=0.000000, new time=0.000000
nsBuiltinDecoder::SeekingStopped, next state=2
Suppressed timeupdate during seeking: currentTime=0.000000, new time=0.000000
nsBuiltinDecoder::SeekingStopped, next state=2
nsBuiltinDecoder::SeekingStopped, next state=2
Suppressed timeupdate during seeking: currentTime=0.000000, new time=0.000000
nsBuiltinDecoder::SeekingStopped, next state=2
nsBuiltinDecoder::SeekingStopped, next state=2
Suppressed timeupdate during seeking: currentTime=0.000000, new time=0.000000
nsBuiltinDecoder::SeekingStopped, next state=2
(Assignee)

Comment 29

5 years ago
Thanks for the log!

The buffer and period sizes reported during setup look correct and match what I see locally.  Buffer size is 100ms as requested, with a period size of just under 25ms (4 periods per buffer).  After the first wakeup where an entire buffer worth of data is requested, a second wakeup occurs immediately and is already reporting an underrun via the result of snd_pcm_avail_update.  I'd expect to see the second wakeup after ~25ms, requesting an additional period (25ms) of data.  It seems like something's going wrong at the alsa-pulse or PA level, but I'm not sure what yet.
(Assignee)

Updated

5 years ago
Duplicate of this bug: 770815
(Assignee)

Comment 31

5 years ago
Bug 770815 has a PA log with the bug reproduced, which gives me another clue to work on.  See bug 770815 comment 9.
(Assignee)

Comment 32

5 years ago
I can reproduce this locally now, by using a hacked version of the ALSA PulseAudio plugin to force the PA stream to have buffer sizes different to what we've requested via ALSA.  I modified this chunk http://gitorious.org/alsa/alsa-plugins/blobs/50562a9d5d54270ab67f183013204d24cdbeff21/pulse/pcm_pulse.c#line801 so that every reference to io->buffer_size and io->period_size is treated as 2x the actual value.  This results in the PA stream being configured with twice the latency we've requested, and approximates the same failure we're seeing here when PA increases the latency on our behalf.

So far I've confirmed that the buffer and period sizes reported back to us via ALSA after stream setup remain the initial computed values e.g. if we request 100ms latency, the buffer size will be 4410 frames and is reported as that size once the stream is configured (that is, after PA setup has resulted in a different size being selected).  When the stream is started, snd_pcm_avail_update requests no more than 4410 frames initially, and reports zero after writing the requested number of frames.  Calling snd_pcm_avail_update again immediately after a write reports zero, but sleeping on the PCM's waitable via poll results in an immediate wakeup requesting more data--except now snd_pcm_avail_update returns -EPIPE (signalling an underrun), so we enter xrun processing.  In short, ALSA ends up in a situation where the higher level API thinks the buffer/period sizes are what it's reporting to us, but the ALSA PulseAudio plugin is working with a different (larger) buffer.

We don't hit this bug with the old sydneyaudio backend for two major reasons: the first is that we used to use a 500ms latency, and the second is that the ALSA refill is driven by blocking writes rather than using poll to wait for space.  The former means that for the cases we've seen so far, PA would not have upsized the latency request behind ALSA's back.  Even if it had, because of the latter case, the buffer refill seems to be serviced correctly by the blocking-write model that sydneyaudio uses (even with my hacked plugin they old SA backend works).

I'll take this up on the ALSA/PA devel lists.  I suspect we're not going to be able to work around this except by bumping the default latency requested up high enough to cover the problem configurations.
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
(Assignee)

Comment 33

5 years ago
Note to self (and anyone trying to reproduce Linux audio issues in a VM): while looking through the provided PA log, I noticed that PA detects if it's running in a VM (pa_running_in_vm()) and disables tsched mode if so.  tsched mode has much stricter requirements on timing and drivers, so it's where we're much more likely to encounter bugs and other behavioural issues.

Also, I can reproduce this without a hacked ALSA PA plugin.  pacmd list-sinks reports a latency range of 46-2000ms for my active sink (listed under the configured latency field).  Setting media.cubeb_latency_ms to 45ms works, and 44ms fails per the bug description.
Keywords: qawanted
(Assignee)

Comment 34

5 years ago
(In reply to Matthew Gregan [:kinetik] from comment #33)
> Also, I can reproduce this without a hacked ALSA PA plugin.  pacmd
> list-sinks reports a latency range of 46-2000ms for my active sink (listed
> under the configured latency field).  Setting media.cubeb_latency_ms to 45ms
> works, and 44ms fails per the bug description.

This is somehow connected to the PA server increasing the minimum latency over time.  When I restart the PA server (either with pulseaudio -k and auto respawn or by running it in non-daemon debug mode), pacmd list-sinks reports the latency as 0.5-2000ms and I can run the cubeb stream with any latency.

In the case reported in bug 770815, the server is running with tsched disabled, so it's slightly different--it hasn't increased latency over time and just happens to run with a default latency higher than we're requesting.
(Assignee)

Comment 35

5 years ago
Raised this on the PA mailing list: http://lists.freedesktop.org/archives/pulseaudio-discuss/2012-July/014091.html
(Assignee)

Comment 36

5 years ago
Created attachment 642725 [details] [diff] [review]
patch v0

I haven't found a nice way to detect and work around this, and even if the ALSA PA plugin was fixed now we'd need to work around this for already deployed versions, so here's an ugly hack to do so now.

Detect the ALSA PA plugin using the same logic we're already using in sydneyaudio, and round the latency up to 200ms if less is requested.

Pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=779af852bad4
Attachment #642725 - Flags: review?(chris.double)
(Assignee)

Comment 37

5 years ago
Raised this on the ALSA mailing list too: http://mailman.alsa-project.org/pipermail/alsa-devel/2012-July/053334.html
(Assignee)

Comment 38

5 years ago
scientes, runetmember, marb, and anyone else on this bug who was having problems, would you mind testing the builds here (http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mgregan@mozilla.com-779af852bad4/( and updating the bug to let me know if this work around is sufficient.

Comment 39

5 years ago
Yes, it is working for me.

Comment 40

5 years ago
Comment on attachment 642725 [details] [diff] [review]
patch v0

Review of attachment 642725 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry about the delay, I missed this in my queue!

::: content/media/nsAudioStream.cpp
@@ +884,2 @@
>    {
>      return static_cast<nsBufferedAudioStream*>(aThis)->StateCallback(aState);

Return not needed now that StateCallback's return type is void.
Attachment #642725 - Flags: review?(chris.double) → review+
(Assignee)

Comment 41

5 years ago
Comment on attachment 642725 [details] [diff] [review]
patch v0

Per discussion on the mailing list, a better workaround is to tell the PA ALSA plugin to ignore underruns.  Patch coming up.
Attachment #642725 - Attachment is obsolete: true
(Assignee)

Comment 42

5 years ago
Created attachment 647067 [details] [diff] [review]
patch v1

This contains a different workaround in cubeb_alsa.c  The rest of the patch is the same as the last revision, but with the review comment addressed.

Try push: https://tbpl.mozilla.org/?tree=Try&rev=e2a1b014a263
Attachment #647067 - Flags: review?(chris.double)

Updated

5 years ago
Attachment #647067 - Flags: review?(chris.double) → review+
(Assignee)

Comment 43

5 years ago
Apologies for a second round of testing, but scientes, runetmember, marb, and anyone else on this bug who was having
problems, would you mind testing the builds here
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mgregan@mozilla.com-e2a1b014a263/ and letting me know if this fix works for you?

Comment 44

5 years ago
Issue is still present for me with this build.
If we'd like to get a fix into FF15 still, we should be landing a final solution on mozilla-central as soon as possible. FF15 beta 3 (of 6) is going to build tomorrow.
tracking-firefox16: --- → +
(Assignee)

Comment 46

5 years ago
(In reply to runetmember from comment #44)
> Issue is still present for me with this build.

Thanks for letting me know.  It did work for me, but perhaps that workaround only helps within a small range of latency difference (I tested a PA server running with a minimum latency of 56ms while playing a stream configured for 30ms).

Did the first attempted fix work for you?  If so, I'll go ahead and land that one for now.
(Assignee)

Comment 47

5 years ago
(In reply to Matthew Gregan [:kinetik] from comment #46)
> Thanks for letting me know.  It did work for me, but perhaps that workaround
> only helps within a small range of latency difference (I tested a PA server
> running with a minimum latency of 56ms while playing a stream configured for
> 30ms).

Also, would you mind attaching the output of "pacmd list-sinks"? That'll show me the minimum latency your PA is running with.
(Assignee)

Comment 48

5 years ago
Comment on attachment 642725 [details] [diff] [review]
patch v0

Per comment 45, for the sake of expediency I'll unobsolete the original fix and land it.  It's simple and safe, and is closer to the behaviour we used to have. We're using the same PA detection code in the old audio backend already, and the other only change is to force the requested latency of the stream upwards.
Attachment #642725 - Attachment is obsolete: false
(Assignee)

Comment 49

5 years ago
Comment on attachment 647067 [details] [diff] [review]
patch v1

And marking this one obsolete since it's an incomplete fix.
Attachment #647067 - Attachment is obsolete: true
(Assignee)

Comment 50

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/e40d733dae53
Target Milestone: --- → mozilla17
(Assignee)

Comment 51

5 years ago
Comment on attachment 642725 [details] [diff] [review]
patch v0

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 623444
User impact if declined: broken HTML 5 playback on Linux with particular configurations
Testing completed (on m-c, etc.): locally
Risk to taking this patch (and alternatives if risky): very low, uses quirk detection already in use in the old audio backend, plus increases audio stream latency to a value closer to the old backend
String or UUID changes made by this patch: none
Attachment #642725 - Flags: approval-mozilla-beta?
Attachment #642725 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 52

5 years ago
Created attachment 647457 [details]
latency test

(In reply to runetmember from comment #44)
> Issue is still present for me with this build.

If you could compile and run the attached test and attach the resulting output, that'd be very useful for working out why the second fix didn't work for you.

Compile it with:
% cc $(pkg-config alsa --cflags --libs) latency.c

And then just run it:
% ./a.out
default node type is pulse
0.00: latency = 1, buffer_size = 96, period_size = 32
2.20: avail = 96, wrote = 96
[...]

If it prints "xrun, giving up", try passing a larger latency via the command line, e.g.

$ ./a.out 100
(also try 50, 200, 500)

Thanks in advance!

Comment 53

5 years ago
I guess it's not so important now, but for me patch v1 was working.
(Assignee)

Comment 54

5 years ago
(In reply to marb from comment #53)
> I guess it's not so important now, but for me patch v1 was working.

That's still useful to know, thank you.  It's a better fix, and I plan to revisit that solution when I get some more time; hopefully I can find out why it doesn't work for runetmember.

Comment 55

5 years ago
Created attachment 647474 [details]
pacmd list-sinks output

> Did the first attempted fix work for you?
Link in Comment 38 is give 404 now. I guess I will able to check patch v0 in 15 beta 3 soon, right?

> If you could compile and run the attached test and attach the resulting output, that'd be very useful for working out why the second fix didn't work for you.
I would like to, but I'm not programmer. If you upload binary, I will check it.

> Also, would you mind attaching the output of "pacmd list-sinks"?
Attached.
(Assignee)

Comment 56

5 years ago
(In reply to runetmember from comment #55)
> > Did the first attempted fix work for you?
> Link in Comment 38 is give 404 now. I guess I will able to check patch v0 in
> 15 beta 3 soon, right?

Sorry, that test build expired.  I pushed a new one, it should be available in an hour or so: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mgregan@mozilla.com-adb9fbbd06cf/

I'm fairly confident this will work, but it's good to collect more data to confirm that.

> I would like to, but I'm not programmer. If you upload binary, I will check
> it.

Thanks, while preparing a binary that I knew would work on your system I found what I suspect is the problem, which is that Ubuntu's ALSA config space is quite different to the system I was testing on, so the workaround wasn't being applied.  Once I've fixed that, I'll let you know so you can test the new build.

Comment 57

5 years ago
> I pushed a new one, it should be available in an hour or so
Ok, I download it, and will check later.
This bug is reproducible for me not on every work session. Sometimes (like today) bug disappear after reboot. I will check test build when I again get bug reproducible.
https://hg.mozilla.org/mozilla-central/rev/e40d733dae53
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Comment on attachment 642725 [details] [diff] [review]
patch v0

[Triage Comment]
Approving for Aurora 16, but holding off till the next beta for approval there. That'll allow for some bake time.
Attachment #642725 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 60

5 years ago
http://hg.mozilla.org/releases/mozilla-aurora/rev/bd3e0818ea0d
status-firefox16: --- → fixed
(Assignee)

Updated

5 years ago
Blocks: 779392
(Assignee)

Comment 61

5 years ago
I'll continue working on the "patch v1"/handle_underrun workaround in bug 779392.
Matthew can you confirm if this is fixed on Aurora builds now?
(Reporter)

Comment 63

5 years ago
I can confirm this is fixed in nightly builds, after removing workarounds discussed above or using a new profile. (original reporter)
(Assignee)

Comment 64

5 years ago
(In reply to Lukas Blakk [:lsblakk] from comment #62)
> Matthew can you confirm if this is fixed on Aurora builds now?

Tested locally and the fix works as expected.

How I tested:
1. Run the following command line to discover the PA server's minimum latency:
   % pacmd list-sinks | awk '/index:/ {p=$1=="*"}; p==1 && /latency/ {print $0}'
       current latency: 0.00 ms
       configured latency: 0.00 ms; range is 56.00 .. 2000.00 ms
2. Note that the current minimum is 56ms (this value depends on the server's state)
3. Set the hidden integer pref media.cubeb_latency_ms to a value less than 56ms (I used 20ms)
   (this isn't required if the minimum is higher than 100ms, our default)
4. Opened a test WebM file and confirmed that it plays back as expected with audio
5. Reset the pref to the default value
status-firefox16: fixed → verified
Comment on attachment 642725 [details] [diff] [review]
patch v0

We'll take this on Beta then, please land today so this gets into Beta 4.
Attachment #642725 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(Assignee)

Comment 66

5 years ago
https://hg.mozilla.org/releases/mozilla-beta/rev/411b48cf7481
status-firefox15: affected → fixed

Comment 67

5 years ago
Verified on:
Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0 (20120808131812)
status-firefox15: fixed → verified
You need to log in before you can comment on or make changes to this bug.