Closed Bug 1396542 Opened 4 years ago Closed 4 years ago
Firefox 57 audio fails on some Linux machines; needs read access to /var/lib/dbus/machine-id
7.76 KB, text/x-log
1.45 KB, text/x-log
28.86 KB, text/plain
27.34 KB, text/x-log
176.00 KB, text/plain
59 bytes, text/x-review-board-request
[Tracking Requested - why for this release]: May be a major regression in WebRTC
Can you please use this tool to determine the regression range? https://mozilla.github.io/mozregression/ From command line: > mozregression --good 56 --bad 57
I just tried to repro locally with a single machine only, but everything works as expected. I'll try to repro tomorrow with two machines.
Re #4: I tried reproing on my linux machine, and I can repro on a single machine as well (I just had to pass --use-fake-device-for-media-stream to Chrome since I only have one webcam). Nils: you are connecting chrome-to-firefox, right, and not firefox-to-firefox?
Re #3: Running your command gives: 0:01.11 ERROR: The url 'https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?changeset=56' contains no pushlog. Maybe use another range ? I did run mozregression --good 2017-07-21 --bad 2017-09-03 instead. It looks like this happened between the 2017-07-24 and 2017-07-25 nightlies. I kept bisecting and this is the final result (I'll attach the full bisect log). 13:21.72 INFO: No more inbound revisions, bisection finished. 13:21.72 INFO: Last good revision: 8e1e06adf80f82d3d5cf08eadaf569a107bd1ecf 13:21.72 INFO: First bad revision: b5fa08551d6e74d8300fa94f3161afdffd867764 13:21.72 INFO: Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8e1e06adf80f82d3d5cf08eadaf569a107bd1ecf&tochange=b5fa08551d6e74d8300fa94f3161afdffd867764
It's interesting that the implicated patches are Linux-specific, so this could be a linux-only problem. Which platform were you on in #4, Nils?
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0 Firefox: 57.0a1, Build ID: 20170904220027 I have tested this issue on latest Firefox release (55.0.3) and latest Nightly (57.0a1) build and could not reproduce it. I've navigated to the provided URL, but the connection to the website is unsecure (NET::ERR_CERT_COMMON_NAME_INVALID). Instead of the provided URL, I've used https://appr.tc with 3 different machines to test this issue, one with Win 10, a Mac 10.12 and an Ubuntu 16.04, and everything works as expected.
Re #9: Ok, which Chrome version did you use?
Version 60.0.3112.113 (Official Build) (64-bit)
Ok, just to ensure we agree what "works as expected" means. The page should load, the video feeds make a flip-over animation and the feed from the other machine starts playing. There are two video tags on the page, one local preview and one remote view. Is that what you're seeing Marius?
Yes, the page loads, video feed makes a flip-over animation when the other party joins/connects, there are two video tags on the page (you are on the bottom right corner and the other party is displayed on the screen). Could you please post all the information from the about:support page? Also, have you also tried to reproduce on https://appr.tc? I could only manage to get a call working on this page.
FYI, from Chrome: (Linux, Version 60.0.3112.113 (Official Build) (64-bit)) trying to connect to: www.apprtc.appspot.com/r/jesup Your connection is not private Attackers might be trying to steal your information from www.apprtc.appspot.com (for example, passwords, messages, or credit cards). Learn more NET::ERR_CERT_COMMON_NAME_INVALID Trying appr.tc/r/jesup: no problems, with Firefox Nightly on Windows or Linux. I was using e10s (multi-process) in Firefox (since the commits flagged above have to do with sandboxing. What exact OS are you using? and what audio devices?
Yesterday I was trying on Mac only. Today I tried on Fx on Mac and Chrome 60 on Win 10. Again I can't repro either. I tried: - with both machines on the same LAN - with both machines in different LAN so they ended up using TURN servers - from Fx to Chrome - from Chrome to Fx All works fine. You briefly see your own video. Then then video flips over and you start to see the video from the other side and your own preview in the lower right corner. I also encountered the TLS cert problem with www.apprtc.apspot.com. Patrik: do you have any more details on what is actually not working? Is the ICE connection not getting established? Is the video not playing? Could you please attach a copy of "about:webrtc" from Firefox from such a failed call?
Re #13: I've run on apprtc.appspot.com, but appr.tc is just an alias for that right? It's the same page. Re #14: Thanks, noted. I'm running Ubuntu 14.04 trusty. I have a GF108 High Definition Audio Controller Digital Stereo (HDMI) in this machine. Re #15: Chrome says ICE is completed. The firefox side says "End of candidates" from apprtc.debug.js when a call succeeds (like when I run on my mac), and it does _not_ say that for a failing call on my Linux box. Here is the code that says that: https://github.com/webrtc/apprtc/blob/1bf0f49c731715bd9c4f53b41df37525e670ee3f/src/web_app/js/peerconnectionclient.js#L329 I'm not good enough at WebRTC call debugging to know what this means. Anyway, AppRTC flips the video tags when the remote video's ready state is at least 2 (see https://github.com/webrtc/apprtc/blob/cf63a6f65badbf2cb38382271bbe2486c66b2958/src/web_app/js/appcontroller.js#L276), i.e. video bits have started arriving. Something fails before that.
I've made another finding: I can't get a call up even when I call from my working firefox nightly on mac to my nightly on Linux. Therefore, this isn't an interop bug between Chrome and Firefox. We should rename this bug to "Firefox nightly WebRTC doesn't work on certain Linux machines" or something like that :)
I found one indicator in Patrik's logs: Firefox SDP shows in the audio part only a=recvonly, which I guess means gUM failed to get the mic. As the video m-section contains a=sendrecv this appears to be an audio only problem on Linux. Paul: what kind of log files would we need from Patrik to find out what is going on here?
Summary: Firefox and Chrome fail to set up WebRTC call on AppRTC; regressed sometime the past few weeks. → Firefox 57 audio fails on some Linux machines
Please activate flags: cubeb:4,MediaStreamGraph:4,MediaManager:4
I'm not sure audio is at fault here. An audio issue would most likely manifest itself as a fail-to-start (gUM rejected) or a started-but-no-callbacks (gUM stalled). Here gUM must have been resolved since apprtc proceeded to setting up the connection. A gUM reject on apprtc can't be missed. Nils, what do you look at when deciding to make it recvonly? MediaStreamTracks?
Aha! Yeah, that's the difference. When I run on Linux it doesn't ask for mic, only webcam, but when I run on mac it asks for both mic and webcam. So that suggests it doesn't recognize my mic. At least not after the three patches I bisected down to in #6. So that's why the call isn't getting up, because AppRTC doesn't handle that? It doesn't recognize either the mic in my webcam or the front built-in audio. I can see they are working in the OS though.
Re #20: Can you explain in detail how to activate the flags? I found cubeb in about:config but not the other two.
Aha, maybe apprtc does an enumeration pass before deciding what to ask for -- and only ask for video when no audio devices were found. See  for info on enabling logging. TL;DR set some environment variables when running firefox, like > MOZ_LOG=timestamp,cubeb:4,MediaStreamGraph:4,MediaManager:4 MOZ_LOG_FILE=/tmp/moz.log ./firefox  https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Gecko_Logging#Enabling_Logging
(In reply to Andreas Pehrson [:pehrsons] from comment #21) > Nils, what do you look at when deciding to make it recvonly? > MediaStreamTracks? The SDP in the logs which Patrik send to me show a=recvonly for the audio m-section, a=sendrecv for the video m-section. So I think appr.tc failed to attach an audio track to the PC. How would gUM behave if audio and video got requested, but only video succeeded?
(In reply to Patrik Höglund from comment #22) > Aha! Yeah, that's the difference. When I run on Linux it doesn't ask for > mic, only webcam, but when I run on mac it asks for both mic and webcam. So > that suggests it doesn't recognize my mic. At least not after the three > patches I bisected down to in #6. So that's why the call isn't getting up, > because AppRTC doesn't handle that? Not sure about the current state. But historically AppRTC did not handle it very well when it requests audio and video, but only got one of these.
Log from failed apprtc call. with MOZ_LOG=timestamp,cubeb:4,MediaStreamGraph:4,MediaManager:4 MOZ_LOG_FILE=/tmp/moz.log ./firefox-nightly/firefox_aug31/firefox
Re #24: attached moz.log. I only managed to get one out on the first attempt though: when I tried again, all I get is empty logs!
Attachment #8906506 - Attachment mime type: text/x-log → text/plain
(In reply to Nils Ohlmeier [:drno] from comment #25) > How would gUM behave if audio and video got requested, but only video succeeded? We have been rejecting the entire request in that case for quite some time, per the spec.
(In reply to Patrik Höglund from comment #27) > Created attachment 8906506 [details] > moz.log > > Log from failed apprtc call. with > > MOZ_LOG=timestamp,cubeb:4,MediaStreamGraph:4,MediaManager:4 > MOZ_LOG_FILE=/tmp/moz.log ./firefox-nightly/firefox_aug31/firefox I don't see cubeb or media logs here. This could be from the parent process, while media logs are in the child process. Those logs should appear in the same location but under other file names that indicate which process they came from. It could be that they cannot be written due to sandboxing. That can be lowered per the logging wiki page I linked in comment 24. Alternatively turn off multi-process firefox in about:preferences and all you get is one file.
I found a /tmp/moz.log.child-1 which has cubeb lines in it, attaching that. Another data point: When I turn multi-process off, the call works!
Log from child process with failing call
Failure of audio on Linux with e10s is very likely a sandboxing issue - Likely the sandboxwas tightened, and (your?) Pulse accesses some files which may not be whitelisted. What OS version are you running on? Anything non-standard, or non-standard settings? Unusual audio drivers? I'll get info on how to log sandbox violations.
:gcp fyi this bug looks like sandbox related breakage on linux, maybe you can help with comment 33?
Tracking for 57, from what I understand the tightened linux sandbox might ship there.
Patrik - can you do a run (with e10s (multiprocess) on), with env variable MOZ_SANDBOX_LOGGING set (I presume setting it to '1' is sufficient). Thanks!
Sandbox printouts from failed run
mozlog from latest run
Re #36: setting the env variable caused Sandbox lines to be printed on stdout/stderr, so I assume those are the ones you want. I also attached the child-1 log from the same run if you need it.
Changing the component to cubeb, this is clearly a cubeb issue. Cubeb is our cross-platform audio input output library.
Component: WebRTC → Audio/Video: cubeb
Which Linux distribution is this? PulseAudio is probably failing to connect to the server. We have several specific holes for letting it figure out where to connect, but whatever PulseAudio version/config is used here seems to be using another method which is blocked.
Flags: needinfo?(gpascutto) → needinfo?(phoglund)
Whiteboard: [needinfo 2017-09-07 drno] → [needinfo 2017-09-07 drno],sb?
This is weird: Sandbox: Failed errno -13 op 0 flags 02400000 path /dev/shm/pulse-shm-3626393402 /dev/shm is whitelisted even for writing: https://dxr.mozilla.org/mozilla-central/rev/f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp#48 Is it the flags? 02400000
The flags are O_NOFOLLOW 00400000 O_CLOEXEC 02000000 We allow those, so that shouldn't be the problem: https://dxr.mozilla.org/mozilla-central/rev/f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/security/sandbox/linux/broker/SandboxBroker.cpp#444
Since there are a few new people in the bug, let me remind you that I bisected this issue in #6 and it boiled down to three possible culprits: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8e1e06adf80f82d3d5cf08eadaf569a107bd1ecf&tochange=b5fa08551d6e74d8300fa94f3161afdffd867764 The distribution is Goobuntu. I have no idea how it differs from Ubuntu unfortunately.
Something interesting is this: Normal refuses will look like this: Sandbox: SandboxBroker: denied op=open rflags=2000000 perms=0 path=/var/lib/dbus/machine-id for pid=568 Sandbox: Failed errno -13 op 0 flags 02000000 path /var/lib/dbus/machine-id Logging the error in the SandboxBroker and SandboxBrokerClient However the shm fails: Sandbox: Failed errno -13 op 0 flags 02400000 path /dev/shm/pulse-shm-3626393402 Seem to fail in a way that the Broker doesn't log them.
(In reply to Patrik Höglund from comment #44) > Since there are a few new people in the bug, let me remind you that I > bisected this issue in #6 and it boiled down to three possible culprits: > https://hg.mozilla.org/integration/mozilla-inbound/ > pushloghtml?fromchange=8e1e06adf80f82d3d5cf08eadaf569a107bd1ecf&tochange=b5fa > 08551d6e74d8300fa94f3161afdffd867764 I already removed the regression-window wanted flag because I saw that. Those 3 patches enable sandboxing of filesystem reads. We have no plans to back those out, so we'll have to figure out why PulseAudio is getting blocked by it. There's several specific holes to let it through until most of bug 1362220 lands. > The distribution is Goobuntu. I have no idea how it differs from Ubuntu unfortunately. Ouch, that will complicate matters.
Is it possible to run Firefox with the sandbox logging, look for the /dev/shm/pulse-shm-xxxxx refusal as in comment 45, and check what the actual permissions on that file are? This logging is consistent with this open() failing: https://dxr.mozilla.org/mozilla-central/rev/f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/security/sandbox/linux/broker/SandboxBroker.cpp#725 Maybe we're not supposed to be able to open this at all, and we're only falling back to this because another way of connecting to Pulse was blocked earlier.
It looks like nobody's mentioned this line from the log yet: Sandbox: SandboxBroker: denied op=open rflags=2000000 perms=0 path=/usr/local/xyz/home/phoglund/.pulse/client.conf for pid=568 If that file exists, that might be part of the cause.
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #48) > It looks like nobody's mentioned this line from the log yet: > > Sandbox: SandboxBroker: denied op=open rflags=2000000 perms=0 > path=/usr/local/xyz/home/phoglund/.pulse/client.conf for pid=568 > > If that file exists, that might be part of the cause. I saw this, but as I understand .pulse is an older location, and now it's XDG_CONFIG_HOME, which is .config, which we cover. My Ubuntu 16.04 seems to have the PA stuff there indeed. So unless Goobuntu for some reason uses the older location, that should be fine.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #49) > I saw this, but as I understand .pulse is an older location, and now it's > XDG_CONFIG_HOME, which is .config, which we cover. https://dxr.mozilla.org/mozilla-central/source/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp#133 We should probably use XDG_CONFIG_HOME instead of .config, or both. I'll file that separately.
Re #49: I don't have a ~/.pulse dir, but I do have a ~/.config/pulse dir: $ ls ~/.config/pulse/ 7855a8131f74d443df3b71ef59a3d3ff-card-database.tdb 7855a8131f74d443df3b71ef59a3d3ff-device-volumes.tdb 7855a8131f74d443df3b71ef59a3d3ff-default-sink 7855a8131f74d443df3b71ef59a3d3ff-stream-volumes.tdb 7855a8131f74d443df3b71ef59a3d3ff-default-source cookie
Patrik, can you try the test in comment 47?
Re #52: All the /dev/shm/pulse-shm-* files have -rwx------.
(In reply to Patrik Höglund from comment #53) > Re #52: > > All the /dev/shm/pulse-shm-* files have -rwx------. Yes, but owned by what user/group?
All are owned by phoglund and group "eng", except for one which is owned by user lightdm and group lightdm.
Whiteboard: [needinfo 2017-09-07 drno],sb? → [needinfo 2017-09-07 drno][sb?]
I wondered if this might be related — /etc/machine-id doesn't exist, and /var/lib/dbus/machine-id isn't allowed by policy: Sandbox: Failed errno -2 op 0 flags 02000000 path /etc/machine-id Sandbox: SandboxBroker: denied op=open rflags=2000000 perms=0 path=/var/lib/dbus/machine-id for pid=568 That doesn't happen for me with regular WebAudio, because we start the audio backend before sandboxing, but it does with WebRTC, probably because of the unused extra PA instance it creates (bug 1394163). This suggests a possible test on Goobuntu: adding /var/lib/dbus/machine-id to the security.sandbox.content.read_path_whitelist pref (it's a comma-separated list of paths, but this should be the only item). However, on plain Ubuntu 16.04, even if I remove /etc/machine-id I can't reproduce this using the instructions in comment #0. So that might not be it. There's also this oddity, which also turned up in bug 1363378; we're returning EACCES instead of ENOENT for opening the empty string, which may or may not affect anything because we don't yet know what's doing that: Sandbox: rewriting /proc/self/exe -> /proc/568/exe Sandbox: Recording mapping /usr/local/xyz/home/phoglund/dev/chromium/firefox-nightly/firefox/firefox -> /proc/568/exe Sandbox: SandboxBroker: denied op=open rflags=0 perms=0 path= for pid=568 Sandbox: Failed errno -13 op 0 flags 00 path Sandbox: SandboxBroker: denied op=open rflags=0 perms=0 path= for pid=568 Sandbox: Failed errno -13 op 0 flags 00 path And then there's this: Sandbox: SandboxBroker: denied op=access rflags=4 perms=0 path=/proc/net for pid=568 Sandbox: Failed errno -13 op 1 flags 04 path /proc/net My guess is that this is glibc's internal function __opensock, called from if_indextoname or similar, and bug 1384292 might be related, but from looking at the glibc source it should be harmless to deny access.  https://searchfox.org/mozilla-central/rev/6326724982c66aaeaf70bb7c7ee170f7a38ca226/dom/ipc/ContentChild.cpp#1654
Re #56: adding /var/lib/dbus/machine-id to the security.sandbox.content.read_path_whitelist fixes the error.
Priority: -- → P2
Whiteboard: [needinfo 2017-09-07 drno][sb?] → [sb?]
Bug 1388545 comment #6 seems to be the same problem on other distributions (including Ubuntu 14.04 but not Ubuntu 16.04, which might hint at why Goobuntu is affected). I'll just take this bug, because it should be a one-line fix.
Assignee: nobody → jld
Summary: Firefox 57 audio fails on some Linux machines → Firefox 57 audio fails on some Linux machines; needs read access to /var/lib/dbus/machine-id
Comment on attachment 8910529 [details] Bug 1396542 - Let sandboxed content processes read /var/lib/dbus/machine-id. https://reviewboard.mozilla.org/r/181962/#review187506 Didn't we have some meta to hang up the hundreds of PA related bugs on so we know what stuff to back out when it gets remoted? Maybe we should.
Attachment #8910529 - Flags: review?(gpascutto) → review+
(In reply to Gian-Carlo Pascutto [:gcp] from comment #60) > Comment on attachment 8910529 [details] > Bug 1396542 - Let sandboxed content processes read /var/lib/dbus/machine-id. > > https://reviewboard.mozilla.org/r/181962/#review187506 > > Didn't we have some meta to hang up the hundreds of PA related bugs on so we > know what stuff to back out when it gets remoted? Maybe we should. There's bug 1104619 (sb-audio), but I thought that was for things we plan to do, not things we plan to back out. I could file another bug for this, but that doesn't really seem useful if I'm just going to rip out all the #ifdef MOZ_PULSEAUDIO things at once.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/d095506bd1d6 Let sandboxed content processes read /var/lib/dbus/machine-id. r=gcp
Splendid! When will this fix make it Nightly, so I can re-enable the Chrome-Firefox interop test?
(In reply to Patrik Höglund from comment #63) > Splendid! When will this fix make it Nightly, so I can re-enable the > Chrome-Firefox interop test? ~12 hours after there is comment here from Pulsebot about mozilla-central. We just switched to Firefox 58 so that might be some of the delay.
Actually this is in nightly now. https://hg.mozilla.org/mozilla-central/rev/d095506bd1d6 Seems that merge wasn't marked in bugzilla?
It seems that we want to uplift that in 57, Jed, could you fill an uplift request? Thanks
Comment on attachment 8910529 [details] Bug 1396542 - Let sandboxed content processes read /var/lib/dbus/machine-id. Approval Request Comment [Feature/Bug causing the regression]: Bug 1308400 [User impact if declined]: No sound for some Linux distributions [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: No [Needs manual test from QE? If yes, steps to reproduce]: Would be nice. Manual QA team found the bug. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No [Why is the change risky/not risky?]: Addition to whitelist. [String changes made/needed]: N/A
Attachment #8910529 - Flags: approval-mozilla-beta?
Comment on attachment 8910529 [details] Bug 1396542 - Let sandboxed content processes read /var/lib/dbus/machine-id. A browser without sound is a bit sad, taking it to beta. Should be in 57beta3
Attachment #8910529 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Sylvestre Ledru [:sylvestre] from comment #68) > Comment on attachment 8910529 [details] > Bug 1396542 - Let sandboxed content processes read /var/lib/dbus/machine-id. > > A browser without sound is a bit sad, taking it to beta. > Should be in 57beta3 Beta is fine due to bug 1388046, which I am going to back out (in a new bug, I guess) right about now!
(In reply to Julien Cristau [:jcristau] from comment #65) > Seems that merge wasn't marked in bugzilla? Weird, could've sworn I ran Bugherder on that merge push to mark all of the bugs in the push, but apparently I didn't. Sorry about that. Looks like Aryx got it, though.
I couldn't reproduce the bug and we tested on three different machines. We tried to make a call between Windows 10 (Chrome)and Windows 10 (Nightly from 2017-09-03), Windows 10 (Chrome) and Ubuntu 16.04 x64 (Nightly from 2017-09-03), mac 10.13 (Chrome) and Ubuntu 16.04 x64 (Nightly from 2017-09-03). We tested using this https://appr.tc/r/jesup from comment 14, because the others didn't work. Using this link we managed to make a call without any problems. Since we cannot reproduce this bug can you please verify it on Beta 57.0b14? So we can close this bug as verified fixed. Thank you.
(In reply to Oana Botisan from comment #73) > I couldn't reproduce the bug and we tested on three different machines. We > tried to make a call between Windows 10 (Chrome)and Windows 10 (Nightly from > 2017-09-03), Windows 10 (Chrome) and Ubuntu 16.04 x64 (Nightly from > 2017-09-03), mac 10.13 (Chrome) and Ubuntu 16.04 x64 (Nightly from > 2017-09-03). This bug affects Ubuntu 14.04 but not 16.04; see bug 1388545 comment #6 and 7 (and comment #17 on this bug).
We managed to reproduce the bug using the method from https://bugzilla.mozilla.org/show_bug.cgi?id=1388545#c6 on Ubuntu 14.04 x32 on an older version of Nightly (2017-09-04). We retested everything using the latest Nightly 58.0a1 and Firefox 57.0 - build 4 on the same platform, but the bug was not reproducing anymore.
You need to log in before you can comment on or make changes to this bug.