13.45 KB, text/plain
59 bytes, text/x-review-board-request
2.30 KB, patch
|Details | Diff | Splinter Review|
Since the latest nightly upgrade, when I go to <youtube.com>, the website loads (most of the time, sometimes it doesn't even seem to do that) and then freezes. Hovering the links doesn't affect the mouse cursor, clicking them doesn't do anything Most other tabs still work, but some do not. Quitting Firefox most of the times leaves a dead content process behind; I assume the one hosting YouTube somehow froze. about:support says Name Firefox Version 56.0a1 Build ID 20170726100241 Update History Update Channel nightly User Agent Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0 OS Linux 4.11.0+ Multiprocess Windows 1/1 (Enabled by default) Web Content Processes 4/4 Stylo false (disabled by default) I was using the nightly from the 24th most of the day; YouTube still worked fine there.
Actually this doesn't just happen with YouTube, it also sometimes happens e.g. on Rust's Discourse forum. Nightly is pretty much unusable right now for me.
Can you find a regression window?
What does that mean? The July 26th nightly has the problem, the July 24th nightly did not. Is there some way to get more builds in between?
(In reply to Ralf Jung from comment #3) > What does that mean? The July 26th nightly has the problem, the July 24th > nightly did not. Is there some way to get more builds in between? Yes! You can use http://mozilla.github.io/mozregression/ and tell it to narrow down the range between July 24th and July 26th, given that that range likely still contains several hundreds of changes.
Hm, the bug doesn't show up with mozregression. Must be something about my profile. I will investigate.
Okay, so it's something about the sandbox I use. I run Firefox in a "firejail" sandbox, just to contain the browser a little better and, in particular, to not give it access to sensitive parts of the file system like private keys. Disabling the use of seccomp by firejail fixes the issue. So something about the seccomp filters firejail applies must be in conflict with something that happened the last few days. I will try to run mozregression inside the sandbox.
2 years ago
Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core
Result of the regression test: 8:46.09 INFO: Last good revision: 8e1e06adf80f82d3d5cf08eadaf569a107bd1ecf 8:46.09 INFO: First bad revision: b5fa08551d6e74d8300fa94f3161afdffd867764 8:46.09 INFO: Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8e1e06adf80f82d3d5cf08eadaf569a107bd1ecf&tochange=b5fa08551d6e74d8300fa94f3161afdffd867764 I have attached the full log.
(In reply to Ralf Jung from comment #6) > Okay, so it's something about the sandbox I use. I run Firefox in a > "firejail" sandbox, just to contain the browser a little better and, in > particular, to not give it access to sensitive parts of the file system like > private keys. Disabling the use of seccomp by firejail fixes the issue. So > something about the seccomp filters firejail applies must be in conflict > with something that happened the last few days. I'm assuming this is https://firejail.wordpress.com/ ? gcp, seems this regressed in bug 1308400 - can you take a look?
Firejail and Firefox's sandbox both use seccomp-bpf to filter and intercept syscalls. I'm a bit surprised this still worked after Firefox 54 made use of seccomp-bpf itself, as far as I remember you couldn't nest seccomp-bpf filters. There might be a possibility here that firejail quietly broke Firefox's sandbox, and this quiet breakage is fatal now that seccomp-bpf usage is used to broker filesystem reads. I'll defer to jld who's the expert on this, but I suspect this will be a WONTFIX and you can either have firejail, or Firefox's own sandbox, but not both at the same time.
Flags: needinfo?(gpascutto) → needinfo?(jld)
Component: Audio/Video: Playback → Security: Process Sandboxing
Summary: youtube.com freezes → Broken browser when running under a firejail sandbox
> I'm a bit surprised this still worked after Firefox 54 made use of seccomp-bpf itself, as far as I remember you couldn't nest seccomp-bpf filters. There might be a possibility here that firejail quietly broke Firefox's sandbox, and this quiet breakage is fatal now that seccomp-bpf usage is used to broker filesystem reads. Yeah, I guessed the same. > you can either have firejail, or Firefox's own sandbox, but not both at the same time. I can also just turn off seccomp in firejail, which I did and now things all work again. (I still prefer having Firefox in a separate mount namespace that blocks access to sensitive files like my private keys. You know, better safe than sorry.)
It seems that at the very least, seccomp diagnosis is not working the way it should. No matter whether I have seccomp enabled in firejail (i.e., no matter whether Firefox is in a nested seccomp or not), about:support shows Seccomp-BPF (System Call Filtering): true Seccomp Thread Synchronization: true User Namespaces: true Content Process Sandboxing: true Media Plugin Sandboxing: true Content Process Sandbox Level: 3 Effective Content Process Sandbox Level: 3
I digged a bit and nested seccomp-bpf is supported, we specifically fixed that a year ago. But something firejail is doing might obviously interfere with our sandbox. Needs further investigation.
Specifically, with multiple filters the kernel will run all of them and take the “most restrictive” result. More specifically, it uses the numerically lowest return value: kill < trap < error < trace < allow. Firejail seems to use SECCOMP_RET_KILL, so the process will immediately exit with no error messages if it uses a syscall firejail doesn't like. Given that chroot is one of them, I'd expect media plugins (EME CDMs, and OpenH264 which is used by WebRTC but not <video>) to have been broken since Firefox 40 (bug 1151607) on systems with unprivileged user namespaces, and if so this will break the entire browser relatively soon (bug 1213998). But I haven't seen anything that would interact with bug 1308400 yet.  https://github.com/netblue30/firejail/blob/9a3344f9a569de5a2b619ff9ebc01cbd195ee1d0/src/include/seccomp.h#L139  https://github.com/netblue30/firejail/blob/9a3344f9a569de5a2b619ff9ebc01cbd195ee1d0/src/fseccomp/seccomp.c#L178
I can reproduce this locally. This happens: Sandbox: rewriting /proc/self/status -> /proc/242/status Sandbox: SandboxBroker: denied op=open rflags=0 perms=0 path=/proc/242/status for pid=242 Sandbox: Failed errno -13 op 0 flags 00 path /proc/242/status And that's the last anyone heard of the process with local pid 242, which I'm pretty sure is the one with global pid 73946: jld 73946 1.9 0.0 0 0 pts/7 Zl+ 10:32 0:06 [Web Content] <defunct> Not sure why the parent process hasn't cleaned that up yet. I think the only time I've seen something like that happen was when we backported seccomp-bpf to 3.0.x kernels for Boot2Gecko and SECCOMP_RET_KILL was buggy such that the parent/ptracer wouldn't be notified; as far as I know it force-exits the process normally on mainline kernels that have seccomp-bpf. Next is to find out who's looking at /proc/self/status and what they try to do when they fail…
libnuma (required by libx265, required by ffmpeg). Fails to open the file at , falls back to calling get_mempolicy at , killed by firejail due to . So if we allow /proc/self/status, which doesn't seem to have anything too sensitive, then (a) youtube works again without changes to firejail, and (b) we might be able to stop allowing get_mempolicy in our own seccomp policy.  https://github.com/numactl/numactl/blob/v2.0.11/libnuma.c#L382  https://github.com/numactl/numactl/blob/v2.0.11/libnuma.c#L403  https://github.com/netblue30/firejail/blob/9a3344f9a569de5a2b619ff9ebc01cbd195ee1d0/src/fseccomp/seccomp.c#L163
Also, in case someone else needs to debug something like this: I changed the broker policy to have an entry for /proc/%d/status (like the one for /proc/%d/maps) with the SandboxBroker::CRASH_INSTEAD flag set, ran with the env var MOZ_DISABLE_CRASHREPORTER set (probably unnecessary because it's a debug build), and fed the stack dump to tools/rb/fix_linux_stack.py (with the libnuma-dbg package installed so I could get a line number for that frame instead of tracing through disassembly).
FYI, I reported this against Firejail as <https://github.com/netblue30/firejail/issues/1414> and they, for now, stop blocking get_mempolicy.
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #15) > So if we allow /proc/self/status, which doesn't seem to have anything too > sensitive, then (a) youtube works again without changes to firejail, and (b) > we might be able to stop allowing get_mempolicy in our own seccomp policy. This seems like an easy change. Looking into /prof/self/status, I see a bunch of pids including parent pid. We currently block getppid in preparation for bug 1151624. Presumably, ppid in here will be 0 when running inside a namespace. Given that this is the only thing that broke on /proc/self/status so far, I don't think we have to be afraid that stuff ends up accidentally using ppids this way.
I verified that blocking get_mempolicy makes me crash on YouTube, and that whitelisting that proc entry makes it work again.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #18) > This seems like an easy change. Looking into /prof/self/status, I see a > bunch of pids including parent pid. We currently block getppid in > preparation for bug 1151624. Presumably, ppid in here will be 0 when running > inside a namespace. procfs uses the pid namespace where it was mounted, so that would be the outer ppid. (This is not how it treats other namespaces, like net and user.) I'm not too concerned about revealing it. If at some point in the distant future we take Chrome's approach of remoting all logging and hiding the process's own outer pid from it, then we might reconsider.
Comment on attachment 8898342 [details] Bug 1384804 - Allow libnuma to read /proc/self/status, block get_mempolicy. https://reviewboard.mozilla.org/r/169704/#review175038
Attachment #8898342 - Flags: review?(jld) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/afdd35ed8902 Allow libnuma to read /proc/self/status, block get_mempolicy. r=jld
I backed this out for crashes on get_mempolicy in libnuma. Example crash report: bp-a2754076-0449-4777-9662-4be9c0170823
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Merge of backout: https://hg.mozilla.org/mozilla-central/rev/4ffacd080dc6
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #25) > I backed this out for crashes on get_mempolicy in libnuma. Example crash > report: bp-a2754076-0449-4777-9662-4be9c0170823 We should've only backed out the get_mempolicy block, not the unblock of the /proc/pid/status.
Attachment #8913220 - Flags: review?(jld) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/d07bcfc779eed06b8f521b630841f18070ec6a74 Bug 1384804 - Allow reading /proc/self/status for libnuma. r=jld
gcp, do you want to uplift that to 57 (if really affected)
Comment on attachment 8913220 [details] [diff] [review] Allow reading /proc/self/status for libnuma Approval Request Comment [Feature/Bug causing the regression]: bug 1308400 and the previously landed part this bug itself [User impact if declined]: Running Firefox under older firejails can crash. (Very limited amount of users affected) [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: Landed a few days ago [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: N/A [Is the change risky?]: No [Why is the change risky/not risky?]: Adding an entry to a whitelist [String changes made/needed]: N/A
Attachment #8913220 - Flags: approval-mozilla-beta?
Comment on attachment 8913220 [details] [diff] [review] Allow reading /proc/self/status for libnuma Recent regression, Beta57+
Attachment #8913220 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.