[Wayland] dom/canvas/test/webgl-conf/generated/test_2_conformance2__glsl3__array-as-return-value.html crash with sandbox violation
Categories
(Core :: Security: Process Sandboxing, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox90 | --- | fixed |
People
(Reporter: stransky, Assigned: gerard-majax)
References
Details
Attachments
(1 file)
On Wayland, dom/canvas/test/webgl-conf/generated/test_2_conformance2__glsl3__array-as-return-value.html crashes with:
0:04.01 GECKO(16251) Sandbox: seccomp sandbox violation: pid 16420, tid 16420, syscall 157, args 23 32 0 139706498268232 0 1.
Run with MOZ_DISABLE_CONTENT_SANDBOX=1 prevents the crash.
Reporter | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 1•4 years ago
|
||
prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE)
, if I'm reading that correctly?
Reporter | ||
Comment 2•4 years ago
|
||
It comes from
nsSystemInfo::Init()
void* libpulse = dlopen("libpulse.so.0", RTLD_LAZY);
There's a backtrace:
#4 __GI___prctl (option=option@entry=23) at ../sysdeps/unix/sysv/linux/prctl.c:38
#5 0x00007f9243b1a1f7 in cap_get_bound (cap=cap@entry=32) at cap_proc.c:272
#6 0x00007f9243b197b6 in _initialize_libcap () at cap_alloc.c:20
#7 0x00007f927b5468de in call_init (l=<optimized out>, argc=argc@entry=15, argv=argv@entry=0x7ffec734bc48, env=env@entry=0x7f927ad24400) at dl-init.c:74
#8 0x00007f927b5469c8 in call_init (env=0x7f927ad24400, argv=0x7ffec734bc48, argc=15, l=<optimized out>) at dl-init.c:37
#9 _dl_init (main_map=0x7f925f1dd000, argc=15, argv=0x7ffec734bc48, env=0x7f927ad24400) at dl-init.c:121
#10 0x00007f927b0e02e5 in __GI__dl_catch_exception (exception=exception@entry=0x0, operate=operate@entry=0x7f927b54a350 <call_dl_init>, args=args@entry=0x7ffec7347300)
at dl-error-skeleton.c:182
#11 0x00007f927b54ae25 in dl_open_worker (a=a@entry=0x7ffec73474a0) at dl-open.c:783
#12 0x00007f927b0e0288 in __GI__dl_catch_exception (exception=exception@entry=0x7ffec7347480, operate=operate@entry=0x7f927b54aa40 <dl_open_worker>, args=args@entry=0x7ffec73474a0)
at dl-error-skeleton.c:208
#13 0x00007f927b54a65e in _dl_open
(file=0x7ffec7347480 "ht4\307\376\177", mode=-2147483647, caller_dlopen=0x7f9272945e53 <nsSystemInfo::Init()+1171>, nsid=-2, argc=15, argv=0x7ffec734bc48, env=0x7f927ad24400)
at dl-open.c:864
#14 0x00007f927b4bb39c in dlopen_doit (a=a@entry=0x7ffec73476d0) at dlopen.c:66
#15 0x00007f927b0e0288 in __GI__dl_catch_exception (exception=exception@entry=0x7ffec7347670, operate=operate@entry=0x7f927b4bb340 <dlopen_doit>, args=args@entry=0x7ffec73476d0)
at dl-error-skeleton.c:208
#16 0x00007f927b0e0353 in __GI__dl_catch_error
(objname=objname@entry=0x7f927ad34090, errstring=errstring@entry=0x7f927ad34098, mallocedp=mallocedp@entry=0x7f927ad34088, operate=operate@entry=0x7f927b4bb340 <dlopen_doit>, args=args@entry=0x7ffec73476d0) at dl-error-skeleton.c:227
#17 0x00007f927b4bbbd9 in _dlerror_run (operate=operate@entry=0x7f927b4bb340 <dlopen_doit>, args=args@entry=0x7ffec73476d0) at dlerror.c:170
#18 0x00007f927b4bb428 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#19 0x00007f9272945e53 in nsSystemInfo::Init() (this=<optimized out>) at /raid/src2/xpcom/base/nsSystemInfo.cpp:1009
The call is:
268 int cap_get_bound(cap_value_t cap)
269 {
270 int result;
271
272 result = prctl(PR_CAPBSET_READ, pr_arg(cap), pr_arg(0));
273 if (result < 0) {
274 errno = -result;
275 return -1;
276 }
but it comes from libc.
Reporter | ||
Comment 3•4 years ago
|
||
OS is Fedora 33.
Comment 4•4 years ago
|
||
So this is a mess. libcap2
started using PR_CAPBSET_READ
at static initializer time to probe the kernel's capability set size, in version 2.30, released 2020-01-04. And we're loading libpulse
in nsSystemInfo
, even in sandboxed processes that can't use it, because of bug 1245745. (We really should pass down system info like that over IPC or in the environment instead of making every process recompute it, and I remember saying something similar on some other sandboxing bug recently….)
Meaning that I don't know why this would have started happening only recently. Were we not using nsSystemInfo
in content processes before this? Or was it previously always being used before sandbox startup, and now sometimes the first use is after sandbox startup?
I suppose it's easy enough to fix by always returning EINVAL
— sandboxed processes shouldn't have access to actually do anything with capabilities, so it shouldn't matter what it thinks the set size is.
Comment 5•4 years ago
|
||
Meaning that I don't know why this would have started happening only recently.
IIUC this only happens on Wayland, which doesn't have a CI yet - Martin is working on bringing it up. Given that it's a WebGL test, I wonder if also happens on X11/EGL though, or what exactly is different for Wayland.
Assignee | ||
Comment 7•4 years ago
|
||
Martin, is there any specific environment / setup to perform to repro ?
I just did a build here and I can't repro running ./mach test dom/canvas/test/webgl-conf/generated/test_2_conformance2__glsl3__array-as-return-value.html
either running on X11/EGL nor under Xayland or pure Wayland.
But I'm running on some Ubuntu 21.04 setup (Gnome/Wayland).
Assignee | ||
Comment 8•4 years ago
|
||
Soo, after setting up an uptodate Fedora 33 VM, I can repro the crash there.
Comment 9•4 years ago
|
||
Thanks for looking into this. Just to be sure: does this only reproduce on Wayland or also X11/EGL? Because in the later case we should make this bug block bug 1695933, which will hopefully land soon.
Assignee | ||
Comment 10•4 years ago
|
||
(In reply to Robert Mader [:rmader] from comment #9)
Thanks for looking into this. Just to be sure: does this only reproduce on Wayland or also X11/EGL? Because in the later case we should make this bug block bug 1695933, which will hopefully land soon.
Only on Wayland for far, I have not ested on X11/EGL on Fedora yet, but I'll verify that soon then.
Assignee | ||
Comment 11•4 years ago
|
||
This might need cross-checking, but:
- Xwayland installed and running on the F33 VM
GDB_BACKEND=xwayland MOZ_ENABLE_WAYLAND=0 ./mach run
and checkingWindow Protocol
showsxwayland
- test running with ``GDB_BACKEND=xwayland MOZ_ENABLE_WAYLAND=0` repro the issue
However, I'm unsure about whether XWayland protocol is a valid alternative in this case, or do I need pure X11?
Assignee | ||
Comment 12•4 years ago
|
||
Good thing, Fedora 33 still has Xorg setup, so I booted a new session using "GNOME with Xorg":
XDG_SESSION_TYPE=x11
./mach run
shows properlyWindow Protocol: x11
inabout:support
- test still fails the same way:
Signature:[@ libc.so.6 + 0x1023f1]
Comment 13•4 years ago
|
||
Thanks. EGL/X11 can be activated both in an X11 or Wayland session via MOZ_X11_EGL=1
(i.e. Window Protocol
can be x11
or xwayland
) - it's visible by Driver WSI Info
s showing EGL_...
instead of GLX_...
extensions.
However, from what I understand now, the bug also happens on plain GLX - i.e. this does not only affect Wayland or EGL, but all configurations. Now it would be interesting if it's a Firefox regression or a mesa bug or so :/ Odd that it doesn't repro on Ubuntu.
Comment 14•4 years ago
|
||
P.S.: AFAIK Fedora builds firefox with GCC, while Ubuntu IIUC has switched to Clang/LLVM like upstream has.
Martin, could this be the issue?
Assignee | ||
Comment 15•4 years ago
|
||
(In reply to Robert Mader [:rmader] from comment #13)
Thanks. EGL/X11 can be activated both in an X11 or Wayland session via
MOZ_X11_EGL=1
(i.e.Window Protocol
can bex11
orxwayland
) - it's visible byDriver WSI Info
s showingEGL_...
instead ofGLX_...
extensions.However, from what I understand now, the bug also happens on plain GLX - i.e. this does not only affect Wayland or EGL, but all configurations. Now it would be interesting if it's a Firefox regression or a mesa bug or so :/ Odd that it doesn't repro on Ubuntu.
Indeed, running with MOZ_X11_EGL=1
also show the issue, but since previous tests were reproduced with GLX, I dont think it should be blocking the egl work?
Comment 16•4 years ago
•
|
||
P.S.: AFAIK Fedora builds firefox with GCC, while Ubuntu IIUC has switched to Clang/LLVM like upstream has.
Seems unlikely given the stacks @jld posted. More likely: Fedora has never versions of one of the involved libraries.
Assignee | ||
Comment 17•4 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #16)
P.S.: AFAIK Fedora builds firefox with GCC, while Ubuntu IIUC has switched to Clang/LLVM like upstream has.
Seems unlikely given the stacks @jld posted. More likely: Fedora has never versions of one of the involved libraries.
2.44 on debian and ubuntu, 2.48 on fedora ... let's look ...
Comment 18•4 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #15)
I dont think it should be blocking the egl work?
Indeed - neither should it block bug 1578640 then.
Assignee | ||
Comment 19•4 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #17)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #16)
P.S.: AFAIK Fedora builds firefox with GCC, while Ubuntu IIUC has switched to Clang/LLVM like upstream has.
Seems unlikely given the stacks @jld posted. More likely: Fedora has never versions of one of the involved libraries.
2.44 on debian and ubuntu, 2.48 on fedora ... let's look ...
(and fedora 32 was on 2.26)
Assignee | ||
Comment 20•4 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #4)
Meaning that I don't know why this would have started happening only recently. Were we not using
nsSystemInfo
in content processes before this? Or was it previously always being used before sandbox startup, and now sometimes the first use is after sandbox startup?
So, there might be a bit of both:
- LD_PRELOAD with libcap.so.2.48, test passes
- changing symlink /usr/lib64/libcap.so.2 from libcap.so.2.48 to libcap.so.2.26 also passes tests, without any preload
Assignee | ||
Comment 21•4 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #20)
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #4)
Meaning that I don't know why this would have started happening only recently. Were we not using
nsSystemInfo
in content processes before this? Or was it previously always being used before sandbox startup, and now sometimes the first use is after sandbox startup?So, there might be a bit of both:
- LD_PRELOAD with libcap.so.2.48, test passes
I've also tried rebuilding with the debian patches, just in case, but to no success.
- changing symlink /usr/lib64/libcap.so.2 from libcap.so.2.48 to libcap.so.2.26 also passes tests, without any preload
I think this is just consistent with https://git.kernel.org/pub/scm/libs/libcap/libcap.git/diff/libcap/cap_alloc.c?id=f1f62a748d7c67361e91e32d26abafbfb03eeee4 as you mentionned in comment 4.
I'm going to try and find a way to trace better between Fedora and Ubuntu, I'm really getting convinced the difference is the one you suspect: on ubuntu it is used before sandbox startup while it's after on fedora.
Assignee | ||
Comment 22•4 years ago
|
||
Tracing with MOZ_SANDBOX_LOGGING=1
, I can confirm on Fedora 33:
libcap.so
is being loaded during the test, by the content process- when using
LD_PRELOAD
there is no trace of loadinglibcap.so
in the content process
Running with MOZ_SANDBOX_LOGGING=1
also on Ubuntu, I can confirm there is no trace of libcap.so
being loaded by the content process.
Assignee | ||
Comment 23•4 years ago
|
||
Martin, I'm still trying to verify my idea that this is related to the current Fedora setup of libcap
and might involve PAM
. In the meantime, i'm mostly confident that this is unrelated to Wayland, since I happen to repro under a GNOME/Xorg session on Fedora 33 in my VM. I'm continuing investigation to try and verify the current hypothesis I have on the source of the issue and why we see that only on Fedora and not on Debian, but in the meantime I think we can drop the wayland-tests
blocker?
Reporter | ||
Comment 24•4 years ago
|
||
Yes, let's drop the wayland-tests.
Thanks.
Assignee | ||
Comment 25•4 years ago
|
||
I ended up hacking directly:
diff --git a/toolkit/xre/nsAppRunner.cpp b/toolkit/xre/nsAppRunner.cpp
index 0d887d933f3d1..3afcf295eaf09 100644
--- a/toolkit/xre/nsAppRunner.cpp
+++ b/toolkit/xre/nsAppRunner.cpp
@@ -5416,6 +5416,9 @@ int XREMain::XRE_main(int argc, char* argv[], const BootstrapConfig& aConfig) {
NS_SetCurrentThreadName("MainThread");
#endif
+ PR_SetEnv("LD_DEBUG=libs,files");
+ PR_SetEnv("LD_DEBUG_OUTPUT=libcap/ld.log");
+
AUTO_BASE_PROFILER_LABEL("XREMain::XRE_main (around Gecko Profiler)", OTHER);
AUTO_PROFILER_INIT;
AUTO_PROFILER_LABEL("XREMain::XRE_main", OTHER);
From there, I can confirm:
- on Fedora,
libpulse.so.0
from ournsSystemInfo::Init()
loading pullslibcap.so.2
that triggers the violation - on Debian, no
libcap.so.2
gets loaded, at all (no dep fromlibpulse.so.0
) - on Ubuntu,
libpulse.so.0
has a dependency againstlibcap.so.2
BUTlibsystemd.so.0
also has one, and it gets loaded by as a dep chain that goes up tolibmozgtk.so
Assignee | ||
Comment 26•4 years ago
|
||
Assignee | ||
Comment 27•4 years ago
|
||
If you want to confirm on your side that the fix works for you, but as much as I can tell, it unblocks what I was reproducing on Fedora 33 under Xorg and Wayland.
Assignee | ||
Updated•4 years ago
|
Comment 29•4 years ago
|
||
![]() |
||
Comment 30•4 years ago
|
||
bugherder |
Description
•