Closed Bug 1222500 Opened 9 years ago Closed 9 years ago

Assertion in SandboxEarlyInit() could fail when nVidia proprietary driver is used

Categories

(Core :: Security: Process Sandboxing, defect)

40 Branch
All
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla46
Tracking Status
firefox42 --- wontfix
firefox43 --- wontfix
firefox44 --- wontfix
firefox45 --- fixed
firefox46 --- fixed
firefox-esr38 --- unaffected
b2g-master --- unaffected

People

(Reporter: ibragimovrinat, Assigned: jld)

References

Details

(Whiteboard: [gfx-noted])

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0 Iceweasel/42.0 Build ID: 20151104000845 Steps to reproduce: * Find a machine with nVidia video adapter and proprietary driver * Enable threaded optimizations by setting __GL_THREADED_OPTIMIZATIONS environment variable to 1; * Start Firefox, open a page with Flash element. Actual results: plugin-container process crashes at assetion MOZ_RELEASE_ASSERT(IsSingleThreaded()); in SandboxEarlyInit() Expected results: plugin-container shouldn't crash
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Enabled threaded optimizations cause libGL.so to start a worker thread, so when SandboxEarlyInit() checks for number of threads, there are two instead of expected one. (gdb) thread apply all bt Thread 2 (Thread 0x7f0ee10ff700 (LWP 11393)): #0 0x00007f0ef6235428 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0 #1 0x00007f0ee6529c24 in ?? () from /usr/lib/libGL.so.1 #2 0x00007f0ee2feec16 in ?? () from /usr/lib/libnvidia-glcore.so.352.41 #3 0x00007f0ee6529d0c in ?? () from /usr/lib/libGL.so.1 #4 0x00007f0ef622f4a4 in start_thread () from /usr/lib/libpthread.so.0 #5 0x00007f0ef134913d in clone () from /usr/lib/libc.so.6 Thread 1 (Thread 0x7f0ef657a9c0 (LWP 11392)): #0 0x0000556b74ea7f7f in mozilla::SandboxEarlyInit(GeckoProcessType, bool) () #1 0x0000556b74e67beb in ?? () #2 0x00007f0ef1280610 in __libc_start_main () from /usr/lib/libc.so.6 #3 0x0000556b74e67599 in _start ()
Component: Untriaged → Graphics
Product: Firefox → Core
Jed, this assertion comes from your patches in bug 1151607. Does the assertion mean to also prevent threads created by 3rd party code (in our case the gl driver), or just gecko threads?
Flags: needinfo?(jld)
Whiteboard: [gfx-noted]
In principle, the process really does need to be single-threaded at that point, in order to unshare the user namespace and to make changes that affect all threads. Currently this is important only for GeckoMediaPlugin child processes (key word being “currently”), but I wanted the assertion to be present for all processes in order to catch regressions sooner. In hindsight, this should have been a MOZ_DIAGNOSTIC_ASSERT, and SandboxEarlyInit should have been unified with SandboxFlags initialization to be able to fall back on disabling functionality that can't work, but I'd been thinking about it as potentially security-critical (see also bug 1088387 which eventually didn't happen, and bug 1151632 which would move the B2G uid/gid change there), and I wasn't really thinking about the failure mode of LD_PRELOAD libraries creating threads in static initializers. As a workaround, setting __GL_THREADED_OPTIMIZATIONS=1 without setting LD_PRELOAD doesn't reproduce this bug, and according to nVidia's docs that should be enough to enable threaded GL optimizations for Gecko — we link against libpthread so don't need to preload it, and we call XInitThreads before using X11 so don't need to preload libGL.
Assignee: nobody → jld
Status: UNCONFIRMED → ASSIGNED
Component: Graphics → Security: Process Sandboxing
Ever confirmed: true
Flags: needinfo?(jld)
Hardware: x86_64 → All
Assuming this is present since bug 1151607 landed in 40 although the oldest report so far is from 42. Also worth mentioning, carried over from bug 1224057: this won't show up in crash-stats, because it doesn't generate crash dumps, because the assertion happens before the crashing process's crash reporter is initialized. (The parent is perfectly capable of taking a crash dump of the child, but it doesn't have any idea the child crashed until it's too late.)
With bugs like this there's always the question of “what does Chromium do”, so I checked. If user namespaces are available, it crashes immediately at startup: $ LD_PRELOAD="libpthread.so.0 libGL.so.1" __GL_THREADED_OPTIMIZATIONS=1 chromium [1:1:1124/152118:FATAL:sandbox_linux.cc(178)] Check failed: sandbox::Credentials::MoveToNewUserNS(). Otherwise, the setuid sandbox is used and strips out the LD_PRELOAD for renderer processes, which work normally, but the GPU process is affected and disables its sandbox. See also: https://crbug.com/477925 (and https://crbug.com/226181 which seems to have been fixed at some point). As for Firefox/Gecko: comment #4 isn't quite right, because the parent process needs to know about this, which means hooking into its startup; it might make the most sense just to move SandboxEarlyInit to initialization time. Also, the thing I said in bug 1224057 about interposing a wrapper for pthread_create is definitely not right, because it's called “preload” for a reason: libpthread and libGL are loaded first, and run their initializers before the executable (or its dependencies) can do anything, and they precede in the symbol search order so they can't be interposed like that in any case. And none of that matters anyway, because nVidia libGL doesn't call pthread_create via symbol resolution; it uses dlopen/dlvsym to specifically search "libpthread.so.0".
(In reply to Jed Davis [:jld] from comment #6) > it might make the most sense just to move SandboxEarlyInit to initialization time. Except that that means needing to know the process type before XRE_SetProcessType has been called (or, put another way, trying to read from argv before main() has been called). So that's also not a good idea. > Also, the thing I said in bug 1224057 about interposing a wrapper for > pthread_create is definitely not right, because it's called “preload” for a > reason: libpthread and libGL are loaded first, and run their initializers > before the executable (or its dependencies) can do anything, and they > precede in the symbol search order so they can't be interposed like that in > any case. This sentence is pretty much entirely wrong, it turns out. My test program didn't work only because I hadn't linked it with --export-dynamic. The executable *is* first in the search path, followed by preloaded objects, followed by dependencies. But initializers are run in reverse search order; i.e., in reverse dependency order starting from leaf libraries, ending with the preloaded objects and, last of all, the executable. See also: https://sourceware.org/bugzilla/show_bug.cgi?id=14379 The relevant part of all this is that it's probably best to avoid assuming anything in particular about initializer order (except *maybe* the executable running last). > And none of that matters anyway, because nVidia libGL doesn't > call pthread_create via symbol resolution; it uses dlopen/dlvsym to > specifically search "libpthread.so.0". And this part, at least, is true: symbol interposition would not be the answer to this problem. Current idea: hook in at XRE_main and, if multithreaded there, disable anything that cares about singlethreadedness.
This patch: on desktop, if we're multithreaded in XRE_main, set a special SandboxInfo flag and disable features that need threading; check for that in SandboxEarlyInit. Also checks for it in the gtests, so that (hopefully?) we don't start getting false positives everywhere because of a Gecko change. Filed bug 1229136 about the build config issues I ran into. I'd like to get this uplifted, so I'm trying to keep this patch simple, and I just copypasted IsSingleThreaded(). Tested this with the nVidia drivers in question, and a simple LD_PRELOAD library that just creates a thread.
Attachment #8693922 - Flags: review?(gdestuynder)
Comment on attachment 8693922 [details] [diff] [review] Patch: check for excess threads in XRE_main. Review of attachment 8693922 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/sandbox/linux/Sandbox.cpp @@ +527,2 @@ > const SandboxInfo info = SandboxInfo::Get(); > + if (info.Test(SandboxInfo::kUnexpectedThreads)) { is this potentially dangerous/disabling sandbox in cases where the env variable is set unexpectedly? (pun not intended) ::: security/sandbox/linux/common/SandboxInfo.cpp @@ +245,5 @@ > + " nVidia GL: that's not necessary for Gecko.)" : ""); > + > + // Propagate this information for use by child processes. (setenv > + // isn't thread-safe, but other threads are from non-Gecko code so > + // they wouldn't be using NSPR; we have to hope for the best.) theres a few opportunities for race conditions here and in the fs-based thread check (if we are spawning a thread at the same time for ex) - however worst case scenario seems to be we crash because we didn't detect the unexpected threads, and the scenarios would be involving a lot of bad luck. so i guess its ok ;-)
Attachment #8693922 - Flags: review?(gdestuynder) → review+
(In reply to Guillaume Destuynder [:kang] from comment #9) > ::: security/sandbox/linux/Sandbox.cpp > @@ +527,2 @@ > > const SandboxInfo info = SandboxInfo::Get(); > > + if (info.Test(SandboxInfo::kUnexpectedThreads)) { > > is this potentially dangerous/disabling sandbox in cases where the env > variable is set unexpectedly? (pun not intended) I'm not sure what the failure mode here is? We more or less trust the environment; there's already MOZ_DISABLE_CONTENT_SANDBOX / MOZ_DISABLE_GMP_SANDBOX and a few others. SandboxEarlyInit is called from main(), so it's after the SandboxInfo initializer has run and checked it and set the flags; I could assert or comment that dependency if that's the concern. > ::: security/sandbox/linux/common/SandboxInfo.cpp > @@ +245,5 @@ > > + " nVidia GL: that's not necessary for Gecko.)" : ""); > > + > > + // Propagate this information for use by child processes. (setenv > > + // isn't thread-safe, but other threads are from non-Gecko code so > > + // they wouldn't be using NSPR; we have to hope for the best.) > > theres a few opportunities for race conditions here and in the fs-based > thread check (if we are spawning a thread at the same time for ex) - however > worst case scenario seems to be we crash because we didn't detect the > unexpected threads, and the scenarios would be involving a lot of bad luck. > so i guess its ok ;-) The stat() should be safe — if it's possible for anything to happen in the process at the same time, then the process is already multithreaded, so the thread count will be ≥2 at any point there. (Or, put another way: if there's a point when the stat() could read the thread count as 1, then it will remain 1 until something on the thread calling stat() creates a thread. I'm assuming the kernel won't underestimate the thread count, but I know that threads that are already in the exit syscall can still appear in procfs, so that's probably safe.) I'm not seeing what else there is thread-unsafe besides the getenv()s — but, yes, the worst case should just be failing the assertion that's already failing.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
(Note to self: request uplift once this has had a little time on m-c.)
Flags: needinfo?(jld)
Comment on attachment 8693922 [details] [diff] [review] Patch: check for excess threads in XRE_main. Approval Request Comment [Feature/regressing bug #]: bug 1151607 [User impact if declined]: Startup crash on Linux systems using nVidia graphics drivers with multithreaded optimization enabled according to nVidia's instructions. [Describe test coverage new/current, TreeHerder]: No regression tests for this; media mochitests and B2G tests use sandboxing and give some coverage; has been stable on m-c for most of a month now. [Risks and why]: Relatively low, even though this patch has more “moving parts” than I'd like. The effect on non-sandboxing uses is to weaken an assertion, so shouldn't make things worse. The effect on sandboxing is more worrying because it affects feature detection, but the feature it can disable is currently used only on a best-effort basis and only for desktop media plugins (OpenH264, EME). [String/UUID change made/needed]: None.
Flags: needinfo?(jld)
Attachment #8693922 - Flags: approval-mozilla-aurora?
Rebased for beta — there are some context changes. Carrying over r+. See previous comment for approval request info.
Attachment #8704845 - Flags: review+
Attachment #8704845 - Flags: approval-mozilla-beta?
Rinat, could you please verify this issue is fixed as expected on a latest Nightly build? Thanks!
Flags: needinfo?(ibragimovrinat)
Comment on attachment 8693922 [details] [diff] [review] Patch: check for excess threads in XRE_main. Taking it to improve the testing coverage and having it asap in aurora.
Attachment #8693922 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Milan, I am having trouble deciding whether this is a "low risk+must fix" for 44 or not. Would you be able to comment on how severe this startup crash can be and whether we should be taking it to Beta44? As you may know from now until 44 goes live, the uplift criteria includes critical regressions, sec and stability fixes. I know this meets the stability fix criteria but how risky is it to take this patch?
Flags: needinfo?(milan)
Not really graphics, way outside my domain, so I can only judge it based on the comments in here. Combine that with not getting crash reports for this one way or another, and I presume we don't have telemetry telling us that people have chosen this option, I'd say this is OK, but it will be difficult for us to prove it. For what it's worth, I'd take it :)
Flags: needinfo?(milan)
Flags: needinfo?(milan)
Comment on attachment 8704845 [details] [diff] [review] bug1222500-for-beta-hg0.diff I discussed this with Aaron as well and given that this crash only manifests itself when the environment variable is set (which I assume is not our default mode), the likelihood of running into this crash should be very low. Also, this is not linked to a top-crasher and given the medium risk associated, I have decided not to take this to Beta44.
Attachment #8704845 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
(In reply to Ritu Kothari (:ritu) from comment #17) > Rinat, could you please verify this issue is fixed as expected on a latest > Nightly build? Thanks! Hi. Sorry for the delay. I've tried Nightly just now (Linux x86_64) and can confirm that issue is fixed. Also tried Firefox 43 to ensure I have a right setup. Assertion triggers there as expected, but Nightly works fine. So, definitely fixed.
Flags: needinfo?(ibragimovrinat)
Status: RESOLVED → VERIFIED
Flags: needinfo?(milan)
See Also: → 1401062
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: