Open Bug 1129492 Opened 6 years ago Updated 10 days ago

Firefox content process has a live connection to the X11 server.

Categories

(Core :: Security: Process Sandboxing, defect, P3)

x86_64
Linux
defect

Tracking

()

Fission Milestone M6c
Tracking Status
firefox-esr68 --- wontfix
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- wontfix

People

(Reporter: jld, Assigned: gcp)

References

(Depends on 4 open bugs, Blocks 2 open bugs)

Details

(Whiteboard: sblc4)

Attachments

(1 file)

See also bug 582057 comment #13 for historical context.  We basically can't have a meaningful sandbox for content processes on X11 platforms as long as that file descriptor is there.

Chromium doesn't do this — indeed, it doesn't connect to the X11 server from the renderer process in the first place — so it is possible.  The question is how much work would be needed.
I've used conditional breakpoints to find where we're using the X11 socket, and I'm attaching a selection of stack traces.  It seems to be heavily used in layers/graphics code, and also polled for events in nsAppShell::ProcessNextNativeEvent.
Hi Milan, any idea who knows enough about our linux gfx plumbing to say what is needed here? If this is going to be a big chunk of work it would be good to know about it ahead of time!
Flags: needinfo?(milan)
Big chunk of work, but unclear it's the work that we would want to do anyway.  CC-ing Jeff, he wanted to have the conversation about "why is the sandbox meaningless if we have this connection", though we know it won't be as good a sandboxing with it.
Flags: needinfo?(milan)
(In reply to Milan Sreckovic [:milan] from comment #3)
> CC-ing Jeff, he wanted to have the conversation about "why is the
> sandbox meaningless if we have this connection", though we know it won't be
> as good a sandboxing with it.

A connection to the X server can be used to read from or write to any window/drawable, inject input events, and poll the input devices.  I can see the argument that this would still raise the bar for meaningful exploitation as compared to control of an unrestricted process, but at the same it seems wrong to describe something as “sandboxed” if it can be turned into a clandestine remote access tool and keylogger.
(In reply to Jed Davis [:jld] from comment #4)
> (In reply to Milan Sreckovic [:milan] from comment #3)
> > CC-ing Jeff, he wanted to have the conversation about "why is the
> > sandbox meaningless if we have this connection", though we know it won't be
> > as good a sandboxing with it.
> 
> A connection to the X server can be used to read from or write to any
> window/drawable, inject input events, and poll the input devices.  I can see
> the argument that this would still raise the bar for meaningful exploitation
> as compared to control of an unrestricted process, but at the same it seems
> wrong to describe something as “sandboxed” if it can be turned into a
> clandestine remote access tool and keylogger.

Switching from using XRender for rendering is a requirement for this bug. It is currently planned but not really scheduled.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #5)
> (In reply to Jed Davis [:jld] from comment #4)
> > (In reply to Milan Sreckovic [:milan] from comment #3)
> > > CC-ing Jeff, he wanted to have the conversation about "why is the
> > > sandbox meaningless if we have this connection", though we know it won't be
> > > as good a sandboxing with it.
> > 
> > A connection to the X server can be used to read from or write to any
> > window/drawable, inject input events, and poll the input devices.  I can see
> > the argument that this would still raise the bar for meaningful exploitation
> > as compared to control of an unrestricted process, but at the same it seems
> > wrong to describe something as “sandboxed” if it can be turned into a
> > clandestine remote access tool and keylogger.
> 
> Switching from using XRender for rendering is a requirement for this bug. It
> is currently planned but not really scheduled.

Further, we have the problem of GLX which we don't really have any plan to stop using, nor is there much of a replacement for it beyond using Wayland.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)
> Further, we have the problem of GLX which we don't really have any plan to
> stop using, nor is there much of a replacement for it beyond using Wayland.

For reference: Chromium on Linux also uses GLX, but the GL drivers and the X11 connection are in the GPU process, so a renderer process can't directly access them.
Noting for future reference: The MIT-SHM extension might also be a problem, because SysV shared memory follows Unix's “same uid policy” and can't be restricted/brokered like file access.  (It was observed when the initial attempt at a desktop content system call whitelist was made, but that was long enough ago that there could have been significant changes to how graphics work that might make this not a problem, so this should be double-checked.)  There's a not-well-specified revision to use memory-mapped files (http://patchwork.freedesktop.org/patch/15082/) but I don't know what would need to happen to make it work — Ubuntu 14.04 has a new enough X server and should (I think?) have new enough libraries, but X clients still empirically use SysV (including the Firefox parent process).
This should probably depend on bug 1180942
Whiteboard: sb+
Related work on exploiting a sandbox that allows X11 access: https://mjg59.dreamwidth.org/42320.html
Moving to sblc4 which deals with this issue.
Whiteboard: sb+ → sblc4
Component: Widget: Gtk → Security: Process Sandboxing
It would be worth looking into the Xorg sandboxing support (for servers compiled with --enable-xcsecurity) described here https://notehub.org/rp5n2.
(In reply to Haik Aftandilian [:haik] from comment #12)
> It would be worth looking into the Xorg sandboxing support (for servers
> compiled with --enable-xcsecurity) described here https://notehub.org/rp5n2.

I tested this, and as Jed already suspected, it kills performance because GLX and OpenGL no longer work:
GLXtest process failed (exited with status 1): GLX extension missing.

This has a fairly extreme effect on performance for me.

There are also some undesirable side effects: I don't seem to be able to copy/paste from the untrusted process.
See Also: → 1326361
Assignee: nobody → gpascutto
I tried using gdb to see what we're sending to the X server from the content process after sandbox startup.  If WebGL isn't used, the only thing I saw was, whenever a native widget is rendered in a new state, a GetInputFocus request sent by XSync() from this gdk_flush() call: https://searchfox.org/mozilla-central/rev/152c0296f8a10f81185ee88dfb4114ec3882b4c6/widget/gtk/nsNativeThemeGTK.cpp#1217

However, there doesn't appear to be anything that needs to be flushed/synchronized; that GetInputFocus message is the only thing in the buffer being sent to the server.  We seem to be doing the actual rendering to memory and handling it through our own IPC and compositor, which is good.  But it's not clear whether this is something we can depend on, and (if so) in what case that would hold; there's a lot of code that's being skipped due to conditionals like https://searchfox.org/mozilla-central/rev/152c0296f8a10f81185ee88dfb4114ec3882b4c6/gfx/2d/DrawTargetCairo.cpp#2329

It would be interesting, if WebGL is disabled or preffed off, to be able to just close the X11 connection or not even open it in the first place.  Failing that, if GLX is the only thing we need to deal with, then that could simplify the problem of proxying X11.

Another useful piece of information I've been told is that we don't ever use indirect GLX, because WebGL needs a newer version of OpenGL than that supports, so what we'd actually be sending over X11 is just for setting up out-of-band direct rendering, which could also make proxying a simpler problem.
There's also gfxPlatformGtk::GetPlatformCMSOutputProfile, which looks to be relatively easy to handle if needed (probably by sending down from the parent while setting up the content process): https://searchfox.org/mozilla-central/rev/5bb6dfcb3355a912c20383578bc435c1d4ecf575/gfx/thebes/gfxPlatformGtk.cpp#421

And this, which seems to be something about fonts and subpixel rendering: https://searchfox.org/mozilla-central/rev/5bb6dfcb3355a912c20383578bc435c1d4ecf575/gfx/thebes/gfxFcPlatformFontList.cpp#799
See Also: → 1376559
Priority: -- → P3
See Also: → 1503054

It seems like this is coming up again due to Fission. With the high number of content processes, we run the risk of encountering the 256 Xorg connection limit, which can cause crashes. :jesup has apparently run into this multiple times while dogfooding fission with high tab counts.

What remaining features are currently requiring that we create X connections in every content process? If it's only needed for WebGL, for example, it might be nice to delay creating the Xorg connection until a webpage requests it.

Flags: needinfo?(jmuizelaar)

Bug 788319 may help get rid of GLX, if needed.

WebGL can also potentially be remoted if need be. Does anyone know what's currently making X connections?

Flags: needinfo?(jmuizelaar) → needinfo?(jld)

WebGL is most of it and there's work in progress to remote that (see bug 1621762).

We're also initializing GTK in the content process, but we don't use it to draw native widgets directly; as I understand it they're drawn to shared memory and composited. There are still some unnecessary calls to XSync when that happens, but otherwise normal non-WebGL operation doesn't seem to actually talk to the X server. However, starting content processes in headless mode means no widgets are drawn (most noticeably scroll bars), and the last time I tried that, enabling :spohl's non-native widgets with widget.disable-native-theme-for-content didn't help. And I don't currently know enough about graphics to say what might be going on there or what it might take to fix it.

Flags: needinfo?(jld)
Fission Milestone: --- → ?

Perhaps we can just switch to calling gtk-parse-args instead of gtk_init() to avoid the display connection?

I tested MOZ_HEADLESS again, using the same patch as my previous attempt (just the existing headless mode, no gtk_parse_args), and it seems to work now — scrollbars are drawn (same as non-native + non-headless), and form controls seem to work. Headless content + native GTK widgets continues to not work, but that's expected. And WebGL is broken, but that's also expected.

I confirmed the absence of X connections with a patch to log child processes' sockets before sandbox startup. I also observed that we normally make two connections to the X server per content process (a finding also reported in bug 1635451): one from gtk_init called by ContentChild::Init, and the other from XOpenDisplay called by gfxPlatform::Init; both are held open for the life of the process.

I haven't tested extensively, but if there aren't other regressions, this seems useful for use cases that don't need (or actively don't want) WebGL, like Tor Browser.

So I did some bisecting, and headless non-native widgets appear to have been fixed in bug 1619664 (specifically the “Decide which theme to use per document” patch), then regressed in bug 1615026, then fixed again in bug 1620246.

The second broken range does look like a coordinate system problem: scroll bars are blank but occupy the expected size, and form controls are displaced and partly cut off by their bounding boxes.

With the original breakage, the absent scroll bars are the narrower GTK width… because it was ignoring the pref and using GTK in the headless case, until bug 1619664 reordered the conditionals. So non-native widgets have probably always worked in headless-content mode.

See Also: → 1636493

Tracking for Fission Nightly milestone M6c because this bug causes crashes with many tabs. This problem affects all Linux users, but Fission makes it worse because we'll have more processes.

The fix is probably a lot of work so we should start soon for Fission.

Severity: normal → S2
Fission Milestone: ? → M6c
See Also: → 1636555

(In reply to Chris Peterson [:cpeterson] from comment #23)

Tracking for Fission Nightly milestone M6c because this bug causes crashes with many tabs. This problem affects all Linux users, but Fission makes it worse because we'll have more processes.

The fix is probably a lot of work so we should start soon for Fission.

There are two problems here:

  • Remove X11 connections from content processes in all cases, for security, which is this bug
  • Avoid X11 connections in the common case, to avoid exhausting resource limits, which is bug 1635451

This bug has two main blockers at this point:

  • WebGL, which needs to be either remoted (bug 1607940) or changed use DRI directly of using X (bug 788319, if I understand it correctly)
  • Flash, which will be removed eventually but probably not before Fission is intended to ship

The first one of those has two potential solutions actively being worked on; I don't know how close either one is to being shippable. The second is more difficult; the NPAPI code has a number of dependencies on the content process being an X client (search nsPluginInstanceOwner.cpp for uses of Display* and Window for some examples; see also bug 1548475) and I have no idea how feasible it would be to change that.

If Fission Nightly is intended to ship before we remove support for Flash, then it will likely be necessary to figure out a smaller-scoped solution for bug 1635451.

Depends on: remove-flash

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #24)

There are two problems here:

  • Remove X11 connections from content processes in all cases, for security, which is this bug
  • Avoid X11 connections in the common case, to avoid exhausting resource limits, which is bug 1635451

Sounds like this bug might not need to block Fission Nightly, but bug 1635451 (avoiding X11 in the common case) would be good to fix before Nightly. I'll move this bug to Fission milestone M7 (Beta) so we continue to track this work without blocking Fission Nightly.

  • Flash, which will be removed eventually but probably not before Fission is intended to ship
    ...
    If Fission Nightly is intended to ship before we remove support for Flash, then it will likely be necessary to figure out a smaller-scoped solution for bug 1635451.

We currently plan to remove Flash support in Firefox 84 (in Nightly in October, Release in December). If Fission is indeed ready to roll out to (some percentage of) Nightly users before October, we can disable Flash for those Fission users even before Flash EOL in Fx84. Flash is going away soon and we don't want any extra Flash work to delay Fission development or rollout.

Fission Milestone: M6c → M7

As :cpeterson mentioned, we explicitly have no intention of supporting Flash with Fission enabled, so it's OK to land code which breaks NPAPI when Fission is enabled.

I've been told there are plans to implement WebGL remoting in Firefox 80 (July) or 81 (August). Tracking for Fission Nightly M6c.

Fission Milestone: M7 → M6c
Depends on: 1638466
Depends on: 1640345
Depends on: 1642463

To help alleviate this bug, we only need to disable Flash for Fission Linux users (bug 1642463), not all users on all platforms (bug 1455897).

No longer depends on: remove-flash
Depends on: linux-nnt
Depends on: remove-flash
No longer depends on: 1642463
Blocks: fission
You need to log in before you can comment on or make changes to this bug.