Closed Bug 1565972 Opened 5 years ago Closed 5 years ago

Firefox hangs with Xephyr, both running inside an LXC container when sysctl kernel.unprivileged_userns_clone=1

Categories

(Core :: Security: Process Sandboxing, defect, P1)

60 Branch
Unspecified
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1559368
Tracking Status
firefox-esr60 --- wontfix
firefox-esr68 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- fixed

People

(Reporter: francois.lesueur, Assigned: jld)

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0

Steps to reproduce:

  1. Install a Debian stretch (tested in a Virtualbox VM and on my bare-metal host)

  2. Install LXC

  3. Activate network in LXC : USE_LXC_BRIDGE="true" in /etc/default/lxc-net

  4. Create a debian stretch or buster container : lxc-create -n test -t download -- -d debian -r stretch -a amd64

  5. Add network and X11 to the container. At the end of /var/lib/lxc/debian/config, remove the existing lxc.network line and add :
    lxc.network.0.type = veth
    lxc.network.0.link = lxcbr0
    lxc.network.0.flags = up
    lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none ro,bind,create=dir 0 0

  6. Log into the container : lxc-start -n debian && lxc-attach -n debian

  7. Install Xephyr and Firefox : apt install xserver-xephyr firefox-esr

  8. run "sysctl kernel.unprivileged_userns_clone=1"

  9. Create an unprivileged user to start Firefox

  10. su - to this unprivileged user

  11. run "DISPLAY=:0 Xephyr :2 &"

  12. run "DISPLAY=:2 firefox"

Actual results:

Firefox runs but cannot render any tab. Console outputs lots of errors :
###!!! [Parent][MessageChannel] Error: (msgtype=0x160061,name=PBrowser::Msg_UpdateDimensions) Channel error: cannot send/recv

###!!! [Parent][MessageChannel] Error: (msgtype=0x160080,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv

Unable to init server: Could not connect: Connection refused

Switching of multiprocess (autoremote in about:config) of setting sysctl userns_clone to 0 solves the issue on stretch. With a Debian Buster host, even setting userns_clone to 0 does not solve the issue.

I have tested with both firefox-esr from debian repo and from the up-to-date tar.bz2 on mozilla website, the behavior is similar.

Expected results:

Firefox should render tabs

The STR and environment for this bug seem a bit too convoluted for me to try to reproduce, hence I'll try first with the easy short path triaging it to Core::GTK and maybe NI Martin, maybe he has a better idea of a start-up component.

Component: Untriaged → Widget: Gtk
Flags: needinfo?(stransky)
Product: Firefox → Core

I'm afraid this is out of my scope. Looks like some IPC issues. Does it work if you disable e10s by setting browser.tabs.remote.autostart to false at about:config?

Flags: needinfo?(stransky) → needinfo?(francois.lesueur)

Yes, it works when I disable browser.tabs.remote.autostart (setting it to false)

Cheers
Francois

Flags: needinfo?(francois.lesueur)

(but in my case, this does not solve my issue since I auto-provision LXC containers, which then run with firefox default config, and I'm thus unable to automatically disable e10s)

Okay, moving to IPC then.

Component: Widget: Gtk → IPC

Looks like sandboxing (kernel.unprivileged_userns_clone) and I'm guessing this a case of the X11 socket detection code doing not quite the right thing, and possibly a duplicate of bug 1559368 (but with Xephyr instead of Xwayland) given the comment about running Firefox as a different unprivileged user from the X server.

Can you paste the output of ls -l /tmp/.X11-unix?

Component: IPC → Security: Process Sandboxing
Flags: needinfo?(francois.lesueur)
OS: Unspecified → Linux

Hi

Inside the container :

ls -l /tmp/.X11-unix

total 0
srwxrwxrwx 1 root root 0 juil. 10 22:19 X0

On the host :
$ ls -l /tmp/.X11-unix
total 0
srwxrwxrwx 1 root root 0 juil. 10 22:19 X0

Bug 1559368 seems definitely related. During my testing, I found the same behavior at some point (firefox crashing with the exact same screen as 1559368, and also crashing at the restart in the same way)

/tmp/.X11-unix is bind-mounted inside the container in read-only, so the Xephyr running inside the container cannot create its socket there. I do not know where this Xephyr socket is ??? I have this line in netstat -alnp, inside the container :

netstat -alnp | grep Xephyr

unix 2 [ ACC ] STREAM LISTENING 9455950 509/Xephyr @/tmp/.X11-unix/X15

But I can't "ls" this @/tmp/.X11-unix/X15 ...

Cheers
François

Flags: needinfo?(francois.lesueur)

I forgot to mention, to be more precise : Xephyr creates this :15 display and Firefox is running on this :15 display

Hi,

From the comment of the X11 socket detection code, I think you're right !

In my case :

  • I have a /tmp/.X11-unix in the container (it is bind mounted from the host). The assumption that this directory does not exist in a container (exemple for a snap in the comment) is thus false
  • The listening socket :15 is an abstract socket address (the @ at the beginning) and, still from the comment, this display should be considered remote then

Cheers,
Francois

Yes, that's an abstract address; that @ is how netstat prints a null byte (a.k.a. \0 or ^@).

So this looks like we'll need to refine that check to look for the actual socket. I'll take this, because I have an idea for how to do this without too much complication.

As a temporary workaround, setting MOZ_ASSUME_USER_NS=0 in the environment should work, similarly to turning off kernel.unprivileged_userns_clone.

Assignee: nobody → jld
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

HI,

Thanks ! The workaround MOZ_ASSUME_USER_NS=0 allows my setup to work as expected.

Let me know if you need some more tests when you'll refine the check.

Cheers,
Francois

Priority: -- → P1
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.