Closed Bug 1376559 Opened 7 years ago Closed 5 years ago

Fx 52 content process crashes on startup on RHEL 7 when run with --no-remote over ssh

Categories

(Core :: DOM: Content Processes, defect, P3)

52 Branch
All
Linux
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: u581815, Unassigned)

References

Details

(Keywords: crash)

Attachments

(1 file)

Passing along this report from Robert Knight <knight@princeton.edu> 

> If one connects to a RHEL 7 system (with ssh -X or -Y) and invokes firefox with the --no-remote option, you get a "Gah, Your Panel crashed" or some such message.

> The fix is to change (in about:config)

> browser.tabs.remote.autostart.2

> from true (the default) to false.

In a followup email, he said he's running firefox-52.2.0-1.el7_3.x86_64 and will be available next week for additional questions.
Can they open about:crashes in a new tab and link to a crash report?

Would also be worth checking the same thing happens on a mozilla.org esr build (I think you'd have to manually enable e10s), and/or if they can reproduce on a newer build.

Do we know why RHEL are enabling e10s on esr? My understanding is that it's disabled on our esr builds... Ryan, can you confirm?
Component: General → Tabbed Browser
Flags: needinfo?(ryanvm)
Flags: needinfo?(gguthe)
Summary: Fx 52 crashes on startup on RHEL 7 when run with --no-remote over ssh → Fx 52 content process crashes on startup on RHEL 7 when run with --no-remote over ssh
We have e10s enabled by default on ESR52 (subject to the eligibility rules set in the e10s rollout system addon). I would assume this is a sandboxing issue offhand.
Flags: needinfo?(ryanvm)
I'm unable to open an about:crashes.  It will let me create a new tab without crashing, but entering about:crashes into the URL bar and hitting return just reproduces the error in the first tab.
Jim, who can help figure out what is going on here?
Flags: needinfo?(jmathies)
Flags: needinfo?(gguthe)
Jed/gcp, do you have clues, given comment #2 suggesting this is to do with sandboxing?
Flags: needinfo?(jld)
Flags: needinfo?(gpascutto)
If you set |security.sandbox.content.level| to |0| in |about:config|, does that stop this issue? That'll help definitively answer if this is sandbox related.
Flags: needinfo?(knight)
The 52 ESR branch of our sources only enables content sandboxing for Nightly builds:
https://dxr.mozilla.org/mozilla-esr52/source/old-configure.in#3961

We didn't generally enable sandboxing on Linux until Firefox 54 IIRC, and that has several important fixes.
Flags: needinfo?(jmathies)
Flags: needinfo?(gpascutto)
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #6)
> If you set |security.sandbox.content.level| to |0| in |about:config|, does
> that stop this issue? That'll help definitively answer if this is sandbox
> related.

Entering "security.sandbox" in the about:config search box produces no results at all.  Just "security." produces quite a few.
Flags: needinfo?(knight)
(In reply to Robert Knight from comment #3)
> I'm unable to open an about:crashes.  It will let me create a new tab
> without crashing, but entering about:crashes into the URL bar and hitting
> return just reproduces the error in the first tab.

Presumably you can open about:crashes after turning off e10s, right?

(In reply to Greg Guthe [:g-k] from comment #0)
> > If one connects to a RHEL 7 system (with ssh -X or -Y) and invokes firefox with the --no-remote option, you get a "Gah, Your Panel crashed" or some such message.
> 
> > The fix is to change (in about:config)
> 
> > browser.tabs.remote.autostart.2
> 
> > from true (the default) to false.
(In reply to Robert Knight from comment #8)
> Created attachment 8883549 [details]
> stderr from failing "firefox --no-remote" command line invocation.

Thanks. Unfortunately this only has errors from the parent process, not the child that crashed, so it doesn't reveal much.

>Entering "security.sandbox" in the about:config search box produces no results at all.  Just "security." produces quite a few.

I think this is consistent with comment 7, i.e. ESR 52 has no sandboxing.
Flags: needinfo?(jld)
(In reply to :Gijs from comment #10)
> (In reply to Robert Knight from comment #3)
> > I'm unable to open an about:crashes.  It will let me create a new tab
> > without crashing, but entering about:crashes into the URL bar and hitting
> > return just reproduces the error in the first tab.
> 
> Presumably you can open about:crashes after turning off e10s, right?
> 
> (In reply to Greg Guthe [:g-k] from comment #0)
> > > If one connects to a RHEL 7 system (with ssh -X or -Y) and invokes firefox with the --no-remote option, you get a "Gah, Your Panel crashed" or some such message.
> > 
> > > The fix is to change (in about:config)
> > 
> > > browser.tabs.remote.autostart.2
> > 
> > > from true (the default) to false.

Probably, if I knew how to turn off e10s.  This crash is my first contact with that term.  

Which of the e10s keys should be turned off?  And if that would still produce information, I can just use it in one of the 10 or so browsers that have crashed that I have not turned browser.tabs.remote.autostart.2 to false?
Sadly, in the RHEL 7 builds that I'm using, Firefox reports "The address isn't valid" when I try about:crashes.
(In reply to Robert Knight from comment #12)
> (In reply to :Gijs from comment #10)
> > (In reply to Robert Knight from comment #3)
> > > I'm unable to open an about:crashes.  It will let me create a new tab
> > > without crashing, but entering about:crashes into the URL bar and hitting
> > > return just reproduces the error in the first tab.
> > 
> > Presumably you can open about:crashes after turning off e10s, right?
> > 
> > (In reply to Greg Guthe [:g-k] from comment #0)
> > > > If one connects to a RHEL 7 system (with ssh -X or -Y) and invokes firefox with the --no-remote option, you get a "Gah, Your Panel crashed" or some such message.
> > > 
> > > > The fix is to change (in about:config)
> > > 
> > > > browser.tabs.remote.autostart.2
> > > 
> > > > from true (the default) to false.
> 
> Probably, if I knew how to turn off e10s.  This crash is my first contact
> with that term.  

Sorry, it's a shorthand for our parent/content process separation, and controlled by the preference mentioned (browser.tabs.remote.autostart.2). Greg relayed in comment #0 that this makes the crashes go away - is that right? (It would probably take a restart of the browser after changing that preference)

If so, then about:crashes should show content process crashes from before you changed the pref in the same crashy browser. So if you crash, change the preference, restart, open about:crashes, it should have a link with a recent date/time that will go to a page on https://crash-stats.mozilla.com/ . A link to the relevant report will hopefully help suss out what is going on.

(In reply to Robert Knight from comment #13)
> Sadly, in the RHEL 7 builds that I'm using, Firefox reports "The address
> isn't valid" when I try about:crashes.

It sounds like RHEL disable the crashreporter on their builds... if not then you might need to use a mozilla.org build to crash (assuming those do also crash?), then disable and check about:crashes, to get a useful link to a crash report. :-(

I'm unsure how one would go about getting a crash dump with symbols for the content process with the RHEL build. Maybe someone else can chime in on that.
(In reply to :Gijs from comment #14)
> (In reply to Robert Knight from comment #12)
> > (In reply to :Gijs from comment #10)
> > > (In reply to Robert Knight from comment #3)
> > > > I'm unable to open an about:crashes.  It will let me create a new tab
> > > > without crashing, but entering about:crashes into the URL bar and hitting
> > > > return just reproduces the error in the first tab.
> > > 
> > > Presumably you can open about:crashes after turning off e10s, right?
> > > 
> > > (In reply to Greg Guthe [:g-k] from comment #0)
> > > > > If one connects to a RHEL 7 system (with ssh -X or -Y) and invokes firefox with the --no-remote option, you get a "Gah, Your Panel crashed" or some such message.
> > > > 
> > > > > The fix is to change (in about:config)
> > > > 
> > > > > browser.tabs.remote.autostart.2
> > > > 
> > > > > from true (the default) to false.
> > 
> > Probably, if I knew how to turn off e10s.  This crash is my first contact
> > with that term.  
> 
> Sorry, it's a shorthand for our parent/content process separation, and
> controlled by the preference mentioned (browser.tabs.remote.autostart.2).
> Greg relayed in comment #0 that this makes the crashes go away - is that
> right? (It would probably take a restart of the browser after changing that
> preference)
Yes, that's right.

> 
> If so, then about:crashes should show content process crashes from before
> you changed the pref in the same crashy browser. So if you crash, change the
> preference, restart, open about:crashes, it should have a link with a recent
> date/time that will go to a page on https://crash-stats.mozilla.com/ . A
> link to the relevant report will hopefully help suss out what is going on.
> 
> (In reply to Robert Knight from comment #13)
> > Sadly, in the RHEL 7 builds that I'm using, Firefox reports "The address
> > isn't valid" when I try about:crashes.
> 
> It sounds like RHEL disable the crashreporter on their builds... if not then
> you might need to use a mozilla.org build to crash (assuming those do also
> crash?), then disable and check about:crashes, to get a useful link to a
> crash report. :-(
> 
> I'm unsure how one would go about getting a crash dump with symbols for the
> content process with the RHEL build. Maybe someone else can chime in on that.

Shall I wait for that response or go ahead with installing  mozilla.org build?  I have perhaps another hour today to work on this.
Flags: needinfo?(gijskruitbosch+bugs)
Jan, Martin, do you think you could help Mozilla to find out what's going on? IIUC it seems specific to RHEL 7.
I can report that 54.0.1 (the mozilla version default when I just download) does not crash.  I'm off to try to find the 52.2.0 version.
(In reply to Robert Knight from comment #17)
> I can report that 54.0.1 (the mozilla version default when I just download)
> does not crash.  I'm off to try to find the 52.2.0 version.

https://download.mozilla.org/?product=firefox-52.2.1esr-SSL&os=linux64&lang=en-US should work (from https://www.mozilla.org/en-US/firefox/organizations/all/ )

Checking if it reproduces with the official build will be helpful either way, I expect. :-)
Flags: needinfo?(gijskruitbosch+bugs)
Using my local Fedora 25 machine, I used "ssh -Y" to connect to a RHEL 7 VM with the firefox 52.2 package installed, I have the same version as reported in comment 0, and then I ran "firefox -no-remote".

I was able to reproduce the "tab crashed" error.

In parallel, on the screen of that VM, I saw an selinux alert:
"SELinux is preventing /usr/lib64/firefox/plugin-container from name_connect access on the tcp_socket port 6010."

For testing purposes, I did something REALLY insecure, and executed "setenforce 0" as root on that VM, and afterwards, firefox worked.

So apparently, Firefox needs more access than the current selinux policy allows it do it. If that access is considered reasonable, might require additional selinux whitelisting rules to be shipped on RHEL 7.

I assume that Martin and Jan know whom to talk to for firefox permission, so I'll set needinfo on them. This might be a Linux distribution issue, not a Firefox code issue, but let's wait what they say.
Flags: needinfo?(stransky)
gcp in the IRC channel just posted a link to
https://bugzilla.redhat.com/show_bug.cgi?id=1188290
I confirm that making that SELinux change does prevent the crash.
Note that command was really insecure, and I don't recommend to use it on a real system.
You should probably go back to "setenforce 1".

Another bug that gcp discovered seems to have a more restricted solution, could you try if that works for you?
https://bugzilla.redhat.com/show_bug.cgi?id=1188290#c1

setsebool -P mozilla_plugin_can_network_connect 1
This is the only one that I tried.  (The second one.)
(In reply to Kai Engert (:kaie:) from comment #19)
> In parallel, on the screen of that VM, I saw an selinux alert:
> "SELinux is preventing /usr/lib64/firefox/plugin-container from name_connect
> access on the tcp_socket port 6010."

Content processes using X11 directly is a known problem, and one that we'll have to deal with eventually for sandboxing; see bug 1129492.
Presumably, RHEL policies already have an exception or permission that allows plugin-container to connect to local unix domain sockets (which is what is normally used for that X11 connection), but when using X11 forwarding the connection goes over TCP, which is blocked.

That should be unblocked. RHEL engineers can watch bug 1129492 - if that is fixed, those SELinux rules could be removed again and tightened.

Also, in the case of SSH forwarding that TCP connection to port 6010 would be a local connection that is forwared by SSH, right? Then the permission could be restricted to host-local TCP from plugin-container.
Since downstream bug report [1] was closed as not a bug, I doubt the selinux guys are willing to add an exception for this one. And it looks like the Firefox over ssh is not used by masses. It's good to have it documented somewhere though.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1129492
Flags: needinfo?(stransky)
(In reply to Jan Horak from comment #26)
> Since downstream bug report [1] was closed as not a bug, I doubt the selinux
> guys are willing to add an exception for this one.

From reading that bug, it's not clear to me they understood that plugin-container is actually the Firefox content process, instead of a plugin. (Yes, the name is weird, due to legacy reasons - I'm not sure what the state of renaming it to something sensible is.)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #27)
> (In reply to Jan Horak from comment #26)
> > Since downstream bug report [1] was closed as not a bug, I doubt the selinux
> > guys are willing to add an exception for this one.
> 
> From reading that bug, it's not clear to me they understood that
> plugin-container is actually the Firefox content process, instead of a
> plugin. (Yes, the name is weird, due to legacy reasons - I'm not sure what
> the state of renaming it to something sensible is.)
Can we assure that plugin-container process contains only safe code (ie Firefox one)? In RHEL we're using Firefox 52 ESR which still have support for the NPAPI plugins. IIRC they also share the same process name (plugin-container).
(In reply to Gian-Carlo Pascutto [:gcp] from comment #27)
> From reading that bug, it's not clear to me they understood that
> plugin-container is actually the Firefox content process, instead of a
> plugin. (Yes, the name is weird, due to legacy reasons - I'm not sure what
> the state of renaming it to something sensible is.)

That's bug 1277968, which landed in 53.  It has a bunch of dependencies, but bug 1313808 seems to be the only one that didn't make it into 52.  It might be possible to backport those two to 52 ESR, but I haven't tried it.
just a note that 'duplicated' bug 
   https://bugzilla.mozilla.org/show_bug.cgi?id=1383141
had as an attachment a SELinux patch which stopped the issue
Component: Tabbed Browser → DOM: Content Processes
Product: Firefox → Core
Priority: -- → P3
OS: Unspecified → Linux
Hardware: Unspecified → All

ESR 52 is past end-of-life, and ESR 60 uses the firefox executable for content processes rather than plugin-container so it shouldn't exhibit this bug.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Keywords: crash
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: