Closed Bug 1326245 Opened 3 years ago Closed 3 years ago

Handling safe mode is completely broken in the child process

Categories

(Toolkit :: Startup and Profile System, defect)

defect
Not set

Tracking

()

VERIFIED FIXED
mozilla54
Tracking Status
firefox51 - wontfix
firefox52 blocking verified
firefox-esr52 --- fixed
firefox53 + verified
firefox54 + verified

People

(Reporter: ehsan, Assigned: mossop)

References

Details

(Keywords: regression, Whiteboard: [fce-active-legacy])

Attachments

(2 files, 1 obsolete file)

The gSafeMode boolean is initialized by looking at either the command line or an environment variable for the restart in safe mode path, but it is never transmitted to the child process, so all consumers of nsIXULAppInfo::GetInSafeMode() in the child process (or any other process than the parent for that matter) will do the wrong thing.

An example of a Web-facing regression caused by this is WebGL no longer being disabled in safe mode in e10s mode due to this code <http://searchfox.org/mozilla-central/rev/51aa673e28802fe6ea89f4793721fc55334a6ac8/dom/canvas/WebGLContext.cpp#965> no longer working.  This is easy to demonstrate for example by visiting <http://webglsamples.org/blob/blob.html> in e10s vs non-e10s mode in safe mode.  The non-e10s window shows you a prompt saying WebGL is disabled, but the e10s window will show nothing and render nothing.

It seems the tracking-e10s flag is no longer a thing.  Needinfoing mrbkap instead of that to make sure this is tracked.
Flags: needinfo?(mrbkap)
See Also: → 1325651
Tracking 51/52/53+ to track whatever we still need to do for safe mode.
What behavior do we actually want here? The reason we disable things like JITs, webgl, etc in safe mode is because they can cause the main process to crash very early, and therefore never check for updates. I don't think we actually need to do any of that in the content process, since that doesn't affect the basic functionality of launching and checking for updates.

So I'm inclined to say this is WORKSFORME or WONTFIX or someth8ing like that.

Although, I don't really understand the bit about webgl rendering nothing. Is that because part of the webgl code is still in the chrome process and therefore still disabled?
Flags: needinfo?(ehsan)
Component: XPCOM → Startup and Profile System
Product: Core → Toolkit
(In reply to Benjamin Smedberg [:bsmedberg] from comment #2)
> What behavior do we actually want here? The reason we disable things like
> JITs, webgl, etc in safe mode is because they can cause the main process to
> crash very early, and therefore never check for updates. I don't think we
> actually need to do any of that in the content process, since that doesn't
> affect the basic functionality of launching and checking for updates.

There are many places in the code base where we check for whether we are in safe mode and do things differently based on that.  If you're wondering whether what we are currently doing makes sense in each specific place in the code, you need to ask the individuals responsible for those places, I don't know about all of those places.

> So I'm inclined to say this is WORKSFORME or WONTFIX or someth8ing like that.

Without going through the list of all of the places where post-e10s we're doing something different, that seems premature to me.

> Although, I don't really understand the bit about webgl rendering nothing.
> Is that because part of the webgl code is still in the chrome process and
> therefore still disabled?

Presumably.  I'm not sure.  Jeff Gilbert is a better person to ask about WebGL.

But note that the WebGL call site is just a random one I chose as an example since it was easy to test.  There's nothing WebGL specific here per se.
Flags: needinfo?(ehsan) → needinfo?(jgilbert)
We shouldn't use this bug to decide whether or not to move WebGL out from under Safe Mode, so I'm in favor of fixing this bug.
If we decide we don't want WebGL under Safe Mode, we can work on that elsewhere.
Flags: needinfo?(jgilbert)
If "safe mode" is a property of the entire application, then yes we should propagate the flag.  I'm asserting that safe mode is not a property of the application, but of the process, in the sense that we want to do safe things in that process so that it doesn't crash and can check for updates.

If we do what you're saying, I assert that we're going to have to go back to each thing that currently checks safe mode and make it check safe-mode+main process, which seems silly.
It for sure seems like there are different levels here.

I think we should clarify in documentation what 'safe mode' means to resolve this bug.
I don't know who "owns" safe mode. Mike, do you know?
Flags: needinfo?(mrbkap) → needinfo?(mconley)
Sorry for the delay - especially since the answer is "I don't know". :/ Maybe Mossop knows?
Flags: needinfo?(mconley) → needinfo?(dtownsend)
(In reply to Mike Conley (:mconley) - Backed up on review / needinfo's from comment #8)
> Sorry for the delay - especially since the answer is "I don't know". :/
> Maybe Mossop knows?

I don't know that there really is an owner. The closest fit is probably Benjamin but I'm going to disagree with his thoughts.

(In reply to Benjamin Smedberg [:bsmedberg] from comment #5)
> If "safe mode" is a property of the entire application, then yes we should
> propagate the flag.  I'm asserting that safe mode is not a property of the
> application, but of the process, in the sense that we want to do safe things
> in that process so that it doesn't crash and can check for updates.

Users see safe mode as a property of the application. They launch Firefox in safe mode and expect it to behave safely. The fact that there are multiple processes isn't surfaced to them at all so all the processes should behave as if in safe mode.

> If we do what you're saying, I assert that we're going to have to go back to
> each thing that currently checks safe mode and make it check safe-mode+main
> process, which seems silly.

I think we just need to propagate gSafeMode to each child process. This will be the least confusing to developers.
Flags: needinfo?(dtownsend)
We should definitely fix this for 52, given that it's ESR, but too late for 51.
Flags: needinfo?(milan)
Bug 1172491 was where we started allowing e10s in safe mode.
See Also: → 1172491
Whiteboard: [fce-active]
Mossop, since you have some helpful sounding analysis, can you help find someone to work on this for 52?
Is there some sort of "ongoing e10s cleanup" team? Jimm, maybe you can help so this doesn't fall through the cracks.
Flags: needinfo?(jmathies)
Flags: needinfo?(dtownsend)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #12)
> Is there some sort of "ongoing e10s cleanup" team?

no, e10s is shipping product. now owned by everyone.

> Jimm, maybe you can help
> so this doesn't fall through the cracks.

it should track and block a release, sounds like we want it to block 52.
Flags: needinfo?(milan)
Flags: needinfo?(jmathies)
accidentally cleared this, re-adding.
Flags: needinfo?(milan)
While we did already ship this issue (so maybe it shouldn't block, I wouldn't like to ship it in ESR 52 if we can help it.
Assignee: nobody → dtownsend
Flags: needinfo?(dtownsend)
Comment on attachment 8834652 [details]
Bug 1326245: Tell the child processes when we're in safe mode.

https://reviewboard.mozilla.org/r/110510/#review111968

::: dom/ipc/ContentChild.cpp:2324
(Diff revision 1)
>    mAppInfo.buildID.Assign(buildID);
>    mAppInfo.name.Assign(name);
>    mAppInfo.UAName.Assign(UAName);
>    mAppInfo.ID.Assign(ID);
>    mAppInfo.vendor.Assign(vendor);
> +  mAppInfo.safeMode = safeMode;

oh, our nsXULAppInfo is silly. Why our code doesn't just assign these process specific values to the global variables defined in nsXULAppInfo.

But ok, you're following the current way to do this.

Want to file a followup bug to make nsXULAppInfo a tad less error prone by storing the data only in one place.
Attachment #8834652 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #17)
> Want to file a followup bug to make nsXULAppInfo a tad less error prone by
> storing the data only in one place.

Filed bug 1337877.
Pushed by dtownsend@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7248ea754e40
Tell the child processes when we're in safe mode. r=smaug
Backed out for failing various webgl tests:

https://hg.mozilla.org/integration/autoland/rev/9da1857374780d5fc4c8babf11bf2d1ac8443bd4

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=7248ea754e40a2ffc017272a9ba98ea571729bf8
Failure log (example, please check the whole push): https://treeherder.mozilla.org/logviewer.html#?job_id=75611457&repo=autoland

[task 2017-02-08T19:10:07.969224Z] 19:10:07     INFO - TEST-START | dom/canvas/test/webgl-conf/generated/test_conformance__attribs__gl-bindAttribLocation-aliasing.html
[task 2017-02-08T19:10:08.090764Z] 19:10:08     INFO - ++DOMWINDOW == 15 (0x7f48022ee800) [pid = 1334] [serial = 15] [outer = 0x7f4803b45c00]
[task 2017-02-08T19:10:08.169551Z] 19:10:08     INFO - ++DOCSHELL 0x7f4802137800 == 5 [pid = 1334] [id = {080074ae-9b8a-4a26-bf20-74f7a2adf778}]
[task 2017-02-08T19:10:08.169648Z] 19:10:08     INFO - ++DOMWINDOW == 16 (0x7f48022f4000) [pid = 1334] [serial = 16] [outer = (nil)]
[task 2017-02-08T19:10:08.238109Z] 19:10:08     INFO - ++DOMWINDOW == 17 (0x7f48022eec00) [pid = 1334] [serial = 17] [outer = 0x7f48022f4000]
[task 2017-02-08T19:10:08.266389Z] 19:10:08     INFO - ++DOCSHELL 0x7f4802140000 == 6 [pid = 1334] [id = {3e7df34f-011d-4984-ba76-4e40896385a1}]
[task 2017-02-08T19:10:08.267418Z] 19:10:08     INFO - ++DOMWINDOW == 18 (0x7f48022f6800) [pid = 1334] [serial = 18] [outer = (nil)]
[task 2017-02-08T19:10:08.282525Z] 19:10:08     INFO - ++DOMWINDOW == 19 (0x7f48022f8800) [pid = 1334] [serial = 19] [outer = 0x7f48022f6800]
[task 2017-02-08T19:10:08.415253Z] 19:10:08     INFO - ++DOMWINDOW == 20 (0x7f48021c8c00) [pid = 1334] [serial = 20] [outer = 0x7f48022f6800]
[task 2017-02-08T19:10:08.868669Z] 19:10:08     INFO - [Child 1334] WARNING: NS_ENSURE_TRUE(ParseTypeAttribute(type, &version)) failed: file /home/worker/workspace/build/src/dom/base/nsScriptLoader.cpp, line 1456
[task 2017-02-08T19:10:08.876698Z] 19:10:08     INFO - JavaScript warning: http://mochi.test:8888/tests/dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js, line 1443: Error: WebGL warning: Failed to create WebGL context: WebGL is currently disabled.
[task 2017-02-08T19:10:08.878217Z] 19:10:08     INFO - JavaScript warning: http://mochi.test:8888/tests/dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js, line 1443: Error: WebGL warning: Failed to create WebGL context: WebGL is currently disabled.
[task 2017-02-08T19:10:08.885706Z] 19:10:08     INFO - TEST-INFO | started process screentopng
[task 2017-02-08T19:10:09.729510Z] 19:10:09     INFO - TEST-INFO | screentopng: exit 0
[task 2017-02-08T19:10:09.730464Z] 19:10:09     INFO - Buffered messages logged at 19:10:08
[task 2017-02-08T19:10:09.731724Z] 19:10:09     INFO - TEST-PASS | dom/canvas/test/webgl-conf/generated/test_conformance__attribs__gl-bindAttribLocation-aliasing.html | A valid string reason is expected 
[task 2017-02-08T19:10:09.732917Z] 19:10:09     INFO - TEST-PASS | dom/canvas/test/webgl-conf/generated/test_conformance__attribs__gl-bindAttribLocation-aliasing.html | Reason cannot be empty 
[task 2017-02-08T19:10:09.734684Z] 19:10:09     INFO - Buffered messages finished
[task 2017-02-08T19:10:09.736250Z] 19:10:09     INFO - TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_conformance__attribs__gl-bindAttribLocation-aliasing.html | Unable to fetch WebGL rendering context for Canvas 
[task 2017-02-08T19:10:09.737584Z] 19:10:09     INFO -     reportResults@dom/canvas/test/webgl-conf/mochi-single.html?checkout/conformance/attribs/gl-bindAttribLocation-aliasing.html:22:7
[task 2017-02-08T19:10:09.738959Z] 19:10:09     INFO -     reportTestResultsToHarness@dom/canvas/test/webgl-conf/checkout/js/js-test-pre.js:116:5
[task 2017-02-08T19:10:09.740332Z] 19:10:09     INFO -     testFailed@dom/canvas/test/webgl-conf/checkout/js/js-test-pre.js:246:5
[task 2017-02-08T19:10:09.741794Z] 19:10:09     INFO -     create3DContext@dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js:1451:5
[task 2017-02-08T19:10:09.743107Z] 19:10:09     INFO -     @dom/canvas/test/webgl-conf/checkout/conformance/attribs/gl-bindAttribLocation-aliasing.html:51:10
[task 2017-02-08T19:10:09.744919Z] 19:10:09     INFO - JavaScript error: http://mochi.test:8888/tests/dom/canvas/test/webgl-conf/checkout/conformance/attribs/gl-bindAttribLocation-aliasing.html, line 52: TypeError: gl is null
Flags: needinfo?(dtownsend)
We access gSafeMode directly in some places, rather than through nsXULAppInfo::GetInSafeMode, so this patch won't cover those cases.
Flags: needinfo?(milan)
(In reply to Milan Sreckovic [:milan] from comment #21)
> We access gSafeMode directly in some places, rather than through
> nsXULAppInfo::GetInSafeMode, so this patch won't cover those cases.

Yeah. We also call GetInSafeMode earlier in startup than the message that I've used to set it so this approach can't work.
Flags: needinfo?(dtownsend)
I looked a bit into passing it as a command line flag (the whole ContentParent/ContentChild/ContentProcess) but I don't know if that's the right place for it.
blassey's patch in bug 1303096 made it so that this was early enough to send a bunch of XPCOM-related information to the content process: http://searchfox.org/mozilla-central/rev/672c83ed65da286b68be1d02799c35fdd14d0134/dom/ipc/ContentParent.cpp#2040

I wonder if you could send the safe mode state there as well.

That just landed a few days ago though, so if this approach works, the right way to uplift it would be to append the safe mode state to the SendGetXPCOMProcessAttributes message: https://dxr.mozilla.org/mozilla-aurora/rev/82c75fd3a2de796351296592c459ab4aa4cd0baf/dom/ipc/ContentChild.cpp#999
Attachment #8834652 - Attachment is obsolete: true
Attachment #8835185 - Flags: review?(bugs)
Comment on attachment 8835185 [details]
Bug 1326245: Tell the child processes when we're in safe mode.

https://reviewboard.mozilla.org/r/110874/#review112366
Attachment #8835185 - Flags: review?(bugs) → review+
Pushed by dtownsend@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/20a0d5bdd1c2
Tell the child processes when we're in safe mode. r=smaug
We already have a -safe-mode command line argument <http://searchfox.org/mozilla-central/rev/afcf40f3eafd895611a839017730debb58a342a6/toolkit/xre/nsAppRunner.cpp#3378>.  Any reason you can't use that here too?
Flags: needinfo?(dtownsend)
(In reply to :Ehsan Akhgari from comment #28)
> We already have a -safe-mode command line argument
> <http://searchfox.org/mozilla-central/rev/
> afcf40f3eafd895611a839017730debb58a342a6/toolkit/xre/nsAppRunner.cpp#3378>. 
> Any reason you can't use that here too?

No reason, I just chose to keep the style of the argument consistent with the rest of the arguments that the child processes accept.
Flags: needinfo?(dtownsend)
https://hg.mozilla.org/mozilla-central/rev/20a0d5bdd1c2
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
We'll likely want an uplift to aurora and beta here.
Flags: needinfo?(dtownsend)
Attached patch patchSplinter Review
This is the same approach but for aurora and beta where process arguments are handled elsewhere. I had to stop us breaking out of the loop early to make sure that we catch any potential -safeMode argument.
Attachment #8836908 - Flags: review?(bugs)
Comment on attachment 8836908 [details] [diff] [review]
patch

I guess adding a continue there doesn't really matter.
Attachment #8836908 - Flags: review?(bugs) → review+
Comment on attachment 8836908 [details] [diff] [review]
patch

Approval Request Comment
[Feature/Bug causing the regression]: No regression
[User impact if declined]: Certain aspects of safe mode won't apply if the user is running with e10s enabled
[Is this code covered by automated tests?]: We have no automated tests for safe mode
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: Verify that webgl is disabled when in safe mode with e10s enabled.
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Low risk
[Why is the change risky/not risky?]: The change is straightforward
[String changes made/needed]: None
Flags: needinfo?(dtownsend)
Attachment #8836908 - Flags: approval-mozilla-beta?
Attachment #8836908 - Flags: approval-mozilla-aurora?
Comment on attachment 8836908 [details] [diff] [review]
patch

safe mode fix for e10s, aurora53+, beta52+
Attachment #8836908 - Flags: approval-mozilla-beta?
Attachment #8836908 - Flags: approval-mozilla-beta+
Attachment #8836908 - Flags: approval-mozilla-aurora?
Attachment #8836908 - Flags: approval-mozilla-aurora+
Flagging this for verification, instructions in Comment 34.
Flags: qe-verify+
Reproduced the initial issue on Firefox 51.0.1 (Build ID: 20170125094131) on Windows 10 x64.

Verified as fixed using the instructions from Comment 34 with the latest Nightly 54.0a1 (Build ID: 20170221030205) on Windows 10 x64, Ubuntu 16.04 x64 and Mac OS X 10.11.
Status: RESOLVED → VERIFIED
Managed to reproduce this issue on Firefox 52.0a1 (2016-10-04), under Windows 10x64.

The issue is no longer reproducible on Firefox 53.0a2 (2017-02-24), or on Firefox 52.0b9.
Tests were performed under Windows 10x64, Mac OS X 10.11.6 , Ubuntu 16.04x86.
Flags: qe-verify+
Whiteboard: [fce-active] → [fce-active-legacy]
You need to log in before you can comment on or make changes to this bug.