Closed
Bug 1326245
Opened 8 years ago
Closed 7 years ago
Handling safe mode is completely broken in the child process
Categories
(Toolkit :: Startup and Profile System, defect)
Toolkit
Startup and Profile System
Tracking
()
People
(Reporter: ehsan.akhgari, Assigned: mossop)
References
Details
(Keywords: regression, Whiteboard: [fce-active-legacy])
Attachments
(2 files, 1 obsolete file)
59 bytes,
text/x-review-board-request
|
smaug
:
review+
|
Details |
2.09 KB,
patch
|
smaug
:
review+
jcristau
:
approval-mozilla-aurora+
jcristau
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
The gSafeMode boolean is initialized by looking at either the command line or an environment variable for the restart in safe mode path, but it is never transmitted to the child process, so all consumers of nsIXULAppInfo::GetInSafeMode() in the child process (or any other process than the parent for that matter) will do the wrong thing. An example of a Web-facing regression caused by this is WebGL no longer being disabled in safe mode in e10s mode due to this code <http://searchfox.org/mozilla-central/rev/51aa673e28802fe6ea89f4793721fc55334a6ac8/dom/canvas/WebGLContext.cpp#965> no longer working. This is easy to demonstrate for example by visiting <http://webglsamples.org/blob/blob.html> in e10s vs non-e10s mode in safe mode. The non-e10s window shows you a prompt saying WebGL is disabled, but the e10s window will show nothing and render nothing. It seems the tracking-e10s flag is no longer a thing. Needinfoing mrbkap instead of that to make sure this is tracked.
Flags: needinfo?(mrbkap)
Comment 1•8 years ago
|
||
Tracking 51/52/53+ to track whatever we still need to do for safe mode.
Comment 2•8 years ago
|
||
What behavior do we actually want here? The reason we disable things like JITs, webgl, etc in safe mode is because they can cause the main process to crash very early, and therefore never check for updates. I don't think we actually need to do any of that in the content process, since that doesn't affect the basic functionality of launching and checking for updates. So I'm inclined to say this is WORKSFORME or WONTFIX or someth8ing like that. Although, I don't really understand the bit about webgl rendering nothing. Is that because part of the webgl code is still in the chrome process and therefore still disabled?
Flags: needinfo?(ehsan)
Updated•8 years ago
|
Component: XPCOM → Startup and Profile System
Product: Core → Toolkit
Reporter | ||
Comment 3•8 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #2) > What behavior do we actually want here? The reason we disable things like > JITs, webgl, etc in safe mode is because they can cause the main process to > crash very early, and therefore never check for updates. I don't think we > actually need to do any of that in the content process, since that doesn't > affect the basic functionality of launching and checking for updates. There are many places in the code base where we check for whether we are in safe mode and do things differently based on that. If you're wondering whether what we are currently doing makes sense in each specific place in the code, you need to ask the individuals responsible for those places, I don't know about all of those places. > So I'm inclined to say this is WORKSFORME or WONTFIX or someth8ing like that. Without going through the list of all of the places where post-e10s we're doing something different, that seems premature to me. > Although, I don't really understand the bit about webgl rendering nothing. > Is that because part of the webgl code is still in the chrome process and > therefore still disabled? Presumably. I'm not sure. Jeff Gilbert is a better person to ask about WebGL. But note that the WebGL call site is just a random one I chose as an example since it was easy to test. There's nothing WebGL specific here per se.
Flags: needinfo?(ehsan) → needinfo?(jgilbert)
Comment 4•8 years ago
|
||
We shouldn't use this bug to decide whether or not to move WebGL out from under Safe Mode, so I'm in favor of fixing this bug. If we decide we don't want WebGL under Safe Mode, we can work on that elsewhere.
Flags: needinfo?(jgilbert)
Comment 5•8 years ago
|
||
If "safe mode" is a property of the entire application, then yes we should propagate the flag. I'm asserting that safe mode is not a property of the application, but of the process, in the sense that we want to do safe things in that process so that it doesn't crash and can check for updates. If we do what you're saying, I assert that we're going to have to go back to each thing that currently checks safe mode and make it check safe-mode+main process, which seems silly.
Comment 6•8 years ago
|
||
It for sure seems like there are different levels here. I think we should clarify in documentation what 'safe mode' means to resolve this bug.
Comment 7•8 years ago
|
||
I don't know who "owns" safe mode. Mike, do you know?
Flags: needinfo?(mrbkap) → needinfo?(mconley)
Comment 8•7 years ago
|
||
Sorry for the delay - especially since the answer is "I don't know". :/ Maybe Mossop knows?
Flags: needinfo?(mconley) → needinfo?(dtownsend)
Assignee | ||
Comment 9•7 years ago
|
||
(In reply to Mike Conley (:mconley) - Backed up on review / needinfo's from comment #8) > Sorry for the delay - especially since the answer is "I don't know". :/ > Maybe Mossop knows? I don't know that there really is an owner. The closest fit is probably Benjamin but I'm going to disagree with his thoughts. (In reply to Benjamin Smedberg [:bsmedberg] from comment #5) > If "safe mode" is a property of the entire application, then yes we should > propagate the flag. I'm asserting that safe mode is not a property of the > application, but of the process, in the sense that we want to do safe things > in that process so that it doesn't crash and can check for updates. Users see safe mode as a property of the application. They launch Firefox in safe mode and expect it to behave safely. The fact that there are multiple processes isn't surfaced to them at all so all the processes should behave as if in safe mode. > If we do what you're saying, I assert that we're going to have to go back to > each thing that currently checks safe mode and make it check safe-mode+main > process, which seems silly. I think we just need to propagate gSafeMode to each child process. This will be the least confusing to developers.
Flags: needinfo?(dtownsend)
We should definitely fix this for 52, given that it's ESR, but too late for 51.
Updated•7 years ago
|
Flags: needinfo?(milan)
Bug 1172491 was where we started allowing e10s in safe mode.
See Also: → 1172491
Updated•7 years ago
|
Updated•7 years ago
|
Whiteboard: [fce-active]
Comment 12•7 years ago
|
||
Mossop, since you have some helpful sounding analysis, can you help find someone to work on this for 52? Is there some sort of "ongoing e10s cleanup" team? Jimm, maybe you can help so this doesn't fall through the cracks.
Flags: needinfo?(jmathies)
Flags: needinfo?(dtownsend)
Comment 13•7 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #12) > Is there some sort of "ongoing e10s cleanup" team? no, e10s is shipping product. now owned by everyone. > Jimm, maybe you can help > so this doesn't fall through the cracks. it should track and block a release, sounds like we want it to block 52.
Flags: needinfo?(milan)
Flags: needinfo?(jmathies)
Updated•7 years ago
|
Comment 15•7 years ago
|
||
While we did already ship this issue (so maybe it shouldn't block, I wouldn't like to ship it in ESR 52 if we can help it.
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → dtownsend
Flags: needinfo?(dtownsend)
Comment 17•7 years ago
|
||
mozreview-review |
Comment on attachment 8834652 [details] Bug 1326245: Tell the child processes when we're in safe mode. https://reviewboard.mozilla.org/r/110510/#review111968 ::: dom/ipc/ContentChild.cpp:2324 (Diff revision 1) > mAppInfo.buildID.Assign(buildID); > mAppInfo.name.Assign(name); > mAppInfo.UAName.Assign(UAName); > mAppInfo.ID.Assign(ID); > mAppInfo.vendor.Assign(vendor); > + mAppInfo.safeMode = safeMode; oh, our nsXULAppInfo is silly. Why our code doesn't just assign these process specific values to the global variables defined in nsXULAppInfo. But ok, you're following the current way to do this. Want to file a followup bug to make nsXULAppInfo a tad less error prone by storing the data only in one place.
Attachment #8834652 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 18•7 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #17) > Want to file a followup bug to make nsXULAppInfo a tad less error prone by > storing the data only in one place. Filed bug 1337877.
Comment 19•7 years ago
|
||
Pushed by dtownsend@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7248ea754e40 Tell the child processes when we're in safe mode. r=smaug
Comment 20•7 years ago
|
||
Backed out for failing various webgl tests: https://hg.mozilla.org/integration/autoland/rev/9da1857374780d5fc4c8babf11bf2d1ac8443bd4 Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=7248ea754e40a2ffc017272a9ba98ea571729bf8 Failure log (example, please check the whole push): https://treeherder.mozilla.org/logviewer.html#?job_id=75611457&repo=autoland [task 2017-02-08T19:10:07.969224Z] 19:10:07 INFO - TEST-START | dom/canvas/test/webgl-conf/generated/test_conformance__attribs__gl-bindAttribLocation-aliasing.html [task 2017-02-08T19:10:08.090764Z] 19:10:08 INFO - ++DOMWINDOW == 15 (0x7f48022ee800) [pid = 1334] [serial = 15] [outer = 0x7f4803b45c00] [task 2017-02-08T19:10:08.169551Z] 19:10:08 INFO - ++DOCSHELL 0x7f4802137800 == 5 [pid = 1334] [id = {080074ae-9b8a-4a26-bf20-74f7a2adf778}] [task 2017-02-08T19:10:08.169648Z] 19:10:08 INFO - ++DOMWINDOW == 16 (0x7f48022f4000) [pid = 1334] [serial = 16] [outer = (nil)] [task 2017-02-08T19:10:08.238109Z] 19:10:08 INFO - ++DOMWINDOW == 17 (0x7f48022eec00) [pid = 1334] [serial = 17] [outer = 0x7f48022f4000] [task 2017-02-08T19:10:08.266389Z] 19:10:08 INFO - ++DOCSHELL 0x7f4802140000 == 6 [pid = 1334] [id = {3e7df34f-011d-4984-ba76-4e40896385a1}] [task 2017-02-08T19:10:08.267418Z] 19:10:08 INFO - ++DOMWINDOW == 18 (0x7f48022f6800) [pid = 1334] [serial = 18] [outer = (nil)] [task 2017-02-08T19:10:08.282525Z] 19:10:08 INFO - ++DOMWINDOW == 19 (0x7f48022f8800) [pid = 1334] [serial = 19] [outer = 0x7f48022f6800] [task 2017-02-08T19:10:08.415253Z] 19:10:08 INFO - ++DOMWINDOW == 20 (0x7f48021c8c00) [pid = 1334] [serial = 20] [outer = 0x7f48022f6800] [task 2017-02-08T19:10:08.868669Z] 19:10:08 INFO - [Child 1334] WARNING: NS_ENSURE_TRUE(ParseTypeAttribute(type, &version)) failed: file /home/worker/workspace/build/src/dom/base/nsScriptLoader.cpp, line 1456 [task 2017-02-08T19:10:08.876698Z] 19:10:08 INFO - JavaScript warning: http://mochi.test:8888/tests/dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js, line 1443: Error: WebGL warning: Failed to create WebGL context: WebGL is currently disabled. [task 2017-02-08T19:10:08.878217Z] 19:10:08 INFO - JavaScript warning: http://mochi.test:8888/tests/dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js, line 1443: Error: WebGL warning: Failed to create WebGL context: WebGL is currently disabled. [task 2017-02-08T19:10:08.885706Z] 19:10:08 INFO - TEST-INFO | started process screentopng [task 2017-02-08T19:10:09.729510Z] 19:10:09 INFO - TEST-INFO | screentopng: exit 0 [task 2017-02-08T19:10:09.730464Z] 19:10:09 INFO - Buffered messages logged at 19:10:08 [task 2017-02-08T19:10:09.731724Z] 19:10:09 INFO - TEST-PASS | dom/canvas/test/webgl-conf/generated/test_conformance__attribs__gl-bindAttribLocation-aliasing.html | A valid string reason is expected [task 2017-02-08T19:10:09.732917Z] 19:10:09 INFO - TEST-PASS | dom/canvas/test/webgl-conf/generated/test_conformance__attribs__gl-bindAttribLocation-aliasing.html | Reason cannot be empty [task 2017-02-08T19:10:09.734684Z] 19:10:09 INFO - Buffered messages finished [task 2017-02-08T19:10:09.736250Z] 19:10:09 INFO - TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_conformance__attribs__gl-bindAttribLocation-aliasing.html | Unable to fetch WebGL rendering context for Canvas [task 2017-02-08T19:10:09.737584Z] 19:10:09 INFO - reportResults@dom/canvas/test/webgl-conf/mochi-single.html?checkout/conformance/attribs/gl-bindAttribLocation-aliasing.html:22:7 [task 2017-02-08T19:10:09.738959Z] 19:10:09 INFO - reportTestResultsToHarness@dom/canvas/test/webgl-conf/checkout/js/js-test-pre.js:116:5 [task 2017-02-08T19:10:09.740332Z] 19:10:09 INFO - testFailed@dom/canvas/test/webgl-conf/checkout/js/js-test-pre.js:246:5 [task 2017-02-08T19:10:09.741794Z] 19:10:09 INFO - create3DContext@dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js:1451:5 [task 2017-02-08T19:10:09.743107Z] 19:10:09 INFO - @dom/canvas/test/webgl-conf/checkout/conformance/attribs/gl-bindAttribLocation-aliasing.html:51:10 [task 2017-02-08T19:10:09.744919Z] 19:10:09 INFO - JavaScript error: http://mochi.test:8888/tests/dom/canvas/test/webgl-conf/checkout/conformance/attribs/gl-bindAttribLocation-aliasing.html, line 52: TypeError: gl is null
Flags: needinfo?(dtownsend)
We access gSafeMode directly in some places, rather than through nsXULAppInfo::GetInSafeMode, so this patch won't cover those cases.
Flags: needinfo?(milan)
Assignee | ||
Comment 22•7 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #21) > We access gSafeMode directly in some places, rather than through > nsXULAppInfo::GetInSafeMode, so this patch won't cover those cases. Yeah. We also call GetInSafeMode earlier in startup than the message that I've used to set it so this approach can't work.
Flags: needinfo?(dtownsend)
I looked a bit into passing it as a command line flag (the whole ContentParent/ContentChild/ContentProcess) but I don't know if that's the right place for it.
Comment 24•7 years ago
|
||
blassey's patch in bug 1303096 made it so that this was early enough to send a bunch of XPCOM-related information to the content process: http://searchfox.org/mozilla-central/rev/672c83ed65da286b68be1d02799c35fdd14d0134/dom/ipc/ContentParent.cpp#2040 I wonder if you could send the safe mode state there as well. That just landed a few days ago though, so if this approach works, the right way to uplift it would be to append the safe mode state to the SendGetXPCOMProcessAttributes message: https://dxr.mozilla.org/mozilla-aurora/rev/82c75fd3a2de796351296592c459ab4aa4cd0baf/dom/ipc/ContentChild.cpp#999
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8834652 -
Attachment is obsolete: true
Assignee | ||
Updated•7 years ago
|
Attachment #8835185 -
Flags: review?(bugs)
Comment 26•7 years ago
|
||
mozreview-review |
Comment on attachment 8835185 [details] Bug 1326245: Tell the child processes when we're in safe mode. https://reviewboard.mozilla.org/r/110874/#review112366
Attachment #8835185 -
Flags: review?(bugs) → review+
Comment 27•7 years ago
|
||
Pushed by dtownsend@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/20a0d5bdd1c2 Tell the child processes when we're in safe mode. r=smaug
Reporter | ||
Comment 28•7 years ago
|
||
We already have a -safe-mode command line argument <http://searchfox.org/mozilla-central/rev/afcf40f3eafd895611a839017730debb58a342a6/toolkit/xre/nsAppRunner.cpp#3378>. Any reason you can't use that here too?
Flags: needinfo?(dtownsend)
Assignee | ||
Comment 29•7 years ago
|
||
(In reply to :Ehsan Akhgari from comment #28) > We already have a -safe-mode command line argument > <http://searchfox.org/mozilla-central/rev/ > afcf40f3eafd895611a839017730debb58a342a6/toolkit/xre/nsAppRunner.cpp#3378>. > Any reason you can't use that here too? No reason, I just chose to keep the style of the argument consistent with the rest of the arguments that the child processes accept.
Flags: needinfo?(dtownsend)
Comment 30•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/20a0d5bdd1c2
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
We'll likely want an uplift to aurora and beta here.
Flags: needinfo?(dtownsend)
Assignee | ||
Comment 32•7 years ago
|
||
This is the same approach but for aurora and beta where process arguments are handled elsewhere. I had to stop us breaking out of the loop early to make sure that we catch any potential -safeMode argument.
Attachment #8836908 -
Flags: review?(bugs)
Comment 33•7 years ago
|
||
Comment on attachment 8836908 [details] [diff] [review] patch I guess adding a continue there doesn't really matter.
Attachment #8836908 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 34•7 years ago
|
||
Comment on attachment 8836908 [details] [diff] [review] patch Approval Request Comment [Feature/Bug causing the regression]: No regression [User impact if declined]: Certain aspects of safe mode won't apply if the user is running with e10s enabled [Is this code covered by automated tests?]: We have no automated tests for safe mode [Has the fix been verified in Nightly?]: Yes [Needs manual test from QE? If yes, steps to reproduce]: Verify that webgl is disabled when in safe mode with e10s enabled. [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Low risk [Why is the change risky/not risky?]: The change is straightforward [String changes made/needed]: None
Flags: needinfo?(dtownsend)
Attachment #8836908 -
Flags: approval-mozilla-beta?
Attachment #8836908 -
Flags: approval-mozilla-aurora?
Comment 35•7 years ago
|
||
Comment on attachment 8836908 [details] [diff] [review] patch safe mode fix for e10s, aurora53+, beta52+
Attachment #8836908 -
Flags: approval-mozilla-beta?
Attachment #8836908 -
Flags: approval-mozilla-beta+
Attachment #8836908 -
Flags: approval-mozilla-aurora?
Attachment #8836908 -
Flags: approval-mozilla-aurora+
Comment 36•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/a9c518127961
Comment 37•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/4c6104612205
Comment 38•7 years ago
|
||
Flagging this for verification, instructions in Comment 34.
Flags: qe-verify+
Comment 39•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-esr52/rev/4c6104612205
status-firefox-esr52:
--- → fixed
Comment 40•7 years ago
|
||
Reproduced the initial issue on Firefox 51.0.1 (Build ID: 20170125094131) on Windows 10 x64. Verified as fixed using the instructions from Comment 34 with the latest Nightly 54.0a1 (Build ID: 20170221030205) on Windows 10 x64, Ubuntu 16.04 x64 and Mac OS X 10.11.
Status: RESOLVED → VERIFIED
Comment 41•7 years ago
|
||
Managed to reproduce this issue on Firefox 52.0a1 (2016-10-04), under Windows 10x64. The issue is no longer reproducible on Firefox 53.0a2 (2017-02-24), or on Firefox 52.0b9. Tests were performed under Windows 10x64, Mac OS X 10.11.6 , Ubuntu 16.04x86.
Updated•7 years ago
|
Flags: qe-verify+
Updated•6 years ago
|
Whiteboard: [fce-active] → [fce-active-legacy]
You need to log in
before you can comment on or make changes to this bug.
Description
•