Closed Bug 1384336 Opened 8 years ago Closed 8 years ago

Avoid running the appshell in content processes

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla57

Tracking Flags:

Tracking

Status

firefox57

---

fixed

People

(Reporter: billm, Assigned: billm)

References

Details

Attachments

(2 files, 3 obsolete files)

don't register thread observer in child process 8 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 1.03 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
accessibility changes 8 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 3.20 KB, patch	bugzilla : feedback+	Details \| Diff \| Splinter Review
accessibility changes, v2 8 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 6.64 KB, patch	bugzilla : review+	Details \| Diff \| Splinter Review
don't register thread observer in child process, v2 8 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 6.47 KB, patch		Details \| Diff \| Splinter Review
don't register thread observer in child process, v3 8 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 7.81 KB, patch	mstange : review+	Details \| Diff \| Splinter Review

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Description

•

8 years ago

Having to run the appshell in content processes makes Quantum DOM a lot harder. It's also mostly useless, and it probably is a small drag on our performance. And it makes sandboxing harder (see bug 1381022). I'm trying to make this bug pretty tightly scoped: rather than remove the appshell entirely from the content process, I just want to avoid calling the nsIThreadObserver callbacks for it. That will allow some of the widget-specific code that gets initialized in appshell to continue working. But it means we won't be using OS-specific event waiting mechanisms, which is what I really want to avoid.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 1

•

8 years ago

Attached patch don't register thread observer in child process (obsolete) — Details — Splinter Review

How does this seem to you, Olli?

Attachment #8890124 - Flags: review?(bugs)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 2

•

8 years ago

Attached patch accessibility changes (obsolete) — Details — Splinter Review

Our accessibility code wants relies on us being in an "alertable state" (as defined by Win32) while processing events since that allows us to process APCs. I tried getting rid of the APC mechanism entirely and just using events, but that caused deadlocks if the main thread is blocked in EnsureMTA. So this patch does both, as we discussed at the workweek, Aaron. If we decide to go this route, I think there's a bit more cleanup that could be done. However, I want to get some feedback before doing that. These patches are green on Linux and Windows. I haven't tried Mac yet.

Attachment #8890125 - Flags: feedback?(aklotz)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Updated

•

8 years ago

Blocks: 1381022

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 3

•

8 years ago

Comment on attachment 8890124 [details] [diff] [review] don't register thread observer in child process Is this ok for all the non-parent processes? I think someone familiar with our plugin processes and gpu processes should take a look too. But if we can do this, great!

Attachment #8890124 - Flags: review?(bugs) → review+

(No longer employed by Mozilla) Aaron Klotz

Comment 4

•

8 years ago

Comment on attachment 8890125 [details] [diff] [review] accessibility changes Review of attachment 8890125 [details] [diff] [review]: ----------------------------------------------------------------- This looks great, Bill! FYI, with respect to additional cleanup work and a11y, we can do some stuff in MessageChannel::WaitForSyncNotifyWithA11yReentry once the main event loop changes are done.

Attachment #8890125 - Flags: feedback?(aklotz) → feedback+

(No longer employed by Mozilla) Aaron Klotz

Comment 5

•

8 years ago

(In reply to Olli Pettay [:smaug] (vacation-ish until July 30) from comment #3) > Is this ok for all the non-parent processes? > > I think someone familiar with our plugin processes and gpu processes should > take a look too. I can't comment on gpu or GMP processes, but we probably can't in the NPAPI plugin container. I suspect that Flash assumes a fully-fledged Windows message pump in there.

(No longer employed by Mozilla) Aaron Klotz

Comment 6

•

8 years ago

(Unless we're already using something different down there... I can't remember)

(No longer employed by Mozilla) Aaron Klotz

Updated

•

8 years ago

Blocks: 1385014

(No longer employed by Mozilla) Aaron Klotz

Comment 7

•

8 years ago

(In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) from comment #4) > Comment on attachment 8890125 [details] [diff] [review] > accessibility changes > FYI, with respect to additional cleanup work and a11y, we can do some stuff > in MessageChannel::WaitForSyncNotifyWithA11yReentry once the main event loop > changes are done. I filed bug 1385014 for this.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 8

•

8 years ago

Attached patch accessibility changes, v2 — Details — Splinter Review

OK, here's a cleaner patch.

Attachment #8890125 - Attachment is obsolete: true

Attachment #8891509 - Flags: review?(aklotz)

(No longer employed by Mozilla) Aaron Klotz

Updated

•

8 years ago

Attachment #8891509 - Flags: review?(aklotz) → review+

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 9

•

8 years ago

Ugh, this is totally orange on Mac.

Alex Gaynor [:Alex_Gaynor]

Comment 10

•

8 years ago

I don't know the appshell code very well, but please let me know if there's anyway I can help out with the mac bits.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 11

•

8 years ago

Attached patch don't register thread observer in child process, v2 (obsolete) — Details — Splinter Review

This patch is similar to the previous one: it disables the appshell's event loop code in the content process. However, I made it pref-controlled. I also needed some extra changes for Mac, since the MacOS appshell code is a little different. Stephen, could you review this? The main question here is whether there's any need to run a native event loop in the content process on Macs.

Attachment #8890124 - Attachment is obsolete: true

Attachment #8893563 - Flags: review?(spohl.mozilla.bugs)

Stephen A Pohl [:spohl]

Comment 12

•

8 years ago

Comment on attachment 8893563 [details] [diff] [review] don't register thread observer in child process, v2 (In reply to Bill McCloskey (:billm) from comment #11) > The main question here is whether there's > any need to run a native event loop in the content process on Macs. Forwarding to mstange to make sure.

Attachment #8893563 - Flags: review?(spohl.mozilla.bugs) → review?(mstange)

Markus Stange [:mstange]

Comment 13

•

8 years ago

Things that would need the native event loop in the content process are: (1) Callers that use [someObjCObject performSelector:@selector(someSelector) afterDelay:someDelay]. (2) Notification observers, e.g. things that listen for changes to system settings. I don't see either of those things used in any code that runs in the content process. That's great! I was expecting more problems here. Looking through nsAppShell.mm, I also noticed two other things: HangReporter integration, and MacWakeLockListener. We need to make sure that this patch doesn't cause hang reporting to detect the content process as hanging all the time. And the fact that we currently set up MacWakeLockListeners in the content process is probably just a bug. I also checked what kinds of events go through -[GeckoNSApplication sendEvent:] in the content process at the moment. I've only seen two event types: NSApplicationDefined events, which we use internally to wake up the event loop, and NSSystemDefined events which seem to be firing system-wide when certain things happen. For some of them I wasn't able to determine a cause. Some seem to fire whenever you use the dictionary popup anywhere on the system. I'm excited about this change and I think it's time that we made it. Sometimes I see overhead from the native event loop in profiles, and somethings things related to -[NSApplication setWindowsNeedUpdate:] which I don't understand, and this change should reduce that overhead. I've briefly tested this patch now, and the only problem I've found is that Activity Monitor now says our child processes are "Not Responding". We should fix that. There's a Chrome bug about a similar problem with some useful information, it gets interesting around here: https://bugs.chromium.org/p/chromium/issues/detail?id=304860#c21

Markus Stange [:mstange]

Comment 14

•

8 years ago

(In reply to Markus Stange [:mstange] from comment #13) > Looking through nsAppShell.mm, I also noticed two other things: HangReporter > integration, and MacWakeLockListener. We need to make sure that this patch > doesn't cause hang reporting to detect the content process as hanging all > the time. And the fact that we currently set up MacWakeLockListeners in the > content process is probably just a bug. Looking at the patch again, I can see that you're not changing anything about the MacWakeLockListener situation, so let's not worry about that.

Alex Gaynor [:Alex_Gaynor]

Comment 15

•

8 years ago

In bug 1386308 I removed the use of MacWakeLockListener in content.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 16

•

8 years ago

I'm afraid I didn't understand that Chrome bug. Is there something we could do to fix it?

Markus Stange [:mstange]

Comment 17

•

8 years ago

(In reply to Alex Gaynor [:Alex_Gaynor] from comment #15) > In bug 1386308 I removed the use of MacWakeLockListener in content. Oh, good. (In reply to Bill McCloskey (:billm) from comment #16) > I'm afraid I didn't understand that Chrome bug. Is there something we could > do to fix it? I think we need to find out where we trigger a call to _RegisterApplication and avoid making that call. The chromium issue points out process renaming as one source of these calls. The general idea is: if the WindowServer has never heard from our process, it won't care about whether we're responding. So we need to make sure that we never communicate with it in the first place.

Alex Gaynor [:Alex_Gaynor]

Comment 18

•

8 years ago

Not sure if it'd solve this problem, but bug 1322024 deals with blocking new connections to the windoserver at the sandbox level -- as long as we don't obtain a connection to the windowserver before the sandbox is enabled, then it'd become impossible for us to ever talk to the windowserver. (If you find any places where we're talking to the windowserver, I guess they're blockers for that bug!)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 19

•

8 years ago

The patch in bug 1322024 doesn't fix the "not responding" problem. It seems like the window server connection is probably created before the sandbox is initialized. I set a breakpoint in _RegisterApplication and saw it being called from SetProcessName [1]. When I commented out that call, I saw it in appshell initialization [2]. I was afraid to comment out that thing since I don't understand what it does. However, I did find a Chromium patch [3] that shuts down the connection to the window server and prevents all future connections. I tried that and it seemed to work. I've got a try push pending to see how that goes [4]. I'm assuming that would subsume the change to the sandboxing rules, but I don't really know. None of this stuff seems to be documented. [1] http://searchfox.org/mozilla-central/rev/765cc1b925c5d32d05111c364257a0b79bf2952e/dom/plugins/ipc/PluginUtilsOSX.mm#276 [2] http://searchfox.org/mozilla-central/rev/765cc1b925c5d32d05111c364257a0b79bf2952e/widget/cocoa/nsAppShell.mm#287 [3] https://codereview.chromium.org/673443002/patch/20001/30001 [4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b4356599ff4d04d3233fad5ca7701913963173c

Haik Aftandilian [:haik]

Comment 20

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #19) > However, I did find a Chromium patch [3] that shuts down the connection to > the window server and prevents all future connections. I tried that and it > seemed to work. I've got a try push pending to see how that goes [4]. I'm > assuming that would subsume the change to the sandboxing rules, but I don't > really know. None of this stuff seems to be documented. We'd still want to remove the allowance for windowserver.active from the sandbox rules because that blocks access at the kernel level. I don't know exactly how CGSSetDenyWindowServerConnections works, but from the disassembly it appears to set a process-local flag which could be easily subverted. It's a shame to have to use private API's, but it seems like a reasonable thing to do in this situation. I'm not sure it's worth it, but if we wanted to avoid the private API's, we could check if _RegisterApplication still works properly after the sandbox has initialized. If it does, we might be able to refactor our ContentChild setup code so that _RegisterApplication is called after we've blocked connections to the window server in the sandbox and then we wouldn't need to use the private API calls. Chromium intentionally uses the windowserver in their "sandbox warmup code" before enabling the sandbox. Since this is all undocumented behavior, I'm not sure we'd be gaining much though. And explicitly closing the windowserver.active is a good thing to do in case it is ever opened as a side-effect of a different Apple API call / in a new OS version. -- I think that not using a native event loop will resolve bug 1306663. That's an example of messages coming in on the native event loop trying to register an IPC service in plugin-container. (i.e. an IPC server in plugin-container). We disable registering IPC services in the sandbox rules, but we still get the event loop messages which attempt to do it. And they trigger error messages on the command line console and in the Console app.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 21

•

8 years ago

What do you think, Markus? I think our alternatives are: 1. Use CGSShutdownServerConnections when we shut down the sandbox. 2. Avoid setting the process name and also figure out how to fix up nsAppShell::Init so it doesn't use the window server. And maybe fix up anything else that contacts the window server after that.

Flags: needinfo?(mstange)

Markus Stange [:mstange]

Comment 22

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #19) > I set a breakpoint in _RegisterApplication and saw it being > called from SetProcessName [1]. Ok, this one matches what the Chrome people encountered in their bug. > When I commented out that call, I saw it in > appshell initialization [2]. I was afraid to comment out that thing since I > don't understand what it does. This initializes the process-wide NSApplication instance. NSApplication manages the event loop and any open windows. Since we don't need either of those in the content process, ideally we wouldn't ever need to create NSApplication. I'd prefer not using anything of the Mac nsAppShell in the content process, instead of initializing some of it and then never calling back into its code. But I don't know how much work that would be. > However, I did find a Chromium patch [3] that shuts down the connection to > the window server and prevents all future connections. I tried that and it > seemed to work. Interesting. So maybe CGSShutdownServerConnections() effectively tells the window server to stop monitoring this process for responsiveness. (In reply to Bill McCloskey (:billm) from comment #21) > What do you think, Markus? I think our alternatives are: > 1. Use CGSShutdownServerConnections when we shut down the sandbox. > 2. Avoid setting the process name and also figure out how to fix up > nsAppShell::Init so it doesn't use the window server. And maybe fix up > anything else that contacts the window server after that. 2 certainly seems like the more prudent thing to do, but 1 is so much simpler. And since Chrome is using that API, they must have determined that it's necessary. So if we have to use it anyway, we might as well use it to save us some work. Let's go with 1.

Flags: needinfo?(mstange)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 23

•

8 years ago

Attached patch don't register thread observer in child process, v3 — Details — Splinter Review

Attachment #8893563 - Attachment is obsolete: true

Attachment #8893563 - Flags: review?(mstange)

Attachment #8895537 - Flags: review?(mstange)

Markus Stange [:mstange]

Comment 24

•

8 years ago

Comment on attachment 8895537 [details] [diff] [review] don't register thread observer in child process, v3 Review of attachment 8895537 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/ipc/ContentChild.cpp @@ +1480,5 @@ > + // future connections as well. We do this for sandboxing, but it also ensures > + // that the Activity Monitor will not label the content process as "Not > + // responding" because it's not running a native event loop. See bug 1384336. > + CGSSetDenyWindowServerConnections(true); > + CGSShutdownServerConnections(); So this also runs if dom.ipc.useNativeEventProcessing.content is true? Does that work?

Attachment #8895537 - Flags: review?(mstange) → review+

Pulsebot

Comment 25

•

8 years ago

Pushed by wmccloskey@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/4786ec6700d0 Fix to MainThreadInvoker to avoid deadlocks (r=aklotz) https://hg.mozilla.org/integration/mozilla-inbound/rev/01fd73e4b8b8 Stop using OS-level event loop in content process (r=mstange)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 26

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/4786ec6700d0 https://hg.mozilla.org/mozilla-central/rev/01fd73e4b8b8

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Ionuț Goldan [:igoldan]

Updated

•

8 years ago

Depends on: 1389498

roxleitan

Comment 28

•

7 years ago

According to the kick-off meeting with Andrew, stage 2 from https://docs.google.com/document/d/1KYuJu23tzweO4WOceF5Fhle6SQ-PXnfV6NfrGf4kiBM is to be tested by Softvision - to which bug 1384336 refers to. Our current understanding is that we need to cover testing Firefox with third-party software (accessibility, antivirus). So, we are planning to test as follows: Environment: - Windows : Win 7 x32, Win 7x 64, Win 8.1 x32, Win 8.1 x64, Win 10 x32, Win 10 x64 - Linux : Ubuntu 16.04 x64 , Ubuntu 15.04 x32 - Mac OS X 10.12 Test Scenarios: running FF in the above environments with: -Mac+ Ubuntu - native firewalls, native screen readers -Windows: KAV 10.3.0, NVDA Andrew, Bill could you list that pref. that is mentioned in comment 11 if that's still applicable? Could you also please advise if there are any variations to the above scenarios or any other scenarios or setups that would be relevant to test against? Is there anything related to content processes that would need to be tested during this test run?

Flags: needinfo?(overholt)

Flags: needinfo?(WJMII)

Andrew Overholt [:overholt]

Comment 29

•

7 years ago

I'll let Bill answer.

Flags: needinfo?(wmccloskey)

Flags: needinfo?(overholt)

Flags: needinfo?(WJMII)

Alice0775 White

Updated

•

7 years ago

Depends on: 1395187

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 30

•

7 years ago

The pref is set by default, so it's fine to test a normal Nightly here. The testing setup sounds good, although the more AV/accessibility software we can test with, the better. I think that skipping Windows 8 should be fine. Probably no testing on Linux is needed at all.

Flags: needinfo?(wmccloskey)

Adam Gashlin (he/him) [:agashlin] (ex-moco)

Updated

•

7 years ago

Depends on: 1396984

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Updated

•

7 years ago

Comment 31

•

7 years ago

I'm going to be disabling this for 57 in bug 1384336. I'll try to sort out these regressions and hopefully we can re-enable in 58.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 32

•

7 years ago

(In reply to Bill McCloskey (:billm) from comment #31) > I'm going to be disabling this for 57 in bug 1384336. I'll try to sort out > these regressions and hopefully we can re-enable in 58. That should have been bug 1397956.

Bob Owen (:bobowen)

Updated

•

7 years ago

Blocks: 1423628

Hiroyuki Ikezoe (:hiro)

Updated

•

6 years ago

Depends on: 1486971

Markus Stange [:mstange]

Comment 33

•

6 years ago

This got re-enabled in bug 1426100.

Hiroyuki Ikezoe (:hiro)

Updated

•

6 years ago

Depends on: 1492284

You need to log in before you can comment on or make changes to this bug.