Closed Bug 1467758 Opened 6 years ago Closed 3 years ago

WindowServer connection never being terminated

Categories

(Core :: Security: Process Sandboxing, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
95 Branch
Performance Impact ?
Tracking Status
firefox95 --- fixed

People

(Reporter: alex.plaskett, Assigned: haik)

References

Details

(Keywords: sec-want, Whiteboard: [adv-main95-])

Attachments

(1 file)

The plugin-container content processes will never terminate the connection to the WindowServer using CGSSetDenyWindowServerConnections and CGSShutdownServerConnections. XRE_UseNativeEventProcessing will always return true as the default preference dom.ipc.useNativeEventProcessing.content is set to true by default on macOS (https://searchfox.org/mozilla-central/source/modules/libpref/init/all.js#3218). We can also confirm this under lldb, attached to a plugin content process by printing the value of deny_future_windowserver_connections, which will always be 0. It is recommended this is reviewed. Annotated code as follows: https://searchfox.org/mozilla-central/source/dom/ipc/ContentChild.cpp#1612 static bool StartMacOSContentSandbox() { int sandboxLevel = GetEffectiveContentSandboxLevel(); if (sandboxLevel < 1) { return false; } if (!XRE_UseNativeEventProcessing()) { // If we've opened a connection to the window server, shut it down now. Forbid // future connections as well. We do this for sandboxing, but it also ensures // that the Activity Monitor will not label the content process as "Not // responding" because it's not running a native event loop. See bug 1384336. CGSSetDenyWindowServerConnections(true); CGSShutdownServerConnections(); } bool XRE_UseNativeEventProcessing() { if (XRE_IsContentProcess()) { // In the case of the content process static bool sInited = false; static bool sUseNativeEventProcessing = false; if (!sInited) { Preferences::AddBoolVarCache(&sUseNativeEventProcessing, "dom.ipc.useNativeEventProcessing.content"); // This pulls back existing true cache value if its not already set. sInited = true; } return sUseNativeEventProcessing; // Will return true here as well based on default pref value. } return true; } ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// // Adds the aVariable to cache table. |aVariable| must be a pointer for a // static variable. The value will be modified when the pref value is changed // but note that even if you modified it, the value isn't assigned to the // pref. static nsresult AddBoolVarCache(bool* aVariable, const char* aPref, bool aDefault = false, bool aSkipAssignment = false); /* static */ nsresult Preferences::AddBoolVarCache(bool* aCache, const char* aPref, bool aDefault, bool aSkipAssignment) { AssertNotAlreadyCached("bool", aPref, aCache); if (!aSkipAssignment) { // aSkipAssignment is false by default *aCache = GetBool(aPref, aDefault); // GetBool retrieves true and sets sUseNativeEventProcessing to true. } CacheData* data = new CacheData(); data->mCacheLocation = aCache; data->mDefaultValueBool = aDefault; CacheDataAppendElement(data); Preferences::RegisterCallback(BoolVarChanged, aPref, data, Preferences::ExactMatch, /* isPriority */ true); return NS_OK; } //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
See Also: → 1397956
Group: core-security → dom-core-security
See Also: → 1468163
See Also: → 1468693
Depends on: 1426100
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
Group: dom-core-security
Keywords: sec-want
To summarize the results of the research in bug 1426100, and the state of this bug: Originally we thought the blocker to disconnecting the windowserver was removing native event processing (which needed some performance work to be landable). However, on implementing that and digging in, I found that windowserver connections were still hanging around. Disassembly of the macOS system libraries showed that we (along with every other browser) were misusing the (undocumented, private) disconnect-windowserver APIs. Correcting our usage of them produced (a) a browser with no windowserver connections, (b) a browser where OpenGL didn't work. Digging into the OpenGL code I found that many of these APIs require a windowserver connection. Therefore disconnecting from the windowserver will require WebGL's OpenGL interactions to be remoted. Therefore the route to go is (a) land the removal of native event processing without touching the windowserver code (this is what the latest patch on bug 1426100 does), (b) once the WebGL remoting is completed, turn off the windowserver connection. (Both Chromium and WebKit were notified that they had the same bug as we did with disconnecting from the windowserver.)
Depends on: 1504000
Depends on: 1524722
See Also: → 1524722
Depends on: 1529390
No longer depends on: 1529390

This may also depend on non-native for Mac. On 10.14, without the WindowServer, some HTML form elements such as radio buttons are not rendered. More debugging needed.

Depends on: 1381938

:handyman, do you think you could weigh in on the WebGL bits here? Without reproducing Alex's research - what bugs would you guess are blockers for this, to give us an idea of when we can re-evaluate the situation here?

Flags: needinfo?(davidp99)

For testing, setting the pref security.sandbox.content.mac.disconnect-windowserver=true (which doesn't exist by default) will change the sandbox rules to not allow the WindowServer connection.

Bug tracking for this is not great ATM but some useful bugs:

Bug 1621762 - Add IpdlQueue option for remote WebGL
Current WIP that adds functionality for Linux/Mac WebGL remoting. This will land soon.

Bug 1624726 - Refactor of WebGL remoting classes
Just some code cleanup.

Bug 1607940 - WebGL out-of-process prototype
The bug for landing webgl remoting turned on. ATM, this is only planned for Windows as there may well be bad performance regressions on other platforms (but that is not certain). If regressions aren't too bad then we will use this to enable all platforms. There are a few pieces of this that are still somewhat unsettled (I think just related to textures and VR) but otherwise the plan for this is concrete. We can make this the dependency for now.

Depends on: 1607940
Flags: needinfo?(davidp99)

I should clarify that bug 1621762 adds some behavior needed for Linux and Mac but won't be a complete working implementation that people can play with.

Depends on: 1642621

Drop the window server connection from the content process sandbox when out-of-process WebGL is enabled.

Assignee: nobody → haftandilian
Status: NEW → ASSIGNED
Pushed by haftandilian@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5aaf8123b97a WindowServer connection never being terminated r=spohl
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 95 Branch
See Also: → 1540776

(In reply to Pulsebot from comment #9)

Pushed by haftandilian@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5aaf8123b97a
WindowServer connection never being terminated r=spohl

== Change summary for alert #32053 (as of Fri, 22 Oct 2021 11:58:42 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
74% cpstartup content-process-startup macosx1015-64-shippable-qr e10s fission stylo webrender 420.58 -> 109.33
73% cpstartup content-process-startup macosx1015-64-shippable-qr e10s stylo webrender 399.29 -> 108.92
72% cpstartup content-process-startup macosx1015-64-shippable-qr e10s stylo webrender 386.08 -> 109.92
70% cpstartup content-process-startup macosx1015-64-shippable-qr e10s fission stylo webrender 369.21 -> 112.00
70% cpstartup content-process-startup macosx1015-64-shippable-qr e10s stylo webrender-sw 385.96 -> 117.25
... ... ... ... ...
7% perf_reftest_singletons coalesce-2.html macosx1014-64-shippable-qr e10s stylo webrender 205.19 -> 190.66

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=32053

Whiteboard: [qf]

On a MacBook Pro 13-inch, M1, 2020, the change in cpstartup with the profiler running is around a 30% reduction (from 70ms to 48ms) from some quick measurements.

One noticeable difference in the profiles on the 2020 M1 with/without the fix is that before removing WindowServer access, the call to mozilla::ipc::SetThisProcessName() ends up in macOS SkyLight function SLSMainConnectionID() presumably getting the WindowServer connection info which takes ~20ms. This gets skipped with this change and that amounts to most of the difference on this faster machine. (Faster when compared to the CI test machines.)

Depends on: 1737998
No longer depends on: 1737998
Blocks: 1740119
Whiteboard: [qf] → [qf][adv-main95-]
Performance Impact: --- → ?
Whiteboard: [qf][adv-main95-] → [adv-main95-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: