Open Bug 1635451 Opened 7 months ago Updated 14 days ago

Limit number of client connections to X for fission to avoid xclient exhaustion (fission)

Categories

(Core :: Widget: Gtk, defect, P2)

Unspecified
Linux
defect

Tracking

()

Fission Milestone M7

People

(Reporter: jesup, Unassigned)

References

Details

(Whiteboard: [fission])

Attachments

(1 file)

In fission, we may have a large number of content processes. In linux, each currently opens 2 xclient connections. Xorg code defaults to a max number of clients from all programs of 256, though it can be manually increased to either 512 or 2048. Running fission with ~100 content processes, I've run out of xclient connections multiple times.

See Bug 1621762, bug 788319, bug 1535761 and bug 1129492

See also bug 1129492 comment #21 — if there aren't other regressions, we might be able to connect lazily to X if and when WebGL is used, or at least remove one of the two connections per process. That would require non-native widgets but it wouldn't block on WebGL remoting.

From #ipc:
"So it turns out we actually connect to the X server twice per content process — once from gtk_init, once from gfxPlatform::Init.
Which is odd, because I didn't see them separately in xlsclients, but I can replay in rr and see it open both sockets and never close them."

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #1)

See also bug 1129492 comment #21 — if there aren't other regressions, we might be able to connect lazily to X if and when WebGL is used, or at least remove one of the two connections per process. That would require non-native widgets but it wouldn't block on WebGL remoting.

I should add that this is all theoretical; there might turn out to be reasons why we'd need both X connections just for WebGL, and the idea of lazily initializing the X client is even more speculative.

Bug 1195359 appears to be the source of the two X connections; I don't know how much the original rationale there still applies.

Blocks: 1636493

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #24)

There are two problems here:

  • Remove X11 connections from content processes in all cases, for security, which is bug 1129492
  • Avoid X11 connections in the common case, to avoid exhausting resource limits, which is bug 1635451

Based on Jed's bug 1129492 comment 24, we should try to fix this bug (avoiding X11 in the common case) before Fission Nightly. Tracking for Fission Nightly milestone M6c.

Fission Milestone: --- → M6c

https://bugzilla.mozilla.org/show_bug.cgi?id=1617078#c17 indicates that the X connections limit is also seen with running ~100 copies of reftest layout/reftests/scrolling/frame-scrolling-attr-1.html

See Also: → 1617078
Blocks: 1629093
Priority: -- → P2

There has been a change to our docker images to work around this issue ignoring the underlying cause, see bug 1638183.

See Also: → 1638183

Hi Neha, Martin, this is spiking up on bug 1629093. If I understand correctly, someone should look into this here in this bug?

Flags: needinfo?(stransky)
Flags: needinfo?(nkochar)

This is a duplicate of Bug 1129492 where the work is tracked.

Flags: needinfo?(stransky)
Depends on: 1657315

Some thoughts how to limit the number of X11 connections:

  • load/cache GTK theme info - sizes/borders/etc and the close the X11 connection. Unfortunately almost every page contain scrollbar which is painted in content process. Can we re-connect to X server just for the widget repaint?
  • To use Gtk styles without X11 connection. That needs gsettings which needs X11 display. Not sure if we can fiddle with that somehow.
  • Use a fake X11 connection (like Xvfb) for the offscreen drawing.

(In reply to Martin Stránský [:stransky] from comment #10)

Some thoughts how to limit the number of X11 connections:

  • load/cache GTK theme info - sizes/borders/etc and the close the X11 connection. Unfortunately almost every page contain scrollbar which is painted in content process. Can we re-connect to X server just for the widget repaint?
  • To use Gtk styles without X11 connection. That needs gsettings which needs X11 display. Not sure if we can fiddle with that somehow.
  • Use a fake X11 connection (like Xvfb) for the offscreen drawing.

Aren't we already drawing widgets offscreen and compositing them ourselves? I remember talking to someone about that, probably karlt, in mid-2017, and using a debugger to break on writing to the X11 socket seemed to confirm it.

But there are problems with loading GTK at all: the memory usage mentioned in bug 1470983 is the main one, but also we occasionally get sandbox crash reports where GTK theme loading was trying to use DBus or something else like that instead of just reading files from a well-known location.

If GTK3 themes were close enough to standard CSS, and if it were feasible to implement these widgets in HTML itself via shadow DOM or similar, then there would be a relatively nice solution to the problem of transporting the theme between processes and rendering it without any external dependencies, but I don't know if either of those preconditions is true.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #11)

Aren't we already drawing widgets offscreen and compositing them ourselves? I remember talking to someone about that, probably karlt, in mid-2017, and using a debugger to break on writing to the X11 socket seemed to confirm it.

Yes, you're right. We use GtkStyleContext to draw the widgets to cairo surfaces. And on recent enough Gtk systems we use styles created from CSS styles: https://searchfox.org/mozilla-central/rev/dfd9c0f72f9765bd4a187444e0c1e19e8834a506/widget/gtk/WidgetStyleCache.cpp#1356

The problem is:

  • The style can't be created without display connection / gsettings as Gtk itself is not designed to run without active display connection (Wayland/X11). If something works it may be just partial functionality and it's not guaranteed.
  • Some widgets can't be created from CSS and we use actual widgets to get the style.

But there are problems with loading GTK at all: the memory usage mentioned in bug 1470983 is the main one, but also we occasionally get sandbox crash reports where GTK theme loading was trying to use DBus or something else like that instead of just reading files from a well-known location.

That's interesting. I think the DBus can be avoided, I'm not aware of any styling over DBus or so. I suspect the DBus connection comes from unrelated parts, I'm interesting to hear more about it.

If GTK3 themes were close enough to standard CSS, and if it were feasible to implement these widgets in HTML itself via shadow DOM or similar, then there would be a relatively nice solution to the problem of transporting the theme between processes and rendering it without any external dependencies, but I don't know if either of those preconditions is true.

I'm afraid we need to take Gtk as-is unless we want to migrate to Gtk4.

I think the easiest solution may be to fake the X11 connection to make Gtk3 happy or close display connection after the style initialization (if that's even possible).

Comment 13 would make impossible to handle theme changes and such, right? We use some gtk function calls to actually paint the widget backgrounds and borders and such which may not work if you close the display connection like this right?

Does something like extracting all the geometry+style data we need from the parent process and forward it, then use Gecko instead of gtk to paint with that style information, sound doable?

So, like, make a Gecko re-implementation of gtk_render_background / gtk_render_frame, etc? I think making a WebRender-based one should be relatively straight-forward (if a bit of code). Not sure how easy would it be to do a skia-based one though, hopefully not too terrible.

Just for curiosity I asked the WebKitGTK folks, and they have apparently given up on rendering native form controls and scrollbars for both GTK3 and GTK4: https://bugs.webkit.org/show_bug.cgi?id=208129. They told me there's a switch for scrollbars in GTK3 but that it'll go away.

Another thing that we use GtkStyleContext for but wouldn't be solved by the non-native theme (at least as-is) is the selection colors and other system colors.

We could probably forward them from the parent process and such (we have the LookAndFeel caches to do this kind of thing).

(In reply to Emilio Cobos Álvarez (:emilio) from comment #15)

Comment 13 would make impossible to handle theme changes and such, right?

I suppose so.

We use some gtk function calls to actually paint the widget backgrounds and borders and such which may not work if you close the display connection like this right?

The painting is done via cairo to cairo context we create which is an offscreen image. I don't think that needs an active display connection. But It may depend on actual screen configuration, like desktop scale for instance.

A bit crazy idea may be to bundle gtk3 library, stripped of unrelated code and fixed to work offscreen, which may be used just for the widget/style rendering. That means to create a fake display connection and some stock screen/settings. That library may still load themes from /usr/share and use it.

Just for curiosity I asked the WebKitGTK folks, and they have apparently given up on rendering native form controls and scrollbars for both GTK3 and GTK4: https://bugs.webkit.org/show_bug.cgi?id=208129. They told me there's a switch for scrollbars in GTK3 but that it'll go away.

Just so everyone is on the same page, we are planning to do this for Firefox as well:
https://bugzilla.mozilla.org/show_bug.cgi?id=1535761
https://bugzilla.mozilla.org/show_bug.cgi?id=1411425

The underlying driver isn't directly this bug (although it helps fix it as a side effect so that's nice...), but rather that drawing native controls means the content process ends up accessing (native) graphics APIs (with all their dependencies) which ends up making sandboxing more complicated and less secure, because we have no control over what those libraries are and end up doing (which tends to be an even larger problem on Linux due to ecosystem diversity). Being able to draw them ourselves with the same appearance as a native one would be a nice to have, but the aforementioned upsides mean that a non-entirely native look wouldn't be a blocker - especially as unstyled form controls aren't that common.

(In reply to Martin Stránský [:stransky] from comment #14)

As for the style creation, we're blocked here:
https://gitlab.gnome.org/GNOME/gtk/blob/3.24.20/gtk/gtkstylecontext.c#L345

Thanks for identifying that. A GdkScreen is required for settings. It need not be a GdkX11Screen, but there is no existing headless backend. Perhaps the broadway backend might be adaptable, but I assume we can't depend on that being installed.

We can create a plain GdkScreen, but it's an abstract class really. Filling in the necessary parts for an implementation, such as screen settings from ipc, requires assumptions about GDK internals. That would make this approach risky, but maybe it's an option if there will be no more GDK 3 development?

Attachment #9179809 - Attachment mime type: text/x-c++src → text/plain

To be clear, much of this discussion is about the possibility of an interim solution until bug 1411425 is fixed.

Would it be feasible to have Fission use a new process only when under some limit (say 100 processes), or is the code starting to make assumptions that same process means same site? I'm wondering whether 100 would be sufficient for the vast majority of users at least.

As a means for delaying reaching the limit, lazy GdkDisplay creation may be possible for when native widgets are drawn or sized, which is at least usually on-demand. We'd still need to send at least some of the LookAndFeel info from the parent (bug 1470983 and/or bug 1503054) because some of this is not on-demand. I don't have a good feel for how many cross-site iframes would use native widgets, but I guess that would be a smaller number. Lazy GdkDisplay creation would help only if processes are being started and stopped often. If processes are pooled for re-use, then they'll all end up having a connection.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #15)

Does something like extracting all the geometry+style data we need from the parent process and forward it, then use Gecko instead of gtk to paint with that style information, sound doable?

So, like, make a Gecko re-implementation of gtk_render_background / gtk_render_frame, etc?

I expect this is doable, but a large chunk of work. There would be a large number different combinations of widgets and states and properties to serialize, and there is also some likelihood that different GTK versions will have different implementations of the render functions. Any interim solution would kind of be wasted work, given the plan is to remove it, so I'd prefer a solution with less work.

See Also: → 1626536

Would it be feasible to have Fission use a new process only when under some limit (say 100 processes), or is the code starting to make assumptions that same process means same site? I'm wondering whether 100 would be sufficient for the vast majority of users at least.

I am told that, unlike e10s, Fission's process architecture does not allow content processes to be reused for multiple sites. So capping the number of Fission content processes as an interim solution is not feasible.

We will need to wait for non-native theming (bug 1411425) and the other work gcp listed in bug 1129492 comment 29.

(In reply to Karl Tomlinson (:karlt) from comment #21)

We can create a plain GdkScreen, but it's an abstract class really. Filling in the necessary parts for an implementation, such as screen settings from ipc, requires assumptions about GDK internals. That would make this approach risky, but maybe it's an option if there will be no more GDK 3 development?

Yes, there's no development on Gtk3, all Gtk devs are focused to Gtk4 and Gtk3 is in maintenance mode.

Fission Milestone: M6c → M7
You need to log in before you can comment on or make changes to this bug.