consider sharing UA style sheets across processes

ASSIGNED
Assigned to

Status

()

P3
normal
ASSIGNED
9 months ago
4 days ago

People

(Reporter: heycam, Assigned: heycam)

Tracking

(Blocks: 3 bugs)

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Fission Milestone:M2)

Details

(Whiteboard: [overhead:400K][layout:p1])

Attachments

(21 attachments)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
(Assignee)

Description

9 months ago
I don't know if this is feasible yet.  Maybe not.

But we seem to use about 300 KiB to store all of the UA style sheets in a freshly opened content process that just loaded about:blank, and another 160 KiB for the cached UA sheet cascade data.  Can we share this data between processes in shared memory somehow?

The contents of UA style sheets shouldn't change; even when shown in devtools, you shouldn't be able to edit the rules.  But devtools does need to be able to look up those rules.  It would be tricky to handle the reference counting across processes.

The cached UA cascade data has referenced counted objects in it too, like selectors.  And there are atoms in there, which obviously wouldn't be pointer identical to ones generated in the content process.

Maybe there's a different way we can represent this data in memory so it's suitable for use across processes?
(Assignee)

Comment 1

9 months ago
I think it's going to be hard for the cascade data.  But maybe do-able for the style sheets themselves.  Since we should know at build time what the UA style sheets contain, we can ensure that we only ever need to have static atom references.

If we kept the same representation we would need to avoid actually bumping reference counts on the objects in the style sheet, since that would end up doing a copy-on-write.

Or we could have a completely separate representation -- maybe some kind of compact bytecode-like format -- that we generate at build time.  But it would be a pain to maintain both that representation and the regular Rust object representation.
(In reply to Cameron McCormack (:heycam) (busy until July 12) from comment #1)
> I think it's going to be hard for the cascade data.  But maybe do-able for
> the style sheets themselves.  Since we should know at build time what the UA
> style sheets contain, we can ensure that we only ever need to have static
> atom references.
> 
> If we kept the same representation we would need to avoid actually bumping
> reference counts on the objects in the style sheet, since that would end up
> doing a copy-on-write.

I'm not sure how the copy-on-write bit would work, since the child processes aren't forked from the parent process (i.e. e10s!=NUWA). So you can't share anything with pointers across the processes (since the virtual space doesn't line up), which makes this pretty hard.

> Or we could have a completely separate representation -- maybe some kind of
> compact bytecode-like format -- that we generate at build time.  But it
> would be a pain to maintain both that representation and the regular Rust
> object representation.

Yeah that seems like a huge amount of work, and not obviously tractable.

I wonder if we could get somewhere by partitioning up the UA sheets and lazily-loading them based on certain conditions. For example, we could probably avoid loading the mathml and svg sheets until an element from that namespace was inserted into the document. counterstyles.css seems like we could trim it down and load the infrequently-used parts lazily. There's a lot of stuff in forms.css for a few infrequently-used form controls like number and range, perhaps we could load those on-demand as well?
(Assignee)

Comment 3

9 months ago
We've actually been moving away from on-demand loading recently, because the more granular you get about which UA style sheets are loaded in a given document, the more chance you have of having many entries in the UA sheet cascade cache (for different sets of sheets).  Though maybe in a post-Fission world there will be fewer entries in that cache anyway just because we have fewer documents loaded in a given process.
(Assignee)

Comment 4

9 months ago
(In reply to Bobby Holley (:bholley) from comment #2)
> > Or we could have a completely separate representation -- maybe some kind of
> > compact bytecode-like format -- that we generate at build time.  But it
> > would be a pain to maintain both that representation and the regular Rust
> > object representation.
> 
> Yeah that seems like a huge amount of work, and not obviously tractable.

And I agree with this, FWIW, although it might be possible to leverage serde to avoid writing a whole bunch of code to handle this representation.
I discussed with heycam during the work week and I suggested another possible route to go.

My suggestion was that, we can probably migrate majority of the UA sheets into (maybe generated) Rust code, and probably have a special routine invoked during selector matching and an additional (non-Arced) style source variant for rule tree.

This is feasible because as far as I can see, majority of rules in UA sheets only have a single compound selector (i.e. no combinator), and many of them are even just single simple selector, which makes it very easy to do the matching in code without going through the general matching mechanism.

Additionally, doing so may allow us to remove several internal pseudo-classes which were added purely for performance reason and/or enabling special matching.

I suppose that putting data into code is probably the safest way to share something between processes.
And for combinators used in UA sheets, probably all of them are child combinator, and they should usually take happy path when the last element matches. Child combinator it probably not very hard to support in code as well.
Oh, well, we have some descendant combinators... which is probably not to bad either :)
(Assignee)

Comment 8

9 months ago
Here's the style sheet sizes for the sheets loaded in a fresh content process that only loaded about:blank:

128312 resource://gre-resources/forms.css
104912 resource://gre-resources/html.css
50216 resource://gre-resources/quirk.css
39288 resource://gre-resources/ua.css
25944 resource://gre-resources/mathml.css
14704 chrome://global/content/minimal-xul.css
8592 resource://gre/res/svg.css
5472 chrome://global/skin/scrollbars.css
2096 resource://gre-resources/counterstyles.css
1584 about:PreferenceStyleSheet
400 resource://gre-resources/noscript.css
(Assignee)

Comment 9

9 months ago
Slightly different results with bug 1475191 patch applied:

128472 resource://gre-resources/forms.css
118880 resource://gre-resources/html.css
50504 resource://gre-resources/quirk.css
39416 resource://gre-resources/ua.css
26456 resource://gre-resources/mathml.css
14864 chrome://global/content/minimal-xul.css
8816 resource://gre/res/svg.css
5504 chrome://global/skin/scrollbars.css
2096 resource://gre-resources/counterstyles.css
1648 about:PreferenceStyleSheet
400 resource://gre-resources/noscript.css
(Assignee)

Updated

9 months ago
Depends on: 1475197
It does seem like we could forms.css is a big fish here. A lot of that is the stuff for range, number, and date, so we could consider being more lazy with those, but that has the downsides mentioned in comment 3.

I think Xidorn's idea about baking this stuff into the binary is a really good one, and worth investigating more.
(Assignee)

Comment 11

9 months ago
I guess that should be fine, although it does mean a step backwards in maintainability of the UA sheets.  A couple of things to be mindful of:

* handling invalidation correctly, if we're hard coding any rules that have dependencies on attribute values and element state

* ensuring we walk these rules at the right time, in case the relative order of rules (within a given sheet or between other UA sheets) matters for the cascade order


Xidorn mentions doing this just for rules that are sequences of simple selectors (i.e. have no combinators), but I wonder if it would be that much harder to support any arbitrary selectors, so long as we only reference static atoms.
Given the numbers from comment 9, roughly guessing 500K of overhead / content process.
Whiteboard: [overhead:500K]
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #12)
> Given the numbers from comment 9, roughly guessing 500K of overhead /
> content process.

Well, it's really 397k if you add up the numbers. Is the issue that we need to round up or down, or should we put 400k instead?
Flags: needinfo?(erahm)
(In reply to Bobby Holley (:bholley) from comment #13)
> (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment
> #12)
> > Given the numbers from comment 9, roughly guessing 500K of overhead /
> > content process.
> 
> Well, it's really 397k if you add up the numbers. Is the issue that we need
> to round up or down, or should we put 400k instead?

I wasn't sure if we're also addressing the cached sheets (comment #0) in this bug as well. Maybe we should split that out?

Either way we're just trying to track level of impact with the [overhead] tag, rounding is okay :)
Flags: needinfo?(erahm)
Whiteboard: [overhead:500K] → [overhead:400K]
(In reply to Cameron McCormack (:heycam) from comment #11)
> I guess that should be fine, although it does mean a step backwards in
> maintainability of the UA sheets.

It might be a step backwards to some extent, but it has an advantage that the values are guaranteed to be valid at compile time. (We don't really have approach to verify ua sheets we ship, especially the ua-only stuffs at the moment.)

Also if we, at some point, convert all ua styles into code, we can remove the UA-only checks as well as parsing functions of some internal properties.

> * handling invalidation correctly, if we're hard coding any rules that have
> dependencies on attribute values and element state
> 
> * ensuring we walk these rules at the right time, in case the relative order
> of rules (within a given sheet or between other UA sheets) matters for the
> cascade order
> 
> Xidorn mentions doing this just for rules that are sequences of simple
> selectors (i.e. have no combinators), but I wonder if it would be that much
> harder to support any arbitrary selectors, so long as we only reference
> static atoms.

The things you listed above are exactly the reason that I think we can start with compound selectors (i.e. no combinator). If everything can be check against just a single element (without its context), there should be fewer cases which we would make mistake, I guess...
(Assignee)

Comment 16

8 months ago
(In reply to Eric Rahm [:erahm] from comment #14)
> I wasn't sure if we're also addressing the cached sheets (comment #0) in
> this bug as well. Maybe we should split that out?

Split that out to bug 1481975.
Assignee: nobody → cam
Status: NEW → ASSIGNED
Summary: consider sharing UA style sheets and/or cascade data across processes → consider sharing UA style sheets across processes
(Assignee)

Updated

8 months ago
Depends on: 1482782
Note that compiling the UA sheets into code is going to make artifact builders unhappy, so there's going to need to be some sort of fallback case that doesn't use the compiled-in sheets, or uses the actual sheets if they're "newer", or something.
Why would that make artifact builders unhappy? The sheets would go into the binary directly, and it would no longer be .css files changeable for artifact builds, which should be fine since those are UA sheets and frontend people unlikely want to change them.
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #18)
> Why would that make artifact builders unhappy?

I don't understand why this is a question when:

> The sheets would go into the
> binary directly, and it would no longer be .css files changeable for
> artifact builds

really provides the answer.

> which should be fine since those are UA sheets and frontend
> people unlikely want to change them.

I dunno, if I'm a frontend person and I want to iterate on sheet design, for whatever reason, and I now have to actually compile a binary every single time, I feel like that would make me quite unhappy, no matter how infrequent it is.  How often do these sheets get modified, anyway?  Do we have any kind of numbers on that?
(In reply to Nathan Froyd [:froydnj] from comment #19)
> I dunno, if I'm a frontend person and I want to iterate on sheet design, for
> whatever reason, and I now have to actually compile a binary every single
> time, I feel like that would make me quite unhappy, no matter how infrequent
> it is.  How often do these sheets get modified, anyway?  Do we have any kind
> of numbers on that?

The point I think Xidorn was making is that most front-end CSS aren't UA sheets. UA sheets are generally defined in the HTML spec, and is something mostly the Gecko people touch.
Aha, #eggonface.

My misunderstanding, then, carry on! :)
(Assignee)

Updated

7 months ago
Depends on: 1483121
I spoke a bit with Cameron about this, and I think depending on how complex this ends up looking like it may or may not end up paying off, assuming we can do more targeted reductions for all sheets.

For example, Facebook's login page and nytimes.com all load more than 5MB of stylesheets here (lol). It may be worth to see what the average origin's sheet sizes are, but if we could reduce, let's say, 10% of the memory than the current stylesheets or cascade data use, it will end up reducing way more memory than this.

I may take a look at some these if I have some free time, filed bug 1484563 to track stuff.
(In reply to Emilio Cobos Álvarez (:emilio) from comment #22)
> I spoke a bit with Cameron about this, and I think depending on how complex
> this ends up looking like it may or may not end up paying off, assuming we
> can do more targeted reductions for all sheets.
> 
> For example, Facebook's login page and nytimes.com all load more than 5MB of
> stylesheets here (lol). It may be worth to see what the average origin's
> sheet sizes are, but if we could reduce, let's say, 10% of the memory than
> the current stylesheets or cascade data use, it will end up reducing way
> more memory than this.

If there are ways we can generally reduce stylesheet memory usage across the board, that's great - but we also need to get our fixed per-process overhead down. There are lots of simple sites that currently don't consume a lot of memory to load, but will start to do so when we enable fission. Given that the UA sheets consume ~400k per process, encoding the UA sheets into the text segment seems very much worth doing.
If we can handle this sharing in a way which will allow it to use it for stylesheets added by extensions, it will probably substantially add to the savings. Ad blockers are notorious for injecting huge stylesheets into every site, and lots of other add-ons tend to inject huge UI stylesheets on the off chance that they may eventually need them.

That said, even if we can't manage that, just sharing the UA stylesheets is almost certainly worth doing even if web sites in general are a bigger problem. The same sort of argument goes areas like JS. Web sites in general will probably have a lot more JS overhead than our chrome code does, but they don't suffer from the same sort of multiplying factor that we will as we increase our process count.

Our current base number is still close to 15MB in the best case, and we definitely can't afford 15MB of extra overhead for every frame, even if the websites we load into many frames are much larger than that.
Sure, to be clear I'm just somewhat concerned about the potential complexity here.

This would only ever work for UA sheets (static sheets), not for add-ons), and honestly it's a fair amount of work and complexity. Also note that no other engine supporting Site Isolation does this at all as far as I can tell (I can try to ask Blink folks if you're curious about why, I kind of am...).

WebKit is doing a large set of general memory-usage improvements, which end up adding-up. Just cherry-picking the ones that are style-system related and that have landed over the last couple months:

  https://bugs.webkit.org/show_bug.cgi?id=186731
  https://bugs.webkit.org/show_bug.cgi?id=186705
  https://bugs.webkit.org/show_bug.cgi?id=186711
  https://bugs.webkit.org/show_bug.cgi?id=186700
  https://bugs.webkit.org/show_bug.cgi?id=186737
  https://bugs.webkit.org/show_bug.cgi?id=186739
  https://bugs.webkit.org/show_bug.cgi?id=186741
  https://bugs.webkit.org/show_bug.cgi?id=186713
  https://bugs.webkit.org/show_bug.cgi?id=186698
  https://bugs.webkit.org/show_bug.cgi?id=186708
  https://bugs.webkit.org/show_bug.cgi?id=186988
  https://bugs.webkit.org/show_bug.cgi?id=186990

I'm not asserting that they'd have the same multiplicative effect on every content process as this of course (maybe, though?). But also this kind of stuff would affect all stylesheets across the board, including add-on sheets, and every content page. I'm sure that this kind of stuff does end up adding up.

Anyway, the point I was trying to make in comment 22 is just that I don't think that any price in complexity is worth paying for 400K of savings if we can get them from somewhere else, and this sure looks complex without having seen the whole patch. I think it's worth investigating what other engines do in terms of memory usage for Site Isolation for layout and style.

That being said if the thing ends up not being as complex / hard to maintain as I fear and we can get all of it then that'd be great, for sure. In any case Cam is the style system owner, he has the last word, and also he's doing the work here so...
(In reply to Kris Maglione [:kmag] from comment #24)
> If we can handle this sharing in a way which will allow it to use it for
> stylesheets added by extensions, it will probably substantially add to the
> savings. Ad blockers are notorious for injecting huge stylesheets into every
> site, and lots of other add-ons tend to inject huge UI stylesheets on the
> off chance that they may eventually need them.

Ugh, I forgot about adblockers. That's going to be a big problem, come to think of it - those stylesheets tend to be very large, and we can't bake them into the binary.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #25)

> This would only ever work for UA sheets (static sheets), not for add-ons)

This is a fair point. Per above, it would be really great if we could find a solution that also helped addon sheets. But that doesn't mean we can just punt on the UA sheets.

> and honestly it's a fair amount of work and complexity. Also note that no
> other engine supporting Site Isolation does this at all as far as I can tell
> (I can try to ask Blink folks if you're curious about why, I kind of am...).

It's hard to compare engines here. Other browsers don't have frame scripts or as many vtables, so they have a natural advantage in terms of per-process memory overhead. And Chrome is certainly taking heat over the memory regressions from site isolation.

> WebKit is doing a large set of general memory-usage improvements, which end
> up adding-up. Just cherry-picking the ones that are style-system related and
> that have landed over the last couple months:
> 
>   https://bugs.webkit.org/show_bug.cgi?id=186731
>   https://bugs.webkit.org/show_bug.cgi?id=186705
>   https://bugs.webkit.org/show_bug.cgi?id=186711
>   https://bugs.webkit.org/show_bug.cgi?id=186700
>   https://bugs.webkit.org/show_bug.cgi?id=186737
>   https://bugs.webkit.org/show_bug.cgi?id=186739
>   https://bugs.webkit.org/show_bug.cgi?id=186741
>   https://bugs.webkit.org/show_bug.cgi?id=186713
>   https://bugs.webkit.org/show_bug.cgi?id=186698
>   https://bugs.webkit.org/show_bug.cgi?id=186708
>   https://bugs.webkit.org/show_bug.cgi?id=186988
>   https://bugs.webkit.org/show_bug.cgi?id=186990
> 
> I'm not asserting that they'd have the same multiplicative effect on every
> content process as this of course (maybe, though?). But also this kind of
> stuff would affect all stylesheets across the board, including add-on
> sheets, and every content page. I'm sure that this kind of stuff does end up
> adding up.

If we can shrink stylesheets significantly without significantly compromising performance, I'm all ears. 

> Anyway, the point I was trying to make in comment 22 is just that I don't
> think that any price in complexity is worth paying for 400K of savings if we
> can get them from somewhere else

If "somewhere else" means "shrinking stylesheets", that's certainly an alternative, but per above I'm skeptical that we can shrink them enough. If it means "finding memory savings elsewhere in the platform", that's a tougher sell. Per [1], there just aren't enough options for memory reduction to ignore a potential 400k win.


> That being said if the thing ends up not being as complex / hard to maintain
> as I fear and we can get all of it then that'd be great, for sure. In any
> case Cam is the style system owner, he has the last word, and also he's
> doing the work here so...

Yeah. And maybe he'll have a good idea for addon stylesheets. :-)  

[1] https://mail.mozilla.org/pipermail/firefox-dev/2018-July/006628.html
(Assignee)

Comment 27

7 months ago
Most of my patch queue is oriented around pointing to selector components, rules, declaration blocks, etc. in static const memory without using pointers.  If we want to support sheets loaded by WebExtensions too, then that sounds like something we'd need to do in the parent process and use shared memory for, but we can probably use most of the same same setup to avoid pointer values in the data.  Still it would be a little trickier as we wouldn't just have a single flat array of selector components etc., but one per extension-loaded style sheet.

Do we need to support mutation of style sheets loaded by WebExtensions?  I hope not. :-)
(In reply to Cameron McCormack (:heycam) (away 24 Aug) from comment #27)
> Most of my patch queue is oriented around pointing to selector components,
> rules, declaration blocks, etc. in static const memory without using
> pointers.  If we want to support sheets loaded by WebExtensions too, then
> that sounds like something we'd need to do in the parent process and use
> shared memory for, but we can probably use most of the same same setup to
> avoid pointer values in the data.  Still it would be a little trickier as we
> wouldn't just have a single flat array of selector components etc., but one
> per extension-loaded style sheet.

Oh, nice! That actually sounds pretty tractable, though certainly some amount of complexity. 
 
> Do we need to support mutation of style sheets loaded by WebExtensions?  I
> hope not. :-)

I also hope not. Kris?
Flags: needinfo?(kmaglione+bmo)
(In reply to Cameron McCormack (:heycam) (away 24 Aug) from comment #27)
> If we want to support sheets loaded by WebExtensions too, then
> that sounds like something we'd need to do in the parent process and use
> shared memory for, but we can probably use most of the same same setup to
> avoid pointer values in the data.

That should be doable. I've already been thinking about moving some of the
content script pre-loading logic to the parent process so we don't have to
re-compile every script for every new process. Doing the same for stylesheets
wouldn't be much extra work.


(In reply to Bobby Holley (:bholley) from comment #28)
> > Do we need to support mutation of style sheets loaded by WebExtensions?  I
> > hope not. :-)
> 
> I also hope not. Kris?

Not the ones that are loaded via the sheet loading APIs. Those are shared
between documents, and injected via DOMWindowUtils so they shouldn't be visible
to pages.

Sheets loaded via <link> tags probably need to support mutation, but those are a
lost cause anyway.
Flags: needinfo?(kmaglione+bmo)
(Assignee)

Comment 30

7 months ago
One fly in the ointment is atoms.  The switch to use indexes for static atoms in my patch queue works well for the UA style sheets, since we just require up front that any identifier in the UA sheets have a static atom, and then have the canonical reference to them be their index, rather than the nsStaticAtom pointer.  For sheets loaded by extensions, we would need to have some sort of dynamically registered indexes/IDs for atoms, have that in shared memory, and then always resolve that index/ID to the nsDynamicAtom pointer when doing equality checks.  Doable but much more of a pain.
(In reply to Cameron McCormack (:heycam) (away 24 Aug) from comment #30)
> One fly in the ointment is atoms.  The switch to use indexes for static
> atoms in my patch queue works well for the UA style sheets, since we just
> require up front that any identifier in the UA sheets have a static atom,
> and then have the canonical reference to them be their index, rather than
> the nsStaticAtom pointer.  For sheets loaded by extensions, we would need to
> have some sort of dynamically registered indexes/IDs for atoms, have that in
> shared memory, and then always resolve that index/ID to the nsDynamicAtom
> pointer when doing equality checks.  Doable but much more of a pain.

Yeah. It seems reasonable to land the UA sheet stuff as the MVP, just keeping in mind the eventual desire to make it work for addon sheets (and probably getting a bug on file to track it).
Whiteboard: [overhead:400K] → [overhead:400K][layout:p1]
(Assignee)

Updated

5 months ago
Depends on: 1500362
Since this got mentioned in the meeting today, I just wanted to pop in and say that sharing memory between processes at the same address space is concerning. I'd like to better understand how that memory will be interacted with. Specifically, passing pointers (raw or offsets) over IPC is exceptionally dangerous.
(In reply to Tom Ritter [:tjr] from comment #32)
> Since this got mentioned in the meeting today, I just wanted to pop in and
> say that sharing memory between processes at the same address space is
> concerning. I'd like to better understand how that memory will be interacted
> with. Specifically, passing pointers (raw or offsets) over IPC is
> exceptionally dangerous.

Since all of the data would be generated in the parent process, I'm not especially concerned about it being dangerous.

I think the main issue is that style data has references to other data structures within the same data set, and having to store/access all of that data by offset is complicated. If there are certain data blocks that are mapped at the same address in all child processes, that makes the situation much simpler. Aside from making sure we can actually map data blocks at the same address in all content processes. That's easy enough on 64-bit, but a bit tricky on 32-bit.
(In reply to Kris Maglione [:kmag] from comment #33)
> (In reply to Tom Ritter [:tjr] from comment #32)
> > Since this got mentioned in the meeting today, I just wanted to pop in and
> > say that sharing memory between processes at the same address space is
> > concerning. I'd like to better understand how that memory will be interacted
> > with. Specifically, passing pointers (raw or offsets) over IPC is
> > exceptionally dangerous.
> 
> Since all of the data would be generated in the parent process, I'm not
> especially concerned about it being dangerous.
> 
> I think the main issue is that style data has references to other data
> structures within the same data set, and having to store/access all of that
> data by offset is complicated. If there are certain data blocks that are
> mapped at the same address in all child processes, that makes the situation
> much simpler.

Okay, that's what I was hoping: all pointers go Parent -> Child only.

This is a built-in remote code execution engine for Parent -> Child Process compromise, but I expect we already have a few of those.
(Assignee)

Comment 35

5 months ago
(In reply to Kris Maglione [:kmag] from comment #33)
> I think the main issue is that style data has references to other data
> structures within the same data set, and having to store/access all of that
> data by offset is complicated. If there are certain data blocks that are
> mapped at the same address in all child processes, that makes the situation
> much simpler.

Yes, that's exactly my thinking.  The original patches I was working towards in this bug, for storing the style sheet in static data, basically added a bunch of places where we stored offsets into static data arrays, and adding a bunch of enums that allowed either that or regular heap pointers.  It was getting too complex.

> Aside from making sure we can actually map data blocks at the
> same address in all content processes. That's easy enough on 64-bit, but a
> bit tricky on 32-bit.

How important it is it to do something at all on 32 bit?  The technique to choose a random address to map the shared memory to should work well for 64 bit, but on 32 bit on average it's going to cut the available contiguous address space in half, which wouldn't be great.
(In reply to Cameron McCormack (:heycam) from comment #35)
> > Aside from making sure we can actually map data blocks at the
> > same address in all content processes. That's easy enough on 64-bit, but a
> > bit tricky on 32-bit.
> 
> How important it is it to do something at all on 32 bit?  The technique to
> choose a random address to map the shared memory to should work well for 64
> bit, but on 32 bit on average it's going to cut the available contiguous
> address space in half, which wouldn't be great.

I think we should probably make a best effort to use shared data on 32 bit and fall back to per-process loads if we need to. I'm still not sure what the Fission situation is going to look like on 32 bit, but I'm still pretty worried that 32 bit users will in general have less RAM, and that we won't be able to entirely ignore them.

We're going to wind up allocating stylesheet memory in those content processes regardless, so I kind of suspect the address space situation will in general look about the same either way, and possibly even give us less fragmentation in the shared sheet case.
(Assignee)

Updated

5 months ago
Depends on: 1504034
(Assignee)

Updated

5 months ago
Depends on: 1504104
(Assignee)

Updated

4 months ago
Depends on: 1506760
(Assignee)

Updated

4 months ago
Depends on: 1507050
(Assignee)

Updated

4 months ago
Depends on: 1507687
(Assignee)

Updated

4 months ago
Depends on: 1511276
This looks like a use case for a general way to freeze shared memory (bug 1479960).  A read-only mapping of a writeable descriptor/handle can generally have write permissions re-added later, so we need to make sure we're not ever giving the content processes write access (even if we trust the process at the time the mapping is established).
Depends on: 1479960
(Assignee)

Comment 38

3 months ago
Thanks for the pointer to your work, Jed; I agree it would be good to drop write access from the parent process as soon as we finish copying the data into the buffer.
(Assignee)

Updated

3 months ago
Depends on: 1515533
(Assignee)

Updated

3 months ago
Depends on: 1516212
(Assignee)

Updated

3 months ago
Blocks: 1519001

(In reply to Cameron McCormack (:heycam) from comment #39)

Now that bug 1516493 and bug 1515201 have landed, my current patch queue shows the expected content process overhead reductions:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d565e9d7de84da8d3974c8f5a02fc3067cc3420f&newProject=try&newRevision=3fbde7f7886236ca2641b8187883068663ddc6a1&framework=4

Nice work! And to make sure I understand - the solution here will also apply to sheets injected via addons?

(Assignee)

Comment 41

3 months ago

Almost. There are a couple of style sheet features (like @import sheets) that I skipped support for since they weren't needed in UA sheets. I still need to investigate popular add-ons though to see whether they are loading big sheets that apply to all documents, or if they have adjusted to inserting page-specific sheets. I will look at that once this lands.

If we do have big addon inserted sheets, then the one tricky thing will be atoms. In the UA sheets we have a fixed set of strings that are atomized, so we can just assert that all of the atoms within the style sheets are static atoms. For a sharable add-on generated sheet, we'd probably need to do something like atomize in each content process separately, and dynamically look up those atoms from a big table mapping from the string's pointer or something.

(Assignee)

Updated

3 months ago
Depends on: 1519011
(Assignee)

Updated

2 months ago
Depends on: 1519737
(Assignee)

Comment 42

2 months ago

I had a look at what ABP, the most popular ad-blocking addon, is doing. I tested by opening a number of tabs (loading theage.com.au, reddit.com, bbc.com, mozilla.org, cnn.com, abc.net.au, and duckduckgo.com).

Each content process ended up with at least one very large (a bit over 1 MiB) style sheet, and most of them were unique. One of them ended up being loaded in all but one content process, and it was inserted into tracking / ad iframes. The sheet loaded into the top level document for each of these sites was unique for each site. I haven't dug into what the differences are, but they only differ by several hundred of bytes, so it's probably a ~1 MiB prefix that has general blocking rules plus a small amount of site-specific rules. So this is already bad news in a non-Fission world, since these will be unique style sheets even within the same content process.

Some sites also got a small, site-specific style sheet (< 2 KiB).

If ABP were always using the same, separate 1 MiB style sheet, and splitting out anything site specific to a separate sheet, then we might be able to do something. The WebExtensions API ends up calling nsStyleSheetService::PreloadSheetAsync to create a StyleSheet object, and nsDOMWindowUtils::AddSheet later to insert into explicitly into each document. In PreloadSheetAsync we could send the style sheet contents to the parent process, get it to parse it into a sharable form, then pass a handle to the sharable form back down the content process. The chance of allocating another shared memory buffer that can be mapped at the same address in the parent and content processes will be lower, at least on 32 bit, though I guess we could just reserve another few megabytes of address space for this purpose at the time we allocate the shared memory for the UA style sheets.

(The WebExtensions API takes the style sheet text from the addon and generates a data: URI which it passes into the style sheet service. This is also not great, as we have a massive nsIURI object hanging around in addition to the 1 MiB of parsed style sheet data.)

I also looked a uBlock Origin, which is the second most popular ad-blocking addon. It didn't load any style sheets into tracking / ad iframes. Each site had a unique style sheet. There may well be some commonality between them but it's not a big prefix like the ABP ones, so I couldn't tell at first glance. The sheet sizes were around 130 KiB.

(Assignee)

Comment 43

2 months ago

The only thing left blocking uploading patches here is a Talos regression in a couple of tests (ts_paint, sessionrestore) on macOS. Profiling and bisecting my code reveals that this is due to the ipc::SharedMemoryBasic::ShareToProcess call that I do in ContentParent::InitInternal, so I can pass the shared memory buffer down to the content process early on, in the SetXPCOMProcessAttributes IPC call.

It turns out that this call gets blocked, waiting for the newly created content process to finish its sandbox compilation call, which happens under mozilla::EarlyStartMacSandboxIfEnabled. We have bug 1412855 on file for sandbox compilation being slow; here it's taking around 100 ms.

Making the early macOS sandbox start later (after my ShareToProcess call) sounds like it defeats the purpose of being early. So I think either we need to fix the slow sandbox compilation or use some other method for passing the shared memory down to the child, e.g. like what we do for prefs, passing a handle on the command line.

Base ts_paint profile:

https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FCcjM7ADDQw-oKWKL0Qqtlg%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_ts_paint.zip/calltree/?file=profile_ts_paint%2Fstartup%2Fcycle_19.profile&globalTrackOrder=0-2-3-4-5-1&hiddenGlobalTracks=2-3-4-5&localTrackOrderByPid=2566-1-0~2567-0~&range=0.0000_0.7335&search=mach_msg_trap&thread=0&v=3

Profile with patch queue applied:

https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FG6c2RW6ASS-w31HzoCooQw%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_ts_paint.zip/calltree/?file=profile_ts_paint%2Fstartup%2Fcycle_19.profile&globalTrackOrder=0-1-2-3-4&hiddenGlobalTracks=1-2-4&localTrackOrderByPid=1442-1-0~1444-0~1443-0~&range=0.0000_0.8167&search=mach_msg_trap&thread=0&v=3

(Assignee)

Comment 44

2 months ago

(Not passing a handle on the command line, but having a fixed file descriptor for the UA sheet shared memory, which gets dup'd after the content process spawns.)

(In reply to Cameron McCormack (:heycam) from comment #43)

It turns out that this call gets blocked, waiting for the newly created content process to finish its sandbox compilation call, which happens under mozilla::EarlyStartMacSandboxIfEnabled. We have bug 1412855 on file for sandbox compilation being slow; here it's taking around 100 ms.

Bug 1465669 (using POSIX shm on Mac) would probably help here: sending the SCM_RIGHTS probably wouldn't block on anything to do with the receiving process, but even if it did it would block the IPC thread and not the main thread, which ought to at least impact total time less.

(In reply to Cameron McCormack (:heycam) (away Jan 18) from comment #43)

Making the early macOS sandbox start later (after my ShareToProcess call) sounds like it defeats the purpose of being early. So I think either we need to fix the slow sandbox compilation or use some other method for passing the shared memory down to the child, e.g. like what we do for prefs, passing a handle on the command line.

FWIW, I collected some rough data on Mac sandbox string compilation times on different hardware. Copied from bug 1501126 comment 3:

I also found that sandbox_init_with_parameters() is much slower
on older hardware like we have on try. Some rough numbers I collected
for the average execution time:

2008 MacBook with macOS 10.9: 500ms
try hardware (t-yosemite-r7-186) with macOS 10.10: 122ms
2015 MacBook Air with macOS 10.14: 15ms
2012 MacBook Pro with macOS 10.11: 14ms
2018 MacBook Pro with macOS 10.14: 8ms

(Assignee)

Comment 47

2 months ago

Thanks, Haik, that's interesting. I wonder how much of the improvement over the try hardware is due to the newer OS rather than the hardware. In any case, I've written a workaround to avoid sending the shared memory over IPC on macOS for now.

(Assignee)

Comment 49

2 months ago

Each user agent style sheet has a unique URLExtraData object containing
its URL, but since they are refcounted objects, we can't share them
easily across processes. Rather than adding support for copying them
into a shared memory buffer like we will do with the Rust objects, here
we just set up a static array of URLExtraData objects per UA style
sheet. The array will be filled in in a later patch.

Rust UrlExtraData objects, once they are transformed into their
sharable form and copied into the shared memory buffer, will reference
them by an index.

Depends on D17181

(Assignee)

Comment 50

2 months ago

UA style sheets only ever specify a single generic font family in font-family
properties, so we pre-create a unique, static SharedFontList for each generic
and change the representation of FontFamilyList to be able to refer to them
by their generic ID. This avoids having to share refcounted SharedFontList
objects across processes.

Depends on D17182

(Assignee)

Comment 51

2 months ago

This avoids having to support storing refcounted URLValue objects in shared memory,
which would be tricky.

Depends on D17183

(Assignee)

Comment 59

2 months ago

Depends on D17191

Attachment #9038121 - Attachment description: Bug 1474793 - Part 7: Add SharedMemoryBuilder type and ToShared trait. → Bug 1474793 - Part 7: Add SharedMemoryBuilder type and ToShmem trait.
Attachment #9038123 - Attachment description: Bug 1474793 - Part 9: Add support for deriving ToShared. → Bug 1474793 - Part 9: Add support for deriving ToShmem.
Attachment #9038124 - Attachment description: Bug 1474793 - Part 10.1: Add simple ToShared implementations. → Bug 1474793 - Part 10.1: Add simple ToShmem implementations.
Attachment #9038125 - Attachment description: Bug 1474793 - Part 10.2: Add ToShared impls for collections and strings. → Bug 1474793 - Part 10.2: Add ToShmem impls for collections and strings.
Attachment #9038126 - Attachment description: Bug 1474793 - Part 10.3: Add ToShared impl for Atom. → Bug 1474793 - Part 10.3: Add ToShmem impl for Atom.
Attachment #9038127 - Attachment description: Bug 1474793 - Part 10.4: Add ToShared impl for shared_lock::Locked. → Bug 1474793 - Part 10.4: Add ToShmem impl for shared_lock::Locked.
Attachment #9038128 - Attachment description: Bug 1474793 - Part 10.5: Add ToShared impl for URLValueSource. → Bug 1474793 - Part 10.5: Add ToShmem impl for URLValueSource.
Attachment #9038129 - Attachment description: Bug 1474793 - Part 10.6: Add ToShared impl for UrlExtraData. → Bug 1474793 - Part 10.6: Add ToShmem impl for UrlExtraData.
Attachment #9038130 - Attachment description: Bug 1474793 - Part 10.7: Add ToShared impl for FontFamilyList. → Bug 1474793 - Part 10.7: Add ToShmem impl for FontFamilyList.
Attachment #9038131 - Attachment description: Bug 1474793 - Part 10.8: Add derived ToShared implementations. → Bug 1474793 - Part 10.8: Add derived ToShmem implementations.
Blocks: 1495940

Updated

26 days ago
Fission Milestone: --- → M2
(Assignee)

Updated

19 days ago
No longer depends on: 1479960
You need to log in before you can comment on or make changes to this bug.