consider sharing UA style sheets across processes
Categories
(Core :: CSS Parsing and Computation, enhancement, P3)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox68 | --- | fixed |
People
(Reporter: heycam, Assigned: heycam)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [overhead:400K][layout:p1])
Attachments
(20 files, 1 obsolete file)
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review |
| Assignee | ||
Comment 1•7 years ago
|
||
Comment 2•7 years ago
|
||
| Assignee | ||
Comment 3•7 years ago
|
||
| Assignee | ||
Comment 4•7 years ago
|
||
Comment 5•7 years ago
|
||
Comment 6•7 years ago
|
||
Comment 7•7 years ago
|
||
| Assignee | ||
Comment 8•7 years ago
|
||
| Assignee | ||
Comment 9•7 years ago
|
||
Comment 10•7 years ago
|
||
| Assignee | ||
Comment 11•7 years ago
|
||
Comment 12•7 years ago
|
||
Comment 13•7 years ago
|
||
Comment 14•7 years ago
|
||
Comment 15•7 years ago
|
||
| Assignee | ||
Comment 16•7 years ago
|
||
Comment 17•7 years ago
|
||
Comment 18•7 years ago
|
||
Comment 19•7 years ago
|
||
Comment 20•7 years ago
|
||
Comment 21•7 years ago
|
||
Updated•7 years ago
|
Comment 22•7 years ago
|
||
Comment 23•7 years ago
|
||
Comment 24•7 years ago
|
||
Comment 25•7 years ago
|
||
Comment 26•7 years ago
|
||
| Assignee | ||
Comment 27•7 years ago
|
||
Comment 28•7 years ago
|
||
Comment 29•7 years ago
|
||
| Assignee | ||
Comment 30•7 years ago
|
||
Comment 31•7 years ago
|
||
Updated•7 years ago
|
Comment 32•7 years ago
|
||
Comment 33•7 years ago
|
||
Comment 34•7 years ago
|
||
| Assignee | ||
Comment 35•7 years ago
|
||
Comment 36•7 years ago
|
||
Comment 37•6 years ago
|
||
| Assignee | ||
Comment 38•6 years ago
|
||
| Assignee | ||
Comment 39•6 years ago
|
||
Now that bug 1516493 and bug 1515201 have landed, my current patch queue shows the expected content process overhead reductions:
Comment 40•6 years ago
|
||
(In reply to Cameron McCormack (:heycam) from comment #39)
Now that bug 1516493 and bug 1515201 have landed, my current patch queue shows the expected content process overhead reductions:
Nice work! And to make sure I understand - the solution here will also apply to sheets injected via addons?
| Assignee | ||
Comment 41•6 years ago
|
||
Almost. There are a couple of style sheet features (like @import sheets) that I skipped support for since they weren't needed in UA sheets. I still need to investigate popular add-ons though to see whether they are loading big sheets that apply to all documents, or if they have adjusted to inserting page-specific sheets. I will look at that once this lands.
If we do have big addon inserted sheets, then the one tricky thing will be atoms. In the UA sheets we have a fixed set of strings that are atomized, so we can just assert that all of the atoms within the style sheets are static atoms. For a sharable add-on generated sheet, we'd probably need to do something like atomize in each content process separately, and dynamically look up those atoms from a big table mapping from the string's pointer or something.
| Assignee | ||
Comment 42•6 years ago
|
||
I had a look at what ABP, the most popular ad-blocking addon, is doing. I tested by opening a number of tabs (loading theage.com.au, reddit.com, bbc.com, mozilla.org, cnn.com, abc.net.au, and duckduckgo.com).
Each content process ended up with at least one very large (a bit over 1 MiB) style sheet, and most of them were unique. One of them ended up being loaded in all but one content process, and it was inserted into tracking / ad iframes. The sheet loaded into the top level document for each of these sites was unique for each site. I haven't dug into what the differences are, but they only differ by several hundred of bytes, so it's probably a ~1 MiB prefix that has general blocking rules plus a small amount of site-specific rules. So this is already bad news in a non-Fission world, since these will be unique style sheets even within the same content process.
Some sites also got a small, site-specific style sheet (< 2 KiB).
If ABP were always using the same, separate 1 MiB style sheet, and splitting out anything site specific to a separate sheet, then we might be able to do something. The WebExtensions API ends up calling nsStyleSheetService::PreloadSheetAsync to create a StyleSheet object, and nsDOMWindowUtils::AddSheet later to insert into explicitly into each document. In PreloadSheetAsync we could send the style sheet contents to the parent process, get it to parse it into a sharable form, then pass a handle to the sharable form back down the content process. The chance of allocating another shared memory buffer that can be mapped at the same address in the parent and content processes will be lower, at least on 32 bit, though I guess we could just reserve another few megabytes of address space for this purpose at the time we allocate the shared memory for the UA style sheets.
(The WebExtensions API takes the style sheet text from the addon and generates a data: URI which it passes into the style sheet service. This is also not great, as we have a massive nsIURI object hanging around in addition to the 1 MiB of parsed style sheet data.)
I also looked a uBlock Origin, which is the second most popular ad-blocking addon. It didn't load any style sheets into tracking / ad iframes. Each site had a unique style sheet. There may well be some commonality between them but it's not a big prefix like the ABP ones, so I couldn't tell at first glance. The sheet sizes were around 130 KiB.
| Assignee | ||
Comment 43•6 years ago
|
||
The only thing left blocking uploading patches here is a Talos regression in a couple of tests (ts_paint, sessionrestore) on macOS. Profiling and bisecting my code reveals that this is due to the ipc::SharedMemoryBasic::ShareToProcess call that I do in ContentParent::InitInternal, so I can pass the shared memory buffer down to the content process early on, in the SetXPCOMProcessAttributes IPC call.
It turns out that this call gets blocked, waiting for the newly created content process to finish its sandbox compilation call, which happens under mozilla::EarlyStartMacSandboxIfEnabled. We have bug 1412855 on file for sandbox compilation being slow; here it's taking around 100 ms.
Making the early macOS sandbox start later (after my ShareToProcess call) sounds like it defeats the purpose of being early. So I think either we need to fix the slow sandbox compilation or use some other method for passing the shared memory down to the child, e.g. like what we do for prefs, passing a handle on the command line.
Base ts_paint profile:
Profile with patch queue applied:
| Assignee | ||
Comment 44•6 years ago
|
||
(Not passing a handle on the command line, but having a fixed file descriptor for the UA sheet shared memory, which gets dup'd after the content process spawns.)
Comment 45•6 years ago
|
||
(In reply to Cameron McCormack (:heycam) from comment #43)
It turns out that this call gets blocked, waiting for the newly created content process to finish its sandbox compilation call, which happens under mozilla::EarlyStartMacSandboxIfEnabled. We have bug 1412855 on file for sandbox compilation being slow; here it's taking around 100 ms.
Bug 1465669 (using POSIX shm on Mac) would probably help here: sending the SCM_RIGHTS probably wouldn't block on anything to do with the receiving process, but even if it did it would block the IPC thread and not the main thread, which ought to at least impact total time less.
Comment 46•6 years ago
|
||
(In reply to Cameron McCormack (:heycam) (away Jan 18) from comment #43)
Making the early macOS sandbox start later (after my ShareToProcess call) sounds like it defeats the purpose of being early. So I think either we need to fix the slow sandbox compilation or use some other method for passing the shared memory down to the child, e.g. like what we do for prefs, passing a handle on the command line.
FWIW, I collected some rough data on Mac sandbox string compilation times on different hardware. Copied from bug 1501126 comment 3:
I also found that sandbox_init_with_parameters() is much slower
on older hardware like we have on try. Some rough numbers I collected
for the average execution time:
2008 MacBook with macOS 10.9: 500ms
try hardware (t-yosemite-r7-186) with macOS 10.10: 122ms
2015 MacBook Air with macOS 10.14: 15ms
2012 MacBook Pro with macOS 10.11: 14ms
2018 MacBook Pro with macOS 10.14: 8ms
| Assignee | ||
Comment 47•6 years ago
|
||
Thanks, Haik, that's interesting. I wonder how much of the improvement over the try hardware is due to the newer OS rather than the hardware. In any case, I've written a workaround to avoid sending the shared memory over IPC on macOS for now.
| Assignee | ||
Comment 48•6 years ago
|
||
| Assignee | ||
Comment 49•6 years ago
|
||
Each user agent style sheet has a unique URLExtraData object containing
its URL, but since they are refcounted objects, we can't share them
easily across processes. Rather than adding support for copying them
into a shared memory buffer like we will do with the Rust objects, here
we just set up a static array of URLExtraData objects per UA style
sheet. The array will be filled in in a later patch.
Rust UrlExtraData objects, once they are transformed into their
sharable form and copied into the shared memory buffer, will reference
them by an index.
Depends on D17181
| Assignee | ||
Comment 50•6 years ago
|
||
UA style sheets only ever specify a single generic font family in font-family
properties, so we pre-create a unique, static SharedFontList for each generic
and change the representation of FontFamilyList to be able to refer to them
by their generic ID. This avoids having to share refcounted SharedFontList
objects across processes.
Depends on D17182
| Assignee | ||
Comment 51•6 years ago
|
||
This avoids having to support storing refcounted URLValue objects in shared memory,
which would be tricky.
Depends on D17183
| Assignee | ||
Comment 52•6 years ago
|
||
Depends on D17184
| Assignee | ||
Comment 53•6 years ago
|
||
Depends on D17185
| Assignee | ||
Comment 54•6 years ago
|
||
Depends on D17186
| Assignee | ||
Comment 55•6 years ago
|
||
Depends on D17187
| Assignee | ||
Comment 56•6 years ago
|
||
Depends on D17188
| Assignee | ||
Comment 57•6 years ago
|
||
Depends on D17189
| Assignee | ||
Comment 58•6 years ago
|
||
Depends on D17190
| Assignee | ||
Comment 59•6 years ago
|
||
Depends on D17191
| Assignee | ||
Comment 60•6 years ago
|
||
Depends on D17192
| Assignee | ||
Comment 61•6 years ago
|
||
Depends on D17193
| Assignee | ||
Comment 62•6 years ago
|
||
Depends on D17194
| Assignee | ||
Comment 63•6 years ago
|
||
Depends on D17195
| Assignee | ||
Comment 64•6 years ago
|
||
Depends on D17196
| Assignee | ||
Comment 65•6 years ago
|
||
Depends on D17197
| Assignee | ||
Comment 66•6 years ago
|
||
Depends on D17198
| Assignee | ||
Comment 67•6 years ago
|
||
Depends on D17199
| Assignee | ||
Comment 68•6 years ago
|
||
Depends on D17200
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
| Comment hidden (obsolete) |
Updated•6 years ago
|
Updated•6 years ago
|
| Comment hidden (obsolete) |
| Assignee | ||
Comment 71•6 years ago
|
||
Updated•6 years ago
|
Comment 72•6 years ago
|
||
Comment 73•6 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/df3abfe65d13
https://hg.mozilla.org/mozilla-central/rev/858a1c942df3
https://hg.mozilla.org/mozilla-central/rev/6e3e40504fe5
https://hg.mozilla.org/mozilla-central/rev/2022cea7c178
https://hg.mozilla.org/mozilla-central/rev/1133c358953b
https://hg.mozilla.org/mozilla-central/rev/f10b3a04f246
https://hg.mozilla.org/mozilla-central/rev/dbaaaa587dfe
https://hg.mozilla.org/mozilla-central/rev/e3103dfb3c84
https://hg.mozilla.org/mozilla-central/rev/112e458d8382
https://hg.mozilla.org/mozilla-central/rev/48168cceb50e
https://hg.mozilla.org/mozilla-central/rev/020b16c1291d
https://hg.mozilla.org/mozilla-central/rev/11ac463510f6
https://hg.mozilla.org/mozilla-central/rev/1d7e012e4c26
https://hg.mozilla.org/mozilla-central/rev/cc47e694718e
https://hg.mozilla.org/mozilla-central/rev/6fe8bc633215
https://hg.mozilla.org/mozilla-central/rev/637160560b12
https://hg.mozilla.org/mozilla-central/rev/7c9ebe38c5a9
https://hg.mozilla.org/mozilla-central/rev/e450b952d748
https://hg.mozilla.org/mozilla-central/rev/7d7b50ad974a
https://hg.mozilla.org/mozilla-central/rev/81735d15243c
Comment 74•6 years ago
|
||
Impressive patch queue! Is there a high-level summary in the comments somewhere about how this works? If not, could you write one here? I'm also interested to know where we ended up on the adblocker question.
Comment 75•6 years ago
|
||
The awsy numbers for this look really good when enabling the pref, nice work Cameron!
Comment 76•6 years ago
|
||
Could you also file a bug for flipping the pref, link it to this bug, the meta bugs, and the bug(s) that block the flipping?
| Assignee | ||
Comment 77•6 years ago
|
||
I don't think the comments paint a high level view of what's going on, so here's an overview:
- In the parent process early on, we parse the UA style sheets into Gecko StyleSheet objects. Whereas some of the UA sheets used to be lazily loaded, we now load them all up front.
- We create a shared memory segment that we know ahead of time will be big enough to store the UA sheets.
- We create a Rust SharedMemoryBuilder object and pass it a pointer to the start of the shared memory.
- Then for each UA sheet we call Servo_SharedMemoryBuilder_AddStylesheet, which takes the StylesheetContents::rules (an Arc<Locked<CssRules>>), produces a deep clone of it, and writes that clone into the shared memory.
- The deep clone isn't a clone() call, but a call to ToShmem::to_shmem. The ToShmem trait must be implemented on every type that is reachable under the Arc<Locked<CssRules>>. The to_shmem call for each type will clone the value, but any heap allocation will be cloned and placed into the shared memory buffer, rather than heap allocated.
- For most types, the ToShmem implementation is simple, and we just #[derive(ToShmem)] it. For the types that need special handling due to having heap allocations (Vec<T>, Box<T>, Arc<T>, etc.) we have impls that call to_shmem on the heap allocated data, and then create a new container (e.g. using Box::from_raw) that points into the shared memory.
- Arc<T> and Locked<T> want to perform atomic writes on data that needs to be in the shared memory buffer (the reference count for Arc<T>, and the SharedRwLock's AtomicRefCell for Locked<T>), so we add special modes to those objects that skip the writes. For Arc<T>, that means never dropping the object since we don't track the reference count. That's fine, since we want to just drop the entire shared memory buffer at shutdown. For Locked<T>, we just panic on attempting to take the lock for writing. That's also fine, since we don't want devtools being able to modify UA sheets.
- For Atoms in Rust, we change the representation of static atoms to be an index into the static atom table. Then if we need to Deref the Atom we look up the table. We panic if any Atom we encounter in the UA style sheets is not a static atom.
- For each UA sheet, we create a new C++ StyleSheet object using the shared memory clone of the sheet contents, and throw away the original heap allocated one. (I guess we could avoid creating a new C++ StyleSheet object wrapping the shared contents, and update the original StyleSheet object's contents, but I doubt it matters.)
- When we initially map the shared memory in the parent process, we choose an address which is far away from the current extent of the heap. Although not too far, since we don't want to unnecessarily fragment the virtual address space. Then in the child process, as early as possible, we try to map the shared memory at that same address. This means any internal pointers in the UA sheets back into the shared memory buffer are valid in the child process too. In practice this works nearly all the time. If we fail to map at the address we need, the child process falls back to parsing and allocating its own copy of the UA sheets.
For add-ons, the way style sheets are generated with site specific rules would defeat the sharing. uBlock Origin for example has small-ish style sheets with just the right rules needed for each site, and so wouldn't benefit much from sharing. ABP has massive style sheets, but they are almost all unique style sheet text, and so wouldn't be shareable. We could ask ABP to change what they're doing.
But there are at least two challenges to extending this UA sheet sharing approach to work for addon loaded sheets. One is that they are dynamically added at a point in the process' life much later than we do the UA sheets. That means if we wanted a separate shared memory segment per addon loaded style sheet, there is a much lower chance of being able to map it to the same address in the child processes. Maybe we could reserve a certain amount of virtual address space ahead of time, but we would have to choose a size. The other challenge is that the addon loaded sheets have all kinds of dynamic atoms in them. We would need to have some kind of coordinated atom allocation strategy between processes if we wanted to make sure each of these atoms could be associated with a given index that could then be stored in the shared memory.
That isn't something I'm working on now.
| Assignee | ||
Comment 78•6 years ago
|
||
Bug 1533569 is the one to turn the pref on.
Comment 79•6 years ago
|
||
(In reply to Cameron McCormack (:heycam) (away Apr 19-28) from comment #77)
I don't think the comments paint a high level view of what's going on, so here's an overview:
This is an excellent writeup, thank you! Would you mind dropping it in a comment somewhere so that it's in-tree? r=me on this text wherever you feel is appropriate.
Comment 80•6 years ago
|
||
| Assignee | ||
Updated•6 years ago
|
Comment 81•6 years ago
|
||
| bugherder | ||
Description
•