CPG caused memory consumption to increase significantly.
In particular, the amount of fragmentation -- shown in about:memory under "gc-heap/arena/unused" -- increased greatly. We now have *lots* of small compartments like this:
│ ├───────39,224 B (00.01%) -- compartment([System Principal], chrome://global/content/globalOverlay.xul)
│ │ ├──32,768 B (00.01%) -- gc-heap
│ │ │ ├──25,736 B (00.01%) ── arena/unused
│ │ │ └───7,032 B (00.00%) ── sundries
│ │ └───6,456 B (00.00%) ── other-sundries
This is a tracking bug for fixing some or all of this regression.
CPG and Generation GC Compaction seem like they go hand in hand together, i've noticed more Fx freezes due to hitting the LAA virtual limit (roughly 3.78GB) with my large sessions since CPG landed
> CPG and Generation GC Compaction seem like they go hand in hand together,
Not necessarily. The problems with these small compartments is that we have 20-odd different kinds of GC thing, and each arena (which is 4KB) can only hold GC things of a single kind. So if we have e.g. a single GC thing of a particular kind, we'll waste just under 4KB for it.
So the fix seems to be to allow some kind of mingling of different GC thing kinds in a single arena.
> So the fix seems to be to allow some kind of mingling of different GC thing
> kinds in a single arena.
Or allowing different compartments to share arenas, as bug 759585 suggests.
The high compartment overhead hurts Firefox features that are implemented in JS in a modular style, e.g. Firefox Health Report (bug 831397).
The compartment overhead has three components:
1. Wasted space within arenas.
2. Space taken up by cross-compartment wrappers (both the objects and the CCW tables)
3. Strings get copied between compartments.
Having zones (bug 759585) would help avoid both 1 and 3. 2 seems to be less of a problem.
Alternatively, something like bug 807205 would also help both 1 and 3.
(In reply to Nicholas Nethercote [:njn] from comment #4)
> Having zones (bug 759585) would help avoid both 1 and 3. 2 seems to be less
> of a problem.
Strings do seem to be the main problem (because we push a lot of them around), but Bug 756549 does seem to provide some indication that cross-compartment wrappers are an expense for Sync (and perhaps FHR).
Sync is basically a giant test case for cross-compartment cost, given that the HTTP layer, crypto, record parsing, record handling, and each individual engine are in different compartments. Many of the objects we create — and we create a lot! — will be passed through at least four compartments, sometimes more than once.
Sync and its dependent libraries occupy more than 40 different compartments.
When we were tuning for Fx4 we found that we created around 1.6M objects per sync. That should be a little smaller today, and not all of those will cross compartments, but it gives you an idea of the scope of the problem.
Can we figure out some way to measure the amount of overhead to decide whether point 2 is worth addressing?
> Can we figure out some way to measure the amount of overhead to decide
> whether point 2 is worth addressing?
We already measure cross-compartment wrapper overhead in about:memory. Here's an example from the "Other measurements" section at the bottom, which gives the total overhead:
258,266,592 B (100.0%) -- js-main-runtime
├──209,884,320 B (81.27%) -- compartments
│ ├──127,614,976 B (49.41%) -- gc-heap
│ │ ├───34,671,424 B (13.42%) -- objects
│ │ │ ├─────796,192 B (00.31%) ── cross-compartment-wrapper
│ ├────1,530,816 B (00.59%) ── cross-compartment-wrapper-table
It shows the overhead of the CCW objects themselves, and the tables that reference them. If you look in the "explicit" tree, you'll see the same figures on a per-compartment level.
I've never seen these numbers get very high, but I haven't paid them close attention.
But even if we discover that the CCW overhead is higher, I don't know of any ideas on how to reduce it.
> But even if we discover that the CCW overhead is higher, I don't know of any ideas on how to reduce
Other than changing sync so it's not a worst-case benchmark of CCW overhead.
I know that would be ugly. But surely there are occasions when it's necessary to write ugly code to work around performance issues in your platform. This may be one of them; can we consider that option?
I set up a profile with a snapshot of my current places.sqlite, and did an upload first sync.
Memory usage hovered around 350MB during the sync, dropping to 185MB after completion and Minimize Memory Usage.
Counting up the individual cross-compartment-wrapper lines in each of the Sync modules reported in one snapshot totaled 6,308,576 bytes (wrappers + tables).
Other Measurements reports basically the same thing, just in one place:
136,149,032 B (100.0%) -- js-main-runtime
├──117,929,608 B (86.62%) -- compartments
│ ├───61,165,568 B (44.93%) -- gc-heap
│ │ ├──19,549,088 B (14.36%) -- objects
│ │ │ ├───8,583,792 B (06.30%) ── ordinary
│ │ │ ├───7,910,176 B (05.81%) ── function
│ │ │ ├───2,437,536 B (01.79%) ── cross-compartment-wrapper
│ │ │ └─────617,584 B (00.45%) ── dense-array
│ │ ├──15,895,456 B (11.68%) -- strings
│ │ │ ├──10,323,040 B (07.58%) ── normal
│ │ │ └───5,572,416 B (04.09%) ── short
│ │ ├──12,727,576 B (09.35%) ── unused-gc-things
│ ├────4,518,240 B (03.32%) ── cross-compartment-wrapper-table
so it looks like mid-sync we're holding about 10x the cross-compartment-wrapper-table size that you're seeing in regular use (and almost all of that is Sync).
(Of course we're actually generating way more than that, but it's being GCed in phases over the three minutes that the first sync takes on my machine.)
(In reply to Justin Lebar [:jlebar] from comment #7)
> Other than changing sync so it's not a worst-case benchmark of CCW overhead.
I'm totally on-board with this.
That said, I don't know if we have the leeway to consider *any* option that's focused on making Sync and Sync alone better: if cross-compartment overhead is not a problem for other consumers in the tree, then I would have a hard time arguing that it's worth the effort to address, whether that's by rewriting Sync or by altering our runtime environment.
> Counting up the individual cross-compartment-wrapper lines in each of the
> Sync modules reported in one snapshot totaled 6,308,576 bytes (wrappers +
> Other Measurements reports basically the same thing, just in one place:
Just in case it wasn't clear: the "other measurements" numbers are the sum of all the individual numbers in "explicit allocations". So that's exactly what should happen.
> │ │ │ ├───2,437,536 B (01.79%) ── cross-compartment-wrapper
> │ ├────4,518,240 B (03.32%) ── cross-compartment-wrapper-table
> so it looks like mid-sync we're holding about 10x the
> cross-compartment-wrapper-table size that you're seeing in regular use (and
> almost all of that is Sync).
Which is a fairly small proportion of the total memory consumption caused by Sync, right?
Hmm... the wrapper objects will be GC'd once they're no longer needed, but it's possible that the CCW tables won't be shrunk back down if the number of objects tracked by them drops. It might be worth opening a bug for that.
(In reply to Nicholas Nethercote [:njn] from comment #10)
> Just in case it wasn't clear: the "other measurements" numbers are the sum
> of all the individual numbers in "explicit allocations". So that's exactly
> what should happen.
Yes, but additionally it means that Sync is almost 100% of the total (at least, with no browsing occurring), which is nice to confirm.
> Which is a fairly small proportion of the total memory consumption caused by
> Sync, right?
The result of this investigation is:
* Sync will cause a tenfold increase in CCW use over steady state
* Said consumption is (only) 3% of the (massive) heap.
Or, phrased differently: compartments adds a 3% overhead for Sync on the object side of the fence.
That's probably not worth actively trying to improve unless there's some other part of the product that's being impacted (see my Comment 9).
That said, if we *did* manage to reduce that CCW overhead -- perhaps combining JSMs into a single compartment (Bug 807205) as an approach to addressing other parts of this meta bug? -- then we would save 6MB of heap versus the current state of affairs.
> Hmm... the wrapper objects will be GC'd once they're no longer needed, but
> it's possible that the CCW tables won't be shrunk back down if the number of
> objects tracked by them drops. It might be worth opening a bug for that.
Closing this, because we're tracking the two remaining blockers separately.