Closed Bug 1751162 Opened 4 months ago Closed 4 months ago

Restrict nursery size to improve responsiveness if minor GCs are taking a long time

Categories

(Core :: JavaScript: GC, task, P1)

task

Tracking

()

RESOLVED FIXED
98 Branch

People

(Reporter: jonco, Assigned: jonco)

References

Details

Attachments

(4 files)

There are many reports of minor GCs taking over 10ms and these can hurt responsiveness.

One possibility is to reduce the maximum nursery size, which will reduce the maximum minor collection time correspondingly. This will however also reduce throughput for some workloads.

This bug is about investigating this tradeoff.

An alternative approach to this problem is to improve our pretenuring of long lived allocations (bug 1700291) but that is a lot more effort than adjusting the nursery size parameter.

Shell octane tests show our EarlyBoyer score decreases 4.1%, Splay by 2.5%, and a bunch of other tests between 0.5% to 1%.

Results (showing differences only):

                        Min       Mean      Max       CofV    Runs  Change    %     
====================================================================================
EarleyBoyer:
             opt-build   30666.0   31439.1   32941.0    1.2%    50
  bug/js/src/opt-build   29459.0   30165.3   31463.0    1.4%    50   -1273.8   -4.1%
Splay:
             opt-build   18482.0   19021.7   19542.0    1.2%    50
  bug/js/src/opt-build   18083.0   18548.8   18963.0    0.7%    50    -472.8   -2.5%
RegExp:
             opt-build    7569.0    7845.9    8232.0    2.0%    50
  bug/js/src/opt-build    7372.0    7765.8    8092.0    2.0%    50     -80.2   -1.0%
Gameboy:
             opt-build   77017.0   79762.9   82880.0    1.7%    50
  bug/js/src/opt-build   74888.0   79014.0   82798.0    2.0%    50    -748.9   -0.9%
NavierStokes:
             opt-build   27251.0   27718.0   28641.0    1.1%    50
  bug/js/src/opt-build   27157.0   27498.7   28703.0    1.2%    50    -219.3   -0.8%
Typescript:
             opt-build   40197.0   42749.4   44150.0    1.6%    50
  bug/js/src/opt-build   41915.0   42490.0   43502.0    1.0%    50    -259.4   -0.6%
CodeLoad:
             opt-build   30698.0   31431.9   32459.0    1.3%    50
  bug/js/src/opt-build   30678.0   31274.3   31900.0    1.2%    50    -157.6   -0.5%

Geometric mean:
             opt-build             27346.4
  bug/js/src/opt-build             27157.9                         -0.7%

An alternative approach is to factor collection times into the nursery size calculations, to reduce the size if collections are taking to long. This shouldn't limit throughput in cases where most of the nursery is garbage and collections are quick.

Looking at this code, I'm not sure why I did it like this. We hope that there
will be a single optimum nursery size that we can estimate, but the code tries
to estimate a growth rate instead. I think this leads to us over-growing the
nursery at the start of intensive workloads.

The patch changes this the code to estimate an optimum nursery size (in
smoothedTargetSize) based on the existing heuristics. This ended up with a
slower ramp up of nursery size so I changed it to weight clamped growth factors
more when calculating the smoothed value.

I tired hard to make this neutral on benchmark results, but I'll guess we'll
see.

Assignee: nobody → jcoppeard
Status: NEW → ASSIGNED

This limits the growth rate based on the length of collection time, such that
size is proportionally reduced when the time is longer than 4ms.

The existing 'timeFraction' got renamed to 'dutyFactor' to disambiguate.

This doesn't happen during pageload so as not to blow up all our page load
benchmarks.

This is pretty much a wash on octane although it improves splay latency by 3.8%
for me.

This should drastically reduce the number of long minor GCs we see. I expect
we'll see a change in the telemetry.

Depends on D136636

I'm going to land the patches individually as this will make it easier to pinpoint any performance regressions.

Keywords: leave-open
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/61fba7727d74
Part 1: Estimate a target nursery size rather than a target nursery growth rate r=sfink
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2c00bc26b97a
Part 2: Add a nursery size heuristic based on the length of collection time r=sfink
Severity: -- → N/A
Priority: -- → P1

== Change summary for alert #33069 (as of Tue, 25 Jan 2022 20:35:07 GMT) ==

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
8% google-docs-canvas LastVisualChange linux1804-64-shippable-qr cold fission webrender 2,226.67 -> 2,396.67

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
2% google-mail LastVisualChange linux1804-64-shippable-qr cold fission webrender 4,385.00 -> 4,290.00

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=33069

Regressions: 1752995
Regressions: 1753001
Regressions: 1753074

Nursery collection time telemetry shows that these changes were effective. The 99th percentile of collection times was reduced from ~15ms to ~10ms and the 95th percentile from 5.5ms to 4.3ms. Note that this is due to splitting up longer collections rather than doing less work which means where are now doing more minor collections.

Nursery size telemetry shows a large reduction in size at the 95th percentile.

Keywords: leave-open
Summary: Consider reducing the maximum nursery size to improve responsiveness → Restrict nursery size to improve responsiveness if minor GCs are taking a long time
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch
Regressions: 1768813
You need to log in before you can comment on or make changes to this bug.