Closed Bug 1492711 Opened 1 year ago Closed 1 year ago

Nursery shrinking may take multiple cycles before tenure rates are effected

Categories

(Core :: JavaScript: GC, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: pbone, Assigned: pbone)

References

Details

Attachments

(3 files)

When we make a decision to shrink the nursery there's a good chance it does nothing to change the tenure performance.

If the nursery is 4MB large and only 2MB are used, and we decide to shrink it based on collecting that 2MB, then we shrink it to 3MB (4 chunks, minus 1 chunk).  This means it will take more collections before the nursery reaches a more ideal size (one where the tenure rate is more ideal).
Summary: Shrink the nursery to below the used size → Nursery shrinking may take multiple cycles before tenure rates are effected
When we shrink the nursery always choose a size that's less than the used
size, not the capacity.  If we have used 2MB out of a 4MB nursery, and
decide to shrink it, the new size will be 1MB (1 chunk less than the used
size).
Attachment #9010523 - Flags: review?(jcoppeard)
It's not a clear signal, but this change looks like it reduces memory usage:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=0a6ac0129952b708387e526929b1742856c46b25&framework=4&showOnlyComparable=1&selectedTimeRange=172800

Explicit memory used is lower (between 0.5% - 1%)

Resident memory used is lower (About 2%)

These have really poor cofidence scores though, so it's unclear.  Some other usages go up including Base Resident memory (I think that's the memory used after Firefox starts but before the test "begins"), but that also has a lot more noise with my patch compared to without.  My guess is that this change also causes SpiderMonkey to allocate and release chunks more often, creating noise.

The speedometer score is about 0.76% slower,

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=0a6ac0129952b708387e526929b1742856c46b25&framework=1&showOnlyComparable=1&selectedTimeRange=172800

This suggests that a sudden shrinking of the nursery affects the ability to react to changes in workload.  Other scores within the Talos suite imporve.

But this data is pretty noisy, I could run more tests, but I think this is the correct behaviour anyway.
(In reply to Paul Bone [:pbone] from comment #3)
I agree that we should make our nursery shrinking more aggressive.  This is important now that we're doing fewer nursery collections.

I'm a bit worried about that speedometer result though.  But the platform that matters most is Windows.  Can you retest for all platforms and see what the results show?

Another thing to consider: what if we kept the current behaviour of shrinking the capacity by one chunk at at time, but only checked the current promotion rate, rather than current and previous as we do now?  That could make the nursery faster to react.  Could you write patch to do that and compare?
Attachment #9010524 - Flags: review?(jcoppeard) → review+
I ran the tests again and rather than compare against the last 24/48 hours of central pushes, I compared against the point my code branches off.  I do not think this change is making a difference in memory usage:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f9490f1d50ea&newProject=try&newRevision=9670b6b2606c&framework=4&showOnlyComparable=1

They're all "low" confidence and some are up and some are down.

However, there does seem to be at least a little (although it's in the noise) performance loss for speedometer.  Particularly on OS X.  My guess is OS X's mmap routine may be a little slower than the other platforms.  I know these free chunks get cached, but OS X still gets a poorer score,  IMHO this is still really in the noise and we'd need many more runs to be confident.  I don't want to land this patch yet even if it seems cleaner until we know more about the potential slowdown for speedometer.


I also tried your alternative idea Jon, shrinking the nursery only depends on the current promotion rate, not the current + last ones:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f9490f1d50ea&newProject=try&newRevision=4cb30c2e1399&framework=4&showOnlyComparable=1

It made no difference, so I propose landing this because it simplifies things.
Comment on attachment 9010948 [details] [diff] [review]
Bug 1492711 - Use only the current promotion rate for resize decisions

Review of attachment 9010948 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thanks for doing this.
Attachment #9010948 - Flags: review?(jcoppeard) → review+
See Also: → MinGCMem
There are two reasons why this may be slower and less effective than I'd hoped.

1) With the nursery changing size more quickly, free chunks will be poisoned more often as they move between the nursery and the free chunk list.  This will make this slower than expected.

2) If the nursery tends to be smaller, we'll collect it more often and promote objects that might have died if the nursery were bigger.  This means we're doing more work and those objects will then take up memory in the tenured heap, possibly negating any memory savings this change makes.

I'll try to test without poisoning.  But I don't think there's much we can do about the second option, except for having more than one step in the nursery or similar things.  Which following from a conversation with jonco, sfink and I, I think we should retest and see if it helps (it was tried in the past but did not help, but we can try again).
Comment on attachment 9010523 [details] [diff] [review]
Bug 1492711 - Make nursery shrinking more aggressive

Review of attachment 9010523 [details] [diff] [review]:
-----------------------------------------------------------------

I think we should land the third patch ("Use only the current promotion rate for resize decisions") and see what happens.  As you say it's simpler and makes no difference to benchmark results.

This one feels a bit drastic so I don't think we should land it for now.
Attachment #9010523 - Flags: review?(jcoppeard)
Oh wait, you didn't say that, I just thought you had.  Anyway, I reckon we should land it and see what happens.  We can back it out if it makes things worse.
(In reply to Jon Coppeard (:jonco) (PTO until 9th October) from comment #11)
> Oh wait, you didn't say that, I just thought you had.  Anyway, I reckon we
> should land it and see what happens.  We can back it out if it makes things
> worse.

You're not mistake, I did say that (comment 5).  Okay I'll land these two patches and let this bug close.  I'll keep the other patch around and might propose it for another bug.
Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d4ccb5aa7b14
Use only the current promotion rate for resize decisions r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/9fdde4e90efb
Change a silent failure to an assertion r=jonco
https://hg.mozilla.org/mozilla-central/rev/d4ccb5aa7b14
https://hg.mozilla.org/mozilla-central/rev/9fdde4e90efb
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Regressions: 1538594
You need to log in before you can comment on or make changes to this bug.