Closed Bug 1433007 Opened 6 years ago Closed 5 years ago

Allow the nursery to use less than a single chunk

Categories

(Core :: JavaScript: GC, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla67
Performance Impact low
Tracking Status
firefox60 --- wontfix
firefox66 --- wontfix
firefox67 --- fixed

People

(Reporter: pbone, Assigned: pbone)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [overhead:noted])

Attachments

(6 files, 11 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
As functional and immutable JS is becomming more popular we're going to see more situations like Bug 1414385 where a lot of memory is allocated and then becomes dead.  There are a lot of Minor GC events and they're rather cheap since very little memory is tenured.

It may help to tune this situation by reducing the nursery size further, currently the limit is one chunk (usually 1MB).  We could add a more aggressive mode that limited the nursery size to less than a chunk to attempt to keep more of it in cache (or leave some cache for other non-nursery) objects.  This won't speed-up marking/tenuring, and indeed there will be more MinorGC events, however the mutator will touch less memory and therefore have fewer cache misses.
Sorry, I meant to say Bug 1432343.
Blocks: 1432343
Whiteboard: [qf]
Whiteboard: [qf] → [qf:f63][qf:p3]
Priority: -- → P3
Blocks: MinGCMem
Whiteboard: [qf:f63][qf:p3] → [qf:f63][qf:p3][overhead:noted]
Assignee: nobody → pbone
Status: NEW → ASSIGNED
Blocks: 1490917
Status update: I have some partly written patches for this, it's about half done.
+ Comment currentStartChunk_ in more detail
 + Remove misplaced comment about allocation during maybeResizeNursery, this
   code no-longer does any allocation.
Attachment #9015830 - Flags: review?(jcoppeard)
+ Add a limit so that the nursery can collect after using a fraction of a
   chunk.
 + Don't do idle time collection if the nursery is empty, even though the
   free space may now be less than the free space threshold, collecting it
   wont help.
 + Modify a test that expects a larger nursery.
Attachment #9015831 - Flags: review?(jcoppeard)
Comment on attachment 9015830 [details] [diff] [review]
Bug 1397549 - Fix two comments in nursery code

Review of attachment 9015830 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Nursery.h
@@ +367,5 @@
>  
> +    /*
> +     * Pointer to the logical start of the Nursery.  This can be an arbitrary
> +     * position in an arbitrary chunk when a generational GC zeal mode is
> +     * active.  See Nursery::clear(), otherwise it's 0, chunk(0).start().

The last sentence doesn't parse for me.  How about:

...when a generational GC zeal mode is active, otherwise chunk(0).start().

Also, can you move this comment to apply to currentStartPosition_ not currentStartChunk_?
Attachment #9015830 - Flags: review?(jcoppeard) → review+
Comment on attachment 9015831 [details] [diff] [review]
Bug 1397549 - Add a sub-chunk limit to the nursery

Review of attachment 9015831 [details] [diff] [review]:
-----------------------------------------------------------------

This seems fine.  I'd like to see what difference this makes before giving r+ though.

::: js/src/gc/Nursery.cpp
@@ +1145,5 @@
> +        bytes = currentEnd_ - currentStartPosition_;
> +    } else {
> +        bytes = (chunk(currentStartChunk_).end() - currentStartPosition_) +
> +                ((lastChunk - currentStartChunk_) * NurseryChunkUsableSize);
> +    }

Does it work to use the original calculation with chunk(currentStartChunk_).end() replaced by currentEnd_?

@@ +1259,5 @@
> +                updateSubChunkMode(subChunkLimit_ - DefaultSubChunkLimit);
> +            }
> +        } else {
> +            if (maxChunkCount() == 1) {
> +                updateSubChunkMode(ChunkSize/2);

Should this be |ChunkSize - DefaultSubChunkLimit| to be in keeping with the other shrinking calculation in the branch above?
Attachment #9015831 - Flags: review?(jcoppeard) → feedback+
(In reply to Jon Coppeard (:jonco) from comment #6)
I'm also wondering if there's a simpler way to organise this than having a separate sub chunk mode.  Say storing the nursery size in bytes and dividing by chunk size to get the chunk count when needed.
jonco, Sorry about the confusion on IRC last night, I wrote the following comment in bugzilla here but it must have been at the same moment you reviewed a patch, so there was a conflict and I didn't notice until now:

These patches are the work so far, it requires one further patch so that we don't poision the memory beyond the current sub-chunk limit, or ideally only the used memory.

Maybe as a seperate bug we should also avoid touching the chunk before it is handed to the nursery, so that the pages in the chunk are not mapped in until required.  And optionally decommit parts of a chunk if we know we havn't used them in a while.  This should be seperate because it provides a different benifit (memory resident) rather than performance as this bug does.
Depends on: 1498095
(In reply to Jon Coppeard (:jonco) from comment #6)
> Comment on attachment 9015831 [details] [diff] [review]
> Bug 1397549 - Add a sub-chunk limit to the nursery
> 
> Review of attachment 9015831 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This seems fine.  I'd like to see what difference this makes before giving
> r+ though.
> 
> ::: js/src/gc/Nursery.cpp
> @@ +1145,5 @@
> > +        bytes = currentEnd_ - currentStartPosition_;
> > +    } else {
> > +        bytes = (chunk(currentStartChunk_).end() - currentStartPosition_) +
> > +                ((lastChunk - currentStartChunk_) * NurseryChunkUsableSize);
> > +    }
> 
> Does it work to use the original calculation with
> chunk(currentStartChunk_).end() replaced by currentEnd_?

No, those can refer to different chunks, therefore their positions can't be compared.

> 
> @@ +1259,5 @@
> > +                updateSubChunkMode(subChunkLimit_ - DefaultSubChunkLimit);
> > +            }
> > +        } else {
> > +            if (maxChunkCount() == 1) {
> > +                updateSubChunkMode(ChunkSize/2);
> 
> Should this be |ChunkSize - DefaultSubChunkLimit| to be in keeping with the
> other shrinking calculation in the branch above?

Good idea.

Thanks.
Hi jonco,

You've already given this r+ but I've rephrased one of the comments completely.  SO I want to do the right thing and request review again.

Cheers.
Attachment #9015830 - Attachment is obsolete: true
Attachment #9017380 - Flags: review?(jcoppeard)
Depends on: 1499043
Comment on attachment 9017380 [details] [diff] [review]
Bug 1397549 - (Part 1) Fix two comments in nursery code

Review of attachment 9017380 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the updates.
Attachment #9017380 - Flags: review?(jcoppeard) → review+
Attachment #9017380 - Attachment description: Bug 1397549 - Fix two comments in nursery code → Bug 1397549 - (Part 1) Fix two comments in nursery code
Comment on attachment 9017380 [details] [diff] [review]
Bug 1397549 - (Part 1) Fix two comments in nursery code

># HG changeset patch
># User Paul Bone <pbone@mozilla.com>
># Date 1539164140 -39600
>#      Wed Oct 10 20:35:40 2018 +1100
># Node ID 9bf0f6c41547d0f9c0f375d2d7ea60eca469c7bb
># Parent  4c11ab0cd98950983cfc957f579ace6c3e918a43
>Bug 1397549 - Fix two comments in nursery code
>
> + Comment currentStartChunk_ in more detail
> + Remove misplaced comment about allocation during maybeResizeNursery, this
>   code no-longer does any allocation.
>
>diff --git a/js/src/gc/Nursery.cpp b/js/src/gc/Nursery.cpp
>--- a/js/src/gc/Nursery.cpp
>+++ b/js/src/gc/Nursery.cpp
>@@ -1232,8 +1232,6 @@ js::Nursery::maybeResizeNursery(JS::gcre
>         float(previousGC.tenuredBytes) / float(previousGC.nurseryCapacity);
> 
>     if (promotionRate > GrowThreshold) {
>-        // The GC nursery is an optimization and so if we fail to allocate
>-        // nursery chunks we do not report an error.
>         growAllocableSpace();
>     } else if (maxChunkCount() > 1 && promotionRate < ShrinkThreshold) {
>         shrinkAllocableSpace(maxChunkCount() - 1);
>diff --git a/js/src/gc/Nursery.h b/js/src/gc/Nursery.h
>--- a/js/src/gc/Nursery.h
>+++ b/js/src/gc/Nursery.h
>@@ -365,7 +365,11 @@ class Nursery
>     /* Pointer to the first unallocated byte in the nursery. */
>     uintptr_t position_;
> 
>-    /* Pointer to the logical start of the Nursery. */
>+    /*
>+     * These fields refer to the beginning of the nursery. They're normally 0
>+     * and chunk(0).start() respectively. Except when a generational GC zeal
>+     * mode is active, then they may be arbitrary (see Nursery::clear()).
>+     */
>     unsigned currentStartChunk_;
>     uintptr_t currentStartPosition_;
>
Attachment #9017380 - Attachment is obsolete: true
Blocks: 1497873
Depends on: 1505666
Add capacity(), lazyCapacity() and usedSpace() functions to the Nursery.
Attachment #9023566 - Flags: review?(jcoppeard)
To make the next change simpler we will store the nursery capacity in bytes
so it can be varied more granularly.
Attachment #9023567 - Flags: review?(jcoppeard)
+ Add a limit so that the nursery can collect after using a fraction of a
   chunk.
 + Don't do idle time collection if the nursery is empty, even though the
   free space may now be less than the free space threshold, collecting it
   wont help.
 + Modify a test that expects a larger nursery.

Hi Jon, This appears to be stable, but I don't have performance results yet.  I won't commit it until we have those / they seem okay, so feel free to hold back r+ until then.
Attachment #9015831 - Attachment is obsolete: true
Attachment #9023568 - Flags: review?(jcoppeard)
Attachment #9023566 - Flags: review?(jcoppeard) → review+
Comment on attachment 9023567 [details] [diff] [review]
Bug 1433007 - (Part 2) Store the nursery capacity in bytes

Review of attachment 9023567 [details] [diff] [review]:
-----------------------------------------------------------------

It's good you're being exact by using the chunk usable size, but I think it would simplify things to just use ChunkSize and have capacity be a multiple of that.  The trailer size is very small compared to the size of a chunk so this shouldn't make any real difference.  What do you think?

::: js/src/gc/Nursery.h
@@ +162,5 @@
>      // lazilly allocated and added to the chunks array up to this limit, after
>      // that the nursery must be collected, this limit may be raised during
>      // collection.
> +    unsigned maxChunkCount() const
> +    {

nit: brace goes on same line for inline method definitions.

@@ +166,5 @@
> +    {
> +        if (capacity()) {
> +            return (capacity() + NurseryChunkUsableSize - 1) / NurseryChunkUsableSize;
> +        } else {
> +            return 0;

Do you need the if statement?

@@ +407,4 @@
>      unsigned currentChunk_;
>  
>      /*
> +     * The current nursery capacity measured in bytes, it may grow up to this

nit: comma splice.  "it may grow..." should be a new sentence.

This should also state whether the size includes chunk trailers.

@@ +412,3 @@
>       * change during maybeResizeNursery() each collection.
>       */
> +    unsigned capacity_;

Please make this a size_t as it's storing a size.
Comment on attachment 9023568 [details] [diff] [review]
Bug 1433007 - (Part 3) Add a sub-chunk limit to the nursery

Review of attachment 9023568 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Nursery.cpp
@@ +730,4 @@
>  js::Nursery::needIdleTimeCollection() const {
>      uint32_t threshold =
>          runtime()->gc.tunables.nurseryFreeThresholdForIdleCollection();
> +    return !isEmpty() && (minorGCRequested() || freeSpace() < threshold);

Is the problem that the nursery may now be smaller than the threshold to start with?  It seems like this will end up collecting the nursery every time it is non-empty when in this state.  Maybe we should vary the threshold based on the nursery size.

@@ +1140,5 @@
>  
> +    size_t bytes;
> +    if (chunkCount == 1) {
> +        // We need to use currentEnd_ when in sub chunk mode, but it also
> +        // works generally when chunkCount == 1.

Does it work when chunkCount > 1?  If so can simplify this?

@@ +1189,5 @@
> +        currentEnd_ = chunk(0).start() + capacity_;
> +        MOZ_ASSERT(currentEnd_ < chunk(0).end());
> +    } else {
> +        currentEnd_ = chunk(currentChunk_).end();
> +    }

I think you could simplify this to:

currentEnd_ = chunk(0).start() + Min(capacity_, ChunkUsableSize)
(In reply to Jon Coppeard (:jonco) from comment #17)
I also meant to say: the issues above are minor and it's more important to get some good performance data on what difference these changes make.
After some profiling I've found what appears to be a lower limit to a practical nursery size.  About 160KB.  Much lower and the frequent nursery collections' root scanning is greater than any locality benifit.  The localty benifit itself is not directly measureable, but it is indirectly measurable.

More frequent collections mean more root scanning, and the total time spent scanning roots indeed goes up.  If you subtract this from the program's total runtime with a 160K and 1MB heap, (which basically the same for my test program).  Then the remaining time (mutator time + gc time - root scanning time) is lower with a smaller nursery.

tl;dr: these patches do not make anything faster (without also changing root scanning) but they may save some memory.

Comparing central with a 160KB minimum nursery: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=c5b199307ac3&framework=1&showOnlyComparable=1&selectedTimeRange=172800

Comparing central with a 192KB minimum nursery: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=2de9aa5d5f30&framework=1&showOnlyComparable=1&selectedTimeRange=172800

Comparing 160KB minimum nursery with a 160KB minimum nursery: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c5b199307ac3&newProject=try&newRevision=2de9aa5d5f30&framework=1&showOnlyComparable=1&showOnlyConfident=1

I'd be happy to land this now with one of these nursery sizes, or I can attempt to quantify the memory savings first.
(In reply to Paul Bone [:pbone] from comment #20)
> I'd be happy to land this now with one of these nursery sizes, or I can
> attempt to quantify the memory savings first.

It looks like across the board [1,2,3,4] there's a 3-4MB win on the AWSY 'JS Fresh start [+30s]' metric, the rest are pretty much in the noise. Possibly a slight increase in the non-'+30s' measurements (so without sitting idle of 30 seconds).

[1] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=2de9aa5d5f30&originalSignature=b902f4a938e54de76925d9ad1c902b65681cb33f&newSignature=b902f4a938e54de76925d9ad1c902b65681cb33f&framework=4&selectedTimeRange=172800
[2] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=2de9aa5d5f30&originalSignature=276ee65ea65ac53fbf3e703ac268a332101e315b&newSignature=276ee65ea65ac53fbf3e703ac268a332101e315b&framework=4&selectedTimeRange=172800
[3] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=2de9aa5d5f30&originalSignature=27a2e95c5b50b4a65f02eb061fc1288359bd58f3&newSignature=27a2e95c5b50b4a65f02eb061fc1288359bd58f3&framework=4&selectedTimeRange=172800
[4] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=2de9aa5d5f30&originalSignature=17ad55f8ba2a99d51c8635ae11776a844f4d2c4d&newSignature=17ad55f8ba2a99d51c8635ae11776a844f4d2c4d&framework=4&selectedTimeRange=172800
To check my own understanding: With this change, we still allocate the nursery in chunk-sized (e.g. 1MiB) units but we allow the nursery to shrink below the reserved size, enabling the OS to decommit the unused pages. Is that correct?
(In reply to Eric Rahm [:erahm] from comment #21)
> (In reply to Paul Bone [:pbone] from comment #20)
> > I'd be happy to land this now with one of these nursery sizes, or I can
> > attempt to quantify the memory savings first.
> 
> It looks like across the board [1,2,3,4] there's a 3-4MB win on the AWSY 'JS
> Fresh start [+30s]' metric, the rest are pretty much in the noise. Possibly
> a slight increase in the non-'+30s' measurements (so without sitting idle of
> 30 seconds).
> 

Thanks, I didn't notice those subtests.  This is good.
Blocks: 1506733
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #22)
> To check my own understanding: With this change, we still allocate the
> nursery in chunk-sized (e.g. 1MiB) units but we allow the nursery to shrink
> below the reserved size, enabling the OS to decommit the unused pages. Is
> that correct?

Almost correct.

It is still allocated in chunk sized units, and allowed to shrink below that size.  But It lacks the ability to tell the OS it can decommit those pages.  The OS may try to swap them out instead.
The saving I'm aiming for here is cache and TLB usage.

I have a follow up patch (bug 1506733) to tell the OS it can swap them out, but it needs some more tweaking.
(In reply to Jon Coppeard (:jonco) from comment #16)
> Comment on attachment 9023567 [details] [diff] [review]
> Bug 1433007 - (Part 2) Store the nursery capacity in bytes
> 
> Review of attachment 9023567 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It's good you're being exact by using the chunk usable size, but I think it
> would simplify things to just use ChunkSize and have capacity be a multiple
> of that.  The trailer size is very small compared to the size of a chunk so
> this shouldn't make any real difference.  What do you think?

I think making it the usable size makes two things simpler: the freeSpace() and isSubChunkMode functions.  freeSpace() can easily return the actual usable space, and isSubChunkMode() is used when deciding how much to shrink or grow the nursery by.  So I'd like to leave it the way it is, I don't think changing it would make anything else simpler.

> ::: js/src/gc/Nursery.h
> @@ +166,5 @@
> > +    {
> > +        if (capacity()) {
> > +            return (capacity() + NurseryChunkUsableSize - 1) / NurseryChunkUsableSize;
> > +        } else {
> > +            return 0;
> 
> Do you need the if statement?

It would return a bad value if the nursery is disabled.  But we can avoid calling it in that case to remove the if statement.
To make the next change simpler we will store the nursery capacity in bytes
so it can be varied more granularly.
Attachment #9023567 - Attachment is obsolete: true
Attachment #9023567 - Flags: review?(jcoppeard)
Attachment #9024629 - Flags: review?(jcoppeard)
Blocks: 1506761
(In reply to Jon Coppeard (:jonco) from comment #17)
> Comment on attachment 9023568 [details] [diff] [review]
> Bug 1433007 - (Part 3) Add a sub-chunk limit to the nursery
> 
> Review of attachment 9023568 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/gc/Nursery.cpp
> @@ +730,4 @@
> >  js::Nursery::needIdleTimeCollection() const {
> >      uint32_t threshold =
> >          runtime()->gc.tunables.nurseryFreeThresholdForIdleCollection();
> > +    return !isEmpty() && (minorGCRequested() || freeSpace() < threshold);
> 
> Is the problem that the nursery may now be smaller than the threshold to
> start with?  It seems like this will end up collecting the nursery every
> time it is non-empty when in this state.  Maybe we should vary the threshold
> based on the nursery size.

Yes, that's the problem.  I'd prefer to make that seperate from this change. (Bug 1506761)

> @@ +1140,5 @@
> >  
> > +    size_t bytes;
> > +    if (chunkCount == 1) {
> > +        // We need to use currentEnd_ when in sub chunk mode, but it also
> > +        // works generally when chunkCount == 1.
> 
> Does it work when chunkCount > 1?  If so can simplify this?

Not generally, we either have to make it depend on that, or on a zeal mode.  Since sometimes the first chunk doesn't begin at its beginning.

> @@ +1189,5 @@
> > +        currentEnd_ = chunk(0).start() + capacity_;
> > +        MOZ_ASSERT(currentEnd_ < chunk(0).end());
> > +    } else {
> > +        currentEnd_ = chunk(currentChunk_).end();
> > +    }
> 
> I think you could simplify this to:
> 
> currentEnd_ = chunk(0).start() + Min(capacity_, ChunkUsableSize)

ITYM chunk(currentChunk_) in there.  But yes, that's simpler.
+ Add a limit so that the nursery can collect after using a fraction of a
   chunk.
 + Don't do idle time collection if the nursery is empty, even though the
   free space may now be less than the free space threshold, collecting it
   wont help.
 + Modify a test that expects a larger nursery.
Attachment #9023568 - Attachment is obsolete: true
Attachment #9023568 - Flags: review?(jcoppeard)
Attachment #9024635 - Flags: review?(jcoppeard)
Use a 192KB lower limit for the nursery size but allow it to adjust by 64KB.
Attachment #9024636 - Flags: review?(jcoppeard)
Attachment #9024645 - Flags: review?(jcoppeard)
(In reply to Paul Bone [:pbone] from comment #30)
> Created attachment 9024645 [details] [diff] [review]
> Bug 1433007 - (Part 5) Make initial nursery size smaller

I'm testing the impact of this change now.
(In reply to Paul Bone [:pbone] from comment #20)

> I'd be happy to land this now with one of these nursery sizes, or I can
> attempt to quantify the memory savings first.

Looking at those performance results I see a whole load of regressions and only one or two improvements.  We can't land this with those regressions.

Do you know why this is affecting performance so much?  Maybe the nursery size is staying smaller than necessary.
(In reply to Paul Bone [:pbone] from comment #27)
> (In reply to Jon Coppeard (:jonco) from comment #17)
> >  It seems like this will end up collecting the nursery every
> > time it is non-empty when in this state.  Maybe we should vary the threshold
> > based on the nursery size.
> 
> Yes, that's the problem.  I'd prefer to make that seperate from this change.
> (Bug 1506761)

No, we'll need to fix that before this can land.
(In reply to Jon Coppeard (:jonco) from comment #32)
> (In reply to Paul Bone [:pbone] from comment #20)
> 
> > I'd be happy to land this now with one of these nursery sizes, or I can
> > attempt to quantify the memory savings first.
> 
> Looking at those performance results I see a whole load of regressions and
> only one or two improvements.  We can't land this with those regressions.
> 
> Do you know why this is affecting performance so much?  Maybe the nursery
> size is staying smaller than necessary.

I thought I had addressed these, but my latest tests weren't done yet.  Here are the latest.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=35d4af39bc21&framework=1&showOnlyConfident=1&selectedTimeRange=172800

They look similar to the previous tests.  What I'm noticing is that tests like kraken which have sub tests which exercise the GC differently:  https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=35d4af39bc21&originalSignature=c1ea90be01eecd577300791700af3661be7d36b6&newSignature=c1ea90be01eecd577300791700af3661be7d36b6&framework=1&selectedTimeRange=172800

The poor-performing subtests are exactly the ones I'd hoped to optimise, they do a lot of allocation that dies early and the nursery shrinks.  So far what I've found is that when this happens, and the nursery is collected more often is that root marking occurs more often but isn't any cheaper, so it begins to dominate the cost.  I thought I addressed that by setting the lower limit for the nursery size at 192KB, testing locally that was just above the point where the savings due to a smaller nursery and the cost of root marking traded off.

I am also wondering if Bug 1497873 will make things imporve / helps us get better signal with these regressions.
No longer blocks: 1497873, 1506761
Depends on: 1506761, 1497873
Blocks: 1507445
(In reply to Paul Bone [:pbone] from comment #25)
> I think making it the usable size makes two things simpler: the freeSpace() and isSubChunkMode functions.

It doesn't matter if freeSpace() is exact and isSubChunkMode() is still just | current size < chunk size | surely?

> I don't think changing it would make anything else simpler.

You put a comment in part 3 that says "The current nursery size is is not a multiple of DefaultSubChunkLimit, but it'd be nice if it was" and I agree with this comment.

> > ::: js/src/gc/Nursery.h
> > @@ +166,5 @@
> > > +    {
> > > +        if (capacity()) {
> > > +            return (capacity() + NurseryChunkUsableSize - 1) / NurseryChunkUsableSize;
> > > +        } else {
> > > +            return 0;
> > 
> > Do you need the if statement?
> 
> It would return a bad value if the nursery is disabled.  But we can avoid
> calling it in that case to remove the if statement.

I don't get this.  What bad value would it return?
See Also: → 1515380

Hoping to land this on this train, its dependencies still have some regressions though.

Priority: P3 → P1
Whiteboard: [qf:f63][qf:p3][overhead:noted] → [qf:f66][qf:p3][overhead:noted]
Comment on attachment 9024629 [details] [diff] [review]
Bug 1433007 - (Part 2) Store the nursery capacity in bytes

Review of attachment 9024629 [details] [diff] [review]:
-----------------------------------------------------------------

I was expecting you to post updated patches following my comments above.  But yes, hopefully we can get this in soon.
Attachment #9024629 - Flags: review?(jcoppeard)
Attachment #9024635 - Flags: review?(jcoppeard)

(In reply to Paul Bone [:pbone] from comment #36)
I was

Summary: Tune nursery for high allocation and low tenure rates → Allow the nursery to use less than a single chunk

(In reply to Jon Coppeard (:jonco) from comment #37)

Comment on attachment 9024629 [details] [diff] [review]
Bug 1433007 - (Part 2) Store the nursery capacity in bytes

Review of attachment 9024629 [details] [diff] [review]:

I was expecting you to post updated patches following my comments above.
But yes, hopefully we can get this in soon.

It was waiting on Bug 1497873 which should land soon.

Attachment #9024635 - Attachment is obsolete: true
Attachment #9024636 - Attachment is obsolete: true
Attachment #9024636 - Flags: review?(jcoppeard)
Attachment #9024645 - Attachment is obsolete: true
Attachment #9024645 - Flags: review?(jcoppeard)
Blocks: 1506761
No longer depends on: 1506761

The latest patches might be regressing some things a slight amount, but it's hard to say. Meanwhile some of the js-bench benchmarks improve a lot. This is basically on par with what I learnt last time I did performance testing for this change. That said, once we follow up with Bug 1506733 we should see some good memory improvements.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=64af4e0fa2662500d7b520cdfe05411d932221c5&framework=1&selectedTimeRange=172800

Add capacity(), lazyCapacity() and usedSpace() functions to the Nursery.

Depends on D19344

To make the next change simpler we will store the nursery capacity in bytes
so it can be varied more granularly.

Depends on D19345

  • Add a limit so that the nursery can collect after using a fraction of a
    chunk.
  • Don't do idle time collection if the nursery is empty, even though the
    free space may now be less than the free space threshold, collecting it
    wont help.
  • Modify a test that expects a larger nursery.

Depends on D19347

Depends on D19348

Attachment #9023566 - Attachment is obsolete: true
Attachment #9024629 - Attachment is obsolete: true

spaceToEnd() now works when the nursery is disabled.

Blocks: 1527532
Attachment #9042925 - Attachment description: Bug 1433007 - (part 1) Add new size functions to the Nursery r=sfink! → Bug 1433007 - (part 2) Add new size functions to the Nursery r=sfink!
Attachment #9042926 - Attachment description: Bug 1433007 - (part 2) Store the nursery capacity in bytes r=sfink! → Bug 1433007 - (part 3) Store the nursery capacity in bytes r=sfink!
Attachment #9042927 - Attachment description: Bug 1433007 - (part 3) Perform nursery resize calculation in bytes r=sfink! → Bug 1433007 - (part 4) Perform nursery resize calculation in bytes r=sfink!
Attachment #9042928 - Attachment description: Bug 1433007 - (part 4) Add a sub-chunk limit to the nursery r=sfink! → Bug 1433007 - (part 5) Add a sub-chunk limit to the nursery r=sfink!
Attachment #9042929 - Attachment description: Bug 1433007 - (part 5) Make initial nursery size smaller r=sfink! → Bug 1527532 - Make initial nursery size smaller r=sfink!

Comment on attachment 9042929 [details]
Bug 1527532 - Make initial nursery size smaller r=sfink!

Revision D19349 was moved to bug 1527532. Setting attachment 9042929 [details] to obsolete.

Attachment #9042929 - Attachment is obsolete: true
Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9dffda69dd2a
(part 1) Make growAllocableSpace more strict about its argument r=sfink
https://hg.mozilla.org/integration/autoland/rev/b2d8b986d422
(part 2) Add new size functions to the Nursery r=sfink
https://hg.mozilla.org/integration/autoland/rev/54066b2f1a73
(part 3) Store the nursery capacity in bytes r=sfink
https://hg.mozilla.org/integration/autoland/rev/d6ef554aa023
(part 4) Perform nursery resize calculation in bytes r=sfink
https://hg.mozilla.org/integration/autoland/rev/cd3427c916a4
(part 5) Add a sub-chunk limit to the nursery r=sfink
https://hg.mozilla.org/integration/autoland/rev/6cec67142105
(part 6) Rename lazyCapacity() to committed() and remove sizeOfHeapCommitted() r=sfink
Depends on: 1528159

Taking a look at the impact this has in telemetery.

There's a fairly obvious reduction in average nursery size,

https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=Median!Mean!5th%2520percentile&cumulative=0&end_date=2019-02-17&include_spill=0&keys=!__none__!__none__&max_channel_version=nightly%252F67&measure=GC_NURSERY_BYTES&min_channel_version=nightly%252F64&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2019-02-15&trim=1&use_submission_date=0

Which is great, that's the memory savings we can expect to gain once we're able to decommit the unused memory and some tweaks to poisoning.

Looks like a huge impact on the long end of pauses for nursery collection, bringing the 95th percentile from ~250us to ~60us!

https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=Median!5th%2520percentile!95th%2520percentile&cumulative=0&end_date=2019-02-17&include_spill=0&keys=!__none__!__none__&max_channel_version=nightly%252F67&measure=GC_MINOR_US&min_channel_version=nightly%252F64&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2019-01-28&trim=1&use_submission_date=0

The nursery is collected more often, meaning this won't affect throughput (we knew it wouldn't). I think I can see that in some of the increased sample rate but that's always hard to know.

Big AWSY improvements noticed! \0/
Half of the JS <platform> improvements can be attributed to bug 1527742 also.

== Change summary for alert #19393 (as of Fri, 15 Feb 2019 06:35:28 GMT) ==

Improvements:

21% Base Content JS windows7-32 opt stylo 4,162,016.00 -> 3,302,251.33
21% Base Content JS windows7-32 pgo stylo 4,162,040.00 -> 3,306,343.33
17% Base Content JS linux64-qr opt stylo 5,034,618.00 -> 4,172,972.67
17% Base Content JS linux64 opt stylo 5,034,610.00 -> 4,172,976.67
17% Base Content JS windows10-64 opt stylo 5,099,952.00 -> 4,238,681.33
17% Base Content JS windows10-64 pgo stylo 5,099,965.33 -> 4,244,328.00
9% Base Content Explicit windows7-32 opt stylo 8,603,648.00 -> 7,823,189.33
8% Base Content Explicit windows10-64 pgo stylo 11,353,770.67 -> 10,485,589.33
8% Base Content Explicit windows10-64 opt stylo 11,256,490.67 -> 10,402,474.67
7% JS windows7-32 pgo stylo 86,027,087.64 -> 80,349,790.95
7% JS windows7-32 opt stylo 85,982,210.08 -> 80,325,465.84
7% JS linux64-qr opt stylo 113,887,106.78 -> 106,444,803.28
6% JS linux64 opt stylo 113,458,706.47 -> 106,234,537.48
6% JS windows10-64-qr opt stylo 114,480,464.48 -> 107,359,956.88
6% JS windows10-64 opt stylo 114,542,737.86 -> 107,509,563.50
6% JS osx-10-10 opt stylo 114,384,030.41 -> 107,629,067.06
5% Base Content Explicit linux64 opt stylo 13,764,608.00 -> 13,023,232.00
5% Base Content Explicit linux64-qr opt stylo 13,760,170.67 -> 13,032,448.00
3% Explicit Memory linux64 opt stylo 368,888,459.00 -> 357,057,764.05
3% Explicit Memory windows7-32 pgo stylo 272,025,153.61 -> 264,530,747.47

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=19393

w00t!

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #52)

Big AWSY improvements noticed! \0/
Half of the JS <platform> improvements can be attributed to bug 1527742 also.

== Change summary for alert #19393 (as of Fri, 15 Feb 2019 06:35:28 GMT) ==

Improvements:

21% Base Content JS windows7-32 opt stylo 4,162,016.00 -> 3,302,251.33
21% Base Content JS windows7-32 pgo stylo 4,162,040.00 -> 3,306,343.33
17% Base Content JS linux64-qr opt stylo 5,034,618.00 -> 4,172,972.67
17% Base Content JS linux64 opt stylo 5,034,610.00 -> 4,172,976.67
17% Base Content JS windows10-64 opt stylo 5,099,952.00 -> 4,238,681.33
17% Base Content JS windows10-64 pgo stylo 5,099,965.33 -> 4,244,328.00
9% Base Content Explicit windows7-32 opt stylo 8,603,648.00 -> 7,823,189.33
8% Base Content Explicit windows10-64 pgo stylo 11,353,770.67 -> 10,485,589.33
8% Base Content Explicit windows10-64 opt stylo 11,256,490.67 -> 10,402,474.67

These ones could be due to this bug. But I wasn't expecting them because this bug makes no attempt to change poisoning behviour so I dodn't expect resident memory to change.

7% JS windows7-32 pgo stylo 86,027,087.64 -> 80,349,790.95
7% JS windows7-32 opt stylo 85,982,210.08 -> 80,325,465.84
7% JS linux64-qr opt stylo 113,887,106.78 -> 106,444,803.28
6% JS linux64 opt stylo 113,458,706.47 -> 106,234,537.48
6% JS windows10-64-qr opt stylo 114,480,464.48 -> 107,359,956.88
6% JS windows10-64 opt stylo 114,542,737.86 -> 107,509,563.50
6% JS osx-10-10 opt stylo 114,384,030.41 -> 107,629,067.06
5% Base Content Explicit linux64 opt stylo 13,764,608.00 -> 13,023,232.00
5% Base Content Explicit linux64-qr opt stylo 13,760,170.67 -> 13,032,448.00
3% Explicit Memory linux64 opt stylo 368,888,459.00 -> 357,057,764.05
3% Explicit Memory windows7-32 pgo stylo 272,025,153.61 -> 264,530,747.47

The majority of the difference in this tests cannot possibly be from this bug. I can only take credit for up to 1MB. This change won't affect nurseries larger than that.

(In reply to Paul Bone [:pbone] from comment #53)

w00t!

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #52)

Big AWSY improvements noticed! \0/
Half of the JS <platform> improvements can be attributed to bug 1527742 also.

== Change summary for alert #19393 (as of Fri, 15 Feb 2019 06:35:28 GMT) ==

Improvements:

21% Base Content JS windows7-32 opt stylo 4,162,016.00 -> 3,302,251.33
21% Base Content JS windows7-32 pgo stylo 4,162,040.00 -> 3,306,343.33
17% Base Content JS linux64-qr opt stylo 5,034,618.00 -> 4,172,972.67
17% Base Content JS linux64 opt stylo 5,034,610.00 -> 4,172,976.67
17% Base Content JS windows10-64 opt stylo 5,099,952.00 -> 4,238,681.33
17% Base Content JS windows10-64 pgo stylo 5,099,965.33 -> 4,244,328.00
9% Base Content Explicit windows7-32 opt stylo 8,603,648.00 -> 7,823,189.33
8% Base Content Explicit windows10-64 pgo stylo 11,353,770.67 -> 10,485,589.33
8% Base Content Explicit windows10-64 opt stylo 11,256,490.67 -> 10,402,474.67

These ones could be due to this bug. But I wasn't expecting them because this bug makes no attempt to change poisoning behviour so I dodn't expect resident memory to change.

We didn't see an improvement in resident so that makes sense.

7% JS windows7-32 pgo stylo 86,027,087.64 -> 80,349,790.95
7% JS windows7-32 opt stylo 85,982,210.08 -> 80,325,465.84
7% JS linux64-qr opt stylo 113,887,106.78 -> 106,444,803.28
6% JS linux64 opt stylo 113,458,706.47 -> 106,234,537.48
6% JS windows10-64-qr opt stylo 114,480,464.48 -> 107,359,956.88
6% JS windows10-64 opt stylo 114,542,737.86 -> 107,509,563.50
6% JS osx-10-10 opt stylo 114,384,030.41 -> 107,629,067.06
5% Base Content Explicit linux64 opt stylo 13,764,608.00 -> 13,023,232.00
5% Base Content Explicit linux64-qr opt stylo 13,760,170.67 -> 13,032,448.00
3% Explicit Memory linux64 opt stylo 368,888,459.00 -> 357,057,764.05
3% Explicit Memory windows7-32 pgo stylo 272,025,153.61 -> 264,530,747.47

The majority of the difference in this tests cannot possibly be from this bug. I can only take credit for up to 1MB. This change won't affect nurseries larger than that.

These are the "big" AWSY numbers, so summed across 10ish processes, but also a geometric average of several subtests. The biggest win was something like 54MB for After tabs open [+30s, forced GC] (on windows at least) which does seem to imply bug 1527742.

== Change summary for alert #19384 (as of Fri, 15 Feb 2019 02:33:11 GMT) ==

Improvements:

17% Base Content JS osx-10-10 opt stylo 5,034,073.33 -> 4,179,684.00
17% Base Content JS windows10-64-qr opt stylo 5,099,260.00 -> 4,244,212.00
11% Base Content Explicit windows7-32 pgo stylo 8,843,946.67 -> 7,903,061.33
7% Base Content Explicit windows10-64-qr opt stylo 11,213,312.00 -> 10,388,480.00

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=19384

Depends on: 1530573
Depends on: 1530583
Depends on: 1530634
Blocks: 1530575
Blocks: 1531626
Regressions: 1544651
Regressions: 1561832
Performance Impact: --- → P3
Whiteboard: [qf:f66][qf:p3][overhead:noted] → [overhead:noted]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: