Closed Bug 1298018 Opened 3 years ago Closed 2 years ago

Investigate allocating nursery chunks in the background

Categories

(Core :: JavaScript: GC, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: jonco, Assigned: pbone)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 16 obsolete files)

1.24 KB, patch
pbone
: review+
Details | Diff | Splinter Review
847 bytes, patch
pbone
: review+
Details | Diff | Splinter Review
7.93 KB, patch
pbone
: review+
Details | Diff | Splinter Review
3.95 KB, patch
pbone
: review+
Details | Diff | Splinter Review
13.40 KB, patch
pbone
: review+
Details | Diff | Splinter Review
Following bug 1291292, we might want to investigate allocating nursery chunks concurrently with main thread execution.  If we decide we need to grow the nursery the chances are there are not going to be enough free chunks in our empty chunk pool and we will have to request more from the OS, which may take time.  We don't need to use these chunks immediately though suggesting that we might be able to do it on a background thread so as to not block execution.
More background: We try to maintain a configurable number of free chunks in GCRuntime::emptyChunks_.  This is accomplished by AutoMaybeStartBackgroundAllocation / GCRuntime::startBackgroundAllocTaskIfIdle() / BackgroundAllocTask.

We should be able to change Nursery::setCurrentChunk() to request a single chunk at a time when we try to move beyond the number we currently have allocated.  As long as the minimum number of free chunks >= 1 this we should end up allocating these chunks concurrently.
Assignee: nobody → pbone
Status: NEW → ASSIGNED
Priority: -- → P3
Attachment #8918853 - Flags: review?(jcoppeard)
Attachment #8918858 - Flags: review?(jcoppeard)
Attachment #8918859 - Flags: review?(jcoppeard)
setCurrentChunk() was used for two separate things.  Resetting the nursery
(back) to the first chunk, and advancing to the next chunk.  These are now
separate functions.

The new gotoNextChunk() function returns bool, callers can test to see if
advancing to the next chunk was successful.
Attachment #8918860 - Flags: review?(jcoppeard)
Attachment #8918862 - Flags: review?(jcoppeard)
Blocks: 1409021
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=eef92d4da0f52a831ea4671297650ccc5c23f39f
Talos: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=eef92d4da0f52a831ea4671297650ccc5c23f39f

It looks like content processes may be starting up faster as a result of this change.  But note the high variablility there, I find that is common with racing tasks.
Blocks: 1409324
Comment on attachment 8918853 [details] [diff] [review]
Bug 1298018 (part 1) - Clairfy emptyChunks_ comment

Review of attachment 8918853 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/GCRuntime.h
@@ +1179,5 @@
>    private:
> +    // When chunks are empty, they reside in the emptyChunks pool and are
> +    // re-used as needed or eventually expired if not re-used. The emptyChunks
> +    // pool gets refilled from the background allocation task heuristically so
> +    // that empty chunks should always available for immediate allocation

While you're here, can you add a 'be' inbetween 'always available'?
Attachment #8918853 - Flags: review?(jcoppeard) → review+
Comment on attachment 8918858 [details] [diff] [review]
Bug 1298018 (part 2) - Remove two unused methods

Review of attachment 8918858 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the tidyup.
Attachment #8918858 - Flags: review?(jcoppeard) → review+
Attached patch bug1298018-combined-patch (obsolete) — Splinter Review
I'm uploading a combined patch of all of the above because it's easier to see the cumulative effect, and I'll post comments on this.
Comment on attachment 8919238 [details] [diff] [review]
bug1298018-combined-patch

Review of attachment 8919238 [details] [diff] [review]:
-----------------------------------------------------------------

First of all: this looks good.  As always though I have some comments.

::: js/src/gc/Nursery.cpp
@@ +209,5 @@
> +        for (unsigned i = 0; i < chunks_.length(); i++)
> +            runtime()->gc.recycleChunk(chunk(i).toChunk(runtime()), lock);
> +    }
> +    chunks_.shrinkTo(0);
> +    maxChunksAlloc_ = 0;

The code here and in shrinkAllocableSpace() is very similar.  Please factor this out.

@@ +943,5 @@
> +    MOZ_ASSERT(numChunksAlloc() >= 1);
> +    currentChunk_ = 0;
> +    position_ = chunk(0).start();
> +    currentEnd_ = chunk(0).end();
> +    chunk(0).poisonAndInit(runtime(), JS_FRESH_NURSERY_PATTERN);

The code in this method and the end of gotoNextChunk() are the same, so I'd like us to factor this out.  Maybe you can keep the existing setCurrentChunk() and call that from both places.

@@ +958,5 @@
>      MOZ_ASSERT(chunkno < maxChunks());
> +    MOZ_ASSERT(chunkno < maxChunksAlloc());
> +
> +    if (chunkno == numChunksAlloc()) {
> +        // Allocate a new chunk.

This method is marked to be inlined.  Can you factor this block (the allocation path) out into a separate (non-inline) method?  You may be able to use this in allocateFirstChunk() too.

::: js/src/gc/Nursery.h
@@ +152,5 @@
> +    // Total number of chunks and the capacity of the nursery.  Chunks will be
> +    // lazilly allocated and added to the chunks array up to this limit, after
> +    // that the nursery must be collected, this limit may be raised during
> +    // collection.
> +    unsigned maxChunksAlloc() const { return maxChunksAlloc_; }

Can we come up with some better names here? numChunksAlloc and maxChunksAlloc don't make much sense to me.

Maybe something like: allocatedChunkCount(), maxChunkCount() and chunkCountLimit() (for maxChunks()).

I don't mind too much as long as it is readable English and makes it obvious what these things mean (or as obvious as is reasonably possible).
Attachment #8919238 - Flags: feedback+
Attachment #8918859 - Flags: review?(jcoppeard)
Attachment #8918860 - Flags: review?(jcoppeard)
Attachment #8918862 - Flags: review?(jcoppeard)
(In reply to Jon Coppeard (:jonco) from comment #11)
> Created attachment 8919238 [details] [diff] [review]
> bug1298018-combined-patch
> 
> I'm uploading a combined patch of all of the above because it's easier to
> see the cumulative effect, and I'll post comments on this.

Shall I squash them also?  I imagine ti's easier to review in descrete pieces, particularly if I can sperate refactoring changes from algorithmic changes so I usually do this.

Thanks.
Flags: needinfo?(jcoppeard)
Addressed feedback, r+ carried forward.
Attachment #8918853 - Attachment is obsolete: true
Attachment #8919499 - Flags: review+
(In reply to Paul Bone [:pbone] from comment #13)
Can you if you like, it doesn't matter.  Usually reviewing in small pieces is easier, but just for this one I wanted to see all the changes in one go.
Flags: needinfo?(jcoppeard)
(In reply to Jon Coppeard (:jonco) from comment #12)
> Comment on attachment 8919238 [details] [diff] [review]
> bug1298018-combined-patch
> 
> Review of attachment 8919238 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The code here and in shrinkAllocableSpace() is very similar.  Please factor
> this out.
> 
> The code in this method and the end of gotoNextChunk() are the same, so I'd
> like us to factor this out.  Maybe you can keep the existing
> setCurrentChunk() and call that from both places.

Ah sometimes I duplicate code when I think it can help with readability, but it often makes it worse for anyone modifying it later. I think you're right.

> @@ +958,5 @@
> >      MOZ_ASSERT(chunkno < maxChunks());
> > +    MOZ_ASSERT(chunkno < maxChunksAlloc());
> > +
> > +    if (chunkno == numChunksAlloc()) {
> > +        // Allocate a new chunk.
> 
> This method is marked to be inlined.  Can you factor this block (the
> allocation path) out into a separate (non-inline) method?  You may be able
> to use this in allocateFirstChunk() too.

In that case since gotoNextChunk has only one caller I'll also move it's outer block into allocate() directly.

> ::: js/src/gc/Nursery.h
> @@ +152,5 @@
> > +    // Total number of chunks and the capacity of the nursery.  Chunks will be
> > +    // lazilly allocated and added to the chunks array up to this limit, after
> > +    // that the nursery must be collected, this limit may be raised during
> > +    // collection.
> > +    unsigned maxChunksAlloc() const { return maxChunksAlloc_; }
> 
> Can we come up with some better names here? numChunksAlloc and
> maxChunksAlloc don't make much sense to me.
> 
> Maybe something like: allocatedChunkCount(), maxChunkCount() and
> chunkCountLimit() (for maxChunks()).

I found it difficult to think of good names, and I know I failed ;-)  Honnestly though I don't think these names are any better, but I'm happy to use them since you think they're better, that's at least one of us.

> I don't mind too much as long as it is readable English and makes it obvious
> what these things mean (or as obvious as is reasonably possible).

It's difficult without getting into EpicJavaSwingClassNamesFactory although sometimes you get lucky and someone on your team comes up with something.
(In reply to Paul Bone [:pbone] from comment #16)
> (In reply to Jon Coppeard (:jonco) from comment #12)
> > 
> > This method is marked to be inlined.  Can you factor this block (the
> > allocation path) out into a separate (non-inline) method?  You may be able
> > to use this in allocateFirstChunk() too.
> 
> In that case since gotoNextChunk has only one caller I'll also move it's
> outer block into allocate() directly.
> 

I forgot to say earlier.  I prefer not to share this code between allocate() and allocateFirstChunk() since they use the GC lock a little differently.  The other options are holding the lock in allocate() for longer than necessary or complicating the logic with a maybe-pointer to the lock.  Both are easy enough but having seperate code for each is also easy.
r+ carried forward.
Attachment #8919499 - Attachment is obsolete: true
Attachment #8922632 - Flags: review+
r+ carried forward
Attachment #8918858 - Attachment is obsolete: true
Attachment #8922634 - Flags: review+
Attachment #8918859 - Attachment is obsolete: true
Attachment #8922635 - Flags: review?(jcoppeard)
Attachment #8918860 - Attachment is obsolete: true
Attachment #8918862 - Attachment is obsolete: true
Attachment #8919238 - Attachment is obsolete: true
Attachment #8922636 - Flags: review?(jcoppeard)
Comment on attachment 8922635 [details] [diff] [review]
Bug 1298018 (part 3) - Replace updateNumChunks()

Review of attachment 8922635 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments fixed.

::: js/src/gc/Nursery.h
@@ +434,5 @@
>      void setCurrentChunk(unsigned chunkno);
>      void setStartPosition();
>  
> +    /*
> +     * Ensure that the first chunk has been allocated, callers will probably

Nit: this should be two sentences split at the comma.

@@ +438,5 @@
> +     * Ensure that the first chunk has been allocated, callers will probably
> +     * want to call gotoFirstChunk() next.
> +     */
> +    MOZ_MUST_USE bool allocateFirstChunk(
> +        AutoLockGCBgAlloc& lock);

Please put the method definition all on one line.

@@ +500,4 @@
>      void minimizeAllocableSpace();
>  
> +    // Free the chunks starting at firstFreeChunk until the end of the chunks
> +    // vector.  Shrinks the vector but does not update maxChunksAlloc().

Nit: style dictates a single space between sentences in comments.

@@ +500,5 @@
>      void minimizeAllocableSpace();
>  
> +    // Free the chunks starting at firstFreeChunk until the end of the chunks
> +    // vector.  Shrinks the vector but does not update maxChunksAlloc().
> +    void freeChunks(unsigned firstFreeChunk);

We could call this freeChunksFrom() to make it clearer what the parameter means at the call site.
Attachment #8922635 - Flags: review?(jcoppeard) → review+
Comment on attachment 8922636 [details] [diff] [review]
Bug 1298018 (part 4) - Lazilly allocate nursery chunks

Review of attachment 8922636 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments fixed.

::: js/src/gc/Nursery.cpp
@@ +953,5 @@
>      chunk(chunkno).poisonAndInit(runtime(), JS_FRESH_NURSERY_PATTERN);
>  }
>  
> +bool
> +js::Nursery::allocateNextChunk(const unsigned chunkno)

I still feel like we could common up allocateNextChunk and allocateFirstChunk.  This is called on the main thread and it's OK to hold the GC lock for slightly longer than necessary here.

::: js/src/gc/Nursery.h
@@ +151,5 @@
>  
> +    // Total number of chunks and the capacity of the nursery.  Chunks will be
> +    // lazilly allocated and added to the chunks array up to this limit, after
> +    // that the nursery must be collected, this limit may be raised during
> +    // collection.

Please fix extra spaces between sentences and comma splicing.
Attachment #8922636 - Flags: review?(jcoppeard) → review+
(In reply to Jon Coppeard (:jonco) from comment #22)
> 
> @@ +438,5 @@
> > +     * Ensure that the first chunk has been allocated, callers will probably
> > +     * want to call gotoFirstChunk() next.
> > +     */
> > +    MOZ_MUST_USE bool allocateFirstChunk(
> > +        AutoLockGCBgAlloc& lock);
> 
> Please put the method definition all on one line.

Bad editing, sorry.

> @@ +500,5 @@
> >      void minimizeAllocableSpace();
> >  
> > +    // Free the chunks starting at firstFreeChunk until the end of the chunks
> > +    // vector.  Shrinks the vector but does not update maxChunksAlloc().
> > +    void freeChunks(unsigned firstFreeChunk);
> 
> We could call this freeChunksFrom() to make it clearer what the parameter
> means at the call site.

Good idea, I like that.
r+ carried forward.
Attachment #8922634 - Attachment is obsolete: true
Attachment #8923274 - Flags: review+
r+ carried forward.
Attachment #8922635 - Attachment is obsolete: true
Attachment #8923275 - Flags: review+
r+ carried forward
Attachment #8923275 - Attachment is obsolete: true
Attachment #8923331 - Flags: review+
Updated based on review comments.
Attachment #8922636 - Attachment is obsolete: true
Attachment #8923332 - Flags: review+
Attachment #8923333 - Flags: review?(jcoppeard)
Comment on attachment 8923333 [details] [diff] [review]
Bug 1298018 (part 5) - Remove allocateFirstChunk()

Review of attachment 8923333 [details] [diff] [review]:
-----------------------------------------------------------------

This all looks good.

::: js/src/gc/Nursery.cpp
@@ +980,5 @@
>  
>      const unsigned priorCount = allocatedChunkCount();
>      const unsigned newCount = priorCount + 1;
> +    MOZ_ASSERT((chunkno == currentChunk_ + 1) ||
> +        ((chunkno == 0) && (allocatedChunkCount() == 0)));

You don't need the brackets around the individual comparison expressions here (although you do need brackets to group the part after the || obviously).  Also this can probably fit on one line since line length for code is 100 columns.
Attachment #8923333 - Flags: review?(jcoppeard) → review+
Responded to review feedback.
Attachment #8923333 - Attachment is obsolete: true
Attachment #8923664 - Flags: review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9c9299bd7441
Part 1: Clarify emptyChunks_ comment. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/679e75a67b8e
Part 2: Remove some unused methods. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/b6dafc346ef5
Part 3: Replace updateNumChunks(). r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/668f1422c955
Part 4: Lazily allocate nursery chunks. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/f029195245c0
Part 5: Remove allocateFirstChunk(). r=jonco
Keywords: checkin-needed
Trying to reproduce locally and not having much luck so far. Also trying a try job based on my current workspace:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=11be966f188d86ac1c81d55a6e6cf4393ffadd7d

I've been working from central, so central vs inbound is one difference beteen this try job and the patches that were landed.
Fixed an assertion failure in this patch. r+ carried forward.
Attachment #8923331 - Attachment is obsolete: true
Attachment #8924849 - Flags: review+
Update patch to apply cleanly, r+ carried forward.
Attachment #8923332 - Attachment is obsolete: true
Attachment #8924851 - Flags: review+
Update patch to behave correctly when OOM. r+ carried forward.
Attachment #8923664 - Attachment is obsolete: true
Attachment #8924852 - Flags: review+
Keywords: checkin-needed
Part 4 needs rebasing.
Flags: needinfo?(pbone)
Keywords: checkin-needed
Blocks: 1412729
Update this patch after it collieded one of my other bugs on inbound ;-)

r+ carried forward.
Attachment #8924851 - Attachment is obsolete: true
Flags: needinfo?(pbone)
Attachment #8925402 - Flags: review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7baa6f43d16a
Part 1: Clarify emptyChunks_ comment. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/7434bbd4eee4
Part 2: Remove some unused methods. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/bf065ffab5f7
Part 3: Replace updateNumChunks(). r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/d8dbac587181
Part 4: Lazily allocate nursery chunks. r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/7bd9208b8c7c
Part 5: Remove allocateFirstChunk(). r=jonco
Keywords: checkin-needed
Good new! This brought performance improvements on all desktop platforms:

== Change summary for alert #10370 (as of Sun, 05 Nov 2017 23:00:10 GMT) ==

Improvements:

  3%  JS summary windows7-32 opt      82,500,807.36 -> 79,769,418.53
  3%  JS summary windows10-64-stylo-disabled opt stylo-disabled107,852,873.76 -> 104,522,058.13
  3%  JS summary windows10-64 opt     108,577,497.74 -> 105,240,906.96
  3%  JS summary windows10-64 pgo     107,901,620.73 -> 104,642,357.82
  3%  JS summary osx-10-10 opt        96,607,332.66 -> 93,825,884.40
  3%  JS summary windows7-32 pgo      82,116,802.24 -> 79,880,127.86
  3%  JS summary windows7-32-stylo-disabled opt stylo-disabled81,602,343.44 -> 79,487,100.86
  3%  JS summary linux32-stylo-disabled opt stylo-disabled71,516,104.70 -> 69,706,964.91
  3%  JS summary linux64 opt          96,603,284.61 -> 94,185,065.38
  2%  JS summary linux64-stylo-disabled opt stylo-disabled96,791,471.62 -> 94,382,184.75
  2%  JS summary macosx64-stylo-disabled opt stylo-disabled96,160,270.30 -> 93,835,931.67
  2%  JS summary linux64-stylo-sequential opt stylo-sequential96,814,975.41 -> 94,520,770.93

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=10370
Woot!  I did some of my own testing in the shell and with talos, but I looked mostly at time, not memory.  I'm very pleased. thank you for sharing!

*updates Q4 OKRs*
You need to log in before you can comment on or make changes to this bug.