Investigate why CodeMirror frequently has ~20ms long minor GCs because of FULL_STORE_BUFFER

RESOLVED FIXED in Firefox 47

Status

()

Core
JavaScript: GC
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: fitzgen, Assigned: fitzgen)

Tracking

(Blocks: 1 bug)

unspecified
mozilla47
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox47 fixed)

Details

Attachments

(1 attachment, 2 obsolete attachments)

Tweaked MonoTypeBuffer::MaxEntries to see if that would lower the length of gc pauses, but it didn't help much:

"""
Loading 50MB source text into CodeMirror, max minor gc pause (microseconds, lower is better)

current 48 * 1024 / sizeof(T) = 24775
         2 * 1024 / sizeof(T) = 23029
         4 * 1024 / sizeof(T) = 25364
         8 * 1024 / sizeof(T) = 22432
        16 * 1024 / sizeof(T) = 22230
        32 * 1024 / sizeof(T) = 23001
"""

Then I dumped out the number of slots in the store buffer for each minor GC (this is mostly from Array#splice which uses the slots buffer not the whole cell buffer), and found that it was consistently in the same ballpark, regardless the value of MaxEntries.

Current theory is that we aren't reaching a safepoint in time, and maybe we should check in Array#splice if we should minor GC because the slots buffer might be full.
I am consistently seeing only 245 slots edges in the buffer, but the current MaxEntries is 6144.

Added this to the bottom of array_splice_impl:

  cx->runtime()->gc.gcIfRequested(cx);

But I'm not seeing any differences, which makes sense if we are only putting 245 slots in the store buffer. But that doesn't make sense since my gcreason always seems to be FULL_STORE_BUFFER...

So I set a breakpoint in StoreBuffer::setAboutToOverflow, set breakpoint commands for it to dump the 5 youngest frames and then continue, and then ran the code mirror test case again.

*We aren't ever hitting this breakpoint when running the test case.* Yes, we hit it on startup and shutdown, but never when running the actual test case.

Digging in some more to see what is actually triggering the minor collections...
Blocks: 1072675
Created attachment 8713260 [details] [diff] [review]
Merge overlapping SlotsEdges in the store buffer

Frequently, the mutator will modify nearly the same elements of an object
repeatedly. However, because the set of elements aren't exactly the same, the
single item buffer in front of MonoTypeBuffer can't de-duplicate these
edges. For example, in one CodeMirror test case, we would add 245 SlotsEdges
entries for almost the same 50,000 elements in an object, causing us to trace
these same 50,000 elements 245 times!

This patch makes `js::gc::StoreBuffer::putSlot` check to see if the new range is
overlapping with the last range added, and if so, merge the ranges rather than
adding partially duplicated elements into the store buffer.

This gives a 1000 point increase on Octane's pdf.js subsuite locally. The
CodeMirror test case mentioned above goes from ~10 seconds execution time to
~1.5 seconds, with the max minor gc pause dropping from up to 40 milliseconds,
down to 4 milliseconds.
Attachment #8713260 - Flags: review?(terrence)
Comment on attachment 8713260 [details] [diff] [review]
Merge overlapping SlotsEdges in the store buffer

Review of attachment 8713260 [details] [diff] [review]:
-----------------------------------------------------------------

\o/

::: js/src/gc/StoreBuffer.h
@@ +296,5 @@
> +        // false if they do not overlap.
> +        bool overlaps(const SlotsEdge& other) const {
> +            // TODO FITZGEN: should we just do objectAndKind_ !=
> +            // other.objectAndKind_ here or can we assume the compiler will see
> +            // through all this?

I don't think the perf is going to matter that much, but I would still just compare objectAndKind_: I think the naming makes it clear enough what's being compared.

@@ +303,5 @@
> +            }
> +            auto end = start_ + count_;
> +            auto otherEnd = other.start_ + other.count_;
> +            return object() == other.object() &&
> +                   kind() == other.kind() &&

You are comparing object and kind a second time? I'd think you could skip this as they will always be true after the test above.
Attachment #8713260 - Flags: review?(terrence) → review+
Created attachment 8713326 [details] [diff] [review]
Merge overlapping SlotsEdges in the store buffer

Addressed review comments (d'oh!) and made it so that adjacent ranges are
considered overlapping so that a series of writes to element 0, then 1, then 2,
..., then N are coalesced into a single range [0, N] in the store buffer.
Attachment #8713326 - Flags: review+
Attachment #8713260 - Attachment is obsolete: true
Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=15074bf694e7
Created attachment 8713392 [details] [diff] [review]
Merge overlapping SlotsEdges in the store buffer

Fix header include order.
Attachment #8713392 - Flags: review+
Attachment #8713326 - Attachment is obsolete: true
New try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=931905133c68
Keywords: checkin-needed

Comment 8

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/ba837846ee44
Keywords: checkin-needed
(In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #2)
> This gives a 1000 point increase on Octane's pdf.js subsuite locally. The
> CodeMirror test case mentioned above goes from ~10 seconds execution time to
> ~1.5 seconds, with the max minor gc pause dropping from up to 40
> milliseconds,
> down to 4 milliseconds.

Can you confirm this is still the case with the newest patch?
I didn't see a bump on AWFY on x86 and x64 shell. Which surprised me since there is only 50 points noise. As a result a 1000 point increase (doubling) would definitely be visible?

Comment 10

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/ba837846ee44
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox47: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
(In reply to Hannes Verschore [:h4writer] from comment #9)
> (In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #2)
> > This gives a 1000 point increase on Octane's pdf.js subsuite locally. The
> > CodeMirror test case mentioned above goes from ~10 seconds execution time to
> > ~1.5 seconds, with the max minor gc pause dropping from up to 40
> > milliseconds,
> > down to 4 milliseconds.
> 
> Can you confirm this is still the case with the newest patch?
> I didn't see a bump on AWFY on x86 and x64 shell. Which surprised me since
> there is only 50 points noise. As a result a 1000 point increase (doubling)
> would definitely be visible?

ni myself to figure this out -- I didn't re-test the updated patches...
Flags: needinfo?(nfitzgerald)

Comment 12

2 years ago
Comment on attachment 8713392 [details] [diff] [review]
Merge overlapping SlotsEdges in the store buffer

Review of attachment 8713392 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/StoreBuffer.h
@@ +307,5 @@
> +            auto start = start_ - 1;
> +
> +            auto otherEnd = other.start_ + other.count_;
> +            return (start <= other.start_ && other.start_ <= end) ||
> +                   (start <= otherEnd && otherEnd <= end);

I think this could be simplified to:

return !(otherEnd < start || end < other.start);

I don't understand why end needs to be _start + count_ + 1, either with or without my suggestion.
Just realized that I was testing this on top of the patch in bug 1233857, so that might be contributing to the speedup I observed when combined with this one.
Ran octane with all variations of this patch and WIP patch from bug 1233857:

M = Adding elements edges write barrier into the MIR (patch from bug 1233857)
R = Merging SlotsEdge ranges (this patch)

                      M+R         R        (none)        M
pdf.js subscore      19951      16858      17991       20109
total score          32729      32358      32662       33127

Conclusions: I need to have larger sample sizes; I think this is all noise. Hopefully once I stop using shared js for my shell builds, I will be able to run octane under the shell rather than in the browser and gathering more samples will get easier.

That said, the CodeMirror testcase's speedup stands with this patch, and that alone is super valuable IMO.
Flags: needinfo?(nfitzgerald)
Duplicate of this bug: 1072675

Comment 16

2 years ago
Should this have the "perf" key word?
You need to log in before you can comment on or make changes to this bug.