Closed Bug 1242691 Opened 8 years ago Closed 8 years ago

Investigate why CodeMirror frequently has ~20ms long minor GCs because of FULL_STORE_BUFFER

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla47

Tracking Flags:

Tracking

Status

firefox47

---

fixed

People

(Reporter: fitzgen, Assigned: fitzgen)

References

Details

Attachments

(1 file, 2 obsolete files)

Merge overlapping SlotsEdges in the store buffer 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 4.42 KB, patch	terrence : review+	Details \| Diff \| Splinter Review
Merge overlapping SlotsEdges in the store buffer 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 4.50 KB, patch	fitzgen : review+	Details \| Diff \| Splinter Review
Merge overlapping SlotsEdges in the store buffer 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 4.33 KB, patch	fitzgen : review+	Details \| Diff \| Splinter Review

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Description

•

8 years ago

Tweaked MonoTypeBuffer::MaxEntries to see if that would lower the length of gc pauses, but it didn't help much:

"""
Loading 50MB source text into CodeMirror, max minor gc pause (microseconds, lower is better)

current 48 * 1024 / sizeof(T) = 24775
         2 * 1024 / sizeof(T) = 23029
         4 * 1024 / sizeof(T) = 25364
         8 * 1024 / sizeof(T) = 22432
        16 * 1024 / sizeof(T) = 22230
        32 * 1024 / sizeof(T) = 23001
"""

Then I dumped out the number of slots in the store buffer for each minor GC (this is mostly from Array#splice which uses the slots buffer not the whole cell buffer), and found that it was consistently in the same ballpark, regardless the value of MaxEntries.

Current theory is that we aren't reaching a safepoint in time, and maybe we should check in Array#splice if we should minor GC because the slots buffer might be full.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 1

•

8 years ago

I am consistently seeing only 245 slots edges in the buffer, but the current MaxEntries is 6144.

Added this to the bottom of array_splice_impl:

  cx->runtime()->gc.gcIfRequested(cx);

But I'm not seeing any differences, which makes sense if we are only putting 245 slots in the store buffer. But that doesn't make sense since my gcreason always seems to be FULL_STORE_BUFFER...

So I set a breakpoint in StoreBuffer::setAboutToOverflow, set breakpoint commands for it to dump the 5 youngest frames and then continue, and then ran the code mirror test case again.

*We aren't ever hitting this breakpoint when running the test case.* Yes, we hit it on startup and shutdown, but never when running the actual test case.

Digging in some more to see what is actually triggering the minor collections...

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Updated

•

8 years ago

Blocks: 1072675

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 2

•

8 years ago

Attached patch Merge overlapping SlotsEdges in the store buffer (obsolete) — Details — Splinter Review

Frequently, the mutator will modify nearly the same elements of an object
repeatedly. However, because the set of elements aren't exactly the same, the
single item buffer in front of MonoTypeBuffer can't de-duplicate these
edges. For example, in one CodeMirror test case, we would add 245 SlotsEdges
entries for almost the same 50,000 elements in an object, causing us to trace
these same 50,000 elements 245 times!

This patch makes `js::gc::StoreBuffer::putSlot` check to see if the new range is
overlapping with the last range added, and if so, merge the ranges rather than
adding partially duplicated elements into the store buffer.

This gives a 1000 point increase on Octane's pdf.js subsuite locally. The
CodeMirror test case mentioned above goes from ~10 seconds execution time to
~1.5 seconds, with the max minor gc pause dropping from up to 40 milliseconds,
down to 4 milliseconds.

Attachment #8713260 - Flags: review?(terrence)

Terrence Cole [:terrence]

Comment 3

•

8 years ago

Comment on attachment 8713260 [details] [diff] [review]
Merge overlapping SlotsEdges in the store buffer

Review of attachment 8713260 [details] [diff] [review]:
-----------------------------------------------------------------

\o/

::: js/src/gc/StoreBuffer.h
@@ +296,5 @@
> +        // false if they do not overlap.
> +        bool overlaps(const SlotsEdge& other) const {
> +            // TODO FITZGEN: should we just do objectAndKind_ !=
> +            // other.objectAndKind_ here or can we assume the compiler will see
> +            // through all this?

I don't think the perf is going to matter that much, but I would still just compare objectAndKind_: I think the naming makes it clear enough what's being compared.

@@ +303,5 @@
> +            }
> +            auto end = start_ + count_;
> +            auto otherEnd = other.start_ + other.count_;
> +            return object() == other.object() &&
> +                   kind() == other.kind() &&

You are comparing object and kind a second time? I'd think you could skip this as they will always be true after the test above.

Attachment #8713260 - Flags: review?(terrence) → review+

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 4

•

8 years ago

Attached patch Merge overlapping SlotsEdges in the store buffer (obsolete) — Details — Splinter Review

Addressed review comments (d'oh!) and made it so that adjacent ranges are
considered overlapping so that a series of writes to element 0, then 1, then 2,
..., then N are coalesced into a single range [0, N] in the store buffer.

Attachment #8713326 - Flags: review+

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Updated

•

8 years ago

Attachment #8713260 - Attachment is obsolete: true

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 5

•

8 years ago

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=15074bf694e7

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 6

•

8 years ago

Attached patch Merge overlapping SlotsEdges in the store buffer — Details — Splinter Review

Fix header include order.

Attachment #8713392 - Flags: review+

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Updated

•

8 years ago

Attachment #8713326 - Attachment is obsolete: true

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 7

•

8 years ago

New try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=931905133c68

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Updated

•

8 years ago

Keywords: checkin-needed

Pulsebot

Comment 8

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/ba837846ee44

Keywords: checkin-needed

Hannes Verschore [:h4writer]

Comment 9

•

8 years ago

(In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #2)
> This gives a 1000 point increase on Octane's pdf.js subsuite locally. The
> CodeMirror test case mentioned above goes from ~10 seconds execution time to
> ~1.5 seconds, with the max minor gc pause dropping from up to 40
> milliseconds,
> down to 4 milliseconds.

Can you confirm this is still the case with the newest patch?
I didn't see a bump on AWFY on x86 and x64 shell. Which surprised me since there is only 50 points noise. As a result a 1000 point increase (doubling) would definitely be visible?

Carsten Book [:Tomcat]

Comment 10

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ba837846ee44

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox47: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla47

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 11

•

8 years ago

(In reply to Hannes Verschore [:h4writer] from comment #9)
> (In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #2)
> > This gives a 1000 point increase on Octane's pdf.js subsuite locally. The
> > CodeMirror test case mentioned above goes from ~10 seconds execution time to
> > ~1.5 seconds, with the max minor gc pause dropping from up to 40
> > milliseconds,
> > down to 4 milliseconds.
> 
> Can you confirm this is still the case with the newest patch?
> I didn't see a bump on AWFY on x86 and x64 shell. Which surprised me since
> there is only 50 points noise. As a result a 1000 point increase (doubling)
> would definitely be visible?

ni myself to figure this out -- I didn't re-test the updated patches...

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Updated

•

8 years ago

Flags: needinfo?(nfitzgerald)

Jim Blandy :jimb

Comment 12

•

8 years ago

Comment on attachment 8713392 [details] [diff] [review]
Merge overlapping SlotsEdges in the store buffer

Review of attachment 8713392 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/StoreBuffer.h
@@ +307,5 @@
> +            auto start = start_ - 1;
> +
> +            auto otherEnd = other.start_ + other.count_;
> +            return (start <= other.start_ && other.start_ <= end) ||
> +                   (start <= otherEnd && otherEnd <= end);

I think this could be simplified to:

return !(otherEnd < start || end < other.start);

I don't understand why end needs to be _start + count_ + 1, either with or without my suggestion.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 13

•

8 years ago

Just realized that I was testing this on top of the patch in bug 1233857, so that might be contributing to the speedup I observed when combined with this one.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 14

•

8 years ago

Ran octane with all variations of this patch and WIP patch from bug 1233857:

M = Adding elements edges write barrier into the MIR (patch from bug 1233857)
R = Merging SlotsEdge ranges (this patch)

                      M+R         R        (none)        M
pdf.js subscore      19951      16858      17991       20109
total score          32729      32358      32662       33127

Conclusions: I need to have larger sample sizes; I think this is all noise. Hopefully once I stop using shared js for my shell builds, I will be able to run octane under the shell rather than in the browser and gathering more samples will get easier.

That said, the CodeMirror testcase's speedup stands with this patch, and that alone is super valuable IMO.

Flags: needinfo?(nfitzgerald)

Worcester12345

Comment 16

•

8 years ago

Should this have the "perf" key word?

You need to log in before you can comment on or make changes to this bug.