Closed
Bug 990336
Opened 11 years ago
Closed 11 years ago
Improve the performance of DenseRangeWriteBarrierPost
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla31
People
(Reporter: terrence, Assigned: terrence)
References
(Blocks 1 open bug)
Details
(Whiteboard: [talos_regression])
Attachments
(6 files)
22.95 KB,
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
869 bytes,
patch
|
Details | Diff | Splinter Review | |
5.07 KB,
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
9.96 KB,
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
1.68 KB,
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
5.92 KB,
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
This is impacting at least kraken's json-parse-financial and octane's TypeScript and Pdf.js benchmarks. The problem here is that we are ended up shoving thousands of these in the generic buffer, gcing too frequently, and being extra slow because of the generic buffer.
The attached patch does two things. The first is to turn the SlotEdge buffer into a SlotsEdge buffer: e.g. store the slot reference as a range rather than a single slot.
Then we can use that for both HeapSlot references and for DenseRangeRef. This is just a hair faster than going through the generic buffer, but not a real win. The second optimization is to make subsequent pushes of a new range for the same object merge their ranges, rather than creating a new entry. This dramatically reduces the amount of time we spend in minor GC on the benchmarks listed above.
On my desktop, this gets about 1.5% on both octane and kraken.
Note: we had an existing bug where PJS code was asserting !inRememberedSet for all edges, even ones where we don't really know the answer exactly. This happened not to bite us because we mostly only use wholeCellEdges in the jit. Now that SlotsEdge is not exact anymore, the assertion triggered right away. I've removed the assertion and renamed several methods for clarity, given that we've changed what most of them do:
inRememberedSet -> maybeInRememberedSet
location -> deduplicationKey
isNullRef -> moved the code inline into mark, where it makes more sense
Note2: I suppose we could also do the new merging optimization as a compaction pass. Doing it eagerly seems to work well enough for a first try. I may experiment more here later.
Attachment #8399749 -
Flags: review?(jcoppeard)
Assignee | ||
Comment 1•11 years ago
|
||
Comment 2•11 years ago
|
||
Comment on attachment 8399749 [details] [diff] [review]
fix_dense_range_ref-v0.diff
Review of attachment 8399749 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/ds/LifoAlloc.h
@@ +477,5 @@
> }
> };
> +
> + // Return a modifiable pointer to the most recently allocated sizeof(T)
> + // bytes. Do not use this if you value well-defined behavior in your code.
It might be good to explain what you mean by this!
::: js/src/gc/StoreBuffer.h
@@ +284,2 @@
> {
> + // These definitions must match those in HeapSlot::Kind.
Maybe we could make these constants defined in some common header.
@@ +321,5 @@
> + if (other.count_ < 0)
> + count_ = -count_;
> + }
> +
> + void *deduplicationKey() const { return nullptr; }
This means that compaction doesn't do anything for SlotsEdges, but won't we still iterate all the entries anyway?
Attachment #8399749 -
Flags: review?(jcoppeard) → review+
Assignee | ||
Comment 3•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d9c6f5627855
(In reply to Jon Coppeard (:jonco) from comment #2)
>
> ::: js/src/ds/LifoAlloc.h
> @@ +477,5 @@
> > }
> > };
> > +
> > + // Return a modifiable pointer to the most recently allocated sizeof(T)
> > + // bytes. Do not use this if you value well-defined behavior in your code.
>
> It might be good to explain what you mean by this!
Ouch yeah, added some more explanation. LifoAlloc is a fully generic allocator, so looking at the last allocated bytes and casting from void* to some random T provided later is not exactly safe, except in limited circumstances.
> ::: js/src/gc/StoreBuffer.h
> @@ +284,2 @@
> > {
> > + // These definitions must match those in HeapSlot::Kind.
>
> Maybe we could make these constants defined in some common header.
I left this out, since it's not exactly clear where they should go. I agree that it needs to be done at some point though.
> @@ +321,5 @@
> > + if (other.count_ < 0)
> > + count_ = -count_;
> > + }
> > +
> > + void *deduplicationKey() const { return nullptr; }
>
> This means that compaction doesn't do anything for SlotsEdges, but won't we
> still iterate all the entries anyway?
Urk, good point. I've added supportsDeduplication and an early exit now.
https://tbpl.mozilla.org/?tree=Try&rev=42b896aaa51a
And verification that the optimizations are working as expected in:
https://tbpl.mozilla.org/?tree=Try&rev=727f89c22da9
Comment 4•11 years ago
|
||
I see a 6% improvement on kraken and an improvement on octane. The octane improvement is harder to quantify, but exists. Awesome!
Comment 5•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Hannes Verschore [:h4writer] from comment #4)
> I see a 6% improvement on kraken and an improvement on octane. The octane
> improvement is harder to quantify, but exists. Awesome!
Locally I see pdf.js going from ~72ms in minor gc to ~16ms. So, enormous, but tiny in context.
Comment 7•11 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #6)
> (In reply to Hannes Verschore [:h4writer] from comment #4)
> > I see a 6% improvement on kraken and an improvement on octane. The octane
> > improvement is harder to quantify, but exists. Awesome!
>
> Locally I see pdf.js going from ~72ms in minor gc to ~16ms. So, enormous,
> but tiny in context.
Yeah I saw the improvement on pdf.js (31.6% improvement). I just couldn't give an estimate on the full octane benchmark. It might be somewhere around 1.5% to 3%. That's why I didn't give any numbers on octane ;).
Assignee | ||
Comment 8•11 years ago
|
||
Actually. The 1.5% improvement here is 100% down to using the slot buffer instead of the generic buffer: the generic buffer is just /way/ slower than we thought. A very sad bug is currently disabling the "optimization" that was supposed to be introduced by the patch. With this optimization "working", there is actually a tiny regression over having it disabled, but nowhere near the previous slowness. The good news is that we may still be able to get >>1% out of this rock.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 9•11 years ago
|
||
Assignee | ||
Comment 10•11 years ago
|
||
I chose to store the SlotEdge kind in the sign bit of count. This was a monumentally poor choice. If we instead store the kind in the low bit of the object, everything magically gets simpler and faster, even on the worst-case test in bug 989414.
Attachment #8401486 -
Flags: review?(jcoppeard)
Comment 11•11 years ago
|
||
Comment on attachment 8401486 [details] [diff] [review]
fix_slotedge_repr-v0.diff
Review of attachment 8401486 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good!
Attachment #8401486 -
Flags: review?(jcoppeard) → review+
Assignee | ||
Comment 12•11 years ago
|
||
Assignee | ||
Comment 13•11 years ago
|
||
This uses some template chicanery to make the algorithm generic rather than the data. This allows us to use the normal after-the-fact dedup for slot edges too.
Attachment #8403633 -
Flags: review?(jcoppeard)
Assignee | ||
Updated•11 years ago
|
Keywords: leave-open
Assignee | ||
Comment 14•11 years ago
|
||
And fix the bug preventing dedup-on-entry from working and re-add the code that makes use of it since it works now.
Attachment #8403670 -
Flags: review?(jcoppeard)
Comment 15•11 years ago
|
||
Comment on attachment 8403633 [details] [diff] [review]
dedup_slot_edges_normally-v0.diff
Review of attachment 8403633 [details] [diff] [review]:
-----------------------------------------------------------------
It's nice to use the same deduplication machinery for all edge types.
Attachment #8403633 -
Flags: review?(jcoppeard) → review+
Comment 16•11 years ago
|
||
Comment on attachment 8403670 [details] [diff] [review]
fix_length_calculation-v0.diff
Review of attachment 8403670 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/ds/LifoAlloc.h
@@ -141,5 @@
> return aligned;
> }
>
> void *peek(size_t n) {
> - if (bumpBase() - bump < ptrdiff_t(n))
Sorry I missed this first time round!
Attachment #8403670 -
Flags: review?(jcoppeard) → review+
Comment 17•11 years ago
|
||
Assignee | ||
Comment 18•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/92dd49bd29b6
Try errors are know intermittents. I'd expect more bustage if there was an error here.
https://tbpl.mozilla.org/?tree=Try&rev=0c26c720abb5
Assignee | ||
Comment 19•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/a1b2bc9af796
No obvious change on AWFY with the previous patch so lets put the second one in. Same try run applies.
Assignee | ||
Comment 20•11 years ago
|
||
Looks like a tiny win on both Latency benchmarks, and a bit larger of a loss on Splay. I think I'll purge the second part and supporting machinery.
Comment 21•11 years ago
|
||
Updated•11 years ago
|
Whiteboard: [talos_regression]
Assignee | ||
Comment 23•11 years ago
|
||
As per bug 933309 and comment 20, we want to remove the duplicated dedup.
Attachment #8407717 -
Flags: review?(jcoppeard)
Comment 24•11 years ago
|
||
in addition to the osx 10.6 kraken 2.6% regression, I see a regression on win 8 for dromaeo_css by 2.18% as a result of the recent landing in here.
Updated•11 years ago
|
Attachment #8407717 -
Flags: review?(jcoppeard) → review+
Assignee | ||
Comment 25•11 years ago
|
||
Comment 26•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 27•7 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•