Closed Bug 990336 Opened 6 years ago Closed 6 years ago

Improve the performance of DenseRangeWriteBarrierPost

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: terrence, Assigned: terrence)

References

(Blocks 1 open bug)

Details

(Whiteboard: [talos_regression])

Attachments

(6 files)

This is impacting at least kraken's json-parse-financial and octane's TypeScript and Pdf.js benchmarks. The problem here is that we are ended up shoving thousands of these in the generic buffer, gcing too frequently, and being extra slow because of the generic buffer.

The attached patch does two things. The first is to turn the SlotEdge buffer into a SlotsEdge buffer: e.g. store the slot reference as a range rather than a single slot.
Then we can use that for both HeapSlot references and for DenseRangeRef. This is just a hair faster than going through the generic buffer, but not a real win. The second optimization is to make subsequent pushes of a new range for the same object merge their ranges, rather than creating a new entry. This dramatically reduces the amount of time we spend in minor GC on the benchmarks listed above.

On my desktop, this gets about 1.5% on both octane and kraken.

Note: we had an existing bug where PJS code was asserting !inRememberedSet for all edges, even ones where we don't really know the answer exactly. This happened not to bite us because we mostly only use wholeCellEdges in the jit. Now that SlotsEdge is not exact anymore, the assertion triggered right away. I've removed the assertion and renamed several methods for clarity, given that we've changed what most of them do:

inRememberedSet -> maybeInRememberedSet
location -> deduplicationKey
isNullRef -> moved the code inline into mark, where it makes more sense

Note2: I suppose we could also do the new merging optimization as a compaction pass. Doing it eagerly seems to work well enough for a first try. I may experiment more here later.
Attachment #8399749 - Flags: review?(jcoppeard)
Comment on attachment 8399749 [details] [diff] [review]
fix_dense_range_ref-v0.diff

Review of attachment 8399749 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ds/LifoAlloc.h
@@ +477,5 @@
>          }
>      };
> +
> +    // Return a modifiable pointer to the most recently allocated sizeof(T)
> +    // bytes. Do not use this if you value well-defined behavior in your code.

It might be good to explain what you mean by this!

::: js/src/gc/StoreBuffer.h
@@ +284,2 @@
>      {
> +        // These definitions must match those in HeapSlot::Kind.

Maybe we could make these constants defined in some common header.

@@ +321,5 @@
> +            if (other.count_ < 0)
> +                count_ = -count_;
> +        }
> +
> +        void *deduplicationKey() const { return nullptr; }

This means that compaction doesn't do anything for SlotsEdges, but won't we still iterate all the entries anyway?
Attachment #8399749 - Flags: review?(jcoppeard) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/d9c6f5627855

(In reply to Jon Coppeard (:jonco) from comment #2)
> 
> ::: js/src/ds/LifoAlloc.h
> @@ +477,5 @@
> >          }
> >      };
> > +
> > +    // Return a modifiable pointer to the most recently allocated sizeof(T)
> > +    // bytes. Do not use this if you value well-defined behavior in your code.
> 
> It might be good to explain what you mean by this!

Ouch yeah, added some more explanation. LifoAlloc is a fully generic allocator, so looking at the last allocated bytes and casting from void* to some random T provided later is not exactly safe, except in limited circumstances.
 
> ::: js/src/gc/StoreBuffer.h
> @@ +284,2 @@
> >      {
> > +        // These definitions must match those in HeapSlot::Kind.
> 
> Maybe we could make these constants defined in some common header.

I left this out, since it's not exactly clear where they should go. I agree that it needs to be done at some point though.

> @@ +321,5 @@
> > +            if (other.count_ < 0)
> > +                count_ = -count_;
> > +        }
> > +
> > +        void *deduplicationKey() const { return nullptr; }
> 
> This means that compaction doesn't do anything for SlotsEdges, but won't we
> still iterate all the entries anyway?

Urk, good point. I've added supportsDeduplication and an early exit now.

https://tbpl.mozilla.org/?tree=Try&rev=42b896aaa51a

And verification that the optimizations are working as expected in:

https://tbpl.mozilla.org/?tree=Try&rev=727f89c22da9
I see a 6% improvement on kraken and an improvement on octane. The octane improvement is harder to quantify, but exists. Awesome!
https://hg.mozilla.org/mozilla-central/rev/d9c6f5627855
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
(In reply to Hannes Verschore [:h4writer] from comment #4)
> I see a 6% improvement on kraken and an improvement on octane. The octane
> improvement is harder to quantify, but exists. Awesome!

Locally I see pdf.js going from ~72ms in minor gc to ~16ms. So, enormous, but tiny in context.
(In reply to Terrence Cole [:terrence] from comment #6)
> (In reply to Hannes Verschore [:h4writer] from comment #4)
> > I see a 6% improvement on kraken and an improvement on octane. The octane
> > improvement is harder to quantify, but exists. Awesome!
> 
> Locally I see pdf.js going from ~72ms in minor gc to ~16ms. So, enormous,
> but tiny in context.

Yeah I saw the improvement on pdf.js (31.6% improvement). I just couldn't give an estimate on the full octane benchmark. It might be somewhere around 1.5% to 3%. That's why I didn't give any numbers on octane ;).
Actually. The 1.5% improvement here is 100% down to using the slot buffer instead of the generic buffer: the generic buffer is just /way/ slower than we thought. A very sad bug is currently disabling the "optimization" that was supposed to be introduced by the patch. With this optimization "working", there is actually a tiny regression over having it disabled, but nowhere near the previous slowness. The good news is that we may still be able to get >>1% out of this rock.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I chose to store the SlotEdge kind in the sign bit of count. This was a monumentally poor choice. If we instead store the kind in the low bit of the object, everything magically gets simpler and faster, even on the worst-case test in bug 989414.
Attachment #8401486 - Flags: review?(jcoppeard)
Comment on attachment 8401486 [details] [diff] [review]
fix_slotedge_repr-v0.diff

Review of attachment 8401486 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8401486 - Flags: review?(jcoppeard) → review+
This uses some template chicanery to make the algorithm generic rather than the data. This allows us to use the normal after-the-fact dedup for slot edges too.
Attachment #8403633 - Flags: review?(jcoppeard)
Keywords: leave-open
And fix the bug preventing dedup-on-entry from working and re-add the code that makes use of it since it works now.
Attachment #8403670 - Flags: review?(jcoppeard)
Comment on attachment 8403633 [details] [diff] [review]
dedup_slot_edges_normally-v0.diff

Review of attachment 8403633 [details] [diff] [review]:
-----------------------------------------------------------------

It's nice to use the same deduplication machinery for all edge types.
Attachment #8403633 - Flags: review?(jcoppeard) → review+
Comment on attachment 8403670 [details] [diff] [review]
fix_length_calculation-v0.diff

Review of attachment 8403670 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ds/LifoAlloc.h
@@ -141,5 @@
>          return aligned;
>      }
>  
>      void *peek(size_t n) {
> -        if (bumpBase() - bump < ptrdiff_t(n))

Sorry I missed this first time round!
Attachment #8403670 - Flags: review?(jcoppeard) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/92dd49bd29b6

Try errors are know intermittents. I'd expect more bustage if there was an error here.
https://tbpl.mozilla.org/?tree=Try&rev=0c26c720abb5
https://hg.mozilla.org/integration/mozilla-inbound/rev/a1b2bc9af796

No obvious change on AWFY with the previous patch so lets put the second one in. Same try run applies.
Looks like a tiny win on both Latency benchmarks, and a bit larger of a loss on Splay. I think I'll purge the second part and supporting machinery.
Duplicate of this bug: 997146
Whiteboard: [talos_regression]
As per bug 933309 and comment 20, we want to remove the duplicated dedup.
Attachment #8407717 - Flags: review?(jcoppeard)
in addition to the osx 10.6 kraken 2.6% regression, I see a regression on win 8 for dromaeo_css by 2.18% as a result of the recent landing in here.
Attachment #8407717 - Flags: review?(jcoppeard) → review+
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Blocks: 1507445
You need to log in before you can comment on or make changes to this bug.