Closed Bug 990336 Opened 10 years ago Closed 10 years ago

Improve the performance of DenseRangeWriteBarrierPost

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla31

People

(Reporter: terrence, Assigned: terrence)

References

(Blocks 1 open bug)

Details

(Whiteboard: [talos_regression])

Attachments

(6 files)

fix_dense_range_ref-v0.diff 10 years ago Terrence Cole [:terrence] 22.95 KB, patch	jonco : review+	Details \| Diff \| Splinter Review
diff_of_shame.diff 10 years ago Terrence Cole [:terrence] 869 bytes, patch		Details \| Diff \| Splinter Review
fix_slotedge_repr-v0.diff 10 years ago Terrence Cole [:terrence] 5.07 KB, patch	jonco : review+	Details \| Diff \| Splinter Review
dedup_slot_edges_normally-v0.diff 10 years ago Terrence Cole [:terrence] 9.96 KB, patch	jonco : review+	Details \| Diff \| Splinter Review
fix_length_calculation-v0.diff 10 years ago Terrence Cole [:terrence] 1.68 KB, patch	jonco : review+	Details \| Diff \| Splinter Review
backout_dedup_on_insert-v0.diff 10 years ago Terrence Cole [:terrence] 5.92 KB, patch	jonco : review+	Details \| Diff \| Splinter Review

Terrence Cole [:terrence]

Assignee

Description

•

10 years ago

Attached patch fix_dense_range_ref-v0.diff — Details — Splinter Review

This is impacting at least kraken's json-parse-financial and octane's TypeScript and Pdf.js benchmarks. The problem here is that we are ended up shoving thousands of these in the generic buffer, gcing too frequently, and being extra slow because of the generic buffer.

The attached patch does two things. The first is to turn the SlotEdge buffer into a SlotsEdge buffer: e.g. store the slot reference as a range rather than a single slot.
Then we can use that for both HeapSlot references and for DenseRangeRef. This is just a hair faster than going through the generic buffer, but not a real win. The second optimization is to make subsequent pushes of a new range for the same object merge their ranges, rather than creating a new entry. This dramatically reduces the amount of time we spend in minor GC on the benchmarks listed above.

On my desktop, this gets about 1.5% on both octane and kraken.

Note: we had an existing bug where PJS code was asserting !inRememberedSet for all edges, even ones where we don't really know the answer exactly. This happened not to bite us because we mostly only use wholeCellEdges in the jit. Now that SlotsEdge is not exact anymore, the assertion triggered right away. I've removed the assertion and renamed several methods for clarity, given that we've changed what most of them do:

inRememberedSet -> maybeInRememberedSet
location -> deduplicationKey
isNullRef -> moved the code inline into mark, where it makes more sense

Note2: I suppose we could also do the new merging optimization as a compaction pass. Doing it eagerly seems to work well enough for a first try. I may experiment more here later.

Attachment #8399749 - Flags: review?(jcoppeard)

Terrence Cole [:terrence]

Assignee

Comment 1

•

10 years ago

https://tbpl.mozilla.org/?tree=Try&rev=42b896aaa51a

Jon Coppeard (:jonco)

Comment 2

•

10 years ago

Comment on attachment 8399749 [details] [diff] [review]
fix_dense_range_ref-v0.diff

Review of attachment 8399749 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ds/LifoAlloc.h
@@ +477,5 @@
>          }
>      };
> +
> +    // Return a modifiable pointer to the most recently allocated sizeof(T)
> +    // bytes. Do not use this if you value well-defined behavior in your code.

It might be good to explain what you mean by this!

::: js/src/gc/StoreBuffer.h
@@ +284,2 @@
>      {
> +        // These definitions must match those in HeapSlot::Kind.

Maybe we could make these constants defined in some common header.

@@ +321,5 @@
> +            if (other.count_ < 0)
> +                count_ = -count_;
> +        }
> +
> +        void *deduplicationKey() const { return nullptr; }

This means that compaction doesn't do anything for SlotsEdges, but won't we still iterate all the entries anyway?

Attachment #8399749 - Flags: review?(jcoppeard) → review+

Terrence Cole [:terrence]

Assignee

Comment 3

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d9c6f5627855

(In reply to Jon Coppeard (:jonco) from comment #2)
> 
> ::: js/src/ds/LifoAlloc.h
> @@ +477,5 @@
> >          }
> >      };
> > +
> > +    // Return a modifiable pointer to the most recently allocated sizeof(T)
> > +    // bytes. Do not use this if you value well-defined behavior in your code.
> 
> It might be good to explain what you mean by this!

Ouch yeah, added some more explanation. LifoAlloc is a fully generic allocator, so looking at the last allocated bytes and casting from void* to some random T provided later is not exactly safe, except in limited circumstances.
 
> ::: js/src/gc/StoreBuffer.h
> @@ +284,2 @@
> >      {
> > +        // These definitions must match those in HeapSlot::Kind.
> 
> Maybe we could make these constants defined in some common header.

I left this out, since it's not exactly clear where they should go. I agree that it needs to be done at some point though.

> @@ +321,5 @@
> > +            if (other.count_ < 0)
> > +                count_ = -count_;
> > +        }
> > +
> > +        void *deduplicationKey() const { return nullptr; }
> 
> This means that compaction doesn't do anything for SlotsEdges, but won't we
> still iterate all the entries anyway?

Urk, good point. I've added supportsDeduplication and an early exit now.

https://tbpl.mozilla.org/?tree=Try&rev=42b896aaa51a

And verification that the optimizations are working as expected in:

https://tbpl.mozilla.org/?tree=Try&rev=727f89c22da9

Hannes Verschore [:h4writer]

Comment 4

•

10 years ago

I see a 6% improvement on kraken and an improvement on octane. The octane improvement is harder to quantify, but exists. Awesome!

Carsten Book [:Tomcat]

Comment 5

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/d9c6f5627855

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla31

Terrence Cole [:terrence]

Assignee

Comment 6

•

10 years ago

(In reply to Hannes Verschore [:h4writer] from comment #4)
> I see a 6% improvement on kraken and an improvement on octane. The octane
> improvement is harder to quantify, but exists. Awesome!

Locally I see pdf.js going from ~72ms in minor gc to ~16ms. So, enormous, but tiny in context.

Hannes Verschore [:h4writer]

Comment 7

•

10 years ago

(In reply to Terrence Cole [:terrence] from comment #6)
> (In reply to Hannes Verschore [:h4writer] from comment #4)
> > I see a 6% improvement on kraken and an improvement on octane. The octane
> > improvement is harder to quantify, but exists. Awesome!
> 
> Locally I see pdf.js going from ~72ms in minor gc to ~16ms. So, enormous,
> but tiny in context.

Yeah I saw the improvement on pdf.js (31.6% improvement). I just couldn't give an estimate on the full octane benchmark. It might be somewhere around 1.5% to 3%. That's why I didn't give any numbers on octane ;).

Terrence Cole [:terrence]

Assignee

Comment 8

•

10 years ago

Actually. The 1.5% improvement here is 100% down to using the slot buffer instead of the generic buffer: the generic buffer is just /way/ slower than we thought. A very sad bug is currently disabling the "optimization" that was supposed to be introduced by the patch. With this optimization "working", there is actually a tiny regression over having it disabled, but nowhere near the previous slowness. The good news is that we may still be able to get >>1% out of this rock.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Terrence Cole [:terrence]

Assignee

Comment 9

•

10 years ago

Attached patch diff_of_shame.diff — Details — Splinter Review

Terrence Cole [:terrence]

Assignee

Comment 10

•

10 years ago

Attached patch fix_slotedge_repr-v0.diff — Details — Splinter Review

I chose to store the SlotEdge kind in the sign bit of count. This was a monumentally poor choice. If we instead store the kind in the low bit of the object, everything magically gets simpler and faster, even on the worst-case test in bug 989414.

Attachment #8401486 - Flags: review?(jcoppeard)

Jon Coppeard (:jonco)

Comment 11

•

10 years ago

Comment on attachment 8401486 [details] [diff] [review]
fix_slotedge_repr-v0.diff

Review of attachment 8401486 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

Attachment #8401486 - Flags: review?(jcoppeard) → review+

Terrence Cole [:terrence]

Assignee

Comment 12

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/830c3c0720c3
https://tbpl.mozilla.org/?tree=Try&rev=4c3940961fba

Terrence Cole [:terrence]

Assignee

Comment 13

•

10 years ago

Attached patch dedup_slot_edges_normally-v0.diff — Details — Splinter Review

This uses some template chicanery to make the algorithm generic rather than the data. This allows us to use the normal after-the-fact dedup for slot edges too.

Attachment #8403633 - Flags: review?(jcoppeard)

Terrence Cole [:terrence]

Assignee

Updated

•

10 years ago

Keywords: leave-open

Terrence Cole [:terrence]

Assignee

Comment 14

•

10 years ago

Attached patch fix_length_calculation-v0.diff — Details — Splinter Review

And fix the bug preventing dedup-on-entry from working and re-add the code that makes use of it since it works now.

Attachment #8403670 - Flags: review?(jcoppeard)

Jon Coppeard (:jonco)

Comment 15

•

10 years ago

Comment on attachment 8403633 [details] [diff] [review]
dedup_slot_edges_normally-v0.diff

Review of attachment 8403633 [details] [diff] [review]:
-----------------------------------------------------------------

It's nice to use the same deduplication machinery for all edge types.

Attachment #8403633 - Flags: review?(jcoppeard) → review+

Jon Coppeard (:jonco)

Comment 16

•

10 years ago

Comment on attachment 8403670 [details] [diff] [review]
fix_length_calculation-v0.diff

Review of attachment 8403670 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ds/LifoAlloc.h
@@ -141,5 @@
>          return aligned;
>      }
>  
>      void *peek(size_t n) {
> -        if (bumpBase() - bump < ptrdiff_t(n))

Sorry I missed this first time round!

Attachment #8403670 - Flags: review?(jcoppeard) → review+

Carsten Book [:Tomcat]

Comment 17

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/830c3c0720c3

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Blocks: 989414

Terrence Cole [:terrence]

Assignee

Comment 18

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/92dd49bd29b6

Try errors are know intermittents. I'd expect more bustage if there was an error here.
https://tbpl.mozilla.org/?tree=Try&rev=0c26c720abb5

Terrence Cole [:terrence]

Assignee

Comment 19

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a1b2bc9af796

No obvious change on AWFY with the previous patch so lets put the second one in. Same try run applies.

Terrence Cole [:terrence]

Assignee

Comment 20

•

10 years ago

Looks like a tiny win on both Latency benchmarks, and a bit larger of a loss on Splay. I think I'll purge the second part and supporting machinery.

Carsten Book [:Tomcat]

Comment 21

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/92dd49bd29b6
https://hg.mozilla.org/mozilla-central/rev/a1b2bc9af796

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

10 years ago

Blocks: 997146

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

10 years ago

Whiteboard: [talos_regression]

Terrence Cole [:terrence]

Assignee

Comment 23

•

10 years ago

Attached patch backout_dedup_on_insert-v0.diff — Details — Splinter Review

As per bug 933309 and comment 20, we want to remove the duplicated dedup.

Attachment #8407717 - Flags: review?(jcoppeard)

Joel Maher ( :jmaher ) (UTC -8)

Comment 24

•

10 years ago

in addition to the osx 10.6 kraken 2.6% regression, I see a regression on win 8 for dromaeo_css by 2.18% as a result of the recent landing in here.

Jon Coppeard (:jonco)

Updated

•

10 years ago

Attachment #8407717 - Flags: review?(jcoppeard) → review+

Terrence Cole [:terrence]

Assignee

Comment 25

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/7340e1b5c7e1

Ryan VanderMeulen [:RyanVM]

Comment 26

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/7340e1b5c7e1

Terrence Cole [:terrence]

Assignee

Updated

•

10 years ago

Status: REOPENED → RESOLVED

Closed: 10 years ago → 10 years ago

Resolution: --- → FIXED

Firefox Bug Husbandry Bot

Comment 27

•

6 years ago

Removing leave-open keyword from resolved bugs, per :sylvestre.

Keywords: leave-open

Paul Bone [:pbone]

Updated

•

5 years ago

Blocks: 1507445

You need to log in before you can comment on or make changes to this bug.