Investigate optimising gray root buffering

RESOLVED FIXED in Firefox 56

Status

()

enhancement
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: jonco, Assigned: jonco)

Tracking

55 Branch
mozilla56
Points:
---

Firefox Tracking Flags

(firefox56 fixed)

Details

(Whiteboard: [qf:p1])

Attachments

(3 attachments, 1 obsolete attachment)

(Assignee)

Description

2 years ago
Telemetry shows gray root buffering as the top reason in GC_SLOW_TASK.

The proper fix for this is to maintain a remembered set of these roots per zone and mark them incrementally, but this requires incremental gray marking at least (any would add some memory overhead).

In the meantime there's a couple of things we can do to speed this up a little.
(Assignee)

Comment 1

2 years ago
Change BufferGrayRootsTracer to use CallbackTracer's onFooEdge() methods rather than onChild() to remove a virtual dispatch.  Remove release asserts that weren't being triggered.
Attachment #8883899 - Flags: review?(sphink)
(Assignee)

Comment 2

2 years ago
Allow inlining of the "if markable" check for public TraceEdge functions without calling into the JS engine.
Attachment #8883900 - Flags: review?(sphink)
(Assignee)

Comment 3

2 years ago
Posted patch optimise-buffer-gray-roots-3 (obsolete) — Splinter Review
Slightly more controversial, store JS holders in a vector rather than a map to make iterating over them faster.

If we implement the remembered set approach in the future we could take this out again.
Attachment #8883901 - Flags: review?(continuation)
(Assignee)

Comment 4

2 years ago
I instrumented the code to measure the time spent buffering gray roots and calculate the time per root in nanoseconds.  I tested this repeatedly scrolling to the bottom of the facebook news feed (this generates a lot of gray roots).  These are the results for buffering > 1000 roots:

Pre: 366, 308, 363, 302, 201
 - best 201, avg 288.302
Patch 1: 197, 279, 206, 281, 168, 171
 - best 171, avg 203.475
Patch 2: 240, 137, 381, 257, 262, 179, 211, 149, 144
 - best 137, avg 188.483
Patch 3: 98, 380, 268, 210, 190, 118, 298, 169, 112, 105, 113
 - best 98, avg 147.722

Note 'best' is best average for a single collection, 'avg' is average over all collections.
(Assignee)

Updated

2 years ago
Whiteboard: [qf:p1]
Comment on attachment 8883901 [details] [diff] [review]
optimise-buffer-gray-roots-3

Review of attachment 8883901 [details] [diff] [review]:
-----------------------------------------------------------------

My main concern here is the extra memory, but maybe the additional cost every time we create or destroy a JS holder is an issue, too. Olli should look at it to see if he thinks the additional time will be a problem.

::: xpcom/base/CycleCollectedJSRuntime.h
@@ +318,5 @@
>  
>    mozilla::TimeStamp mLatestNurseryCollectionStart;
>  
> +  Vector<JSHolder, 0> mJSHolders;
> +  nsDataHashtable<nsPtrHashKey<void>, uint32_t> mJSHolderMap;

Hmm, I wonder if this should be size_t, not uint32_t, to match the return type of Vector::length(). I guess it isn't too likely we have 2 billion objects...
Attachment #8883901 - Flags: review?(continuation)
Attachment #8883901 - Flags: review?(bugs)
Attachment #8883901 - Flags: review+
Comment on attachment 8883901 [details] [diff] [review]
optimise-buffer-gray-roots-3


>-  nsDataHashtable<nsPtrHashKey<void>, nsScriptObjectTracer*> mJSHolders;
>+  Vector<JSHolder, 0> mJSHolders;
This looks malloc heavy (when increasing the vector size), and requires large continuous memory area. Could we use SegmentedVector? Remember that there can
be hundreds of thousands of holders. (and yes, the hashtable is an issue already, but let's not make the issue worse.)


Have you tested microbenchmarks which create lots of JS holders.
Say something like
for (var i = 0; i < 1000000; ++i) {
  document.createElement("div").randomExpando = document;
}

>+  nsDataHashtable<nsPtrHashKey<void>, uint32_t> mJSHolderMap;
Hmm, what does the value point to? Is that now the index in vector? Looks like so.

Could we do something better?
SegmentedVector for JSHolders.
Hashtable could keep a pointer to JSholder as value, and when holder is removed, do similar check that whether it was the last one, and if not
swap.
It is hopefully fast to access the last item of SegmentedVector. 
And if it isn't, then no swapping or anything, but the JSHolder would be just marked as 'gone' and next time the whole segmentedvector is iterated, it could be compacted.
PurpleBuffer does such compacting http://searchfox.org/mozilla-central/rev/f1472e5b57fea8d4a7bbdd81c44e46958fe3c1ce/xpcom/base/nsCycleCollector.cpp#1059
Though, compacting does make iterating slower.

Anyhow, I'm worried about the Vector usage here.
Attachment #8883901 - Flags: review?(bugs) → review-
Accessing the last item of a SegmentedVector is fast, I believe, but the patch requires random access to the vector, which SegmentedVector does not have (and cannot implement in constant time, without an additional vector of pointers to segments).
yes, which is why I proposed to not use index as value but pointer. One wouldn't need to random access using index.
Attachment #8883899 - Flags: review?(sphink) → review+
Comment on attachment 8883900 [details] [diff] [review]
optimise-buffer-gray-roots-2

Review of attachment 8883900 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/public/RootingAPI.h
@@ +1514,5 @@
>  }
>  
>  #undef DELETE_ASSIGNMENT_OPS
>  
> +// Define the TraceEdge functions declared in TracingAPI.h.

I imagine there's a good reason these need to be here rather in TracingAPI.h. The comment should explain it. And I should think hard about it, whatever it is, to try to avoid it even though I'm sure you've already done that.
Attachment #8883900 - Flags: review?(sphink)
(Assignee)

Comment 10

2 years ago
(In reply to Steve Fink [:sfink] [:s:] from comment #9)
> I imagine there's a good reason these need to be here rather in
> TracingAPI.h.

They depend on the definitions of Heap and TenuredHeap above.  Including RootingAPI.h in TracingAPI.h leads to some horrible cyclic dependency problem that I wasn't able to refactor away.  I'll add a comment.
(Assignee)

Comment 11

2 years ago
Thanks for the feedback.  Patch updated as suggested.
Attachment #8883901 - Attachment is obsolete: true
Attachment #8884873 - Flags: review?(bugs)
Comment on attachment 8884873 [details] [diff] [review]
optimise-buffer-gray-roots-3 v2

Could you still try some microbenchmark.
Perhaps

data:Text/html,<script>for (var i = 0; i < 1000000; ++i) { document.documentElement.appendChild(document.createElement("div"));} var s = performance.now(); var el = document.documentElement.firstChild; while(el) { el.expando = document; el = el.nextSibling;} alert((performance.now() - s) + "ms"); </script>
Attachment #8884873 - Flags: review?(bugs) → review+
(In reply to Jon Coppeard (:jonco) from comment #10)
> (In reply to Steve Fink [:sfink] [:s:] from comment #9)
> > I imagine there's a good reason these need to be here rather in
> > TracingAPI.h.
> 
> They depend on the definitions of Heap and TenuredHeap above.  Including
> RootingAPI.h in TracingAPI.h leads to some horrible cyclic dependency
> problem that I wasn't able to refactor away.  I'll add a comment.

I certainly wouldn't want to add that dependency. But putting tracing functions into RootingAPI.h seems bad too.

I experimented with this. As I see it, we just want these to be visible to anything that includes both RootingAPI.h and TracingAPI.h (and they'll have to, if they need both rooting and tracing.) Which means that as long as the compiler doesn't need to compile these things immediately, we should be ok. The specific problem seems to be the declaration

    inline void
    TraceEdge(JSTracer* trc, JS::TenuredHeap<JSObject*>* thingp, const char* name)
    { ... }

...but I don't even understand why that makes sense in the first place, since TraceEdge is a templated function. I would have expected anything including both RootingAPI.h and gc/Tracer.h to complain about this, but it doesn't, so I'm wrong.

Anyway, it works if you move all of the tracing stuff to TracingAPI.h but change the above to

    template <typename T>
    inline void
    TraceEdge(JSTracer* trc, JS::TenuredHeap<T>* thingp, const char* name)

which makes more sense to me anyway, since now we have specializations for Heap<T> and TenuredHeap<T> in TracingAPI.h, and WriteBarrieredBase<T> and ReadBarriered<T> in gc/Tracer.h, and don't have a mixture of template specializations and plain functions.

Given that you were specifically doing JSObject* there, my first attempt was to declare a generic template <T> TraceEdge(..., T*, ...) and then a specific specialization template <> TraceEdge(..., TenuredHeap<JSObject*>*, ...), but that doesn't work because there's no template declaration to specialize. And if we did do a generic specialization (eg to static_assert that you are missing a specialization), then we'd lose PreBarriered<T>* inheriting from WriteBarrieredBase<T>*. (Trust me, I tried, since I was confused about how it was working in the first place.)

tl;dr: I think if you define template<T> TraceEdge(..., TenuredHeap<T>*, ...) then you can move this stuff into TracingAPI.h and everything should Just Work. If someone uses TenuredHeap<JSObject*> in a way that requires TraceEdge, and neglects to #include both headers, it should still fail to compile.

Let me know if I'm wrong! :-)
Comment on attachment 8883900 [details] [diff] [review]
optimise-buffer-gray-roots-2

Review of attachment 8883900 [details] [diff] [review]:
-----------------------------------------------------------------

Oops, that should've been a review comment. r+ with this stuff placed into TracingAPI.h. Assuming I'm not missing something else that prevents that from working.
Attachment #8883900 - Flags: review+
(Assignee)

Comment 15

2 years ago
(In reply to Steve Fink [:sfink] [:s:] from comment #13)
> tl;dr: I think if you define template<T> TraceEdge(..., TenuredHeap<T>*,
> ...) then you can move this stuff into TracingAPI.h and everything should
> Just Work. 

Ah yes, that fixes it.  Thanks, that is much better.
(Assignee)

Comment 16

2 years ago
(In reply to Olli Pettay [:smaug] from comment #12)

Sure.

Pre change: 1179 447 622 618 738 483     (average 681)
Post change: 860 684 662 908 510 753 620 (average 714)
(Assignee)

Comment 17

2 years ago
Olli, does that ^ sound OK to you?
Flags: needinfo?(bugs)
I think so. It isn't any 2x slowdown or so.
Flags: needinfo?(bugs)

Comment 19

2 years ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9536d61e1167
Speed up gray root buffering tracer by removing a virtual dispatch and some release asserts r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/76b06d8c3204
Allow inlining of TraceEdge API's null check r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/c4b3e9bd89bf
Store JS holders in a vector for faster iteration r=smaug

Comment 20

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/9536d61e1167
https://hg.mozilla.org/mozilla-central/rev/76b06d8c3204
https://hg.mozilla.org/mozilla-central/rev/c4b3e9bd89bf
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.