Closed Bug 1378717 Opened 7 years ago Closed 7 years ago

Investigate optimising gray root buffering

Categories

(Core :: JavaScript: GC, enhancement)

55 Branch
enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Performance Impact high
Tracking Status
firefox56 --- fixed

People

(Reporter: jonco, Assigned: jonco)

Details

Attachments

(3 files, 1 obsolete file)

Telemetry shows gray root buffering as the top reason in GC_SLOW_TASK. The proper fix for this is to maintain a remembered set of these roots per zone and mark them incrementally, but this requires incremental gray marking at least (any would add some memory overhead). In the meantime there's a couple of things we can do to speed this up a little.
Change BufferGrayRootsTracer to use CallbackTracer's onFooEdge() methods rather than onChild() to remove a virtual dispatch. Remove release asserts that weren't being triggered.
Attachment #8883899 - Flags: review?(sphink)
Allow inlining of the "if markable" check for public TraceEdge functions without calling into the JS engine.
Attachment #8883900 - Flags: review?(sphink)
Attached patch optimise-buffer-gray-roots-3 (obsolete) — Splinter Review
Slightly more controversial, store JS holders in a vector rather than a map to make iterating over them faster. If we implement the remembered set approach in the future we could take this out again.
Attachment #8883901 - Flags: review?(continuation)
I instrumented the code to measure the time spent buffering gray roots and calculate the time per root in nanoseconds. I tested this repeatedly scrolling to the bottom of the facebook news feed (this generates a lot of gray roots). These are the results for buffering > 1000 roots: Pre: 366, 308, 363, 302, 201 - best 201, avg 288.302 Patch 1: 197, 279, 206, 281, 168, 171 - best 171, avg 203.475 Patch 2: 240, 137, 381, 257, 262, 179, 211, 149, 144 - best 137, avg 188.483 Patch 3: 98, 380, 268, 210, 190, 118, 298, 169, 112, 105, 113 - best 98, avg 147.722 Note 'best' is best average for a single collection, 'avg' is average over all collections.
Whiteboard: [qf:p1]
Comment on attachment 8883901 [details] [diff] [review] optimise-buffer-gray-roots-3 Review of attachment 8883901 [details] [diff] [review]: ----------------------------------------------------------------- My main concern here is the extra memory, but maybe the additional cost every time we create or destroy a JS holder is an issue, too. Olli should look at it to see if he thinks the additional time will be a problem. ::: xpcom/base/CycleCollectedJSRuntime.h @@ +318,5 @@ > > mozilla::TimeStamp mLatestNurseryCollectionStart; > > + Vector<JSHolder, 0> mJSHolders; > + nsDataHashtable<nsPtrHashKey<void>, uint32_t> mJSHolderMap; Hmm, I wonder if this should be size_t, not uint32_t, to match the return type of Vector::length(). I guess it isn't too likely we have 2 billion objects...
Attachment #8883901 - Flags: review?(continuation)
Attachment #8883901 - Flags: review?(bugs)
Attachment #8883901 - Flags: review+
Comment on attachment 8883901 [details] [diff] [review] optimise-buffer-gray-roots-3 >- nsDataHashtable<nsPtrHashKey<void>, nsScriptObjectTracer*> mJSHolders; >+ Vector<JSHolder, 0> mJSHolders; This looks malloc heavy (when increasing the vector size), and requires large continuous memory area. Could we use SegmentedVector? Remember that there can be hundreds of thousands of holders. (and yes, the hashtable is an issue already, but let's not make the issue worse.) Have you tested microbenchmarks which create lots of JS holders. Say something like for (var i = 0; i < 1000000; ++i) { document.createElement("div").randomExpando = document; } >+ nsDataHashtable<nsPtrHashKey<void>, uint32_t> mJSHolderMap; Hmm, what does the value point to? Is that now the index in vector? Looks like so. Could we do something better? SegmentedVector for JSHolders. Hashtable could keep a pointer to JSholder as value, and when holder is removed, do similar check that whether it was the last one, and if not swap. It is hopefully fast to access the last item of SegmentedVector. And if it isn't, then no swapping or anything, but the JSHolder would be just marked as 'gone' and next time the whole segmentedvector is iterated, it could be compacted. PurpleBuffer does such compacting http://searchfox.org/mozilla-central/rev/f1472e5b57fea8d4a7bbdd81c44e46958fe3c1ce/xpcom/base/nsCycleCollector.cpp#1059 Though, compacting does make iterating slower. Anyhow, I'm worried about the Vector usage here.
Attachment #8883901 - Flags: review?(bugs) → review-
Accessing the last item of a SegmentedVector is fast, I believe, but the patch requires random access to the vector, which SegmentedVector does not have (and cannot implement in constant time, without an additional vector of pointers to segments).
yes, which is why I proposed to not use index as value but pointer. One wouldn't need to random access using index.
Attachment #8883899 - Flags: review?(sphink) → review+
Comment on attachment 8883900 [details] [diff] [review] optimise-buffer-gray-roots-2 Review of attachment 8883900 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/public/RootingAPI.h @@ +1514,5 @@ > } > > #undef DELETE_ASSIGNMENT_OPS > > +// Define the TraceEdge functions declared in TracingAPI.h. I imagine there's a good reason these need to be here rather in TracingAPI.h. The comment should explain it. And I should think hard about it, whatever it is, to try to avoid it even though I'm sure you've already done that.
Attachment #8883900 - Flags: review?(sphink)
(In reply to Steve Fink [:sfink] [:s:] from comment #9) > I imagine there's a good reason these need to be here rather in > TracingAPI.h. They depend on the definitions of Heap and TenuredHeap above. Including RootingAPI.h in TracingAPI.h leads to some horrible cyclic dependency problem that I wasn't able to refactor away. I'll add a comment.
Thanks for the feedback. Patch updated as suggested.
Attachment #8883901 - Attachment is obsolete: true
Attachment #8884873 - Flags: review?(bugs)
Comment on attachment 8884873 [details] [diff] [review] optimise-buffer-gray-roots-3 v2 Could you still try some microbenchmark. Perhaps data:Text/html,<script>for (var i = 0; i < 1000000; ++i) { document.documentElement.appendChild(document.createElement("div"));} var s = performance.now(); var el = document.documentElement.firstChild; while(el) { el.expando = document; el = el.nextSibling;} alert((performance.now() - s) + "ms"); </script>
Attachment #8884873 - Flags: review?(bugs) → review+
(In reply to Jon Coppeard (:jonco) from comment #10) > (In reply to Steve Fink [:sfink] [:s:] from comment #9) > > I imagine there's a good reason these need to be here rather in > > TracingAPI.h. > > They depend on the definitions of Heap and TenuredHeap above. Including > RootingAPI.h in TracingAPI.h leads to some horrible cyclic dependency > problem that I wasn't able to refactor away. I'll add a comment. I certainly wouldn't want to add that dependency. But putting tracing functions into RootingAPI.h seems bad too. I experimented with this. As I see it, we just want these to be visible to anything that includes both RootingAPI.h and TracingAPI.h (and they'll have to, if they need both rooting and tracing.) Which means that as long as the compiler doesn't need to compile these things immediately, we should be ok. The specific problem seems to be the declaration inline void TraceEdge(JSTracer* trc, JS::TenuredHeap<JSObject*>* thingp, const char* name) { ... } ...but I don't even understand why that makes sense in the first place, since TraceEdge is a templated function. I would have expected anything including both RootingAPI.h and gc/Tracer.h to complain about this, but it doesn't, so I'm wrong. Anyway, it works if you move all of the tracing stuff to TracingAPI.h but change the above to template <typename T> inline void TraceEdge(JSTracer* trc, JS::TenuredHeap<T>* thingp, const char* name) which makes more sense to me anyway, since now we have specializations for Heap<T> and TenuredHeap<T> in TracingAPI.h, and WriteBarrieredBase<T> and ReadBarriered<T> in gc/Tracer.h, and don't have a mixture of template specializations and plain functions. Given that you were specifically doing JSObject* there, my first attempt was to declare a generic template <T> TraceEdge(..., T*, ...) and then a specific specialization template <> TraceEdge(..., TenuredHeap<JSObject*>*, ...), but that doesn't work because there's no template declaration to specialize. And if we did do a generic specialization (eg to static_assert that you are missing a specialization), then we'd lose PreBarriered<T>* inheriting from WriteBarrieredBase<T>*. (Trust me, I tried, since I was confused about how it was working in the first place.) tl;dr: I think if you define template<T> TraceEdge(..., TenuredHeap<T>*, ...) then you can move this stuff into TracingAPI.h and everything should Just Work. If someone uses TenuredHeap<JSObject*> in a way that requires TraceEdge, and neglects to #include both headers, it should still fail to compile. Let me know if I'm wrong! :-)
Comment on attachment 8883900 [details] [diff] [review] optimise-buffer-gray-roots-2 Review of attachment 8883900 [details] [diff] [review]: ----------------------------------------------------------------- Oops, that should've been a review comment. r+ with this stuff placed into TracingAPI.h. Assuming I'm not missing something else that prevents that from working.
Attachment #8883900 - Flags: review+
(In reply to Steve Fink [:sfink] [:s:] from comment #13) > tl;dr: I think if you define template<T> TraceEdge(..., TenuredHeap<T>*, > ...) then you can move this stuff into TracingAPI.h and everything should > Just Work. Ah yes, that fixes it. Thanks, that is much better.
(In reply to Olli Pettay [:smaug] from comment #12) Sure. Pre change: 1179 447 622 618 738 483 (average 681) Post change: 860 684 662 908 510 753 620 (average 714)
Olli, does that ^ sound OK to you?
Flags: needinfo?(bugs)
I think so. It isn't any 2x slowdown or so.
Flags: needinfo?(bugs)
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/9536d61e1167 Speed up gray root buffering tracer by removing a virtual dispatch and some release asserts r=sfink https://hg.mozilla.org/integration/mozilla-inbound/rev/76b06d8c3204 Allow inlining of TraceEdge API's null check r=sfink https://hg.mozilla.org/integration/mozilla-inbound/rev/c4b3e9bd89bf Store JS holders in a vector for faster iteration r=smaug
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: