Closed
Bug 1378717
Opened 7 years ago
Closed 7 years ago
Investigate optimising gray root buffering
Categories
(Core :: JavaScript: GC, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox56 | --- | fixed |
People
(Reporter: jonco, Assigned: jonco)
Details
Attachments
(3 files, 1 obsolete file)
3.21 KB,
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
6.06 KB,
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
6.94 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
Telemetry shows gray root buffering as the top reason in GC_SLOW_TASK.
The proper fix for this is to maintain a remembered set of these roots per zone and mark them incrementally, but this requires incremental gray marking at least (any would add some memory overhead).
In the meantime there's a couple of things we can do to speed this up a little.
Assignee | ||
Comment 1•7 years ago
|
||
Change BufferGrayRootsTracer to use CallbackTracer's onFooEdge() methods rather than onChild() to remove a virtual dispatch. Remove release asserts that weren't being triggered.
Attachment #8883899 -
Flags: review?(sphink)
Assignee | ||
Comment 2•7 years ago
|
||
Allow inlining of the "if markable" check for public TraceEdge functions without calling into the JS engine.
Attachment #8883900 -
Flags: review?(sphink)
Assignee | ||
Comment 3•7 years ago
|
||
Slightly more controversial, store JS holders in a vector rather than a map to make iterating over them faster.
If we implement the remembered set approach in the future we could take this out again.
Attachment #8883901 -
Flags: review?(continuation)
Assignee | ||
Comment 4•7 years ago
|
||
I instrumented the code to measure the time spent buffering gray roots and calculate the time per root in nanoseconds. I tested this repeatedly scrolling to the bottom of the facebook news feed (this generates a lot of gray roots). These are the results for buffering > 1000 roots:
Pre: 366, 308, 363, 302, 201
- best 201, avg 288.302
Patch 1: 197, 279, 206, 281, 168, 171
- best 171, avg 203.475
Patch 2: 240, 137, 381, 257, 262, 179, 211, 149, 144
- best 137, avg 188.483
Patch 3: 98, 380, 268, 210, 190, 118, 298, 169, 112, 105, 113
- best 98, avg 147.722
Note 'best' is best average for a single collection, 'avg' is average over all collections.
Assignee | ||
Updated•7 years ago
|
Whiteboard: [qf:p1]
Comment 5•7 years ago
|
||
Comment on attachment 8883901 [details] [diff] [review]
optimise-buffer-gray-roots-3
Review of attachment 8883901 [details] [diff] [review]:
-----------------------------------------------------------------
My main concern here is the extra memory, but maybe the additional cost every time we create or destroy a JS holder is an issue, too. Olli should look at it to see if he thinks the additional time will be a problem.
::: xpcom/base/CycleCollectedJSRuntime.h
@@ +318,5 @@
>
> mozilla::TimeStamp mLatestNurseryCollectionStart;
>
> + Vector<JSHolder, 0> mJSHolders;
> + nsDataHashtable<nsPtrHashKey<void>, uint32_t> mJSHolderMap;
Hmm, I wonder if this should be size_t, not uint32_t, to match the return type of Vector::length(). I guess it isn't too likely we have 2 billion objects...
Attachment #8883901 -
Flags: review?(continuation)
Attachment #8883901 -
Flags: review?(bugs)
Attachment #8883901 -
Flags: review+
Comment 6•7 years ago
|
||
Comment on attachment 8883901 [details] [diff] [review]
optimise-buffer-gray-roots-3
>- nsDataHashtable<nsPtrHashKey<void>, nsScriptObjectTracer*> mJSHolders;
>+ Vector<JSHolder, 0> mJSHolders;
This looks malloc heavy (when increasing the vector size), and requires large continuous memory area. Could we use SegmentedVector? Remember that there can
be hundreds of thousands of holders. (and yes, the hashtable is an issue already, but let's not make the issue worse.)
Have you tested microbenchmarks which create lots of JS holders.
Say something like
for (var i = 0; i < 1000000; ++i) {
document.createElement("div").randomExpando = document;
}
>+ nsDataHashtable<nsPtrHashKey<void>, uint32_t> mJSHolderMap;
Hmm, what does the value point to? Is that now the index in vector? Looks like so.
Could we do something better?
SegmentedVector for JSHolders.
Hashtable could keep a pointer to JSholder as value, and when holder is removed, do similar check that whether it was the last one, and if not
swap.
It is hopefully fast to access the last item of SegmentedVector.
And if it isn't, then no swapping or anything, but the JSHolder would be just marked as 'gone' and next time the whole segmentedvector is iterated, it could be compacted.
PurpleBuffer does such compacting http://searchfox.org/mozilla-central/rev/f1472e5b57fea8d4a7bbdd81c44e46958fe3c1ce/xpcom/base/nsCycleCollector.cpp#1059
Though, compacting does make iterating slower.
Anyhow, I'm worried about the Vector usage here.
Attachment #8883901 -
Flags: review?(bugs) → review-
Comment 7•7 years ago
|
||
Accessing the last item of a SegmentedVector is fast, I believe, but the patch requires random access to the vector, which SegmentedVector does not have (and cannot implement in constant time, without an additional vector of pointers to segments).
Comment 8•7 years ago
|
||
yes, which is why I proposed to not use index as value but pointer. One wouldn't need to random access using index.
Updated•7 years ago
|
Attachment #8883899 -
Flags: review?(sphink) → review+
Comment 9•7 years ago
|
||
Comment on attachment 8883900 [details] [diff] [review]
optimise-buffer-gray-roots-2
Review of attachment 8883900 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/public/RootingAPI.h
@@ +1514,5 @@
> }
>
> #undef DELETE_ASSIGNMENT_OPS
>
> +// Define the TraceEdge functions declared in TracingAPI.h.
I imagine there's a good reason these need to be here rather in TracingAPI.h. The comment should explain it. And I should think hard about it, whatever it is, to try to avoid it even though I'm sure you've already done that.
Attachment #8883900 -
Flags: review?(sphink)
Assignee | ||
Comment 10•7 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #9)
> I imagine there's a good reason these need to be here rather in
> TracingAPI.h.
They depend on the definitions of Heap and TenuredHeap above. Including RootingAPI.h in TracingAPI.h leads to some horrible cyclic dependency problem that I wasn't able to refactor away. I'll add a comment.
Assignee | ||
Comment 11•7 years ago
|
||
Thanks for the feedback. Patch updated as suggested.
Attachment #8883901 -
Attachment is obsolete: true
Attachment #8884873 -
Flags: review?(bugs)
Comment 12•7 years ago
|
||
Comment on attachment 8884873 [details] [diff] [review]
optimise-buffer-gray-roots-3 v2
Could you still try some microbenchmark.
Perhaps
data:Text/html,<script>for (var i = 0; i < 1000000; ++i) { document.documentElement.appendChild(document.createElement("div"));} var s = performance.now(); var el = document.documentElement.firstChild; while(el) { el.expando = document; el = el.nextSibling;} alert((performance.now() - s) + "ms"); </script>
Attachment #8884873 -
Flags: review?(bugs) → review+
Comment 13•7 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #10)
> (In reply to Steve Fink [:sfink] [:s:] from comment #9)
> > I imagine there's a good reason these need to be here rather in
> > TracingAPI.h.
>
> They depend on the definitions of Heap and TenuredHeap above. Including
> RootingAPI.h in TracingAPI.h leads to some horrible cyclic dependency
> problem that I wasn't able to refactor away. I'll add a comment.
I certainly wouldn't want to add that dependency. But putting tracing functions into RootingAPI.h seems bad too.
I experimented with this. As I see it, we just want these to be visible to anything that includes both RootingAPI.h and TracingAPI.h (and they'll have to, if they need both rooting and tracing.) Which means that as long as the compiler doesn't need to compile these things immediately, we should be ok. The specific problem seems to be the declaration
inline void
TraceEdge(JSTracer* trc, JS::TenuredHeap<JSObject*>* thingp, const char* name)
{ ... }
...but I don't even understand why that makes sense in the first place, since TraceEdge is a templated function. I would have expected anything including both RootingAPI.h and gc/Tracer.h to complain about this, but it doesn't, so I'm wrong.
Anyway, it works if you move all of the tracing stuff to TracingAPI.h but change the above to
template <typename T>
inline void
TraceEdge(JSTracer* trc, JS::TenuredHeap<T>* thingp, const char* name)
which makes more sense to me anyway, since now we have specializations for Heap<T> and TenuredHeap<T> in TracingAPI.h, and WriteBarrieredBase<T> and ReadBarriered<T> in gc/Tracer.h, and don't have a mixture of template specializations and plain functions.
Given that you were specifically doing JSObject* there, my first attempt was to declare a generic template <T> TraceEdge(..., T*, ...) and then a specific specialization template <> TraceEdge(..., TenuredHeap<JSObject*>*, ...), but that doesn't work because there's no template declaration to specialize. And if we did do a generic specialization (eg to static_assert that you are missing a specialization), then we'd lose PreBarriered<T>* inheriting from WriteBarrieredBase<T>*. (Trust me, I tried, since I was confused about how it was working in the first place.)
tl;dr: I think if you define template<T> TraceEdge(..., TenuredHeap<T>*, ...) then you can move this stuff into TracingAPI.h and everything should Just Work. If someone uses TenuredHeap<JSObject*> in a way that requires TraceEdge, and neglects to #include both headers, it should still fail to compile.
Let me know if I'm wrong! :-)
Comment 14•7 years ago
|
||
Comment on attachment 8883900 [details] [diff] [review]
optimise-buffer-gray-roots-2
Review of attachment 8883900 [details] [diff] [review]:
-----------------------------------------------------------------
Oops, that should've been a review comment. r+ with this stuff placed into TracingAPI.h. Assuming I'm not missing something else that prevents that from working.
Attachment #8883900 -
Flags: review+
Assignee | ||
Comment 15•7 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #13)
> tl;dr: I think if you define template<T> TraceEdge(..., TenuredHeap<T>*,
> ...) then you can move this stuff into TracingAPI.h and everything should
> Just Work.
Ah yes, that fixes it. Thanks, that is much better.
Assignee | ||
Comment 16•7 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #12)
Sure.
Pre change: 1179 447 622 618 738 483 (average 681)
Post change: 860 684 662 908 510 753 620 (average 714)
Comment 19•7 years ago
|
||
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9536d61e1167
Speed up gray root buffering tracer by removing a virtual dispatch and some release asserts r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/76b06d8c3204
Allow inlining of TraceEdge API's null check r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/c4b3e9bd89bf
Store JS holders in a vector for faster iteration r=smaug
Comment 20•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9536d61e1167
https://hg.mozilla.org/mozilla-central/rev/76b06d8c3204
https://hg.mozilla.org/mozilla-central/rev/c4b3e9bd89bf
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Updated•3 years ago
|
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in
before you can comment on or make changes to this bug.
Description
•