Closed
Bug 708382
Opened 13 years ago
Closed 13 years ago
investigate faster GC scanning of objects
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: igor, Assigned: igor)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
Attachments
(2 files, 1 obsolete file)
8.37 KB,
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
27.67 KB,
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
During investigation for the bug 706062 I observed extreme sensitivity of the GC marking time during V8 to the order of GC graph traversal, the tail recursion elimination and to the way we currently split between marking of large items and other object slots.
The attached patch tries to optimize the GC marking based on that information. In particular:
1. All marking stacks are merged into one that uses tagged pointers or assumed knowledge to distinguish between different stack elements. LargeItem are replaced by explicit pushing into the stack of (value_array_start, value_array_end) pair.
2. ScanRope now also pops pushed ropes from the stack allowing to assume that the stack is not changed after it returns.
3. The patch tries to eliminate the tail recursion whenever possible
With those changes in JS shell on a 3.0 GHz Linux box the GC timing during V8-splay are reduced from (using MOZ_GCTIMER=stdout):
30.878000 30.765000 0.083000
32.662000 31.993000 0.058000
32.939000 32.324000 0.068000
33.136000 33.062000 0.057000
33.111000 33.039000 0.048000
33.733000 33.665000 0.047000
33.639000 33.533000 0.071000
33.736000 33.667000 0.048000
33.863000 33.768000 0.076000
to:
22.326000 22.214000 0.084000
23.232000 22.584000 0.057000
23.201000 23.115000 0.070000
23.801000 23.715000 0.065000
24.441000 24.376000 0.047000
24.217000 24.147000 0.050000
24.768000 24.676000 0.070000
24.622000 24.556000 0.048000
24.600000 24.503000 0.077000
That is, roughly 30% speed-up. I will investigate if the win persists during normal browser session.
Assignee | ||
Comment 2•13 years ago
|
||
This part makes ScanRope to pop the stack until all ropes pushed by the function are processed. This improves string scanning localization and allows to merge object and string stacks.
Attachment #579798 -
Attachment is obsolete: true
Attachment #580039 -
Flags: review?(bhackett1024)
Assignee | ||
Comment 3•13 years ago
|
||
This patch merges all the mark stack into one using the tagged pointers to distinguish between the pushed item kinds. The patch tries to eliminate push/pop pair when it is know what would be pushed. Instead it just jumps to the necessary code directly.
Here is the output MOZ_GC_TIMER=stdout output:
Current TIP part1 + part2
During V8
33.614000 31.979000 1.583000 25.019000 24.387000 0.589000
32.987000 32.357000 0.587000 25.630000 25.031000 0.555000
33.667000 33.057000 0.560000 26.277000 25.200000 0.513000
34.289000 33.719000 0.528000 26.138000 25.576000 0.521000
34.436000 33.869000 0.525000 26.333000 25.690000 0.542000
35.022000 34.425000 0.553000 26.304000 25.744000 0.516000
34.800000 34.226000 0.531000 26.287000 25.705000 0.539000
34.948000 34.354000 0.550000 26.220000 25.628000 0.547000
10.334000 8.233000 2.033000 7.863000 6.779000 1.016000
After http://gregor-wagner.com/tmp/mem
opens all 150+ tabs
422.587000 329.048000 90.807000 390.314000 300.842000 87.583000
406.061000 323.988000 80.499000 380.170000 297.208000 81.493000
404.894000 323.755000 79.629000 380.071000 297.395000 81.230000
During V8 with all 150+ tabs still
openned
418.272000 326.218000 89.835000 391.836000 300.919000 88.592000
402.318000 323.612000 77.362000 381.030000 299.589000 80.095000
402.594000 324.181000 77.073000 380.389000 299.487000 79.555000
402.212000 323.569000 77.302000 380.374000 299.660000 79.367000
55.005000 42.561000 11.376000 46.795000 34.127000 11.583000
54.616000 42.247000 11.046000 46.906000 34.554000 11.303000
54.881000 42.745000 11.036000 46.889000 34.759000 11.091000
55.779000 43.685000 11.032000 47.499000 35.273000 11.100000
56.463000 44.186000 11.132000 47.860000 35.694000 11.121000
56.296000 44.243000 10.993000 47.989000 35.824000 11.120000
56.917000 44.766000 11.031000 47.848000 35.682000 11.110000
56.506000 44.420000 10.994000 48.553000 36.379000 11.133000
So V8 shows 25%-30% reduction in the GC time when only V8 tab is opened and 20% speed up with 150+ other tabs.
The global GC with 150+ opened shows consistent 5% faster timing.
This is with 32-bit browser build on 3.0 GHz Linux bux with GCC 4.6.
Attachment #580039 -
Attachment is obsolete: true
Attachment #580039 -
Flags: review?(bhackett1024)
Attachment #580041 -
Flags: review?(bhackett1024)
Comment 4•13 years ago
|
||
Comment on attachment 580041 [details] [diff] [review]
part2 - common mark stack and push/pop optimizations
This looks great, but I don't think I know enough about the mark stack internals to review this properly.
Attachment #580041 -
Flags: review?(bhackett1024) → review?(wmccloskey)
Comment 5•13 years ago
|
||
Also, does this apply on top of part 1? That patch was marked as obsolete.
Assignee | ||
Updated•13 years ago
|
Attachment #580039 -
Attachment is obsolete: false
Attachment #580039 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 6•13 years ago
|
||
(In reply to Brian Hackett (:bhackett) from comment #5)
> Also, does this apply on top of part 1? That patch was marked as obsolete.
Yes, the patch consists of two parts - the obsolete mark was accidental.
Assignee | ||
Comment 7•13 years ago
|
||
I can also 25% win in GC timing during V-splay on Atom N270 netbook:
152.306000 151.811000 0.320000 122.330000 121.841000 0.327000
156.669000 154.589000 0.222000 126.423000 124.271000 0.228000
Splay: 1765 Splay: 1728
Assignee | ||
Comment 8•13 years ago
|
||
Another reason for the wins the patch provides for the GC under V8-splay is that it scans unmarked objects stored in the slots array from smallest slot to highest slot. Without the patch the code first pushes objects to the stack in that order resulting in scanning in the reverse order.
Apparently the object graph from v8-splay benchmark is rather sensitive to the traversal order. If I change the scanning loop in processMarkStackTop from the patch to go from end to start, then the 25% win in V8 shrinks down to 10%. On he other hand http://29a.ch/2010/6/2/realtime-raytracing-in-javascript is not sensitive to that order. With or without that loop directin change the patch shows 10% win.
Comment on attachment 580039 [details] [diff] [review]
part 1 - merging string and object mark stacks
Review of attachment 580039 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jsgcmark.cpp
@@ +768,5 @@
>
> static void
> ScanRope(GCMarker *gcmarker, JSRope *rope)
> {
> + unsigned savedTos = gcmarker->objStack.tos;
Could you add a comment to the top of the function giving a brief description of what it's doing? Make it clear that, for this function to be correct, marking a string should never cause an object to be marked.
Also, it might be nice to add an accessor function to MarkStack that returns the top of the stack, but that's up to you.
@@ +770,5 @@
> ScanRope(GCMarker *gcmarker, JSRope *rope)
> {
> + unsigned savedTos = gcmarker->objStack.tos;
> + for (;;) {
> + JS_ASSERT(rope->JSString::isRope());
Before this, can you assert that rope is actually a JSTRACE_STRING (via GetGCThingTraceKind, or whatever it's called)?
@@ +805,5 @@
> + rope = static_cast<JSRope *>(gcmarker->objStack.pop());
> + } else {
> + break;
> + }
> + }
How about asserting here (at the end of ScanRope) that gcmarker->objStack.tos == savedTos?
Attachment #580039 -
Flags: review?(wmccloskey) → review+
Comment on attachment 580041 [details] [diff] [review]
part2 - common mark stack and push/pop optimizations
Review of attachment 580041 [details] [diff] [review]:
-----------------------------------------------------------------
Nice work. I hope some day compilers will handle tail recursion better, because processMarkStackTop is not very pretty.
::: js/src/jsgc.h
@@ +1644,5 @@
> + static const uintptr_t StackTagMask = 7;
> +
> + static void staticAsserts() {
> + JS_STATIC_ASSERT(StackTagMask >= uintptr_t(LastTag));
> + JS_STATIC_ASSERT(StackTagMask + 1 == sizeof(Value));
What's the second assert for?
::: js/src/jsgcmark.cpp
@@ +734,5 @@
> + * Make sure that ScanBaseShape is not recursive so its inlining
> + * is possible.
> + */
> + UnownedBaseShape *unowned = base->baseUnowned();
> + JS_OPT_ASSERT(base->compartment() == unowned->compartment());
We're no longer using JS_OPT_ASSERT in jsgcmark.cpp. If you don't want to use the usual compartment assert (since the one you have here is stronger), then could you add a macro like JS_GC_COMPARTMENT_ASSERT that can expand to either JS_ASSERT or JS_OPT_ASSERT? Then we could very easily change whether to enable these assertions in nightlies.
@@ +791,5 @@
> /*
> * When both children are ropes, set aside the right one to
> * scan it later.
> */
> + if (next && !gcmarker->stack.push(reinterpret_cast<uintptr_t>(next)))
Yikes, untagged! Maybe say something about that in the comment at the top of ScanRope (i.e., that everything above savedTos is known to be a rope, so the tag is unnecessary).
@@ +831,5 @@
> ScanString(gcmarker, str);
> }
>
> +static JS_NEVER_INLINE void
> +DelayScanningForArray(GCMarker *gcmarker, HeapValue *begin, HeapValue *end)
Could you call this DelayMarkingValueArray?
@@ +846,5 @@
> + color = gcmarker->getMarkColor();
> + } else {
> + continue;
> + }
> + if (cell->markIfUnmarked(color))
I don't understand why you always color strings black. The code is simpler if you always call getMarkColor(). Is this an optimization to avoid calling getMarkColor? It doesn't seem worth it here, since we're already in delayed marking.
@@ +854,3 @@
>
> +static inline void
> +ScanValueArray(GCMarker *gcmarker, HeapValue *array, size_t size)
I think PushValueArray would be a better name for this.
@@ +1067,5 @@
> } /* namespace gc */
>
> +inline void
> +GCMarker::processMarkStackTop()
> +{
I think this function would be easier to read if you moved scan_value_array and scan_obj to the end. They both end in return statements, so then we could think of "goto scan_obj" as sort of like a function call. And the structure of the function would be clearer, since it would look like this:
if (tag == ...) ... goto scan_value_array
else if (tag == ...) ... goto scan_obj
else if (tag == ...) ...
scan_obj:
...
return;
scan_value_array:
...
return;
Also, it would be nice to put a comment at the top of the function to say why it's written this way.
@@ +1074,5 @@
> + uintptr_t tag = addr & StackTagMask;
> + uintptr_t payload;
> + HeapValue *vp, *end;
> + JSObject *obj;
> + if (tag == ValueArrayTag) {
Extra line between the variables and the code, please.
@@ +1087,5 @@
> + const Value &v = *vp++;
> + if (v.isString()) {
> + JSString *str = v.toString();
> + if (str->markIfUnmarked())
> + ScanString(this, str);
Could you just call PushMarkStack(str) here? I would think it would get inlined.
Also, could you add a comment in PushMarkStack that you can mark strings black because the cycle collector never needs to see them.
@@ +1109,5 @@
> + unsigned nslots = obj->slotSpan();
> + vp = obj->fixedSlots();
> + if (obj->slots) {
> + unsigned nfixed = obj->numFixedSlots();
> + if (nslots > nfixed) {
Do we ever have dynamic slots and have nslots <= nfixed?
Attachment #580041 -
Flags: review?(wmccloskey) → review+
Actually, on second thought Igor, could you make sure that strings are always marked gray if the GCMarker color is gray? I don't think it actually make the cycle collector run any faster if they're marked black. And I'm worried that someone might copy-and-paste the string marking code somewhere else where we actually want to mark gray. People tend to copy-and-paste a lot in jsgcmark.cpp. (Which suggests that there's probably too much redundancy in the file, but that's for another bug.)
Assignee | ||
Comment 12•13 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #11)
> Actually, on second thought Igor, could you make sure that strings are
> always marked gray if the GCMarker color is gray?
The patch does not change that always-color-strings-in-black code. It is leftover of the 2-word strings era when we did not have spare bits in the marking bitmap for the color. If we want to change that, lets discuss it in another bug.
Assignee | ||
Comment 13•13 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #10)
> Do we ever have dynamic slots and have nslots <= nfixed?
The try server told me so ;)
Comment 14•13 years ago
|
||
(In reply to Igor Bukanov from comment #13)
> (In reply to Bill McCloskey (:billm) from comment #10)
> > Do we ever have dynamic slots and have nslots <= nfixed?
nfixed is the inline capacity which predicts eventual object size (via type information), so it makes sense that you could get objects not using their full inline capacity.
Assignee | ||
Comment 15•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/f9eee5b06c47
https://hg.mozilla.org/mozilla-central/rev/3f0c8604e2c1
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•