Last Comment Bug 616666 - switch to an explicit mark stack
: switch to an explicit mark stack
Status: RESOLVED FIXED
fixed-in-tracemonkey
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: P2 normal with 1 vote (vote)
: ---
Assigned To: Bill McCloskey (:billm)
:
Mentors:
Depends on: 654196
Blocks: 638660 IncrementalGC
  Show dependency treegraph
 
Reported: 2010-12-04 00:30 PST by Andreas Gal :gal
Modified: 2011-05-03 11:17 PDT (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (67.20 KB, patch)
2011-03-29 11:45 PDT, Bill McCloskey (:billm)
no flags Details | Diff | Review
rebased (68.80 KB, patch)
2011-04-15 17:32 PDT, Bill McCloskey (:billm)
gal: review+
Details | Diff | Review

Description Andreas Gal :gal 2010-12-04 00:30:04 PST
We should add an explicit mark work lists, or preferably 3 (string/object/xml). Instead of recurring to deal with children, push the thing onto the appropriate list and then pull from the list. The mark bit should be set the moment an object goes onto the worklist so we don't accidentally add it again. This should help locality, since we will be able to process string/object/xml in groups, improving our chances of cache hits, in particular now that we have less false sharing with compartments. I will try to hack this up with gregor during work week if we have time. Could be a nice speed boost.
Comment 1 Gregor Wagner [:gwagner] 2011-03-22 10:56:38 PDT
Andreas tried it, I tried it but both of us came to the same conclusion that there is no real performance win. You still need some "overflow" complexity and you can't get completely rid of the delayed marking scheme. 
My conclusion was that only a "threadsafe" marking stack would be a win where N threads can be used for marking. 
The performance numbers from bug 638660 show a huge speedup.
So if you redesign the marking architecture it would be nice to make it easy for multiple threads to work on it.
Comment 2 Bill McCloskey (:billm) 2011-03-22 11:43:24 PDT
Well, we need this to do incremental GC.

The marking architecture will change. I think it will make parallel marking easier, although I haven't looked too closely at that patch yet.
Comment 3 Bill McCloskey (:billm) 2011-03-29 11:45:06 PDT
Created attachment 522748 [details] [diff] [review]
patch

This patch implements an explicit mark stack. It also refactors and moves a lot of the marking code. I mainly want to do this in preparation for incremental marking, although there is also a small speedup (5-10%).

The main templated function Mark(), instead of calling TypedMarker, now works differently:
- If it's an object, it's pushed onto the object mark stack.
- XML objects go on a separate mark stack.
- Strings and shapes are marked immediately.

Then there's a function, drainMarkStack, that processes each item in the object and XML stack. In addition, if we encounter any objects with > 2048 slots (presumably arrays), they get pushed onto a special large stack. Objects in this stack have their slots pushed onto the stack in bunches, rather than all at once. Otherwise we would use up all the stack and switch to delayed marking.

These are the hot functions:
- drainMarkStack, and the functions it calls...
- ScanObject takes an object and pushes all its children onto the mark stack
- ScanShape does the same for shapes

For the cycle collector and JS_DumpHeap, I have retained a few of the old MarkChildren function. These continue to go to the trouble of naming the different fields and providing all the tracing debug info that the GC doesn't need. Although there is some code duplication between MarkChildren and ScanObject/ScanShape, I think this makes sense. The Scan functions should do whatever works best for the GC, in terms of traversal order. We may want to change the MarkChildren functions in the future to be more useful to the cycle collector (say, by skipping strings all together).

I've tried to move to a new terminology. A "mark" function sets the mark bit and schedules the object for traversal. A "scan" function actually traverses its outgoing pointers. I'd like to purge the word "trace" all together, since we also use it for the tracing JIT. At most, it should be used for the trace hook. I've retained the MarkChildren terminology only for the JSAPI users.

I've moved all the marking code to jsgcmark.cpp because it no longer needs to be inlined. I benchmarked the code before and after the move, and there was no difference from making it not be inline. I like the fact that it's now consolidated in a single file (aside from strings, which are still in jsstr.cpp).

Performance gets a little better. I measured v8-splay as well as a JS raytracer where our GC performance is terrible. On splay, marking time goes from ~50ms to ~45ms. On the raytracer, it goes from ~125ms to ~116ms. Again, though, I think that the main benefit of the patch is that it prepares us for incremental GC.
Comment 4 Gregor Wagner [:gwagner] 2011-03-29 13:02:43 PDT
Very Nice!

One question that comes up (maybe I didn't look close enough):
You always allocate the markstack when entering the GC. What about situations where we trigger a GC because we are out of memory?
Comment 5 Bill McCloskey (:billm) 2011-03-29 14:04:19 PDT
(In reply to comment #4)
> One question that comes up (maybe I didn't look close enough):
> You always allocate the markstack when entering the GC. What about situations
> where we trigger a GC because we are out of memory?

Right now if we can't allocate a mark stack, the limit field of the stack is set to 0, so we do delayed marking of everything. Maybe it would be better to try to allocate the stacks when the runtime is created, since we're less likely to OOM there. In any case, there is a fallback if we can't make a stack.
Comment 6 Nicholas Nethercote [:njn] 2011-03-30 03:18:33 PDT
(In reply to comment #3)
>
> On splay, marking time goes from ~50ms to ~45ms. On the raytracer, it goes
> from ~125ms to ~116ms.

Nice!
Comment 7 Gregor Wagner [:gwagner] 2011-03-30 17:49:43 PDT
(In reply to comment #5)
> (In reply to comment #4)
> > One question that comes up (maybe I didn't look close enough):
> > You always allocate the markstack when entering the GC. What about situations
> > where we trigger a GC because we are out of memory?
> 
> Right now if we can't allocate a mark stack, the limit field of the stack is
> set to 0, so we do delayed marking of everything. Maybe it would be better to
> try to allocate the stacks when the runtime is created, since we're less likely
> to OOM there. In any case, there is a fallback if we can't make a stack.

Yeah allocating it once sounds good. The background-free also allocates memory. We could try to use the same memory since their usage doesn't interfere.
Comment 8 Bill McCloskey (:billm) 2011-04-15 17:32:12 PDT
Created attachment 526433 [details] [diff] [review]
rebased

Please please please review this. It's very straightforward. There are a lot of GC changes coming down the pipe and I don't want it to rot. All the work I'm doing is based on top of it.
Comment 9 Andreas Gal :gal 2011-04-15 17:41:58 PDT
Comment on attachment 526433 [details] [diff] [review]
rebased

>
> 
>-    /* Remove dead wrappers from the compartment map. */
>     for (JSCompartment **c = rt->compartments.begin(); c != rt->compartments.end(); ++c)
>         (*c)->sweep(cx, releaseInterval);
> }
> 

Why is this comment being removed?
>+static void
>+ScanShape(GCMarker *gcmarker, const Shape *shape)
>+{
>+restart:

>+        if (isMarkStackEmpty()) {
>+            /*
>+             * Mark children of things that caused too deep recursion during the above
>+             * tracing. Don't do this until we're done with everything else.
>+             */

I think Brendan wants a newline here.

>+            markDelayedChildren();
>+        }
>+    }
>+}
>+

Did you shark this? Please do with some microbenchmarks to see if anything has to be inlined differently.

Great patch. Very sorry for the delay.
Comment 10 Bill McCloskey (:billm) 2011-04-18 14:13:42 PDT
(In reply to comment #9)
> >-    /* Remove dead wrappers from the compartment map. */
> 
> Why is this comment being removed?

Well, JSCompartment::sweep does a lot more than remove dead wrappers now. It just seemed confusing to me.

> >+        if (isMarkStackEmpty()) {
> >+            /*
> >+             * Mark children of things that caused too deep recursion during the above
> >+             * tracing. Don't do this until we're done with everything else.
> >+             */
> 
> I think Brendan wants a newline here.

You mean after the curly brace? I thought we didn't use a newline if the comment comes after the start of a new scope. Brendan?

> Did you shark this? Please do with some microbenchmarks to see if anything has
> to be inlined differently.

I just did a marking microbenchmark. Previously, all the marking activity was in js::gc::MarkChildren and js::gc::MarkKind. Now it's all focused in js::GCMarker::drainMarkStack, which is where it should be.
Comment 11 Andreas Gal :gal 2011-04-18 14:57:57 PDT
(In reply to comment #10)
> (In reply to comment #9)
> > >-    /* Remove dead wrappers from the compartment map. */
> > 
> > Why is this comment being removed?
> 
> Well, JSCompartment::sweep does a lot more than remove dead wrappers now. It
> just seemed confusing to me.

The comment is above the line that does what the comment says. I think its useful but you can expand it.

> 
> > >+        if (isMarkStackEmpty()) {
> > >+            /*
> > >+             * Mark children of things that caused too deep recursion during the above
> > >+             * tracing. Don't do this until we're done with everything else.
> > >+             */
> > 
> > I think Brendan wants a newline here.
> 
> You mean after the curly brace? I thought we didn't use a newline if the
> comment comes after the start of a new scope. Brendan?

Not after the brace. After the comment.

> 
> > Did you shark this? Please do with some microbenchmarks to see if anything has
> > to be inlined differently.
> 
> I just did a marking microbenchmark. Previously, all the marking activity was
> in js::gc::MarkChildren and js::gc::MarkKind. Now it's all focused in
> js::GCMarker::drainMarkStack, which is where it should be.

Awesome. Thanks!
Comment 12 Bill McCloskey (:billm) 2011-04-19 12:32:21 PDT
http://hg.mozilla.org/tracemonkey/rev/3e5aaea1ccf8
Comment 13 Chris Leary [:cdleary] (not checking bugmail) 2011-04-26 15:10:14 PDT
http://hg.mozilla.org/mozilla-central/rev/3e5aaea1ccf8
Comment 14 :Ms2ger 2011-04-27 08:09:05 PDT
Comment on attachment 526433 [details] [diff] [review]
rebased

>@@ -1513,47 +1515,31 @@ js_DestroyCachedScript(JSContext *cx, JS
> void
> js_TraceScript(JSTracer *trc, JSScript *script)
> {
>     JSAtomMap *map = &script->atomMap;
>     MarkAtomRange(trc, map->length, map->vector, "atomMap");
> 
>     if (JSScript::isValidOffset(script->objectsOffset)) {
>         JSObjectArray *objarray = script->objects();
>-        uintN i = objarray->length;
>-        do {
>-            --i;
>-            if (objarray->vector[i]) {
>-                JS_SET_TRACING_INDEX(trc, "objects", i);
>-                Mark(trc, objarray->vector[i]);
>-            }
>-        } while (i != 0);
>+        MarkObjectRange(trc, objarray->length, objarray->vector, "objects");
>     }
> 
>     if (JSScript::isValidOffset(script->regexpsOffset)) {
>         JSObjectArray *objarray = script->regexps();
>-        uintN i = objarray->length;
>-        do {
>-            --i;
>-            if (objarray->vector[i]) {
>-                JS_SET_TRACING_INDEX(trc, "regexps", i);
>-                Mark(trc, objarray->vector[i]);
>-            }
>-        } while (i != 0);
>+        MarkObjectRange(trc, objarray->length, objarray->vector, "objects");
>     }

s/"objects"/"regexps"/

Note You need to log in before you can comment on or make changes to this bug.