Closed Bug 658041 Opened 9 years ago Closed 9 years ago

Stack based marking for JSRopes

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: gwagner, Assigned: gwagner)

References

(Blocks 1 open bug)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 8 obsolete files)

For parallel marking we have to change the way rope-strings are marked.

The idea is to use a stack-based marking with some sort of delayed marking for over-recursion.
Blocks: 638660
One instance of Schorr-Waite is enough :)
Attached patch patch (obsolete) — Splinter Review
First simple version.
We can avoid the over recursion check if we do it mark-stack based.
The time for marking 5000000 ropes increases from 2.7msec to 2.9msec on my machine.
Assignee: general → anygregor
Ok this patch is wrong. It's not that easy :)
Attached patch WiP (obsolete) — Splinter Review
Reintroducing the stack limits for rope marking.
This patch is just for regression testing and still needs some correctness work. 

I haven't figured out a smart way to set the stackLimit for the gcHelperThread. Igor any ideas about that?
Attachment #534098 - Attachment is obsolete: true
(In reply to comment #4)
> I haven't figured out a smart way to set the stackLimit for the
> gcHelperThread. Igor any ideas about that?

I do not see why we need to worry about the stack limit. With the tail recursion eliminated when marking a rope the stack depth cannot exceed log2(max_string_count) so at worst we would have (32 or 64)/log2(string-size) frames. Also, handling the stack explicitly should not add too much complexity AFICS.
(In reply to comment #5)
> I do not see why we need to worry about the stack limit. With the tail
> recursion eliminated when marking a rope the stack depth cannot exceed
> log2(max_string_count) so at worst we would have (32 or
> 64)/log2(string-size) frames. Also, handling the stack explicitly should not
> add too much complexity AFICS.

This is wrong - the rope tree is not balanced and can have arbitrary depth. 
So back to the original question about the helper thread. When the thread is initialized we can calculate the limit based on the address of a variable in the first native frame of the thread.

But it looks like navigating the tree with an explicit stack can be implemented in a quite straightforward way. It could even less source lines than the native stack management.
testBug614653.js is a good testcase that blows away the stack if we don't check for overflow.
Igor do you mean an explicit stack for each GCMarker or for each Rope? So something similar to the objStack?
(In reply to comment #7)
> testBug614653.js is a good testcase that blows away the stack if we don't
> check for overflow.

What makes the ropes special compared with objects so that using native stack is better for them rather then explicit stacks. It does not look intuitive to have so different approaches.

> Igor do you mean an explicit stack for each GCMarker or for each Rope? So
> something similar to the objStack?

(In reply to comment #7)
> Igor do you mean an explicit stack for each GCMarker or for each Rope? So
> something similar to the objStack?

I was thinking about fixed-size buffer per GCMarker used as an explicit stack when marking a rope. It could be similar to objStack, but since rope has at most two children it can be coded differently.
Attached patch patch (obsolete) — Splinter Review
Explicit mark stack for ropes.
Maybe I am missing something but I get a speedup from 622ms to 557ms for the corrected version of testBug614653.js in the shell.
Attachment #534583 - Attachment is obsolete: true
Comment on attachment 534874 [details] [diff] [review]
patch

> static inline void
> PushMarkStack(GCMarker *gcmarker, JSString *str)
> {
>     JS_ASSERT_IF(gcmarker->context->runtime->gcCurrentCompartment,
>                  str->compartment() == gcmarker->context->runtime->gcCurrentCompartment
>                  || str->compartment() == gcmarker->context->runtime->atomsCompartment);
> 
>-    str->mark(gcmarker);
>+    if (str->isLinear()) {
>+        str->mark(gcmarker);
>+    } else {
>+        JSRope *rope = &str->asRope();
>+        if (!str->markIfUnmarked()) {
>+            gcmarker->pushString(rope->leftChild());
>+            gcmarker->pushString(rope->rightChild());

A tail recursion can be eliminated here at least for one of the children string to avoid going to the stack and back.

>@@ -719,16 +727,19 @@ MarkChildren(JSTracer *trc, JSXML *xml)
> 
> void
> GCMarker::drainMarkStack()
> {
>     while (!isMarkStackEmpty()) {
>         while (!objStack.isEmpty())
>             ScanObject(this, objStack.pop());
> 
>+        while (!ropeStack.isEmpty())
>+            PushMarkStack(this, ropeStack.pop());
>+

As objects may push strings to the stack while strings can only push strings, it is better to drain ropes first to minimize their stack usage.

Could you check cache misses with the patch? I suppose it can explain the speedup. Mutating the string in place to implement O(1) traversal implies writing  a word twice per string. The patch replaces that with coping the word into the stack. Our strings takes 4 words so that minimizes the write bandwidth to the cache and memory by factor of 4.
Attached patch patch (obsolete) — Splinter Review
new version with depth-first-seach like traversal.

I don't see any real performance difference to the other approach but this might be different on smaller devices.
Attachment #534874 - Attachment is obsolete: true
Hm I guess now we can end up in an endless loop where we start delayed marking with the first string we wanna mark.
Comment on attachment 534905 [details] [diff] [review]
patch

> static inline void
> PushMarkStack(GCMarker *gcmarker, JSString *str)
> {
>     JS_ASSERT_IF(gcmarker->context->runtime->gcCurrentCompartment,
>                  str->compartment() == gcmarker->context->runtime->gcCurrentCompartment
>                  || str->compartment() == gcmarker->context->runtime->atomsCompartment);
> 
>-    str->mark(gcmarker);
>+    if (str->isLinear()) {
>+        str->mark(gcmarker);
>+    } else {
>+        if (!str->markIfUnmarked()) {
>+            JSRope *rope = &str->asRope();
>+            gcmarker->pushString(rope->rightChild());
>+            JSString *leftStr = rope->leftChild();
>+            while (leftStr->isRope()) {
>+                rope = &leftStr->asRope();
>+                gcmarker->pushString(rope->rightChild());
>+                leftStr = rope->leftChild();
>+            }

Shouldn't this include some markIfUnmarked() calls for leftStr? Also this can be written in a simpler form like:

while (!str->isLinear()) {
    if (!str->markIfUnmarked())
        return;
    JSRope *rope = &str->asRope();
    gcmarker->pushString(rope->rightChild());
    str = rope->leftChild();
}
str->mark(trx);
(In reply to comment #13)
> Shouldn't this include some markIfUnmarked() calls for leftStr? Also this
> can be written in a simpler form like:
> 
> while (!str->isLinear()) {
>     if (!str->markIfUnmarked())
>         return;
>     JSRope *rope = &str->asRope();
>     gcmarker->pushString(rope->rightChild());
>     str = rope->leftChild();
> }
> str->mark(trx);


This is wrong as gcmarker->pushSomething should be called only for already marked GC thing. So that code becomes:

while (!str->isLinear()) {
    if (!str->markIfUnmarked())
        return;
    JSRope *rope = &str->asRope();
    JSString *right = rope->rightChild();
    if (right->markIfUnmarked()) {
        if (right->isLinear())
            right->mark(gcmarker);
        else
            gcmarker->pushString(right);
    }
    str = rope->leftChild();
}
str->mark(gcmarker);
(In reply to comment #14)
> (In reply to comment #13)
> while (!str->isLinear()) {
>     if (!str->markIfUnmarked())
>         return;
>     JSRope *rope = &str->asRope();
>     JSString *right = rope->rightChild();
>     if (right->markIfUnmarked()) {
>         if (right->isLinear())
>             right->mark(gcmarker);
>         else
>             gcmarker->pushString(right);
>     }
>     str = rope->leftChild();
> }
> str->mark(gcmarker);

Thx for looking at it.
We are getting closer but still not there. "right" might be a static string so we need another isStatic check. I will change it.
Attached patch patch (obsolete) — Splinter Review
More complicated than I expected...
So the problem is that we have to be very careful when we can mark the string and we have to consider that the Rope is already marked once it comes from the ropeStack. I created a ScanString function to handle all strings from the ropestack.
Attachment #534905 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Now with correct isMarkStackEmpty handling.
Attachment #534988 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Finally, try-server seems to like it.
Attachment #535065 - Attachment is obsolete: true
Attachment #535119 - Flags: review?(igor)
Comment on attachment 535119 [details] [diff] [review]
patch

Review of attachment 535119 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with this fixed.

::: js/src/jscntxt.h
@@ +426,5 @@
>      JSObject           *gcWeakMapList;
>  
>      /* Pre-allocated space for the GC mark stacks. Pointer type ensures alignment. */
>      void                *gcMarkStackObjs[js::OBJECT_MARK_STACK_SIZE / sizeof(void *)];
> +    void                *gcMarkStackRopes[js::ROPES_MARK_STACK_SIZE / sizeof(void *)];

We should change the constants into NAME_MARK_STACK_LENGTH to avoid multiplication by the size at the definition point just to divide by the size here.

::: js/src/jsgc.h
@@ +1236,5 @@
>      void markDelayedChildren();
>  
>      bool isMarkStackEmpty() {
> +        return objStack.isEmpty() && ropeStack.isEmpty() &&
> +               xmlStack.isEmpty() && largeStack.isEmpty();

Nit: use one isEmpty() per line if all do not fit one line.

@@ +1247,5 @@
>              delayMarkingChildren(obj);
>      }
>  
> +    void pushString(JSString *str) {
> +        JS_ASSERT(str->isRope());

Rename this into pushRope, take JSRope as a parameter and remove the assert.

::: js/src/jsgcmark.cpp
@@ +542,5 @@
> +       
> +       if (rightChild->isLinear()) {
> +           rightChild->mark(gcmarker);
> +       } else {
> +           if (rightChild->isRope() && rightChild->markIfUnmarked())

The current code in JSString::mark assumes that a non-linear string means rope. So either the code there has a bug (and the comments in jsstr.h are wrong) or we can drop the rightChild->isRope() check here. In the latter case we should replace !str->isLinear() with str->isRope() in the loop condition.

@@ +551,5 @@
>      str->mark(gcmarker);
>  }
>  
> +static inline void
> +ScanString(GCMarker *gcmarker, JSString *str)

Replace that with ScanRope and take JSRope * as a parameter.

@@ +567,5 @@
> +        if (rightChild->isLinear()) {
> +            rightChild->mark(gcmarker);
> +        } else {
> +            if (rightChild->isRope() && rightChild->markIfUnmarked())
> +                gcmarker->pushString(rightChild);

The same comment as in PushString. Also couldn't we just replace the PushString body with:

if (str->isLinear()) {
    str->asLinear().mark(gcmarker);
} else {
    JS_ASSERT(str->isRope());
    if (str->markIfUnmarked())
        ScanRope(gcmarker, str->asRope());
}

? If not, comment why this is not done.

::: js/src/jsstr.cpp
@@ +121,5 @@
>  
>  void
>  JSString::mark(JSTracer *trc)
>  {
> +    JS_ASSERT(isLinear());

Remove JSString::mark alltogether and just call asLinear().mark() instead.

@@ -193,5 @@
> -        parent->d.s.u2.right = str;
> -        str = parent;
> -        parent = nextParent;
> -        goto finish_node;
> -    }

It is interesting that this is the second time Deutsch-Schorr-Waite is removed in SM. The first time was when the delayed marking replaced it for object marking. The reason was that it was not possible to generalize to non-standard objects. Now the patch does that for ropes to make marking thread-safe. What makes DSW so bad I wonder?..
Attachment #535119 - Flags: review?(igor) → review+
(In reply to comment #19)
> Comment on attachment 535119 [details] [diff] [review] [review]
> patch
> 
> Review of attachment 535119 [details] [diff] [review] [review]:
> -----------------------------------------------------------------
> 
> r+ with this fixed.
> 
> ::: js/src/jscntxt.h
> @@ +426,5 @@
> >      JSObject           *gcWeakMapList;
> >  
> >      /* Pre-allocated space for the GC mark stacks. Pointer type ensures alignment. */
> >      void                *gcMarkStackObjs[js::OBJECT_MARK_STACK_SIZE / sizeof(void *)];
> > +    void                *gcMarkStackRopes[js::ROPES_MARK_STACK_SIZE / sizeof(void *)];
> 
> We should change the constants into NAME_MARK_STACK_LENGTH to avoid
> multiplication by the size at the definition point just to divide by the
> size here.

I already changed this in my parallel-marking patch. 

> 
> ::: js/src/jsgcmark.cpp
> @@ +542,5 @@
> > +       
> > +       if (rightChild->isLinear()) {
> > +           rightChild->mark(gcmarker);
> > +       } else {
> > +           if (rightChild->isRope() && rightChild->markIfUnmarked())
> 
> The current code in JSString::mark assumes that a non-linear string means
> rope. So either the code there has a bug (and the comments in jsstr.h are
> wrong) or we can drop the rightChild->isRope() check here. In the latter
> case we should replace !str->isLinear() with str->isRope() in the loop
> condition.

I got confused with static atoms and there we can't call markIfUnmarked() on them. But it works out now with the new control flow.

> 
> The same comment as in PushString. Also couldn't we just replace the
> PushString body with:
> 
> if (str->isLinear()) {
>     str->asLinear().mark(gcmarker);
> } else {
>     JS_ASSERT(str->isRope());
>     if (str->markIfUnmarked())
>         ScanRope(gcmarker, str->asRope());
> }
> 
> ? If not, comment why this is not done.

Yeah that's better! Great!

> 
> It is interesting that this is the second time Deutsch-Schorr-Waite is
> removed in SM. The first time was when the delayed marking replaced it for
> object marking. The reason was that it was not possible to generalize to
> non-standard objects. Now the patch does that for ropes to make marking
> thread-safe. What makes DSW so bad I wonder?..

Most of the papers that report results for the DSW are fairly old. They measure the GC time in minutes... I guess it's time for academia to revisit old algorithms and re-measure the impact with modern architectures.

BTW: This patch is a 50% marking time speedup for ropes! We reduce the marking time from 5.5ms to around 2.5ms for each GC in testBug614653.js
Attached patch patch (obsolete) — Splinter Review
Attachment #535119 - Attachment is obsolete: true
Attached patch patchSplinter Review
fix typo
Attachment #535222 - Attachment is obsolete: true
Sweet!  I wish I had known that there was all this existing delayed-marking machinery when I dag-ified ropes :)
Comment on attachment 535237 [details] [diff] [review]
patch

re-applying Igors r+
Attachment #535237 - Flags: review+
http://hg.mozilla.org/tracemonkey/rev/e0ca895fde7e
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/e0ca895fde7e
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.