Closed Bug 707051 Opened 13 years ago Closed 13 years ago

Failure to mark some shapes during delayed marking

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla11
Tracking Status
firefox10 --- fixed
firefox11 --- fixed
status1.9.2 --- unaffected

People

(Reporter: billm, Assigned: billm)

Details

(Whiteboard: [sg:critical][qa-])

Attachments

(1 file, 1 obsolete file)

Attached patch fix (obsolete) — Splinter Review
When we do delayed marking, we call JS_TraceChildren on each marked object in the arena. It calls MarkChildren. When given a shape, MarkChildren fails to set the marked bit for its parent shape--it just walks up the shape tree. I stumbled upon this while doing incremental GC, but it suggests that we need to do better testing of delayed marking--probably via bug 673551.

In the patch, I had to change the return type of previous() so that the write barrier checks don't complain.
Attachment #578462 - Flags: review?(igor)
Comment on attachment 578462 [details] [diff] [review]
fix

Review of attachment 578462 [details] [diff] [review]:
-----------------------------------------------------------------

Comment in ScanShape before shape = shape->previous(); why the loop in that function is essential to stop unbounded recursion.
Attachment #578462 - Flags: review?(igor) → review+
Could this have been the cause of some of our GC topcrashes?
Comment on attachment 578462 [details] [diff] [review]
fix

I think I broke this in Firefox 5. So I don't think it's causing the topcrashes that we examined a long time ago, but it may be causing some current topcrashes.

It would be nice to take this patch, since it seems pretty exploitable. I'm a bit worried about regression risks, though. This is the same path that the cycle collector takes. So we were probably missing a bunch of shapes from the CC graph, and now we won't be. Andrew, do you know if that's likely to be a performance problem? We often have a lot of shapes.
Attachment #578462 - Flags: approval-mozilla-beta?
Attachment #578462 - Flags: approval-mozilla-aurora?
Assignee: general → wmccloskey
Shapes are not added to the CC graph.  See AddToCCKind.  Instead of adding the graph, the CC traces through the children to find non-shape nodes.  Potentially this could cause the CC to blow the stack, but if this only came into play in 5 maybe it is okay.  It could also cause slowdown.

I don't know much about shapes.  Do they point from parent to child as well as from child to parent?  Are there likely to be a lot of shapes that are reachable from gray nodes but not black ones?
Shapes point to parent (previous) and child or children, but the latter linkage should not matter here.

/be
Well, my main worry is that the combination of two creates doubly-linked trees, which tend to embiggen the CC graph, in cases like the DOM (with the addition of strong parent pointer) and the JS global.

If you want to measure the effect on the graph, bill, you can either count how many shapes are being marked gray, or, more directly, go into NoteJSRoot in nsXPConnect.cpp and add something like

if (kind == JSTRACE_SHAPE)
  myNewGlobalCounter++;

in the else if (kind != JSTRACE_STRING) case.  Initialize myNewGlobalCounter in BeginCycleCollection and print it out in EndCycleCollection in that file.
Shapes can participate in cycles only if they are for accessor (getter/setter) properties or methods (function-valued properties memoized/shared via shapes). These are relatively uncommon across all object populations, more common among prototype objects. Can you take advantage of this sparseness? You shouldn't have to reify peer CC data structures for all shapes, just the ones that might be suspects.

/be
No shapes are reified, they are just recursively scanned for non-shapes.  Only non-shapes are added to the graph.  But yes, it sounds like we could avoid even doing that a lot of the time.  The cycle collector does not take advantage of any preknowledge about the shape of the JS heap, except that strings (even compound ones) are never cyclic.

Ultimately, I'd like to push the CC's traversal of the GC heap inside the GC heap, if only to avoid the virtual call in XPCOM->XPConnect->JS->XPConnect->XPCOM dance we currently have to do to take single step in the CC (bug 653013).  Once that is done we could be smarter about stuff.

Peterv has some thoughts on shapes in the CC graph in bug 651445, mainly concerns about not adding them.
https://hg.mozilla.org/integration/mozilla-inbound/rev/e0cb9fb30750

It doesn't sound like this will screw anything up. As Brendan said, we don't trace the children of a shape. And since shapes aren't actually put in the CC graph, we only have to worry about shapes further up the parent chain that point to objects via accessors. It seems pretty unlikely that this would have a noticeable performance impact. So I think the regression risk is low. And, as I said, the bug is pretty serious.
Target Milestone: --- → mozilla11
Comment on attachment 578462 [details] [diff] [review]
fix

[Triage Comment]
Please re-nominate once this has landed on m-c.
Attachment #578462 - Flags: approval-mozilla-beta?
Attachment #578462 - Flags: approval-mozilla-beta-
Attachment #578462 - Flags: approval-mozilla-aurora?
Attachment #578462 - Flags: approval-mozilla-aurora-
This causes a stack overflow in the cycle collector. Here is what NoteJSChild in nsXPConnect.cpp looks like:

static void
NoteJSChild(JSTracer *trc, void *thing, JSGCTraceKind kind)
{
    if (AddToCCKind(kind)) {
        ...
    } else if (kind != JSTRACE_STRING) {
        JS_TraceChildren(trc, thing, kind);
    }
}

Since AddToCCKind(JSTRACE_SHAPE) is false, we call JS_TraceChildren. This can blow up the stack when traversing long parent chains.

It looks like the previous version of the code was doing exactly what the cycle collector needed: it was skipping over the parent shape itself, and just tracing through the shape's children. However, this is not what we need for delayed marking (this is the aspect I forgot in our conversation today, Andrew; it's still a serious bug).

I'm thinking of adding a flag to JSTracer to say which behavior is desired. This is pretty ugly, but there aren't a lot of good alternatives. We can detect a stack overflow in NoteJSChild, but what do we do then? Aborting the CC seems like a bad idea, since it would cause us to leak. Andrew and I talked about adding shapes to the CC graph selectively, when we run out of stack space for them. This seems really dangerous, though. There may be other parts of the CC that break if some shapes are in the graph and not others.

Peterv, Igor, Andreas, do you guys have any good ideas? It would be nice if whatever we come up with is safe enough to land in aurora and beta. That's why I like the flag idea.
Target Milestone: mozilla11 → ---
I think the safest thing to do in the short term is to add a flag (or repurpose the one I already added with some kind of enum) that leaves the behavior alone except for the delayed marker.  There's a delicate symbiosis between the current behavior of JS_TraceChildren and the CC that seems to work.  In the long term, of course, having even more ways to trace through JS seems very future-unproof.
We need to eliminate the tail recursion over the shape.parent. A simplest solution for that would be to provide a function like 

Shape *TraceShapeChildren(JSTracer *trc, Shape *shape)

that would not call the tracer on the shape but rather return it so NoteJSChild can have:

    } else if (kind == JSTRACE_SHAPE) {
        while (!!(thing = JS_TraceChildren(trc, (Shape *) thing));
    }
Attached patch fixSplinter Review
I like Igor's idea. This patch doesn't die on tinderbox the way the previous one did.
Attachment #578462 - Attachment is obsolete: true
Attachment #579485 - Flags: review?(igor)
Comment on attachment 579485 [details] [diff] [review]
fix

Review of attachment 579485 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgcmark.cpp
@@ +983,5 @@
>  MarkChildren(JSTracer *trc, const Shape *shape)
>  {
> +    MarkShapeChildrenAcyclic(trc, shape);
> +    if (shape->previous())
> +        MarkShape(trc, shape->previous(), "parent");

This should use MarkShapeChildrenAcyclic result instead of accessing shape->previous() one more time.
Attachment #579485 - Flags: review?(igor) → review+
(In reply to Igor Bukanov from comment #16)
> This should use MarkShapeChildrenAcyclic result instead of accessing
> shape->previous() one more time.

The problem is that MarkShape wants a MarkablePointer and MarkShapeChildrenAcyclic just returns a Shape. I could use MarkShapeUnbarriered, but I'm trying to limit the number of those.
(In reply to Bill McCloskey (:billm) from comment #17)
> The problem is that MarkShape wants a MarkablePointer and
> MarkShapeChildrenAcyclic just returns a Shape. I could use
> MarkShapeUnbarriered, but I'm trying to limit the number of those.

Then comment why the result of MarkShapeChildrenAcyclic is ignored.

Also, we need comments in MarkShapeChildrenAcyclic that tracing via BaseShape link shape->base()->baseUnowned() stops right at that point so MarkShapeChildrenAcyclic is truly O(1) in space.
https://hg.mozilla.org/integration/mozilla-inbound/rev/d91ce1c668e7

I did my best to write a good comment.
Target Milestone: --- → mozilla11
(In reply to Bill McCloskey (:billm) from comment #19)
> I did my best to write a good comment.

The comment is very clear, thanks!
Via ed morley:

https://hg.mozilla.org/mozilla-central/rev/d91ce1c668e7

marking fixed with target milestone of mozilla11
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment on attachment 579485 [details] [diff] [review]
fix

This seems to have stuck. This patch is more conservative than the last one in that it shouldn't alter the cycle collector behavior at all. I think the regression risk should be low.
Attachment #579485 - Flags: approval-mozilla-beta?
Attachment #579485 - Flags: approval-mozilla-aurora?
Comment on attachment 579485 [details] [diff] [review]
fix

[Triage Comment]
Having discussed with curtisk and dveditz, it sounds like this issue would not cause us to chemspill FF9 or otherwise mitigate (the bar we're setting for beta at this point), and the code that's already landed on m-c does not point directly at an exploit.

Let's take this on aurora only.
Attachment #579485 - Flags: approval-mozilla-beta?
Attachment #579485 - Flags: approval-mozilla-beta-
Attachment #579485 - Flags: approval-mozilla-aurora?
Attachment #579485 - Flags: approval-mozilla-aurora+
Is this something QA can verify?
Whiteboard: [sg:critical] → [sg:critical][qa?]
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #25)
> Is this something QA can verify?

No. We might be able to write a test if we could artificially induce delayed marking. That work in going on in bug 673551.
Whiteboard: [sg:critical][qa?] → [sg:critical][qa-]
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: