Failure to mark some shapes during delayed marking

RESOLVED FIXED in Firefox 10

Status

()

Core
JavaScript Engine
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: billm, Assigned: billm)

Tracking

unspecified
mozilla11
Points:
---

Firefox Tracking Flags

(firefox10 fixed, firefox11 fixed, status1.9.2 unaffected)

Details

(Whiteboard: [sg:critical][qa-])

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

6 years ago
Created attachment 578462 [details] [diff] [review]
fix

When we do delayed marking, we call JS_TraceChildren on each marked object in the arena. It calls MarkChildren. When given a shape, MarkChildren fails to set the marked bit for its parent shape--it just walks up the shape tree. I stumbled upon this while doing incremental GC, but it suggests that we need to do better testing of delayed marking--probably via bug 673551.

In the patch, I had to change the return type of previous() so that the write barrier checks don't complain.
Attachment #578462 - Flags: review?(igor)

Comment 1

6 years ago
Comment on attachment 578462 [details] [diff] [review]
fix

Review of attachment 578462 [details] [diff] [review]:
-----------------------------------------------------------------

Comment in ScanShape before shape = shape->previous(); why the loop in that function is essential to stop unbounded recursion.
Attachment #578462 - Flags: review?(igor) → review+
Could this have been the cause of some of our GC topcrashes?
(Assignee)

Comment 3

6 years ago
Comment on attachment 578462 [details] [diff] [review]
fix

I think I broke this in Firefox 5. So I don't think it's causing the topcrashes that we examined a long time ago, but it may be causing some current topcrashes.

It would be nice to take this patch, since it seems pretty exploitable. I'm a bit worried about regression risks, though. This is the same path that the cycle collector takes. So we were probably missing a bunch of shapes from the CC graph, and now we won't be. Andrew, do you know if that's likely to be a performance problem? We often have a lot of shapes.
Attachment #578462 - Flags: approval-mozilla-beta?
Attachment #578462 - Flags: approval-mozilla-aurora?
Assignee: general → wmccloskey
Shapes are not added to the CC graph.  See AddToCCKind.  Instead of adding the graph, the CC traces through the children to find non-shape nodes.  Potentially this could cause the CC to blow the stack, but if this only came into play in 5 maybe it is okay.  It could also cause slowdown.

I don't know much about shapes.  Do they point from parent to child as well as from child to parent?  Are there likely to be a lot of shapes that are reachable from gray nodes but not black ones?
Shapes point to parent (previous) and child or children, but the latter linkage should not matter here.

/be
Well, my main worry is that the combination of two creates doubly-linked trees, which tend to embiggen the CC graph, in cases like the DOM (with the addition of strong parent pointer) and the JS global.

If you want to measure the effect on the graph, bill, you can either count how many shapes are being marked gray, or, more directly, go into NoteJSRoot in nsXPConnect.cpp and add something like

if (kind == JSTRACE_SHAPE)
  myNewGlobalCounter++;

in the else if (kind != JSTRACE_STRING) case.  Initialize myNewGlobalCounter in BeginCycleCollection and print it out in EndCycleCollection in that file.
Shapes can participate in cycles only if they are for accessor (getter/setter) properties or methods (function-valued properties memoized/shared via shapes). These are relatively uncommon across all object populations, more common among prototype objects. Can you take advantage of this sparseness? You shouldn't have to reify peer CC data structures for all shapes, just the ones that might be suspects.

/be
No shapes are reified, they are just recursively scanned for non-shapes.  Only non-shapes are added to the graph.  But yes, it sounds like we could avoid even doing that a lot of the time.  The cycle collector does not take advantage of any preknowledge about the shape of the JS heap, except that strings (even compound ones) are never cyclic.

Ultimately, I'd like to push the CC's traversal of the GC heap inside the GC heap, if only to avoid the virtual call in XPCOM->XPConnect->JS->XPConnect->XPCOM dance we currently have to do to take single step in the CC (bug 653013).  Once that is done we could be smarter about stuff.

Peterv has some thoughts on shapes in the CC graph in bug 651445, mainly concerns about not adding them.
(Assignee)

Comment 9

6 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/e0cb9fb30750

It doesn't sound like this will screw anything up. As Brendan said, we don't trace the children of a shape. And since shapes aren't actually put in the CC graph, we only have to worry about shapes further up the parent chain that point to objects via accessors. It seems pretty unlikely that this would have a noticeable performance impact. So I think the regression risk is low. And, as I said, the bug is pretty serious.
Target Milestone: --- → mozilla11
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/527a79f535d8 - I haven't seen a single useful stack yet, but Mac and Windows, reftests, jsreftests, and mochitest-2 were crashing. Random selection of logs:

https://tbpl.mozilla.org/php/getParsedLog.php?id=7744624&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7744595&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7744600&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7744444&tree=Mozilla-Inbound
Comment on attachment 578462 [details] [diff] [review]
fix

[Triage Comment]
Please re-nominate once this has landed on m-c.
Attachment #578462 - Flags: approval-mozilla-beta?
Attachment #578462 - Flags: approval-mozilla-beta-
Attachment #578462 - Flags: approval-mozilla-aurora?
Attachment #578462 - Flags: approval-mozilla-aurora-
(Assignee)

Comment 12

6 years ago
This causes a stack overflow in the cycle collector. Here is what NoteJSChild in nsXPConnect.cpp looks like:

static void
NoteJSChild(JSTracer *trc, void *thing, JSGCTraceKind kind)
{
    if (AddToCCKind(kind)) {
        ...
    } else if (kind != JSTRACE_STRING) {
        JS_TraceChildren(trc, thing, kind);
    }
}

Since AddToCCKind(JSTRACE_SHAPE) is false, we call JS_TraceChildren. This can blow up the stack when traversing long parent chains.

It looks like the previous version of the code was doing exactly what the cycle collector needed: it was skipping over the parent shape itself, and just tracing through the shape's children. However, this is not what we need for delayed marking (this is the aspect I forgot in our conversation today, Andrew; it's still a serious bug).

I'm thinking of adding a flag to JSTracer to say which behavior is desired. This is pretty ugly, but there aren't a lot of good alternatives. We can detect a stack overflow in NoteJSChild, but what do we do then? Aborting the CC seems like a bad idea, since it would cause us to leak. Andrew and I talked about adding shapes to the CC graph selectively, when we run out of stack space for them. This seems really dangerous, though. There may be other parts of the CC that break if some shapes are in the graph and not others.

Peterv, Igor, Andreas, do you guys have any good ideas? It would be nice if whatever we come up with is safe enough to land in aurora and beta. That's why I like the flag idea.
Target Milestone: mozilla11 → ---
I think the safest thing to do in the short term is to add a flag (or repurpose the one I already added with some kind of enum) that leaves the behavior alone except for the delayed marker.  There's a delicate symbiosis between the current behavior of JS_TraceChildren and the CC that seems to work.  In the long term, of course, having even more ways to trace through JS seems very future-unproof.

Comment 14

6 years ago
We need to eliminate the tail recursion over the shape.parent. A simplest solution for that would be to provide a function like 

Shape *TraceShapeChildren(JSTracer *trc, Shape *shape)

that would not call the tracer on the shape but rather return it so NoteJSChild can have:

    } else if (kind == JSTRACE_SHAPE) {
        while (!!(thing = JS_TraceChildren(trc, (Shape *) thing));
    }
(Assignee)

Comment 15

6 years ago
Created attachment 579485 [details] [diff] [review]
fix

I like Igor's idea. This patch doesn't die on tinderbox the way the previous one did.
Attachment #578462 - Attachment is obsolete: true
Attachment #579485 - Flags: review?(igor)

Comment 16

6 years ago
Comment on attachment 579485 [details] [diff] [review]
fix

Review of attachment 579485 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgcmark.cpp
@@ +983,5 @@
>  MarkChildren(JSTracer *trc, const Shape *shape)
>  {
> +    MarkShapeChildrenAcyclic(trc, shape);
> +    if (shape->previous())
> +        MarkShape(trc, shape->previous(), "parent");

This should use MarkShapeChildrenAcyclic result instead of accessing shape->previous() one more time.
Attachment #579485 - Flags: review?(igor) → review+
(Assignee)

Comment 17

6 years ago
(In reply to Igor Bukanov from comment #16)
> This should use MarkShapeChildrenAcyclic result instead of accessing
> shape->previous() one more time.

The problem is that MarkShape wants a MarkablePointer and MarkShapeChildrenAcyclic just returns a Shape. I could use MarkShapeUnbarriered, but I'm trying to limit the number of those.

Comment 18

6 years ago
(In reply to Bill McCloskey (:billm) from comment #17)
> The problem is that MarkShape wants a MarkablePointer and
> MarkShapeChildrenAcyclic just returns a Shape. I could use
> MarkShapeUnbarriered, but I'm trying to limit the number of those.

Then comment why the result of MarkShapeChildrenAcyclic is ignored.

Also, we need comments in MarkShapeChildrenAcyclic that tracing via BaseShape link shape->base()->baseUnowned() stops right at that point so MarkShapeChildrenAcyclic is truly O(1) in space.
(Assignee)

Comment 19

6 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/d91ce1c668e7

I did my best to write a good comment.
Target Milestone: --- → mozilla11

Comment 20

6 years ago
(In reply to Bill McCloskey (:billm) from comment #19)
> I did my best to write a good comment.

The comment is very clear, thanks!

Comment 21

6 years ago
Via ed morley:

https://hg.mozilla.org/mozilla-central/rev/d91ce1c668e7

marking fixed with target milestone of mozilla11
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED

Updated

6 years ago
status-firefox11: --- → fixed
(Assignee)

Comment 22

6 years ago
Comment on attachment 579485 [details] [diff] [review]
fix

This seems to have stuck. This patch is more conservative than the last one in that it shouldn't alter the cycle collector behavior at all. I think the regression risk should be low.
Attachment #579485 - Flags: approval-mozilla-beta?
Attachment #579485 - Flags: approval-mozilla-aurora?
Comment on attachment 579485 [details] [diff] [review]
fix

[Triage Comment]
Having discussed with curtisk and dveditz, it sounds like this issue would not cause us to chemspill FF9 or otherwise mitigate (the bar we're setting for beta at this point), and the code that's already landed on m-c does not point directly at an exploit.

Let's take this on aurora only.
Attachment #579485 - Flags: approval-mozilla-beta?
Attachment #579485 - Flags: approval-mozilla-beta-
Attachment #579485 - Flags: approval-mozilla-aurora?
Attachment #579485 - Flags: approval-mozilla-aurora+
(Assignee)

Comment 24

6 years ago
Seems reasonable.

https://hg.mozilla.org/releases/mozilla-aurora/rev/c99bfdb03cfd
(Assignee)

Updated

6 years ago
status-firefox10: --- → fixed
Is this something QA can verify?
Whiteboard: [sg:critical] → [sg:critical][qa?]
(Assignee)

Comment 26

6 years ago
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #25)
> Is this something QA can verify?

No. We might be able to write a test if we could artificially induce delayed marking. That work in going on in bug 673551.
Whiteboard: [sg:critical][qa?] → [sg:critical][qa-]
status1.9.2: --- → unaffected
Group: core-security
You need to log in before you can comment on or make changes to this bug.