Closed Bug 880816 Opened 7 years ago Closed 7 years ago

GenerationalGC: Crash [@ GetGCThingRuntime]

Categories

(Core :: JavaScript Engine, defect, major)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla24

People

(Reporter: decoder, Assigned: terrence)

References

(Blocks 1 open bug)

Details

(Keywords: crash, testcase, Whiteboard: [jsbugmon:ignore])

Crash Data

Attachments

(3 files, 2 obsolete files)

The following testcase crashes on mozilla-central built with --enable-exact-rooting --enable-gcgenerational, revision 9ca690835a5e (run with --ion-eager):


var lfcode = new Array();
lfcode.push("const baz = 'bar';");
lfcode.push("2");
lfcode.push("{ function foo() {} }");
lfcode.push("evaluate('\
var INVALIDATE_MODES = INVALIDATE_MODE_STRINGS.map(s => ({mode: s}));\
function range(n, m) {}\
function seq_scan(array, f) {}\
function assertStructuralEq(e1, e2) {}\
for (var i = 0, l = a.length; i < l; i++) {}\
');");
lfcode.push("for (var x of Set(Object.getOwnPropertyNames(this))) {}");
var lfRunTypeId = -1;
while (true) {
  var file = lfcode.shift(); if (file == undefined) { break; }
  loadFile(file)
}
function loadFile(lfVarx) {
    try {
        if (lfVarx.substr(-3) == ".js") {}
        if (!isNaN(lfVarx)) {
            lfRunTypeId = parseInt(lfVarx);
        } else {
            switch (lfRunTypeId) {
                case 2: new Function(lfVarx)(); break;
                default: evaluate(lfVarx); break;
            }
       }
    } catch (lfVare) {}
}
Awesome find! This appears to be the same error that is currently afflicting tbpl. Even if not, it is a very real error that needed fixing.

It turns out that we were never purposefully marking the ImmGCPointers embedded in IonCode objects. I think this was so rarely a problem because IonCode is on the stack if it is running, generally in a GC suppression when not, and rarely bakes in pointers to nursery things anyway.

https://tbpl.mozilla.org/?tree=Try&rev=1f6e78dea60a
Attached patch v0 (obsolete) — Splinter Review
Great success! \o/

Lets get this in so we can unhide ggc.
Assignee: general → terrence
Status: NEW → ASSIGNED
Attachment #760030 - Flags: review?(sphink)
Comment on attachment 760030 [details] [diff] [review]
v0

Review of attachment 760030 [details] [diff] [review]:
-----------------------------------------------------------------

I have no clue what that test case is doing, but I won't worry about it.

So method_ was originally converted from HeapPtr -> EncapsulatedPtr in bug 869222, with this comment:

/*
 * IonScripts normally live as long as their owner JSScript; however, they can
 * occasionally get destroyed outside the context of a GC by FinishInvalidationOf.
 * Because of this case, we cannot use the normal store buffer to guard them.
 * Instead we use the generic buffer to mark the owner script, which will mark the
 * IonScript's fields, if it is still alive.
 */

Is this no longer the case?
I think I'll wait for this to land before filing more bugs because it seems that this might be causing some of the other failures I'm seeing right now.
(In reply to Steve Fink [:sfink] from comment #4)
> I have no clue what that test case is doing, but I won't worry about it.

It's a fuzz test; it didn't even occur to me to try to figure out what it's doing.
 
> So method_ was originally converted from HeapPtr -> EncapsulatedPtr in bug
> 869222, with this comment:

Thanks for tracking that down! It was pretty late here when I finally figured out what was going on so I didn't take the time to do the investigation.

> /*
>  * IonScripts normally live as long as their owner JSScript; however, they
> can
>  * occasionally get destroyed outside the context of a GC by
> FinishInvalidationOf.
>  * Because of this case, we cannot use the normal store buffer to guard them.
>  * Instead we use the generic buffer to mark the owner script, which will
> mark the
>  * IonScript's fields, if it is still alive.
>  */
> 
> Is this no longer the case?

Hmm, I expect that is still the case. I don't remember if I didn't switch it to RelocatablePtr because it is memcpy'd or because I thought it was marked off the runtime. I'll take a look at the bug and try to figure it out.
The JSScript, Ion/BaselineScript and IonCode have the following organization, edges labeled for clarity in the following discussion:

{Heap}-(1)->JSScript-(2)->Ion/BaselineScript-(3)->IonCode-(4)->{Heap}

During normal GC's this is all just traced and everything works fine, but for generational GC's we need barriers to keep all these components live. There are two specific complexities here.

First, Ion/BaselineScript is a C++ object and it can be freed outside the GC's control. Thus we can't make (3), the pointer to IonCode, a standard HeapPtr. Bug 869222 attempted to fix this, but the patch is buggy and marks (1), not (3) as intended.

Secondly, the (4) edges are embedded in JIT code and have extra liveness constraints that I do not yet fully understand. For simplicity, these should probably be marked by tracing the IonCode, rather than individual barriers.

Thus, we have (1) marked by normal heap barriers. (2) is not barriered because it is a C++ thing. (3) is normally marked by JSScript::markChildren -> TraceIonScripts -> Ion/BaselineScript::Trace -> Ion/BaselineScript::trace. These are currently not called for GGC. (4) are the edges this patch is tracing, but I think we should also fix (3) here. This just needs to call JSScript::markChildren, rather than MarkScript.
Attached patch v1 (obsolete) — Splinter Review
I no longer think our prior plan of just ensuring that no nursery things leak into code during compilation is tenable. This case illustrates one way (baseline's Lambda IC) and Bug 869222 (the IonScript const-pool) represents another. Moreover, actually tracking these down when they do happen has been extremely troublesome. On the plus side, the attached patch should make this case totally fine: it ensures all of the relevant pointers are marked during the next minor GC after compilation.
Attachment #760030 - Attachment is obsolete: true
Attachment #760030 - Flags: review?(sphink)
Attachment #760581 - Flags: review?(wmccloskey)
Blocks: 869222
Duplicate of this bug: 880840
(In reply to Terrence Cole [:terrence] from comment #8)
> Created attachment 760581 [details] [diff] [review]
> v1
> 
> I no longer think our prior plan of just ensuring that no nursery things
> leak into code during compilation is tenable. This case illustrates one way
> (baseline's Lambda IC) and Bug 869222 (the IonScript const-pool) represents
> another. Moreover, actually tracking these down when they do happen has been
> extremely troublesome. On the plus side, the attached patch should make this
> case totally fine: it ensures all of the relevant pointers are marked during
> the next minor GC after compilation.

Per discussion on IRC, I think we need to ensure there are no nursery pointers in jitcode.  While the IonCode can in principle mark/relocate GC things embedded in it, the real problem is with off thread compilation.  Nursery things embedded in the jitcode means that the off thread compiler can use structures with nursery things embedded in them, and the main thread can do nursery collections without canceling or interrupting off thread compilation.  There are assertions to catch this in Ion, but not I think for Baseline.

It sounds like the problem here is that we need the objects() in a script to be in the tenured heap, but deep cloning a script can place those objects in the nursery.
Comment on attachment 760581 [details] [diff] [review]
v1

Review of attachment 760581 [details] [diff] [review]:
-----------------------------------------------------------------

Terrence and Brian talked about this on IRC and it sounds like we want to tenure this stuff rather than barriering it.
Attachment #760581 - Flags: review?(wmccloskey)
Bill and I discussed this over lunch.

Summary:

We all pointers baked into JIT code to be tenured: there is no advantage to having these pointers in the nursery since IonCode is always tenured. In most cases we know the allocation site of these statically so that is easy to force tenuring. However, there are some rare cases where we cannot know this statically -- we need to decide what to do in these cases. This bug should be about adding assertions of the above and fixing them with whatever strategy we decide on.

I see two options: (1) trigger a minor GC when they occur or (2) allow nursery things into jit-code and clean them up after the fact.

(1) is problematic because the sites in question are under a SuppressGC: adding a GC would require exactly rooting many parts of the compiler. Alternatively we would need to make a new pre-compilation pass to guess if MinorGC is needed before entering the SuppressGC. Triggering MinorGC unconditionally for all compilations (including IC's) is prohibitively costly.

(2) is problematic because the SuppressGC is leaky for MinorGC, which could lead to objects being relocated on the main thread while the background thread is baking them into the code. Additionally, the tracing paths for IonCode are currently only used for liveness and are not tested well with relocation. Further, tracing all IonCode, even if only in one GC, would be quite expensive.

There is no trivial solution here, but after discussing with Bill, I still think that (2) is the easier option. I've got a patch which works using this approach -- I'll split it up and upload shortly.
This patch enhances StoreBuffer::WholeObjectEdges to allow taking a Cell. This lets us use the existing buffer for IonCode.
Attachment #761788 - Flags: review?(wmccloskey)
This patch adds several assertions, tenures in several more places, and adds an ImmMaybeNurseryPtr to ion, using it where needed. The ImmMaybeNurseryPtr is an ImmGCPtr that puts the relevant IonCode to the wholeCellBuffer iff ptr is in the Nursery. It turns out we only need these in IC's where, fortunately, the amount of IonCode we need to mark will be extremely minimal.
Attachment #760581 - Attachment is obsolete: true
Attachment #761795 - Flags: review?(bhackett1024)
Comment on attachment 761795 [details] [diff] [review]
Part 2 of 2 - Fix v2

Review of attachment 761795 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, but can you either revert or explain the CloneInterpretedFunction changes?

::: js/src/ion/CompileInfo-inl.h
@@ +23,5 @@
> +
> +    // The function here can flow in from anywhere. If it is nursery allocated, look up the
> +    // orignal, tenured, function so that we do not embed a nursery pointer in JIT code.
> +    if (fun_ && gc::IsInsideNursery(fun_->runtime(), fun_))
> +        fun_ = fun_->nonLazyScript()->function();

I think it would be cleaner to just always canonicalize the function.

::: js/src/ion/shared/Assembler-shared.h
@@ +124,1 @@
>          JS_ASSERT(!IsPoisonedPtr(ptr));

Maybe reverse the order of these asserts, as the first will bug out if the pointer is poisoned.

::: js/src/jsfun.cpp
@@ +394,5 @@
>  template bool
>  js::XDRInterpretedFunction(XDRState<XDR_DECODE> *, HandleObject, HandleScript, MutableHandleObject);
>  
>  JSObject *
> +js::CloneInterpretedFunction(JSContext *cx, HandleObject enclosingScope, HandleFunction srcFun)

What are these changes fixing?  Per IRC discussion we should be able to clone most interpreted functions into the nursery, and since this patch doesn't seem to touch Ion codegen for JSOP_LAMBDA, Ion will still be cloning interpreted functions into the nursery.

I hope that with the change to ::CompileInfo() this stuff is no longer necessary.
Attachment #761795 - Flags: review?(bhackett1024) → review+
(In reply to Brian Hackett (:bhackett) from comment #16)
> Comment on attachment 761795 [details] [diff] [review]
> Part 2 of 2 - Fix v2
> 
> Review of attachment 761795 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good, but can you either revert or explain the
> CloneInterpretedFunction changes?

CloneInterpretedFunction contains:
    clonedScript->setFunction(clone);

I made JSScript::setFunction assert that the function is tenured so that the fun->nonLazyScript()->function() we do in CompileInfo() would always return a tenured function. Maybe I'm missing something important here?

> ::: js/src/ion/shared/Assembler-shared.h
> @@ +124,1 @@
> >          JS_ASSERT(!IsPoisonedPtr(ptr));
> 
> Maybe reverse the order of these asserts, as the first will bug out if the
> pointer is poisoned.

Oh, good catch!

> ::: js/src/jsfun.cpp
> @@ +394,5 @@
> >  template bool
> >  js::XDRInterpretedFunction(XDRState<XDR_DECODE> *, HandleObject, HandleScript, MutableHandleObject);
> >  
> >  JSObject *
> > +js::CloneInterpretedFunction(JSContext *cx, HandleObject enclosingScope, HandleFunction srcFun)
> 
> What are these changes fixing?  Per IRC discussion we should be able to
> clone most interpreted functions into the nursery, and since this patch
> doesn't seem to touch Ion codegen for JSOP_LAMBDA, Ion will still be cloning
> interpreted functions into the nursery.

Let me double-check, but I think as far as our test suite is concerned, this is the only construction site that can flow into CodeGenerator::visitLambda.
 
> I hope that with the change to ::CompileInfo() this stuff is no longer
> necessary.

I was probably too aggressive with this. I'll try removing the assertion in setFunction and the automatic tenuring for CloneInterpretedFunction and see what happens.
Comment on attachment 761788 [details] [diff] [review]
Part 1 of 2: StoreBuffer improvements v0

Review of attachment 761788 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/StoreBuffer.cpp
@@ +61,3 @@
>  {
> +    JS_ASSERT(tenured->isTenured());
> +    AllocKind kind = tenured->tenuredGetAllocKind();

Please use GetGCThingTraceKind here and you can avoid the assertion.
Attachment #761788 - Flags: review?(wmccloskey) → review+
Brian and I discussed this over IRC and decided to rename CloneInterpretedFunction to CloneFunctionAndScript instead of what is in comment 16-17.
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   http://hg.mozilla.org/mozilla-central/rev/3297733a2661
user:        Terrence Cole
date:        Thu Mar 14 10:26:06 2013 -0700
summary:     Bug 706885 - Implement generational GC for the SpiderMonkey interpreter; r=billm

(testing autoBisect on GGC bugs)
https://hg.mozilla.org/mozilla-central/rev/f16ce0a0f333
https://hg.mozilla.org/mozilla-central/rev/83665aba804f
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
Duplicate of this bug: 880805
You need to log in before you can comment on or make changes to this bug.