Shape regeneration does not touch most empty scopes

RESOLVED FIXED

Status

()

defect
P1
critical
RESOLVED FIXED
10 years ago
8 years ago

People

(Reporter: gkw, Assigned: jorendorff)

Tracking

(Blocks 1 bug, {crash, regression, testcase})

Trunk
x86
macOS
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9.2 +
in-testsuite ?

Firefox Tracking Flags

(status1.9.2 beta1-fixed)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 attachment)

a = this.toSource
gc();
uneval((function () {
    return this
})())
__defineGetter__("x", Function)
gc();
((function () {
    for (y = 0; y < 2; ++y) {
        x.prototype
    }
})())

crashes TM branch debug and opt js shell without -j at js_Interpret. Doesn't seem to affect 1.9.1 branch.

We should take this on 1.9.1 branch if bug 503080 does indeed get checked in to 1.9.1 branch per bug 503080 comment 25.

autoBisect shows this is probably related to bug 503080 :

The first bad revision is:
changeset:   30378:3915e2d2c748
user:        Jason Orendorff
date:        Tue Jul 21 16:25:11 2009 -0500
summary:     Bug 503080 - Remove prototype-scope-sharing. r=brendan.
Flags: in-testsuite?
Flags: blocking1.9.2?
Security-sensitive because this involves gc.
Group: core-security
This is a patch that landed yesterday. Except for trunk tip users, nobody is affected.
(In reply to comment #2)
> This is a patch that landed yesterday. Except for trunk tip users, nobody is
> affected.

"trunk tip" meaning tracemonkey tip. Do we keep bugs like this restricted?

/be
If it were mozilla-central I'd wait a few days for nightly users to catch up.  But for tm branch it doesn't need to stay hidden.
Can't reproduce using tracemonkey tip (rev b749a174cd42) on Mac or Linux.
Can't reproduce using rev 3915e2d2c748 (see comment 0) on Mac either.

Gary, do you have any suggestions? Can you attach a stack?
(In reply to comment #6)
> Can't reproduce using rev 3915e2d2c748 (see comment 0) on Mac either.
> 
> Gary, do you have any suggestions? Can you attach a stack?

I can reproduce using an opt build on TM tip in Linux and Mac per IRC.

Mac stack:

Exception Type:  EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000000
Crashed Thread:  0

Thread 0 Crashed:
0   js-opt-tm-intelmac            	0x000503c2 js_Interpret + 25026
1   js-opt-tm-intelmac            	0x0005a027 js_Execute + 407
2   js-opt-tm-intelmac            	0x0000e22c JS_ExecuteScript + 60
3   js-opt-tm-intelmac            	0x00003cea Process(JSContext*, JSObject*, char*, int) + 1338
4   js-opt-tm-intelmac            	0x000074cf main + 879
5   js-opt-tm-intelmac            	0x00001fdb _start + 209
6   js-opt-tm-intelmac            	0x00001f09 start + 41
Thanks! I can reproduce in opt with symbols on Mac as well. Taking.
Assignee: general → jorendorff
Miraculously WORKSFORME in tip.

The first good revision is:
changeset:   30635:ce025660fb6b
user:        Brendan Eich <brendan@mozilla.org>
date:        Fri Jul 24 06:55:28 2009 -0700
summary:     Bug 488731 - Avoid shape regeneration and property cache purge during the GC (r=mrbkap).

I'm going to look at this a bit more before closing it.
Well, I get SIGSEGV on this line in js_Interpret, case JSOP_GETPROP:

                        ASSERT_VALID_PROPERTY_CACHE_HIT(i, aobj, obj2, entry);
                        if (PCVAL_IS_OBJECT(entry->vword)) {
                            rval = PCVAL_OBJECT_TO_JSVAL(entry->vword);
                        } else if (PCVAL_IS_SLOT(entry->vword)) {
                            slot = PCVAL_TO_SLOT(entry->vword);
                            JS_ASSERT(slot < OBJ_SCOPE(obj2)->freeslot);
                            rval = LOCKED_OBJ_GET_SLOT(obj2, slot);
                        } else {
                            JS_ASSERT(PCVAL_IS_SPROP(entry->vword));
                            sprop = PCVAL_TO_SPROP(entry->vword);
>                           NATIVE_GET(cx, obj, obj2, sprop, &rval);
                        }

but that can't be the right line, because entry->vword == 11, so we should be in the PCVAL_IS_SLOT block.  Since this is a release build the line numbers are all off.

The property cache entry appears to be correct. Hard to say what's causing the crash. I'm going to try release with -g3 or -ggdb3 or something in the morning.
Phew, did I say morning? I meant Monday.
After evaluating f.prototype, f has a prototype property defined on it; it has dslots; it has its own scope with freeslot==6; that scope's shape is the same as the shape of its lastProp, which is the sprop for the .prototype property.

But in this test case, for some reason, that sprop's shape is the same as the shape of a new, empty Function.  I don't know why.

Obviously it is a problem for the empty Function to have the same shape as a Function that already has a .prototype property. This crash is a null deref reaching for funobj->dslots[0] on a new, empty Function where dslots==NULL.

The property cache gets an entry with kshape==(the shape of a new empty Function) and slot==5.

I don't know what causes the sprop for "prototype" to have shape==9, the same as the empty Function object.  Something about the first five lines of the test set this up. Puzzling! Cc-ing bz in case he knows of a bug in this vicinity.

a = this.toSource;
gc();
uneval(this);
__defineGetter__("q", Function)
gc();

var a = [function(){}, function(){}];
for (var y = 0; y < 2; ++y) {
    a[y].prototype;
    print(shapeOf(a[0]) + " " + shapeOf(a[1]));
}
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P1
I think emptyScopes are not traced during GC if no object points directly to them.

I didn't think it was necessary to trace them because they never have any sprops, but their shapes do need to be regenerated.

If this is correct, then Brendan's patch (cited in comment 9) "fixed" this by not regenerating shapes so often. I imagine the bug still exists. (This sort of bug does not submit to meaningful test cases, since it depends so much on the GC algorithm and the implementation of shapes.)
> I didn't think it was necessary to trace them because they never have any
> sprops, but their shapes do need to be regenerated.

Mmm.  If they have non-0 shapes (so unique shapes), then yes!

What gives an empty scope a non-0 shape?
JSScope::createEmptyScope always gives the emptyScope a nonzero shape.

Two empty objects of different classes sometimes must have different shapes (for example, they have a different number of reserved slots; or one has a JSClass.setProperty hook).

I didn't see any win in making createEmptyScope check for this and use shape 0 if possible--especially if bug 471214 plays out the way I think it will.
> JSScope::createEmptyScope always gives the emptyScope a nonzero shape.

Ah, this is a change from bug 503080.  OK.  I haven't kept up with all the changes here.  :(

So yes, we need to renumber empty scopes during gc if we renumber anything at all.
Posted patch v1Splinter Review
Attachment #390869 - Flags: review?(brendan)
Comment on attachment 390869 [details] [diff] [review]
v1

>diff --git a/js/src/jscntxt.h b/js/src/jscntxt.h
>--- a/js/src/jscntxt.h
>+++ b/js/src/jscntxt.h
>@@ -387,16 +387,27 @@ struct JSRuntime {
>      * flag is written only by the GC thread.  Atomic updates to packed bytes
>      * are not guaranteed, so stores issued by one thread may be lost due to
>      * unsynchronized read-modify-write cycles on other threads.
>      */
>     JSPackedBool        gcPoke;
>     JSPackedBool        gcRunning;
>     JSPackedBool        gcRegenShapes;
>     uint8               gcPadding;
>+
>+    /*
>+     * During gc, if rt->gcRegenShapes &&
>+     *   (scope->flags & JSScope::SHAPE_REGEN) == rt->gcRegenShapesScopeFlag,
>+     * then the scope's shape has already been regenerated during this GC.
>+     * To avoid having to sweep JSScopes, the bit's meaning toggles with each shape-regenerating GC.
..12345678901234567890123456789012345678901234567890123456789012345678901234567890

Wrap comment before column 80.

>+
>+     *
>+     * FIXME Once scopes are GC'd (bug 505004), this will be obsolete.
>+     */
>+    uint32              gcRegenShapesScopeFlag;

Why not use uint8 gcPadding for this? Could JS_STATIC_ASSERT(JSScope::SHAPE_REGEN < JS_BIT(8)).

>+    uint32 regenFlag = cx->runtime->gcRegenShapesScopeFlag;
>+    if (IS_GC_MARKING_TRACER(trc) &&
>+        cx->runtime->gcRegenShapes &&
>+        (scope->flags & JSScope::SHAPE_REGEN) != regenFlag) {

Add a scope->hasRegenFlag(regenFlag) inline helper. Maybe call it matchRegenFlag?

>+             empty && (empty->flags & JSScope::SHAPE_REGEN) != regenFlag;

Helper helps here too.

Clever solution, too bad short-lived ;-).

r=me with fixes.

/be
Attachment #390869 - Flags: review?(brendan) → review+
(In reply to comment #18)
> >+    uint32              gcRegenShapesScopeFlag;
> 
> Why not use uint8 gcPadding for this? Could
> JS_STATIC_ASSERT(JSScope::SHAPE_REGEN < JS_BIT(8)).

Oops - yes, that's an obvious win, and no static assert seems necessary since gcPadding and JSScope::flags are the same size.

Pushed.

http://hg.mozilla.org/tracemonkey/rev/d0816ba5044c
Summary: Crash [@ js_Interpret] with gc, defineGetter, prototype, uneval → Shape regeneration does not touch most empty scopes
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/d0816ba5044c
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Mass change: adding fixed1.9.2 keyword

(This bug was identified as a mozilla1.9.2 blocker which was fixed before the mozilla-1.9.2 repository was branched (August 13th, 2009) as per this query: http://is.gd/2ydcb - if this bug is not actually fixed on mozilla1.9.2, please remove the keyword. Apologies for the bugspam)
Keywords: fixed1.9.2
This bug has the same crash signature as bug 519363, so I took a detailed look at it. I confirmed that this bug was in fact caused by empty shapes not being regenerated. The result (for the original test case on my machine) was that an empty scope E was created with shape 9, then a non-empty scope A was created with shape 528, and then during GC, A was regenerated to have shape 9, while the shape of E was unchanged. Thus, E (with no props) and A (with props) both share shape 9, which is obviously bad. (The crash itself was because a prop cache entry got created for A, then hit for E, which then tried to read a dslot.)

I also confirmed that jorendorff's patch here does make this test case not crash (rebasing the patch to the first bug revision noted here). Whether this means the underlying bug (empty shapes not always being regenerated on shape- regenerating GC runs) is entirely fixed, I don't know.

What's interesting is that topcrashes with this signature (null dslots read in js_Interpret) seem to have increased (both absolutely and relative to number of js_Interpret crashes) with the 7/31 nightly build, and this patch went in on 7/30. I'll need to gather better stats and read the list of patches that went in for that build to see what that means, but it might be a clue for the tip version of this topcrash. It might also mean that the 3.5 version really is different from the tip version.

I will also have to dig further to see what all this means for 1.9.1. It's possible that this patch and/or some related ones might be the cure for the 1.9.1 topcrash.
Blocks: 519363
It sure sounds like this is wanted for 1.9.1.

Should this bug block bug 503080?

/be
More on the history. 

1. First dslots null crash.
The first time the null dslots crash was ever reported on trunk was for the 2009-06-07 m-c nightly build. The crash was fairly infrequent at first, being generally <10% of js_Interpret crashes.

2. Dramatic increase in frequency of dslots null crash.
Starting with the 2009-07-25 build, the null dslots crash became about 100% of the js_Interpret crashes. This pattern holds until at least 2009-08-07, at which time the nightly version switched to 3.6a2pre, for which I have not yet collected data.

3. Analysis of dramatic increase and hg history
The patch for bug 503080 landed on 07-24. That was the only shape-related patch to land on m-c within 2 days of 07-25. Thus, I am willing to guess that bug 503080 was what dramatically increased the incidence of this crash on trunk. What's strange is that none of the follow-on patches to mitigate/fix Gary's crash seem to have affected the frequency of the crash. (a) The patch that reduced the frequency of shape regeneration (bug 488731) landed on m-c on 07-27. I think this could be explained if the crash is rare, and the non-GC conditions for the crash usually take a long time to develop, long enough for shape regeneration to have happened. (b) The patch for this bug, to make sure shape regeneration touches empty scopes, landed on m-c on 07-29. It does appear that this crash went from about 100% to about 80% at that time, so there is some evidence that this patch mitigated, but did not fix, the problem.

4. Analysis of start of crash and hg history
After the first crash, it was another 7 nightly builds until we saw a second example. So the first crash could easily have been introduced up to 2 weeks before 06-07 (of course no definite lower bound is possible). Patches related to shape that landed going back to 05-20 were:

- bug 496054 (landed m-c 06-04, landed 1.9.1 06-04): This modified the code that generates new shapes for objects on a prototype chain when obj.__proto__ is set. Specifically, the patch made the object itself also get a new shape.

- bug 494208 (landed m-c 05-29, landed 1.9.1 05-30): This made it so that when setting obj.__proto__, js_GetMutableScope is called only if the object is a native object.

Also during that time we landed upvar2 code and code to trace more ops. Although both of those did introduce bugs, I don't think they introduced bugs similar to this one.

The first regular beta where this crash occurred was 3.5b99, which has a build date of 06-05, so both of these patches had landed there.

In 3.5pre, the 1.9.1 nightly at that time, the first crash of this kind also had a build date of 06-05.

5. Two-week reprieve analysis
There were no crashes of this kind for builds between 06-30 and 07-14, inclusive. There weren't a lot of crashes in general during that time, so this probably doesn't mean anything. During that time, we had:

- some work on DropScope (07-13/14), including making it inline and reducing the amount of locking.
- bug 502890 (07-13): correctly change shape if a branded property is assigned on trace
- bug 498565 (07-13): change js_FillPropertyCache
- bug 497789 (06-30): trace property cache hits for longer prototype chains. This was landed and backed out the same day.

6. "Conclusions"

Of course nothing is for certain in this kind of historical analysis. But based on the fact that in 3 different builds, the first crashes of this kind was on 06-05, 06-05, and 06-07, I think it's likely that the shape-related change (bug 496054) of 06-04 introduced the problem.

Then, it seems pretty clear that bug 503080 made the problem much worse, and was not improved by the related followon patches.
This kind of null dslots indicates that the Basic Layout Guarantee is busted. (Basic layout guarantee: objects with the same shape have the same layout. More precisely expressed in <https://developer.mozilla.org/En/SpiderMonkey/Internals/Property_cache#Shape>.)

This bug obviously caused very widespread bustage, and any branch that took the patch for bug 503080 should most certainly take this patch. (Ever since this bug was reported, it has blocked 503080.)

The way I see it, other bugs that break the shape guarantees are not necessarily "related to" this bug or 503080.
We seem to have newly initialized objects with shapes that lead to null dslots dereferences. Is this bug's patch a partial fix? Or are there other bugs (e.g., to do with the js_ObjectIsSimilarToProto condition change)? We could use nearly null dslots values to try to distinguish.

/be
(In reply to comment #24)
> Of course nothing is for certain in this kind of historical analysis. But based
> on the fact that in 3 different builds, the first crashes of this kind was on
> 06-05, 06-05, and 06-07, I think it's likely that the shape-related change (bug
> 496054) of 06-04 introduced the problem.

The patch for bug 496054 was "safe" in the sense that it regenerated an object's shape where that object was not reshaped before. So by itself it should have caused more property cache misses, not invalid hits. Of course, a latent bug could make the patch cause more invalid hits.

I should have pointed out that setting __proto__ actually runs the GC in a special mode (possibly in a full GC if the GC was poked) to serialize the runtime and check for cycles. It's still hard to understand how we could be regenerating shapes but perhaps there's some content or add-on that triggers excessive shape generation.

/be
Comment on attachment 390869 [details] [diff] [review]
v1

>+        /* Also regenerate the shapes of empty scopes, in case they are not shared. */
>+        for (JSScope *empty = scope->emptyScope;
>+             empty && (empty->flags & JSScope::SHAPE_REGEN) != regenFlag;
>+             empty = empty->emptyScope) {
>+            empty->shape = js_RegenerateShapeForGC(cx);
>+            empty->flags ^= JSScope::SHAPE_REGEN;
>+        }

Sorry if this has come up and I missed it: dmandelin and I were reviewing this change and the loop condition leading to break before empty becomes null made us wonder whether an empty scope might be left with a stale shape. Jorendorff, I missed this during initial review. Is it a bug?

/be
Blocks: 524743
Aha. Yes, there's a bug in the checked-in version of that code. Filed bug 524743, which should block this.
No longer blocks: 524743
Depends on: 524743
Group: core-security
You need to log in before you can comment on or make changes to this bug.