Closed Bug 497789 Opened 11 years ago Closed 10 years ago

[TM]Failure to trace loop which calls function which lives two up the prototype chain

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.3a3

People

(Reporter: bzbarsky, Assigned: brendan)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: fixed-in-tracemonkey)

Attachments

(5 files, 22 obsolete files)

321 bytes, text/plain
Details
653 bytes, text/plain
Details
20.07 KB, patch
Details | Diff | Splinter Review
147.47 KB, patch
jorendorff
: review+
Details | Diff | Splinter Review
5.88 KB, patch
dvander
: review+
Details | Diff | Splinter Review
Attached file JS shell testcase
The attached shell testcase is reduced from the testcase in bug 497788.  When I run it in current TM debug shell, I get 35 copies of:

  Abort recording of tree /Users/bzbarsky/baz.js:15@108 at
    /Users/bzbarsky/baz.js:17@129: Inner tree is trying to grow,
    abort outer recording.

and the stats say:

  recorder: started(37), aborted(35), completed(34), different header(0),
    trees trashed(2), slot promoted(0), unstable loop variable(0), breaks(0),
    returns(0), unstableInnerCalls(0)
  monitor: triggered(4531), exits(4531), type mismatch(0), global mismatch(0)

If using new Ball() instead of new DHTMLBall(), the problem does not occur.  This is keeping us from tracing the O(N^2) part of the testcase in bug 497788, which seems to be the one that dominates for large numbers of balls.
Waldo thinks this code in js_FillPropertyCache in the

   if (scopeIndex != 0 || protoIndex != 1)

block looks kinda suspicious:

340             pc = (jsbytecode *) atom;
341             kshape = (jsuword) obj;

Makes it hard to hit the cache, if we don't do similar pc/kshape mangling whenever we do the cache lookup.
Then again, js_FullTestPropertyCache does seem to do similar mangling.
Attachment #382907 - Attachment mime type: application/x-javascript → text/plain
The problem is that the cache can't invalidate many-to-one dependencies, which arise with deeper prototype chains, and prototype/scope chain combinations, by purging shapes.

In the test we have 128 objects, all with the same proto, DHTMLBall.prototype. It has a proto, Ball.prototype, where doCollide lives. Each time through the inner (j) loop, we cache for a different one of these 128 objects, that it has a method to call, but only if (a) the direct object identity matches; (b) Ball.prototype's shape matches.

While (b) is satisfied in the test, and would be correctly falsified (guard fires) if a shadowing doCollide were added to DHTMLBall.prototype, in general we must guard (a) too.

The alternative, sound when caching a direct property or immediate prototype hit, of (a') the direct object's shape matches, is not safe in general:

js> function f() { this.foo = 1 }
js> function g() { this.bar = 2 }
js> g.prototype = new f
[object Object]
js> o = new g
[object Object]
js> shapeOf(o)
134
js> p = new g
[object Object]
js> shapeOf(p) 
134
js> function h() { }
js> h.prototype = new g      
[object Object]
js> f.prototype.doCollide = function() "oink"
function () "oink"
js> h.prototype.doCollide = function() "meow"
function () "meow"
js> g.prototype.doCollide = function() "woof"
function () "woof"
js> o.doCollide()
woof
js> p.doCollide()
woof
js> q.doCollide()
meow
js> shapeOf(o)
134
js> shapeOf(p)
134
js> q = new h
[object Object]
js> shapeOf(q)
148
js> q.bar = 2
2
js> shapeOf(q)
134

This case requires invalidation by brute force, rather than purging by shape regeneration, in the case of a shadowing event along one of N proto chains that reach the cached method.

/be
Assignee: general → brendan
Status: NEW → ASSIGNED
OS: Mac OS X → All
Priority: -- → P1
Hardware: x86 → All
Target Milestone: --- → mozilla1.9.2a1
Shapes are per-object. Otherwise you would have shape aliases and "shape gaps" as objects grew properties directly and in prototypes, in arbitrary order. The VM can't start a new object's shape with its proto's shape (I played at this in the fx3 daze), or it might have to revise thousands of objects' shapes when their proto changes.

In case it is not clear, the set-up in comment 3 allows "foreshadowing":

js> h.prototype.doCollide2 = function () "moo"
function () "moo"
js> f.prototype.doCollide2 = function () "peep"
function () "peep"
js> o.doCollide2()
peep
js> q.doCollide2()
moo

And o and q still have the same shape (so does p). We reshape f.prototype when doCollide2 is set on it, but if you do

js> a = [o, p, q]
[object Object],[object Object],[object Object]
js> for (let i in a) a[i].doCollide2();
moo

then under (a'), caching the direct object's shape instead of its identity, we would call f.prototype.doCollide2 for a[2] (which references q), because q has the same shape as o and p, and f.prototype indeed has the same shape as was set in the cache's vcap field, at the right proto/scope coordinates.

This is the "foreshadowing" hazard I found in the wild, debugging gmail when bringing up the property cache during fx3 development.

Regular shadowing ("aftshadowing"?) is handled by js_PurgeScopeChain.

Elaborating the notion of shapes to deal with foreshadowing is possible, by throwing more memory at the problem. Conceptually one can keep lists of objects, cache entries, and JITted code to invalidate when foreshadowing happens. The key here is the unlikely evolution of a prototype after instances have been new'ed.

/be
In the example above, do f.prototype and h.prototype also have the same shapes as each other?  That is, would it maybe make sense to guard on shapes of things on the proto chain instead of object identity?
f.prototype and h.prototype have different shapes, because they have different methods, so different "hidden classes" to use the V8/Self-with-classes term.

/be
(In reply to comment #6)
> f.prototype and h.prototype have different shapes, because they have different
> methods, so different "hidden classes" to use the V8/Self-with-classes term.

This means bz's idea from comment 5 could be used when JITting this particular case: emit shape guards for all objects from the direct one up to the grand-proto.

But, these cost more when the chain is deep. It would be better to have a single "shape", "hidden class" or "structure id" to guard.

More, if the property in question weren't a method, then the shapes could match for the foreshadowing scope (h.prototype's) and the farther one (f.prototype's). For non-method properties, shape-guarding up the prototype chain would give rise to false positive bugs.

A composite shape that incorporates all current prototype shapes and the direct shape, but which can be regenerated correctly and efficiently when any cached or guarded prototype property is deleted or shadowed, suggests a tree-structured inheritance graph of shapes:

  Object.prototype
       ^
       |
  f.prototype <-- h.prototype <-- q
       ^
       |
  g.prototype <-- o, p

Each node in the tree has a parent shape struct and its own struct. This tree is unrelated to the property tree, whose relations are induced by property addition to a single scope only. The composite shape of any object can be viewed as a path from root to that object.

These shape structs are essentially today's JSScopes, but without the object back-pointer in JSScope. Such object-independent scopes/shapes can be memoized and shared in thread-local (JSThread) lock-free storage.

These shared shape/scope structs would be related by parent struct pointers.

Thus JSScopes would be memoized and shared aggressively (and probably renamed, the "scope" name is confusing, an idiom not a metaphor or well-used jargon word). This would require moving the lock-free title out of the struct and into the object, but only multi-thread-accessible objects would need titles at all.

This is an optimization we've talked about and it should be done soon, to eliminate JS_THREADSAFE overhead for by far the common case. See bug 349263, bug 419537, and bug 408416, which has as its URL the wiki page https://wiki.mozilla.org/JSObject_Makeover.

Composite shapes don't tell the VM what slot or function-pair in a given object to use when getting or setting, or what particular method value to use as a callee when processing an invocation, unless composite shapes split whenever an object mutates in a way that would invalidate cached or traced method, slot, or sprop references. Such mutations to a given shape struct within the composite would potentially have to revise all cached/traced records.

The kinds of mutation that could invalidate cached/traced records include:

1. Adding to shadow a farther property (but not foreshadow, in this composite scheme).

2. Deleting a proto-property to let a farther one, or else no-such-property => undefined, be seen.

3. Assigning __proto__ (but this is a hard case already, and we should break it for non-newborn objects unless the object is being created by an object initialiser, where the newborn can't escape, so can't make a cycle or be used to create potentially stale cache entries or trace guards).

4. Changing attributes, getter, setter, etc. on an existing property. This can be done with the JS API today, and with ES5 meta-programming APIs soon.

Can anyone think of others? V8 does this by taking its hidden classes through various transitions and (AFAIK) invalidating linked PICs and so on. Can we do any better?

/be
Another option might be to guard on object shape and proto object identity.  Should be faster than guarding on all proto chain shapes, but a lot less complex than what comment 7 describes, and should cover the common cases...
(In reply to comment #8)
> Another option might be to guard on object shape and proto object identity.

That is not an option in this object-relationship case, because the grand-proto could be mutated and nothing about direct objects or the proto would change.

> Should be faster than guarding on all proto chain shapes, but a lot less
> complex than what comment 7 describes, and should cover the common cases...

I'm not sure what you're proposing. There's an N-body problem here, so cache-hit-qualifying or trace-guarding one or two facts about the N bodies won't be enough in general. The js_PurgeScopeChain machinery handles shadowing, and the direct object identity key/guard defends against foreshadowing.

For one- or two-condition caching/guarding, the current arrangement is correct as designed -- but of course it misses the cache or branch-exits the guard for loops over arrays of objects, when calling grand-proto methods.

/be
Right now we're using object identity on the object the property lookup to guard against foreshadowing, right?

Another option to guard against foreshadowing, per above, is to guard on shapes of the object all protos, if I understood comment 6 and comment 7 right.

So it seems to me that one could also guard on the first N shapes and the object identity of the next proto, no?  Right now we use N == 0, basically.  The "guard on shapes of all protos" case uses N == infinity.  I was wondering whether it's worth looking at N == 1.

Unless I'm completely misunderstanding the foreshadowing problem and what guards would protect against it?
(In reply to comment #10)
> Right now we're using object identity on the object the property lookup to
> guard against foreshadowing, right?

Yes, and the reason this works is that (barring __proto__ assignment, which reshapes all objects along old and new proto chains, including [thanks to Waldo for the fix] the direct object whose __proto__ is being set) foreshadowing requires different proto chains to end in the same prototype object in which the property will eventually be defined.

Only one of the proto paths foreshadows, but the starting objects for both paths, foreshadowing and non-foreshadowing, have the same shape.

> Another option to guard against foreshadowing, per above, is to guard on shapes
> of the object all protos, if I understood comment 6 and comment 7 right.

Right. Or make shapes depend on proto-shapes, etc., which is what I am trying to develop now.

> So it seems to me that one could also guard on the first N shapes and the
> object identity of the next proto, no?

Oh, you are right that this would solve foreshadowing, but as I pointed out in comment 9, you also need to guard on the grand-proto shape. So it's not a savings compared to guarding on the proto shape, which is also more general. But for N > 3 then it might be a win, depending.

The real trick is to compute a new object's shapes starting from its proto-shape. I didn't do this in Firefox 3 for want of time, but I should have revisited it for 3.5 (where did the time go? Tracing work plus upvars, in my case). I'm going to try to include the prototype shapes in delegating objects' shapes now.

/be
Attached patch patch for testing (obsolete) — Splinter Review
Passes the suite, ensures zero aborts tracing the JS shell testcase. Hacked with #if 0 surgically, will clean up and test more as time allows.

/be
hit rates: pc 1.53679% (proto 0%), set 0%, ini nan%, full 99.879%

/be
That patch sort of breaks the Firefox UI (keyboard shortcuts don't work on Mac, say, nor does selecting options from menus)...
(In reply to comment #14)
> That patch sort of breaks the Firefox UI (keyboard shortcuts don't work on Mac,
> say, nor does selecting options from menus)...

I smell XBL.

/be
With this patch there are only two hit cases now, one direct (single shape guard) and one indirect (up the proto and/or parent chains). So the non-hacked patch for this bug will simplify everything accordingly, once I debug what is wrong with (I suspect) XBL.

/be
bz: were you running with assertions non-fatal or compiled out? I spotted out and it pointed directly to the problem, evident from the interdiff. Let me know how this works for ya.

/be
Attachment #383331 - Attachment is obsolete: true
Oops, both bugzilla and cmd-line interdiff fail. Inline diff of patches:

50c50
< @@ -575,9 +589,8 @@ js_PurgePropertyCacheForScript(JSContext
---
> @@ -569,9 +583,8 @@ js_PurgePropertyCacheForScript(JSContext
76c76,77
< @@ -381,7 +383,6 @@ js_FillPropertyCache(JSContext *cx, JSOb
---
> @@ -380,7 +382,6 @@ js_FillPropertyCache(JSContext *cx, JSOb
>          if (entry->kpc == pc && entry->kshape == kshape_) {                   \
79d79
<              JS_LOCK_OBJ(cx, pobj);                                            \
82c82
<                  (tmp_ = LOCKED_OBJ_GET_PROTO(pobj)) != NULL &&                \
---
>                  (tmp_ = OBJ_GET_PROTO(cx, pobj)) != NULL &&                   \
102c102,112
< @@ -1207,6 +1211,10 @@ js_AddScopeProperty(JSContext *cx, JSSco
---
> @@ -1202,11 +1206,20 @@ js_AddScopeProperty(JSContext *cx, JSSco
>                          continue;
>  
>                      JS_ASSERT(sprop != overwriting);
> +#if 0
>                      JS_ASSERT(i != 0);
> +#else
> +                    if (i == 0)
> +                        break;
> +#endif
>                      spvec[--i] = sprop;
116c126
< @@ -6730,16 +6730,19 @@ TraceRecorder::test_property_cache(JSObj
---
> @@ -6831,16 +6831,19 @@ TraceRecorder::test_property_cache(JSObj
136c146
< @@ -6758,6 +6761,7 @@ TraceRecorder::test_property_cache(JSObj
---
> @@ -6859,6 +6862,7 @@ TraceRecorder::test_property_cache(JSObj

/be
> bz: were you running with assertions non-fatal or compiled out?

I was running an opt build.
OK, that last patch works fine in terms of menus for me.

I still see what I described in bug 497788 comment 9.  I'll try and see if I can create a testcase (and separate bug) for it.
Ok, I will clean up the patch to have no #if 0s and to simplify interfaces.

/be
I filed bug 499866 on the issue I mention in comment 20.  It looks like basically this fix stops working after a gc.... or something.  Will try to minimize tomorrow.
Note: gal added a comment to jstracer.cpp about some code that can be removed once this patch lands.  Might be worth filing a bug on that and making it depend on this bug or something....
(In reply to comment #23)
> Note: gal added a comment to jstracer.cpp about some code that can be removed
> once this patch lands.  Might be worth filing a bug on that and making it
> depend on this bug or something....

I'll address that in this bug, as much as I can. I was over-optimistic over iChat with Andreas about it, though. That comment promises the moon, but you cannot use inherited shapes to solve the many-to-one dependency/update problem inherent in prototype-based delegation.

Inherited shapes solve foreshadowing, but you still need a way to invalidate when the proto chain itself, or a proto on the extant chain, is mutated.

For existing properties, this is handled by js_PurgeScopeChain and the __proto__ assignment code in jsgc.cpp that reshapes old and new chains along with the obj whose __proto__ is being set.

For missing properties there's no scoreboard to consult, as js_PurgeScopeChain does when it looks for shadowed proto-props. Perhaps there could be, but that may not be worth maintaining compared to what the tracer emits already (most proto chains are short).

/be
Attached patch cleaned up minimal patch (obsolete) — Splinter Review
Attachment #384475 - Attachment is obsolete: true
Attachment #385139 - Flags: review?(igor)
Blocks: 500431
Comment on attachment 385139 [details] [diff] [review]
cleaned up minimal patch

>diff --git a/js/src/jsinterp.cpp b/js/src/jsinterp.cpp
>--- a/js/src/jsinterp.cpp
>+++ b/js/src/jsinterp.cpp
>@@ -334,12 +334,6 @@ js_FillPropertyCache(JSContext *cx, JSOb
> #endif
> 
>         if (scopeIndex != 0 || protoIndex != 1) {
>-            khash = PROPERTY_CACHE_HASH_ATOM(atom, obj, pobj);
>-            PCMETER(if (PCVCAP_TAG(cache->table[khash].vcap) <= 1)
>-                        cache->pcrecycles++);
>-            pc = (jsbytecode *) atom;
>-            kshape = (jsuword) obj;
>-

With these changes neither atom no pcoff are no longer used in js_FillPropertyCache plus the only use of khash is to calculate entry.

>             /*
>              * Make sure that a later shadowing assignment will enter
>              * PurgeProtoChain and invalidate this entry, bug 479198.
>@@ -376,14 +370,13 @@ js_FillPropertyCache(JSContext *cx, JSOb
> JS_REQUIRES_STACK JSAtom *
> js_FullTestPropertyCache(JSContext *cx, jsbytecode *pc,
>                          JSObject **objp, JSObject **pobjp,
>-                         JSPropCacheEntry **entryp)
>+                         JSPropCacheEntry *entry)
> {
>     JSOp op;
>     const JSCodeSpec *cs;
>     ptrdiff_t pcoff;
>     JSAtom *atom;
>     JSObject *obj, *pobj, *tmp;
>-    JSPropCacheEntry *entry;
>     uint32 vcap;
> 
>     JS_ASSERT(uintN((cx->fp->imacpc ? cx->fp->imacpc : pc) - cx->fp->script->code)
>@@ -400,12 +393,10 @@ js_FullTestPropertyCache(JSContext *cx, 
> 
>     obj = *objp;
>     JS_ASSERT(OBJ_IS_NATIVE(obj));
>-    entry = &JS_PROPERTY_CACHE(cx).table[PROPERTY_CACHE_HASH_ATOM(atom, obj, NULL)];
>-    *entryp = entry;
>     vcap = entry->vcap;
> 
>-    if (entry->kpc != (jsbytecode *) atom) {
>-        PCMETER(JS_PROPERTY_CACHE(cx).idmisses++);
>+    if (entry->kpc != pc) {
>+        PCMETER(JS_PROPERTY_CACHE(cx).kpcmisses++);
> 
> #ifdef DEBUG_notme
>         entry = &JS_PROPERTY_CACHE(cx).table[PROPERTY_CACHE_HASH_PC(pc, OBJ_SHAPE(obj))];
>@@ -427,11 +418,15 @@ js_FullTestPropertyCache(JSContext *cx, 
>         return atom;
>     }
> 
>-    if (entry->kshape != (jsuword) obj) {
>-        PCMETER(JS_PROPERTY_CACHE(cx).komisses++);
>+    if (entry->kshape != OBJ_SHAPE(obj)) {
>+        PCMETER(JS_PROPERTY_CACHE(cx).kshmisses++);
>         return atom;
>     }
> 
>+    /*
>+     * PROPERTY_CACHE_TEST handles only the direct and immediate-prototype hit
>+     * cases, all others go here.
>+     */
>     pobj = obj;
> 
>     if (JOF_MODE(cs->format) == JOF_NAME) {
>@@ -522,7 +517,6 @@ js_PurgePropertyCache(JSContext *cx, JSP
>         P(noprotos);
>         P(longchains);
>         P(recycles);
>-        P(pcrecycles);
>         P(tests);
>         P(pchits);
>         P(protopchits);
>@@ -535,8 +529,8 @@ js_PurgePropertyCache(JSContext *cx, JSP
>         P(setpcmisses);
>         P(slotchanges);
>         P(setmisses);
>-        P(idmisses);
>-        P(komisses);
>+        P(kpcmisses);
>+        P(kshmisses);
>         P(vcmisses);
>         P(misses);
>         P(flushes);
>@@ -569,9 +563,8 @@ js_PurgePropertyCacheForScript(JSContext
>          entry++) {
>         if (JS_UPTRDIFF(entry->kpc, script->code) < script->length) {
>             entry->kpc = NULL;
>-            entry->kshape = 0;
>-#ifdef DEBUG
>-            entry->vcap = entry->vword = 0;
>+#ifdef DEBUG
>+            entry->kshape = entry->vcap = entry->vword = 0;
> #endif
>         }
>     }
>@@ -4810,7 +4803,7 @@ js_Interpret(JSContext *cx)
>                     }
> 
>                     atom = js_FullTestPropertyCache(cx, regs.pc, &obj, &obj2,
>-                                                    &entry);
>+                                                    entry);
>                     if (atom) {
>                         PCMETER(cache->misses++);
>                         PCMETER(cache->setmisses++);
>diff --git a/js/src/jsinterp.h b/js/src/jsinterp.h
>--- a/js/src/jsinterp.h
>+++ b/js/src/jsinterp.h
>@@ -217,9 +217,6 @@ typedef struct JSInlineFrame {
> #define PROPERTY_CACHE_HASH_PC(pc,kshape)                                     \
>     PROPERTY_CACHE_HASH(pc, kshape)
> 
>-#define PROPERTY_CACHE_HASH_ATOM(atom,obj,pobj)                               \
>-    PROPERTY_CACHE_HASH((jsuword)(atom) >> 2, OBJ_SHAPE(obj))
>-
> /*
>  * Property cache value capability macros.
>  */
>@@ -282,8 +279,6 @@ typedef struct JSPropertyCache {
>     uint32              noprotos;       /* resolve-returned non-proto pobj */
>     uint32              longchains;     /* overlong scope and/or proto chain */
>     uint32              recycles;       /* cache entries recycled by fills */
>-    uint32              pcrecycles;     /* pc-keyed entries recycled by atom-
>-                                           keyed fills */
>     uint32              tests;          /* cache probes */
>     uint32              pchits;         /* fast-path polymorphic op hits */
>     uint32              protopchits;    /* pchits hitting immediate prototype */
>@@ -297,8 +292,8 @@ typedef struct JSPropertyCache {
>     uint32              slotchanges;    /* clasp->reserveSlots result variance-
>                                            induced slot changes */
>     uint32              setmisses;      /* JSOP_SET{NAME,PROP} total misses */
>-    uint32              idmisses;       /* slow-path key id == atom misses */
>-    uint32              komisses;       /* slow-path key object misses */
>+    uint32              kpcmisses;      /* slow-path key pc misses */
>+    uint32              kshmisses;      /* slow-path key shape misses */
>     uint32              vcmisses;       /* value capability misses */
>     uint32              misses;         /* cache misses */
>     uint32              flushes;        /* cache flushes */
>@@ -380,7 +375,6 @@ js_FillPropertyCache(JSContext *cx, JSOb
>         if (entry->kpc == pc && entry->kshape == kshape_) {                   \
>             JSObject *tmp_;                                                   \
>             pobj = obj;                                                       \
>-            JS_ASSERT(PCVCAP_TAG(entry->vcap) <= 1);                          \
>             if (PCVCAP_TAG(entry->vcap) == 1 &&                               \
>                 (tmp_ = OBJ_GET_PROTO(cx, pobj)) != NULL &&                   \
>                 OBJ_IS_NATIVE(tmp_)) {                                        \
>@@ -395,7 +389,7 @@ js_FillPropertyCache(JSContext *cx, JSOb
>                 break;                                                        \
>             }                                                                 \
>         }                                                                     \
>-        atom = js_FullTestPropertyCache(cx, pc, &obj, &pobj, &entry);         \
>+        atom = js_FullTestPropertyCache(cx, pc, &obj, &pobj, entry);          \
>         if (atom)                                                             \
>             PCMETER(cache_->misses++);                                        \
>     } while (0)
>@@ -403,7 +397,7 @@ js_FillPropertyCache(JSContext *cx, JSOb
> extern JS_REQUIRES_STACK JSAtom *
> js_FullTestPropertyCache(JSContext *cx, jsbytecode *pc,
>                          JSObject **objp, JSObject **pobjp,
>-                         JSPropCacheEntry **entryp);
>+                         JSPropCacheEntry *entry);
> 
> /* The property cache does not need a destructor. */
> #define js_FinishPropertyCache(cache) ((void) 0)
>diff --git a/js/src/jsscope.cpp b/js/src/jsscope.cpp
>--- a/js/src/jsscope.cpp
>+++ b/js/src/jsscope.cpp
>@@ -110,8 +110,12 @@ js_GetMutableScope(JSContext *cx, JSObje
> static void
> InitMinimalScope(JSContext *cx, JSScope *scope)
> {
>-    js_LeaveTraceIfGlobalObject(cx, scope->object);
>-    scope->shape = 0;
>+    JSObject *obj = scope->object;
>+    js_LeaveTraceIfGlobalObject(cx, obj);
>+
>+    JSObject *proto = OBJ_GET_PROTO(cx, obj);
>+    scope->shape = (proto && OBJ_IS_NATIVE(proto)) ? OBJ_SHAPE(proto) : 0;
>+
>     scope->hashShift = JS_DHASH_BITS - MIN_SCOPE_SIZE_LOG2;
>     scope->entryCount = scope->removedCount = 0;
>     scope->table = NULL;
>@@ -1018,7 +1022,7 @@ js_AddScopeProperty(JSContext *cx, JSSco
>                     uintN attrs, uintN flags, intN shortid)
> {
>     JSScopeProperty **spp, *sprop, *overwriting, **spvec, **spp2, child;
>-    uint32 size, splen, i;
>+    uintN size, splen, i;
>     int change;
>     JSTempValueRooter tvr;
> 
>@@ -1198,14 +1202,16 @@ js_AddScopeProperty(JSContext *cx, JSSco
>                      * sprop, while the former simply tests whether sprop->id
>                      * is bound in scope.
>                      */
>-                    if (!SCOPE_GET_PROPERTY(scope, sprop->id))
>-                        continue;
>+                    if (SCOPE_GET_PROPERTY(scope, sprop->id)) {
>+                        JS_ASSERT(sprop != overwriting);
>+                        spvec[--i] = sprop;
>+                    }
>+                    sprop = sprop->parent;
>+                } while (i != 0);
> 
>-                    JS_ASSERT(sprop != overwriting);
>-                    JS_ASSERT(i != 0);
>-                    spvec[--i] = sprop;
>-                } while ((sprop = sprop->parent) != NULL);
>-                JS_ASSERT(i == 0);
>+                JSObject *proto = OBJ_GET_PROTO(cx, scope->object);
>+                if (proto && OBJ_IS_NATIVE(proto))
>+                    sprop = OBJ_SCOPE(proto)->lastProp;
> 
>                 /*
>                  * Now loop forward through spvec, forking the property tree
>diff --git a/js/src/jstracer.cpp b/js/src/jstracer.cpp
>--- a/js/src/jstracer.cpp
>+++ b/js/src/jstracer.cpp
>@@ -7409,33 +7409,17 @@ TraceRecorder::test_property_cache(JSObj
>     JS_ASSERT(cx->requestDepth);
> #endif
> 
>-    // Emit guard(s), common code for both hit and miss cases.
>-    // Check for first-level cache hit and guard on kshape if possible.
>-    // Otherwise guard on key object exact match.
>-    if (PCVCAP_TAG(entry->vcap) <= 1) {
>-        if (aobj != globalObj) {
>-            LIns* shape_ins = addName(lir->insLoad(LIR_ld, map_ins, offsetof(JSScope, shape)),
>-                                      "shape");
>-            guard(true, addName(lir->ins2i(LIR_eq, shape_ins, entry->kshape), "guard(kshape)(test_property_cache)"),
>-                  BRANCH_EXIT);
>-        }
>-    } else {
>-#ifdef DEBUG
>-        JSOp op = js_GetOpcode(cx, cx->fp->script, pc);
>-        JSAtom *pcatom;
>-        if (op == JSOP_LENGTH) {
>-            pcatom = cx->runtime->atomState.lengthAtom;
>-        } else {
>-            ptrdiff_t pcoff = (JOF_TYPE(js_CodeSpec[op].format) == JOF_SLOTATOM) ? SLOTNO_LEN : 0;
>-            GET_ATOM_FROM_BYTECODE(cx->fp->script, pc, pcoff, pcatom);
>-        }
>-        JS_ASSERT(entry->kpc == (jsbytecode *) pcatom);
>-        JS_ASSERT(entry->kshape == jsuword(aobj));
>-#endif
>-        if (aobj != globalObj && !obj_ins->isconstp()) {
>-            guard(true, addName(lir->ins2i(LIR_eq, obj_ins, entry->kshape), "guard(kobj)"),
>-                  BRANCH_EXIT);
>-        }
>+    /*
>+     * Guard on the shape of the directly accessed native object, unless it's
>+     * the global object whose shape can't change on trace.
>+     */
>+    if (aobj != globalObj) {
>+        LIns* shape_ins = addName(lir->insLoad(LIR_ld, map_ins, offsetof(JSScope, shape)),
>+                                  "shape");
>+        guard(true,
>+              addName(lir->ins2i(LIR_eq, shape_ins, entry->kshape),
>+                      "guard(kshape)(test_property_cache)"),
>+              BRANCH_EXIT);
>     }
> 
>     // For any hit that goes up the scope and/or proto chains, we will need to
(In reply to comment #26)
> 
> With these changes neither atom no pcoff are no longer used in
> js_FillPropertyCache plus the only use of khash is to calculate entry.

Sorry for overciting - this is the only comment for the patch, r+ with that fixed.
Attachment #385139 - Attachment is obsolete: true
Attachment #385159 - Flags: review?(igor)
Attachment #385139 - Flags: review?(igor)
Attachment #385159 - Flags: review?(igor) → review+
http://hg.mozilla.org/tracemonkey/rev/dc1593de081a

/be
Whiteboard: fixed-in-tracemonkey
Blocks: 499866
So I'm trying to understand this fix...  We set the shape of an empty scope to the proto's shape...  But what happens if the proto is then reshaped?  Shouldn't we be updating the shape of the empty scope at that point?
(In reply to comment #30)
> So I'm trying to understand this fix...  We set the shape of an empty scope to
> the proto's shape...  But what happens if the proto is then reshaped? 
> Shouldn't we be updating the shape of the empty scope at that point?

Certainly not. There could be 1e6 delegating objects to update.

The problem we solve this way is foreshadowing. Normal (aft-)shadowing is handled by js_PurgeScopeChain calls.

/be
Attachment #385210 - Flags: review?(jorendorff)
Attachment #385210 - Flags: review?(jorendorff) → review+
Depends on: 500528
Backed out:

http://hg.mozilla.org/tracemonkey/rev/f43488dbb065

/be
Whiteboard: fixed-in-tracemonkey
Much C++ cleanup here, though, so drive-by reviews welcome.

/be
Comment on attachment 385528 [details] [diff] [review]
work in progress, no substantive change yet

Bugzilla interdiff against the "not so minimal..." patch works.

This patch also has the virtue of moving js_MakeScopeShapeUnique into jsscope.cpp where it is used by the specific ...ShapeChange methods of JSScope, which are its only callers now. This is the first step toward getting rid of random uniqification by remembering scope shapes in a shape state transition graph.

/be
(In reply to comment #29)
> http://hg.mozilla.org/tracemonkey/rev/dc1593de081a

caused NS_ERROR_DOM_SECURITY_ERR - browser only JavaScript Tests

js1_5/extensions/regress-369696-02.js
js1_6/extensions/regress-312385-01.js
js1_6/extensions/regress-455464-01.js
js1_6/extensions/regress-455464-02.js
js1_6/extensions/regress-455464-03.js
js1_6/extensions/regress-455464-04.js
js1_6/extensions/regress-475144.js
js1_7/extensions/regress-455982-01.js
js1_7/extensions/regress-455982-02.js
js1_7/regress/regress-484524-02.js
js1_8/extensions/regress-452476.js
js1_8/extensions/regress-476427.js
js1_8_1/trace/trace-test.js
Is the plan still to make shape cover the prototype's shape? This seems like a great idea.

I think it should also cover aspects of the JSClass that can complicate property access, like JSClass.resolve and JSClass.getProperty. This would support tracing more property access cases with fewer guards, and I agree with bz that an unusual JSClass probably strongly predicts a latent static type in the code.
It bothers me that "empty" objects share their prototype's shape and scope. I wish they could (instead) all share an empty scope whose shape includes the prototype's shape. To show why, here are some docs I wrote about shapes:

> Native objects have a shape, a 24-bit unsigned integer such that
>
>    * An object that has no own properties has the same shape as
>      its prototype.
>    * If two objects each have at least one own property, and they
>      have the same shape (OBJ_SHAPE(x) == OBJ_SHAPE(y)), then they
>      have the same property specs. [...] The two objects have the
>      same layout.
>    * If an object's shape remains the same for some period of time,
>      then it has the same property specs throughout that time.
https://developer.mozilla.org/En/SpiderMonkey/Internals/Property_cache#Shape

These are the axioms of shape.

The odd "If two objects each have at least one own property" precondition on the second axiom reads to me like a trap for unwary optimizations. ...In fact, see bug 501690.
(In reply to comment #38)
> Is the plan still to make shape cover the prototype's shape? This seems like a
> great idea.
> 
> I think it should also cover aspects of the JSClass that can complicate
> property access, like JSClass.resolve and JSClass.getProperty. This would
> support tracing more property access cases with fewer guards, and I agree with
> bz that an unusual JSClass probably strongly predicts a latent static type in
> the code.

Agree with all of this, have also (tangentially, just looking for a sanity check here before moving on to filing a bugh) been pondering whether it's worth having JSTraceType::TT_NATIVE_OBJECT and JSTraceType::TT_CUSTOM_OBJECT (naming, gack) so that we can make useful don't-guard assumptions about most object-valued values while tracing -- that is, assuming we don't support chimeric objects with class as derived from a JS_GetClass(cx, obj) call and with a custom map with goofy getProperty/etc. hooks.  Do we?  How many (few) people would we break if we said js_ObjectClass is off-limits when creating custom objects?
Stitched together from comment 3 and comment 4.

/be
Re: comment 40, splitting object trace-types just makes for more expensive guards. Splitting function from object won because of scope branding already computing shapes based on method values. This says we want different shapes for different kinds of objects, as jorendorff is urging. Not different trace types.

/be
>    * An object that has no own properties has the same shape as
>      its prototype.

This isn't true. I have weakened the statement in the wiki page.
Brendan, I think the JSScope methods are worth landing right away, under a separate bug. I can shepherd that in if you don't have time.

One nit: Can we rename JSScope::getProperty to JSScope::find? We already have JSClass::getProperty and JSObjectOps::getProperty and I think we want JSObject::getProperty too. All those are concerned with [[Get]]ting properties as opposed to merely getting them.

One way would be:
  hasProperty    => scope->has(sprop)
  getProperty    => scope->find(id)
  addProperty    => scope->add(cx, id, getter, setter, ...)
  removeProperty => scope->remove(sprop)

JSScope::search can be private.
Attached patch updated cleanup patch (obsolete) — Splinter Review
Jason, if you would spin this out and land it, I would be very grateful -- I'm at IEEE CSF with spotty network rest of this week.

/be
Attachment #385528 - Attachment is obsolete: true
Attachment #387505 - Flags: review?(jorendorff)
Attachment #385210 - Attachment is obsolete: true
Attachment #385159 - Attachment is obsolete: true
Hmm, but landing the cleanup patch will break foreshadowing cases such as the one I ran into with gmail (if still present). Maybe I should just finish the "guts" patch and get it attached here.

/be
Filed bug 503343 with a patch.

The patch there omits the substantive changes (in property cache code, mainly, but I also left out some changes to the timing of shape changes, in the interest of quickly landing something with no semantic changes).

You can subtract that patch from your current working patch using interdiff, but be sure to unbitrot your patch first, at least up to bfd1f4324a2f, or interdiff will complain.
Depends on: 503343
Comment on attachment 387505 [details] [diff] [review]
updated cleanup patch

Working on new consolidated patch.

/be
Attachment #387505 - Attachment is obsolete: true
Attachment #387505 - Flags: review?(jorendorff)
Blocks: 503860
Bug 500528 shows how to make two similar objects a -> b -> x and aa -> bb -> x
such that a.m is a plain old method inherited from x, but aa.m is not.

Once these objects exist, if we want to create a new, teleport-capable property
cache entry for `a.m`, it must be able to distinguish between `a` and `aa`
somehow.  But these two JSObjects differ in only three ways.

  1. They have different addresses.
  2. They have different scopes (but the same shape).
  3. Their PROTO slots point to different prototypes.

Otherwise they are identical.

The current property cache uses difference #1. But no matter what combination
of the three differences we use, I don't see how we can also satisfy bzbarsky's
use case-- short of actually walking the prototype chain at run time.

In short, the problem is overconstrained. Dropping teleportation would be
enough to fix that.

Another constraint we could drop is "once these objects exist". We could say,
"yes, well, once such prototype chains exist, we are in a circumstance where we
know we can't teleport". But we would have to detect that circumstance and
permanently turn off teleportation for all affected objects. I see no
reasonable way to do this, given that the key operation of modifying a
prototype is (I think) rather common with XPConnect resolve hooks.

So I vote for dropping teleportation--though of course it would be nice if
Brendan already has a patch that disproves all of the above. :)
There's no need to drop teleportation for bz's testcase:

function Ball() {
}
Ball.prototype.doCollide = function(b) {
}
function DHTMLBall() {
}
DHTMLBall.prototype.__proto__ = Ball.prototype;

const N = 128;
var balls = [];
for (var i=0; i<N; i++) {
    balls[i] = new DHTMLBall();
}

for (i=0; i<N; i++) {
    for (var j=i+1; j<N; j++) {
        balls[i].doCollide();
    }
}

Same proto and grand-proto for all N objects.

Jorendorff: watch Galaxy Quest again, note Sam Rockwell's character's change toward the end ;-). Also Tech Sgt. Chen during the entire digital conveyor scene (teleportation!).

/be
Yeah using difference #3 from comment 49 would work for my testcase.  I suspect in practice, it would be quite rare to have objects of same shape but different protos that could meaningfully share a propcache entry, so using difference 3 is the way to go (as Brendan has been saying).
Attached patch work in progress, not done yet (obsolete) — Splinter Review
Tried several approaches to making property tree lineages descend by prototype. Still not happy with tradeoffs, but close -- aim to finish after sleep.

/be
Driving by to give some numbers.
Tried the most recent patch just now. It fixes the problem I was seeing in deltablue. It now spends *most* of its time in native.

before:
Richards: 1957
DeltaBlue: 33
Crypto: 742
RayTrace: 177
EarleyBoyer: 269
----
Score: 297

after:
Richards: 1947
DeltaBlue: 1593
Crypto: 686
RayTrace: 130
EarleyBoyer: 282
----
Score: 601
Depends on: 488731
Blocks: 479090
Flags: blocking1.9.2+
Priority: P1 → P2
Eager to get those numbers in a tree somewhere, I admit. :-)

Brendan: will this patch give us the ability to reason about "super-shapes" efficiently?  Thinking about cases where we encounter objects that have had irrelevant properties added to the case for which we're guarding, I guess:

o1 = {a:1, b:2}
o2 = {a:10, b:20, c:30}

function loop(obj) {
  obj.a = obj.b;
}

loop(o1);
loop(o2); // can we tell that this is the same shape for our purposes?

What about:

o1 = {a:1, b:2, z:5}
o2 = {a:10, b:20, c:30}

I think those have identical layouts as far as |loop| is concerned, no?
Shaver: yes, I'm working on the whole enchilada. But bug 471214 is ahead in the q.

/be
Blocks: 508716
Depends on: 511591
Depends on: 473228
this is still marked blocking192. how's it looking?
would be nice, but doesn't seem like it will make it. can't block, I don't think.
Flags: blocking1.9.2+ → blocking1.9.2-
I'm working toward this but with shape-related topcrashes (now possibly with fix in hand, see bug 524743 -- field-testing required to know for sure) I can't see blocking on this. With stability and more complete understanding of shapes we can get this bug fixed, possibly even in 1.9.2.

/be
Attached patch half-done patch (obsolete) — Splinter Review
The big change to shape a derived object's first property based on its prototype's shape is not done, but I'm working on it.

/be
Attachment #388429 - Attachment is obsolete: true
So as expected, this partial patch botches the foreshadowing test:

$ ./Darwin_DBG.OBJ/js -j fore.js
fore.js:23: TypeError: Assertion failed: got "peep", expected "moo"

/be
No longer blocks: 508716
Target Milestone: mozilla1.9.2a1 → mozilla1.9.3a2
Attached patch updated incomplete patch (obsolete) — Splinter Review
Attachment #414595 - Attachment is obsolete: true
No longer depends on: 511591
(In reply to comment #61)
> Created an attachment (id=428357) [details]
> updated incomplete patch

Drive-by fuzzing:

for (var a = 0; a < 1; a++) {
  x = a
}
(function() {
  for (b = 0; b < 1; b)
  {
      x.__proto__ = []
  }
})()



$ ./js-dbg-32-tm-darwin 
js> for (var a = 0; a < 1; a++) {
  x = a
}
0
js> (function() {
  for (b = 0; b < 1; b)
  {
      x.__proto__ = []
  }
})()
Assertion failure: PCVCAP_TAG(entry->vcap) <= 1, at ../jsops.cpp:1714
Bus error
The patch is incomplete, so not ready for testing, but thanks!

/be
Attached patch working patch (obsolete) — Splinter Review
I am doing more testing, but this passes trace-tests and now js -j foreshadow.js is happy (so is bz, I bet ;-).

/be
Attachment #428357 - Attachment is obsolete: true
Attached patch working better patch (obsolete) — Splinter Review
Removed bogus assertions (we can't assert that kids are not marked if the node being swept is not marked -- the kids could be at virtual-higher [in the arena pool heap of sprops] addresses, so not yet visited to have their mark bits cleared).

I will split this up a bit, it mixes concerns.

/be
Attachment #429644 - Attachment is obsolete: true
TMFLAGS=stats js -j bzball.js (first attachment) out diff, unpatched shell on left as usual:

1,2c1,2
< recorder: started(37), aborted(35), completed(34), different header(0), trees trashed(2), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), merged loop exits(32), unstableInnerCalls(0), blacklisted(1)
< monitor: exits(4531), timeouts(0), type mismatch(0), triggered(4531), global mismatch(0), flushed(0)
---
> recorder: started(3), aborted(0), completed(3), different header(0), trees trashed(3), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), merged loop exits(0), unstableInnerCalls(0), blacklisted(0)
> monitor: exits(4), timeouts(0), type mismatch(0), triggered(4), global mismatch(0), flushed(0)

Victory over aborts. Not sure whether 4 exits is ideal but that is for another bug, if it is a bug.

/be
> Not sure whether 4 exits is ideal

It is, given HOTLOOP==2.  One exit for the setup loop, two for the inner loop while the outer loop is running its first two iterations, and then one for the outer loop running through the end of the testcase.
This will make switching to thread-local property trees a snap.

/be
Attachment #429647 - Attachment is obsolete: true
Attachment #429656 - Flags: review?(jorendorff)
Plus, C++ decl-at-first-dominating-set style for PT::{insert,remove,get}Child, and some manual PGO to make certain prefetched short-then-clause/early-return cases the fast (non-chunky-kid, no-kids-yet-when-inserting) ones.

/be
Attachment #429656 - Attachment is obsolete: true
Attachment #429669 - Flags: review?(jorendorff)
Attachment #429656 - Flags: review?(jorendorff)
To review, the jsops.cpp patch should be applied and then your best diff -w used; and the jspropertytree.cpp code should be read through as a new whole file, with the old jsscope.cpp code in another window for checking.

/be
Found by jstestbrowser, all praise bc! I should have seen this.

Because the root ply nodes are hashed by (firstProp, protoShape), not just on the node pointer as in below-root-ply hashes, we can't sweep root-ply nodes via the heap -- we have no idea of what protoShape to use to find them. So I had thought to sweep the propertyTree.hash table afterward, but of course whole arenas are freed to the malloc pool in each iteration of the loop over the node arena pool. D'oh!

Easy enough to fix with some refactoring to subroutine OrphanNodeKids. This required making RemoveNodeIfDead a friend of JSScopeProperty so it can test sprop->marked(). Jason, is there a better way?

/be
Attachment #429669 - Attachment is obsolete: true
Attachment #429678 - Flags: review?(jorendorff)
Attachment #429669 - Flags: review?(jorendorff)
Attached patch fix self-deadlock (obsolete) — Splinter Review
Self-lock was via checkMallocGCPressure from cx->calloc from NewPropTreeKidChunk while its ancestor-caller PT::getChild holds the GC lock.

This means property tree nodes do not contribute to memory pressure. That's how it has been. I shouldn't have tried to fix that unrelated bug at the last minute in writing this bug's patch.

Our memory pressure feedback system is pretty broken. We should use OS-based RSS measurements instead of time-killing and incomplete in-process bean-counting, a la bug 506125.

This patch passes jsreftests (aka `make jstestbrowser` -- all hail bc again!).

/be
Attachment #429678 - Attachment is obsolete: true
Attachment #429682 - Flags: review?(jorendorff)
Attachment #429678 - Flags: review?(jorendorff)
Attachment #429682 - Attachment is obsolete: true
Attachment #429684 - Flags: review?(jorendorff)
Attachment #429682 - Flags: review?(jorendorff)
(In reply to comment #72)
> This means property tree nodes do not contribute to memory pressure. That's how
> it has been.

Switching to use cx->calloc and cx->free without selflock is easy once the property tree is moved from JSRuntime to JSThreadData, and free of all locking (including rt->gcLock usage). This should be done for bug 511591.

/be
Blocks: 511591
Blocks: 549683
Attached patch rebase post-538307-landing (obsolete) — Splinter Review
Attachment #429684 - Attachment is obsolete: true
Attachment #430197 - Flags: review?(jorendorff)
Attachment #429684 - Flags: review?(jorendorff)
Target Milestone: mozilla1.9.3a2 → mozilla1.9.3a3
Now that I've reparemeterized from rt to cx in the PropertyTree::*Child methods, I will try to push JS_ReportOutOfMemory (after JS_UNLOCK_GC as needed) down to where OOM is first detected. Last patch is schizo on this point. Should be able to get rid of a goto out_of_memory in getChild, at least.

/be
Attachment #430197 - Attachment is obsolete: true
Attachment #430240 - Flags: review?(jorendorff)
Attachment #430197 - Flags: review?(jorendorff)
The patch regresses this test case:

var a = {x: 'a'},
    b1 = Object.create(a),
    c1 = Object.create(b1),
    b2 = Object.create(a),
    c2 = Object.create(b2);

b2.x = 'b';

var s = '';
for each (var obj in [c1, c2])
    s += obj.x;
assertEq(s, 'ab');
Déjà vu: comment 78 is roughly the same as bug 500528. I don't think I found it any faster this time. Why on earth didn't I check in a test for that?

I'll certainly add comment 78 to js/src/tests tomorrow morning.
Pushed http://hg.mozilla.org/tracemonkey/rev/4424c2547d79 .

This optimization is no good because it still doesn't make shape cover
prototype shape, except initially. Prototypes can change shape while their
descendents remain the same shape.

Therefore with the patch we fail to handle foreshadowing. (fore.js must not test
what it claims to be testing, but I haven't looked at it carefully.)

The cleanups around JSPropertyTree are great and (for the second time in this
bug :-) should go in if possible. A few comments on those (and maybe a
pre-existing bug, not sure):

In jstracer.cpp, TraceRecorder::hasMethod:
>     if (OBJ_IS_NATIVE(pobj)) {
>         JSScope* scope = OBJ_SCOPE(pobj);
>         JSScopeProperty* sprop = (JSScopeProperty*) prop;
> 
>         [...]
>     }

If pobj isn't native, shouldn't this set status = RECORD_STOP? (Pre-existing
bug, if so.)

>-    if (js_MatchPropertyCacheShape(cx, pobj, PCVCAP_SHAPE(vcap))) {
>+    if (entry->matchShape(cx, pobj, PCVCAP_SHAPE(vcap))) {

This is static, so JSPropCacheEntry::matchShape would be less misleading.
Same thing in the other two places where it's used.

In js_PurgePropertyCacheForScript:
>     for (entry = cache->table; entry < cache->table + PROPERTY_CACHE_SIZE;
>          entry++) {

Micro-nit in existing code: This fits on one line now.

In JSScope:
>+    void setSealed()            { JS_ASSERT(!isSharedEmpty()); flags |= SEALED; }
>+
>+    void seal(JSContext *cx) {
>+        JS_ASSERT(!sealed());
>+        sealingShapeChange(cx);
>+        setSealed();
>+    }

This is the only use of setSealed. Let's get rid of it and just inline
those two lines here.

Likewise sealingShapeChange.

>     void setBranded()           { flags |= BRANDED; }
> 
>+    bool brand(JSContext *cx, uint32 slot, jsval v) {
>+        JS_ASSERT(!branded());
>+        brandingShapeChange(cx, slot, v);
>+        if (js_IsPropertyCacheDisabled(cx))  // check for rt->shapeGen overflow
>+            return false;
>+        setBranded();
>+        return true;
>+    }

Likewise, these are the only uses of setBranded and brandingShapeChange.

(Good thing, too, as it would be a silly bug to call any of setSealed,
sealingShapeChange, setBranded, brandingShapeChange without going through
these code paths.)

In jsscope.h:
>+class js::PropertyTree;

GCC warns:
../jsscope.h:571: warning: declaration ‘class js::PropertyTree’ does not declare anything

I think to forward-declare this class you have to say,

  namespace js {
      class PropertyTree;
  }

In struct JSScopeProperty, private flags enum:
>+        /* Root pseudo-property with int-tagged shape id, not mapped by scope. */
>+        PROTO_ROOT      = 0x40

This bit isn't used anywhere.

I think it would be better to make a separate class, PropertyTreeNode, to
represent proto roots and the kids-related bits of each JSScopeProperty.
(That is, a JSScopeProperty would have a PropertyTreeNode member.)
Less pseudo.
Attachment #430240 - Flags: review?(jorendorff) → review-
(In reply to comment #80)
> Pushed http://hg.mozilla.org/tracemonkey/rev/4424c2547d79 .

Thanks!

> This optimization is no good because it still doesn't make shape cover
> prototype shape, except initially. Prototypes can change shape while their
> descendents remain the same shape.

D'oh, so close (see next patch). Thanks for testing this case.

> Therefore with the patch we fail to handle foreshadowing. (fore.js must not
> test
> what it claims to be testing, but I haven't looked at it carefully.)

Foreshadowing as tested by fore.js is not what your test exercises. Your test shadows x in b2 after x is bound in a.

But there's another before/after relation to consider, and this is what you must have been thinking I meant: the order in which the property cache is filled with respect to the shadowing property being bound.

This case is so obvious, and I was so focused on the foreshadowing case, that I missed it! I intended to make shapes differ by their initial emptyScope shapes.

Using the empty shape would seem like an easy patch, except that the scopes for b1 and b2 are scope(a).emptyScope, and (this is a bug, it should bite any grand-proto-deep hierarchy with two sibling protos -- two aunts or uncles) b1's emptyScope, which is shared by c1, is scope(a).emptyScope.emptyScope -- so it is also c2's emptyScope!

Fixing this emptyScope sharing bug gets a pass from your testcase. The fix is in JSScope::canProvideEmptyScope (I used the mighty if statement, again instead of introducing virtual method overhead). Can you contrive a test for it that does not depend on this bug's patch?

I renamed protoShape to emptyShape as needed.

/be
Attachment #430240 - Attachment is obsolete: true
Attachment #431223 - Flags: review?(jorendorff)
Attached patch plain diff of last two patches (obsolete) — Splinter Review
It's going to be Thursday before I can look at this in the detail it merits...

It's unclear to me exactly *why* this kind of scope sharing is a bug. It seems like it depends on how you aim to fix the shape guarantees, which is also not quite clear to me yet.

Perhaps you're aiming for shape to cover prototype identity. If so, then emptyScope.emptyScope sharing is clearly a bug; but bug 503860 comment 9 seems to say you were still trying to avoid that as of two days ago.

I am probably just missing something obvious, in which case I'll find it Thursday. :-)
(In reply to comment #84)
> It's unclear to me exactly *why* this kind of scope sharing is a bug. It seems
> like it depends on how you aim to fix the shape guarantees, which is also not
> quite clear to me yet.

You're right, it's not a bug without the essential change to fix this bug, the change to index tag > 1 hits by shape rather than by direct object identity. If one makes two prototypes with the same empty shape (from their proto's empty scope), then makes objects delegating to these two protos (sharing their shared empty scope, the emptyScope of their grandProto's emptyScope), then everything works correctly:

* For tag == 0 hits there's no difference under the basic layout guarantee.

* For tag == 1 the basic layout guarantee again suffices:
** same slot for a new property x in each prototype results in the correct value being fetched from that slot;
** ditto same getter or setter, attrs, etc; and different g/s/&c mean different proto-shapes;
** proto-shapes differ due to branding for method rather than sprop hits.

* For tag >= 2, different direct object identities distinguish the paths to the grand-or-greater-proto.

> Perhaps you're aiming for shape to cover prototype identity.

Yes!

> If so, then emptyScope.emptyScope sharing is clearly a bug; but bug 503860 comment 9 seems
> to say you were still trying to avoid that as of two days ago.

I'm defeated. You? If there's no alternative then we should proceed to collapse scopes into sprops in single-threaded proptrees. This will require JSObjectMap changes.

> I am probably just missing something obvious, in which case I'll find it
> Thursday. :-)

No, you're all over it, if not ahead of the curve! ;-)

/be
We must distinguish the paths to a grand-or-greater proto. The property cache as designed (i.e., without this bug's patch) did so by indexing by direct object identity. This bug's patch uses direct object shape as key instead, but this must mean shape depends on prototype identity.

If not, then a prototype object could be mutated without changes to its delegating objects shapes or to a foreshadowed prototype property that is yet to be created (of one that exists and is reshaped when shadowed, but then the cache fills happen).

It's annoying (to me anyway, viscerally) that we have to pessimize for the hard case, but without either:

1. a way to broadcast changes to all delegating objects or their cache entries and trace-JITted guards; or

2. a way to infer "super-shapes" that can be shared among all isomorphic trees of proto-linked objects with the same paths up from leave objects -- and with transitions of whole trees to new shapes when any path difference emerges; then

I don't see an alternative.

/be
(In reply to comment #86)
> We must distinguish the paths to a grand-or-greater proto. The property cache
> as designed (i.e., without this bug's patch) did so by indexing by direct
> object identity. This bug's patch uses direct object shape as key instead, but
> this must mean shape depends on prototype identity.

This bug is worth fixing (what's more) in order to sacrifice space (less sharing via the property tree) for speed for benchmarks including bzball.js.

> If not, then a prototype object could be mutated without changes to its
> delegating objects shapes or to a foreshadowed prototype property that is yet
> to be created (of

s/of/or/

> one that exists and is reshaped when shadowed, but then the
> cache fills happen).

/be
One idea, an oldie, is to make property trees start from Object.prototype and keep going, tacking on across prototype boundaries. Currently we have for this testcase (set JS_PROPTREE_DUMPFILE=/tmp/ptd, look in /tmp/ptd.1):

var o = {p: 42};
var q = Object.create(o, {r: {value: 43}});
print(q.p, q.r);
gc();

these lines among others in the dumpfile:

...
id "r" g/s 0x0/0x0 slot 2 attrs 6 (readonly permanent) flags 0 shortid 0
id "p" g/s 0x0/0x0 slot 2 attrs 1 (enumerate) flags 0 shortid 0
...

No indentation, and the "r" line happens to come before the "p" one, so these are peer tree nodes under different emptyShapes (I will add proper emptyShape dumping to the output).

id "p" g/s 0x0/0x0 slot 2 attrs 1 (enumerate) flags 0 shortid 0
 id "r" g/s 0x0/0x0 slot 2 attrs 6 (readonly permanent) flags 0 shortid 0

(This leaves out the emptyShape for q, perhaps that needs to be a node too.)

But what if we extended the tree from p's node to r's? Then for-in and similar code would get the proto-properties at time of delegating object creation or __proto__ setting for free. That could be good (have to do shadowing). We'd get super-shapes, the shapes the cover prototype chains this way -- except:

when someone then mutates o, what happens to q's shape? We lose the desired O(1) complexity for each step in the construction process. Prototypes are extended after instances have been created, we see this in real web JS (measuring a large sample would be good).

This could be worth pursuing, but it would want this bug fixed, and then single-threaded proptrees and native objects. So I'm still wanting to land this patch.

Comments welcome, we could take them to a new bug if the above looks promising.

/be
Mostly for my own education, I decided to try to prove the prototype-identity version correct. Simplifying drastically, we assume the Basic Layout Guarantee extended to include prototype identity:

  (Notation: I'll use o,p,q for objects; x,y,z for property names; s(o)
   means the shape of o; find(o, x) is the object that the property for
   the expression o.x; and up(o, N) is the object N steps up the proto
   chain from o.)

  BLG2. Basic Layout Guarantee V2: 
      If s(o) == t(p), 
      then o and p have the same property specs and o.proto == p.proto.

Now, we want to show that this assumption (plus assumptions we will introduce about how shapes are purged by the engine) implies a correspondingly changed version of the prototype chain shadowing guarantee:

  PCSG2. Proto Chain Shadowing Guarantee V2:
      If at time t0, find(o, x) == up(o, N),
      let s_o = s(o) at time t0 and         (shape of orig object)
          s_r = s(up(o, N)) at time t0,     (shape of orig target on chain)

      then, at any later time t1,
          if s(p) == s_o and s(up(p, N)) == s_r,
          then find(p, x) == up(p, N)

      (i.e., if base object shape and target object shapes match those at
             first lookup, then we can do the lookup in the same way)

For N = 0, this reduces to BLG2. For N = 1, BLG2 requires that p doesn't have a property x; up(p, 1) == up(o, 1) at time t0; and thus also that up(p, 1) has a property x, so find(p, x) = up(p, 1). For N > 1, we proceed as follows:

At time t0 we have this, where -> means a link to the prototype.

  object o    ->   object u  -> ... -> object r
  shape s_o        shape s_u           shape s_r

And we know that find(o, x) == r, which in turn also implies that find(q, x) == r for all q on this chain.

At time t1 we have this:

  object p    ->   object u  -> ... -> object r
  shape s_o        shape s_u'          shape s_r

We need to show that find(p, x) == r. We will go by contradiction. If find(p, x) != t, then either (a) find(t, x) != r (i.e., r has no property x), or (b) there exists some q != r on this chain such that find(q, x) == q (i.e., some other element on this chain has an x). (a) is clearly impossible because it violates BLG2, so we only need to worry about (b). And clearly we can't have (b) happen for the base object p, because that would also violate BLG2. So we only need to worry about (b) happening for the other objects on the chain. WLOG, consider u. We just need to show that find(u, x) != u to complete the proof.

Note that all the elements on the prototype chain must be equal at time t0 and t1, because at time t1 s(p) == s_o, so up(o, k) == up(p, k) by BLG2. But we have a problem: the shape of u may have changed between t0 and t1; in particular it may have gained a property x. We need to assume that the engine changes the shape of r if this happens. Specifically, if u gets a property that shadows x, then the shape of r will be changed. (Interestingly, it seems that the proto chain purging is exactly as strong as what is needed to complete the proof.) Hence, if the shape of r is unchanged, u must not have have had a property x added.

Well, that *seems* valid. :-) The more difficult issue is what secret back doors and corner cases are not covered by this simplified setup. At least assigning __proto__ is not a problem: we know that must change the shape of the base object, so we won't get a match.
(In reply to comment #89)

Too cool. :) There are a few small holes in the proof, easily patched:

> Note that all the elements on the prototype chain must be equal at time t0
> and t1, because at time t1 s(p) == s_o, so up(o, k) == up(p, k) by BLG2.

That's not quite true as written. For any given o, s(o) does not cover o's prototype's shape, only its identity. o's prototype's __proto__ may have been assigned.

> Well, that *seems* valid. :-) The more difficult issue is what secret back
> doors and corner cases are not covered by this simplified setup. At least
> assigning __proto__ is not a problem: we know that must change the shape of
> the base object, so we won't get a match.

The same special case occurs here -- something could assign to u.__proto__. We cope with that in js_SetProtoOrParent by regenerating shapes for all scopes along u's old prototype chain.

This is good stuff. It would be nice for the property cache docs to develop the guarantees as theorems, from axioms that describe the operational behavior of the engine...
Comparing notes with dmandelin yesterday, he pointed out that JSC compares their equivalent of shapes all the way up a proto chain, even though Structure depends on prototype identity.

This is because JSC does not purge the prototype chain by reshaping any objects holding shadowed properties. Doing so with Structure pointer identity instead of shape number equality would require forking Structure trees -- shades of the old O(n^2) middle-delete property tree code.

Again the deep proto chains of gmail (and other JS apps), combined with the fixed size of property cache entries, drove me to stick to the two-word test used for immediate prototype hits, for all cache hits. For deep hits this requires purging (past life experience with value capabilities for cache invalidation suggested the reshaping idea in the first place).

Shadowing is rare but it happens. We flag delegates to avoid purging when there is no possible observer looking through the object in which the shadowing property is being bound. But shadowing-induced purging has non-local effects, both in climbing the chain to check for shadowed ids and reshape their objects, and more significantly in terms of invalidation effects.

The trade-off is that we have faster and fixed-cost (space and time) deep-chain hit testing (read and write), but sometimes (write, rarer than read; shadowing write, even rarer) a grand-proto or greater may be reshaped, invalidating widely (higher up -> greater sharing, potentially greater use and caching).

We should study dynamic shadowing frequency in more detail.

Thanks for the proifoid, dmandelin!

/be
> This is good stuff. It would be nice for the property cache docs to develop the
> guarantees as theorems, from axioms that describe the operational behavior of
> the engine...

Interesting; dmandelin's approach reminds me of backward chaining proof strategies (start with the goal and work towards the axioms) whereas building from axioms would be more like forward chaining. I think Dave's got the right strategy-- optimizes for the high-order bits. Working backwards from the goal, you can stop at roughly any given level of abstraction where you suspect there are diminishing returns compared with the cost of pushing the proof all the way through. If you don't finish, it's not a Real Proof, of course.

Regardless, stating invariants and correctness arguments (even informal) = win.

Dave

PS If you are interested in formalizing more, though, do talk to me. Beware: \lim fun/utility --> \infty.
(In reply to comment #90)
> (In reply to comment #89)
> 
> Too cool. :) There are a few small holes in the proof, easily patched:
> 
> > Note that all the elements on the prototype chain must be equal at time t0
> > and t1, because at time t1 s(p) == s_o, so up(o, k) == up(p, k) by BLG2.
> 
> That's not quite true as written. For any given o, s(o) does not cover o's
> prototype's shape, only its identity. o's prototype's __proto__ may have been
> assigned.

Ooh, nice catch. I see that assigning u.__proto__ (e.g., for u as o's prototype) would necessarily change the shape of u, but nothing in the proof looks at the shape of u (since it could change for all sorts of uninteresting reasons anyway). So we could fix the proof either by checking u's shape as well (going back to the JSC solution of guarding on all shapes) or regenerating the shapes going up in that case as well.

> > Well, that *seems* valid. :-) The more difficult issue is what secret back
> > doors and corner cases are not covered by this simplified setup. At least
> > assigning __proto__ is not a problem: we know that must change the shape of
> > the base object, so we won't get a match.
> 
> The same special case occurs here -- something could assign to u.__proto__. We
> cope with that in js_SetProtoOrParent by regenerating shapes for all scopes
> along u's old prototype chain.
> 
> This is good stuff. It would be nice for the property cache docs to develop the
> guarantees as theorems, from axioms that describe the operational behavior of
> the engine...

Yes. The thing I wonder about is how much specialness with getters, readonly properties, and such needs to be added, and how hard that is to cover
(In reply to comment #93)
> (In reply to comment #90)
> > (In reply to comment #89)
> > 
> > Too cool. :) There are a few small holes in the proof, easily patched:
> > 
> > > Note that all the elements on the prototype chain must be equal at time t0
> > > and t1, because at time t1 s(p) == s_o, so up(o, k) == up(p, k) by BLG2.
> > 
> > That's not quite true as written. For any given o, s(o) does not cover o's
> > prototype's shape, only its identity. o's prototype's __proto__ may have been
> > assigned.
> 
> Ooh, nice catch. I see that assigning u.__proto__ (e.g., for u as o's
> prototype) would necessarily change the shape of u, but nothing in the proof
> looks at the shape of u (since it could change for all sorts of uninteresting
> reasons anyway). So we could fix the proof either by checking u's shape as well
> (going back to the JSC solution of guarding on all shapes) or regenerating the
> shapes going up in that case as well.

Setting u.__proto__ reshapes u and everything on its old chain:

        /*
         * Regenerate property cache shape ids for all of the scopes along the
         * old prototype chain to invalidate their property cache entries, in
         * case any entries were filled by looking up through obj.
         */
        JSObject *oldproto = obj;
        while (oldproto && OBJ_IS_NATIVE(oldproto)) {
            JS_LOCK_OBJ(cx, oldproto);
            JSScope *scope = OBJ_SCOPE(oldproto);
            scope->protoShapeChange(cx);
            JSObject *tmp = oldproto->getProto();
            JS_UNLOCK_OBJ(cx, oldproto);
            oldproto = tmp;
        }

/be
Comment on attachment 431223 [details] [diff] [review]
fixed emptyscope sharing, fixed to use empty shape not proto shape

Phew. Looks good!

r=me with the nits in comment 80 and a fix for shape-regenerating GC (mentioned on IRC)
Attachment #431223 - Flags: review?(jorendorff) → review+
Attached patch cope with emptyscope reshaping (obsolete) — Splinter Review
Brute force. I'll sleep on it, wanted to throw it up for re-review. Plain diff of patches coming if interdiff fails.

/be
Attachment #431223 - Attachment is obsolete: true
Attachment #431225 - Attachment is obsolete: true
Attachment #433264 - Flags: review?(jorendorff)
remember to unskip regress-503860.js
Attachment #433264 - Attachment is obsolete: true
Attachment #433527 - Flags: review?(jorendorff)
Attachment #433264 - Flags: review?(jorendorff)
Attachment #433527 - Flags: review?(jorendorff) → review+
http://hg.mozilla.org/tracemonkey/rev/ff6b54ac276d

/be
Whiteboard: fixed-in-tracemonkey
This is a followup patch to fix Linux orange tinderboxes.

/be
Attachment #434067 - Flags: review?(dvander)
Attachment #434067 - Flags: review?(dvander) → review+
Note second patch to merge downstream along with first:

http://hg.mozilla.org/tracemonkey/rev/d412747189f6

/be
FWIW on gcc-4.4.1, I don't get any strict aliasing warnings, and we pass -fno-strict-aliasing. It's not really clear why this was compiling wrong, but it clearly was. The code for |*pprev = next| was:

    mov %rdx, (%rax)

Where it should have been:

    mov %rdx, 16(%rax)
(In reply to comment #103)
> FWIW on gcc-4.4.1, I don't get any strict aliasing warnings, and we pass
> -fno-strict-aliasing. It's not really clear why this was compiling wrong, but
> it clearly was. The code for |*pprev = next| was:

s/pprev/prevp/

in the old FreeNode::remove inline method.

> 
>     mov %rdx, (%rax)
> 
> Where it should have been:
> 
>     mov %rdx, 16(%rax)

Thanks to dvander for all the help. I hope this is the only problem!

/be
Comparing TM 39531:d52333800c73 vs 39536:d412747189f6, MacOSX 10.5.8,
32-bit, gcc -g -O2, opt build:

TEST              COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:      1.97x as fast     15834.1ms +/- 0.5%   8056.7ms +/- 0.3%     significant

=============================================================================

  v8:             1.97x as fast     15834.1ms +/- 0.5%   8056.7ms +/- 0.3%     significant
    crypto:       1.062x as fast      889.0ms +/- 3.7%    837.2ms +/- 0.2%     significant
    deltablue:    8.30x as fast      9288.8ms +/- 0.9%   1119.6ms +/- 0.2%     significant
    earley-boyer: *1.059x as slow*   2917.1ms +/- 0.5%   3088.1ms +/- 0.8%     significant
    raytrace:     *1.168x as slow*   1674.4ms +/- 0.5%   1956.1ms +/- 0.5%     significant
    richards:     1.009x as fast     1064.8ms +/- 0.1%   1055.7ms +/- 0.1%     significant
'Standard set' SunSpider deltas are pretty much at noise level.


TEST                   COMPARISON            FROM                 TO          

=============================================================================

** TOTAL **:           *1.002x as slow*  1046.7ms +/- 0.1%   1048.7ms +/- 0.1%

=============================================================================

  3d:                  *1.010x as slow*   170.2ms +/- 0.1%    171.9ms +/- 0.1%
    cube:              *1.004x as slow*    53.5ms +/- 0.2%     53.7ms +/- 0.2%
    morph:             *1.040x as slow*    34.4ms +/- 0.3%     35.8ms +/- 0.4%
    raytrace:          -                   82.3ms +/- 0.1%     82.4ms +/- 0.1%

  access:              *1.007x as slow*   150.6ms +/- 0.1%    151.7ms +/- 0.1%
    binary-trees:      *1.009x as slow*    27.9ms +/- 0.4%     28.1ms +/- 0.3%
    fannkuch:          *1.010x as slow*    72.5ms +/- 0.1%     73.3ms +/- 0.1%
    nbody:             *1.005x as slow*    32.5ms +/- 0.3%     32.7ms +/- 0.3%
    nsieve:            -                   17.7ms +/- 0.6%     17.6ms +/- 0.6%

  bitops:              ??                  42.9ms +/- 0.4%     43.0ms +/- 0.4%
    3bit-bits-in-byte: -                    1.9ms +/- 2.6%      1.9ms +/- 3.7%
    bits-in-byte:      *1.014x as slow*    11.9ms +/- 0.7%     12.1ms +/- 0.4%
    bitwise-and:       -                    2.7ms +/- 3.3%      2.7ms +/- 3.5%
    nsieve-bits:       ??                  26.4ms +/- 0.4%     26.4ms +/- 0.6%

  controlflow:         1.017x as fast      10.4ms +/- 0.9%     10.2ms +/- 0.8%
    recursive:         1.017x as fast      10.4ms +/- 0.9%     10.2ms +/- 0.8%

  crypto:              -                   66.6ms +/- 0.6%     66.3ms +/- 0.5%
    aes:               -                   38.5ms +/- 0.9%     38.4ms +/- 0.9%
    md5:               1.010x as fast      18.7ms +/- 0.5%     18.5ms +/- 0.6%
    sha1:              -                    9.5ms +/- 1.0%      9.4ms +/- 1.0%

  date:                *1.005x as slow*   143.1ms +/- 0.1%    143.8ms +/- 0.1%
    format-tofte:      *1.011x as slow*    72.5ms +/- 0.1%     73.3ms +/- 0.1%
    format-xparb:      -                   70.7ms +/- 0.2%     70.6ms +/- 0.2%

  math:                -                   42.4ms +/- 0.3%     42.4ms +/- 0.3%
    cordic:            -                   13.4ms +/- 0.8%     13.4ms +/- 0.7%
    partial-sums:      ??                  19.9ms +/- 0.3%     19.9ms +/- 0.3%
    spectral-norm:     ??                   9.0ms +/- 0.4%      9.1ms +/- 0.6%

  regexp:              ??                  56.5ms +/- 0.2%     56.6ms +/- 0.2%
    dna:               ??                  56.5ms +/- 0.2%     56.6ms +/- 0.2%

  string:              1.003x as fast     363.9ms +/- 0.1%    362.8ms +/- 0.1%
    base64:            1.014x as fast      16.5ms +/- 0.6%     16.3ms +/- 0.5%
    fasta:             *1.009x as slow*    73.4ms +/- 0.2%     74.1ms +/- 0.1%
    tagcloud:          *1.003x as slow*   120.7ms +/- 0.1%    121.1ms +/- 0.1%
    unpack-code:       1.016x as fast     116.4ms +/- 0.1%    114.5ms +/- 0.2%
    validate-input:    -                   36.9ms +/- 0.3%     36.9ms +/- 0.2%
(In reply to comment #105)

>     deltablue:    8.30x as fast      9288.8ms +/- 0.9%   1119.6ms +/- 0.2%    
>     earley-boyer: *1.059x as slow*   2917.1ms +/- 0.5%   3088.1ms +/- 0.8%    
>     raytrace:     *1.168x as slow*   1674.4ms +/- 0.5%   1956.1ms +/- 0.5%    

Great stuff with deltablue.


Looking a bit to see what's with raytrace.  First thing is, I can't
get the same user times as sunspider when running the tests in
isolation (sigh).  What I get is

  before: 1.876s     insns 4,438,849,622    IPC 1.10
  after:  2.240s     insns 4,543,203,834    IPC 0.95

(Core 2 Duo at 2.13 GHz)
so there's still a 19% slowdown, but most of it is IPC lossage, not
increased instruction counts.


Another thing I noticed is it turns over a lot more heap, 98M vs 55M:

vTRUNK --dsymutil=yes --smc-check=all ./TM_39531/js/src/BuildR/js -j ./SunSpider/tests/v8-raytrace.js

==13269== HEAP SUMMARY:
==13269==     in use at exit: 100,937 bytes in 1,388 blocks
==13269==   total heap usage: 1,152,046 allocs, 1,150,658 frees, 55,293,700 bytes allocated

vTRUNK --dsymutil=yes --smc-check=all ./TM_39536/js/src/BuildR/js -j ./SunSpider/tests/v8-raytrace.js

==13270== HEAP SUMMARY:
==13270==     in use at exit: 100,937 bytes in 1,388 blocks
==13270==   total heap usage: 1,154,927 allocs, 1,153,539 frees, 98,852,572 bytes allocated

This would tally with lower IPC if it is indicative of worse cache
behaviour.


Peering at heap snapshots w/ Massif, it's not easy to divine where the
extra space went, but the most obvious change is that 39536 allocates
an extra 28MB along the following path, which isn't visible in the
39531 profile:

(28,401,849B) 0x12C49: JS_ArenaAllocate (jsutil.h:185)
(28,393,638B) 0xAE08C: js::PropertyTree::getChild(JSContext*, JSScopeProperty*,unsigned int, JSScopeProperty const&) (jspropertytree.cpp:164)
(28,393,638B) 0xCCC91: JSScope::getChildProperty(JSContext*, JSScopeProperty*, JSScopeProperty&) (jsscope.cpp:499)
(28,393,638B) 0xCD106: JSScope::addPropertyHelper(JSContext*, long, int (*)(JSContext*, JSObject*, long, long*), int (*)(JSContext*, JSObject*, long, long*), unsigned int, unsigned int, unsigned int, int, JSScopeProperty**) (jsscope.cpp:790)
(28,393,638B) 0xCDC60: JSScope::putProperty(JSContext*, long, int (*)(JSContext*, JSObject*, long, long*), int (*)(JSContext*, JSObject*, long, long*), unsigned int, unsigned int, unsigned int, int) (jsscope.cpp:849)
(28,385,427B) 0x70BB9: js_DefineNativeProperty (jsobj.cpp:4490)
(28,369,005B) 0x70DC9: js_DefineProperty (jsobj.cpp:4349)
(28,352,583B) 0x4838E: args_resolve(JSContext*, JSObject*, long, unsigned int, JSObject**) (jsfun.cpp:635)
(28,352,583B) 0x6F6FA: js_LookupPropertyWithFlags (jsobj.cpp:4599)
(28,352,583B) 0x72231: js_GetPropertyHelper (jsobj.cpp:5008)
(28,352,583B) 0x72C4C: js_GetProperty (jsobj.cpp:5094)
More peering at raytrace slowdown:


The fragprofiler asserts due to reading freed memory.  The trivial
patch at bug 531350 fixes it.  With that in place, there is not much
difference to be seen before and after.


Cachegrinding is more interesting.  The insn counts are up a bit, the
I1 and D1 miss rates are up a bit, but the L2 miss rate has doubled,
almost entirely as a result of missing in D1.  The new miss rate
(5/77) is pretty high.  Symptomatic of larger data set as speculated
above.  I can dredge out the precise functions and lines of code
responsible if anyone wants to see; just shout.


$ vTRUNK --smc-check=all --tool=cachegrind ./TM_39531/js/src/BuildR/js -j ./SunSpider/tests/v8-raytrace.js
==13514== Cachegrind, a cache and branch-prediction profiler
==13514== Copyright (C) 2002-2009, and GNU GPL'd, by Nicholas Nethercote et al.
==13514== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
==13514== Command: ./TM_39531/js/src/BuildR/js -j ./SunSpider/tests/v8-raytrace.js
==13514==
==13514==
==13514== I   refs:      4,438,849,660
==13514== I1  misses:       56,746,866
==13514== L2i misses:           34,285
==13514== I1  miss rate:          1.27%
==13514== L2i miss rate:          0.00%
==13514==
==13514== D   refs:      2,428,050,120  (1,550,785,535 rd   + 877,264,585 wr)
==13514== D1  misses:       13,452,172  (   11,061,018 rd   +   2,391,154 wr)
==13514== L2d misses:        2,628,312  (    1,285,103 rd   +   1,343,209 wr)
==13514== D1  miss rate:           0.5% (          0.7%     +         0.2%  )
==13514== L2d miss rate:           0.1% (          0.0%     +         0.1%  )
==13514==
==13514== L2 refs:          70,199,038  (   67,807,884 rd   +   2,391,154 wr)
==13514== L2 misses:         2,662,597  (    1,319,388 rd   +   1,343,209 wr)
==13514== L2 miss rate:            0.0% (          0.0%     +         0.1%  )

$ vTRUNK --smc-check=all --tool=cachegrind ./TM_39536/js/src/BuildR/js -j ./SunSpider/tests/v8-raytrace.js
==13517== Cachegrind, a cache and branch-prediction profiler
==13517== Copyright (C) 2002-2009, and GNU GPL'd, by Nicholas Nethercote et al.
==13517== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
==13517== Command: ./TM_39536/js/src/BuildR/js -j ./SunSpider/tests/v8-raytrace.js
==13517==
==13517==
==13517== I   refs:      4,543,203,872
==13517== I1  misses:       60,407,383
==13517== L2i misses:           67,038
==13517== I1  miss rate:          1.32%
==13517== L2i miss rate:          0.00%
==13517==
==13517== D   refs:      2,499,076,484  (1,595,182,619 rd   + 903,893,865 wr)
==13517== D1  misses:       16,578,402  (   13,063,505 rd   +   3,514,897 wr)
==13517== L2d misses:        5,205,738  (    2,853,900 rd   +   2,351,838 wr)
==13517== D1  miss rate:           0.6% (          0.8%     +         0.3%  )
==13517== L2d miss rate:           0.2% (          0.1%     +         0.2%  )
==13517==
==13517== L2 refs:          76,985,785  (   73,470,888 rd   +   3,514,897 wr)
==13517== L2 misses:         5,272,776  (    2,920,938 rd   +   2,351,838 wr)
==13517== L2 miss rate:            0.0% (          0.0%     +         0.2%  )
I'll look at shape guard stats in a bit. Here just some v8-raytrace.js trace stats, I didn't see them above.

Fresh tm tip this morning:

$ TMFLAGS=stats ./Darwin_DBG.OBJ/js -j /tmp/v8-raytrace.js 
recorder: started(30), aborted(23), completed(67), different header(0), trees trashed(16), slot promoted(0), unstable loop variable(47), breaks(0), returns(3), merged loop exits(38), unstableInnerCalls(4), blacklisted(15)
monitor: exits(19519), timeouts(0), type mismatch(0), triggered(19519), global mismatch(0), flushed(0)

Fresh m-c tip (no patch for bug 497789 or many others, but close enough to tm tip without the patch for this bug):

$ TMFLAGS=stats ./Darwin_DBG.OBJ/js -j /tmp/v8-raytrace.js 
recorder: started(31), aborted(22), completed(16), different header(0), trees trashed(9), slot promoted(0), unstable loop variable(8), breaks(0), returns(0), merged loop exits(0), unstableInnerCalls(8), blacklisted(10)
monitor: exits(43067), timeouts(0), type mismatch(0), triggered(43067), global mismatch(0), flushed(0)

So we complete 67 traces this bug fixed, vs. 16 without (assuming m-c tip close enough to tm tip without this patch). We trigger 19519 traces, vs. 43067. More I1 cache misses by a bit as reported in comment 108 is not surprising.

But the big jump in L2 misses due to D1 misses is cause for a new bug. Julian, could you please file one? Thanks, and thanks for the analysis here.

/be
(In reply to comment #102)
> Note second patch to merge downstream along with first:
> 
> http://hg.mozilla.org/tracemonkey/rev/d412747189f6


I am confirming that this indeed fixed the issue. I got the minimal test case with GCC:

gczeal(2);
var w = "r".match(/r/);
new String();
var w = "r".match(/r/);
new Function("");

This was after reduction the test case js1_8/regress/regress-464418.js which was falling for me with debug build with -O3....
(In reply to comment #109)
 
> But the big jump in L2 misses due to D1 misses is cause for a new bug. Julian,
> could you please file one?

Done: bug 554626.  I couldn't think of a good name for it; you might
want to rename it.

In short, it appears the hop-up in L2 misses is caused by a hash table
associated with the property tree.  Bug 554626 contains a breakdown of
L2 read/write misses by function, before & after, which points in that
direction.
Depends on: 556124
Depends on: 556525
Depends on: 554955
http://hg.mozilla.org/mozilla-central/rev/ff6b54ac276d

/be
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Depends on: 565603
No longer depends on: 554955
Depends on: 554955
You need to log in before you can comment on or make changes to this bug.