Closed Bug 536564 Opened 12 years ago Closed 12 years ago

Performance fell off a cliff on JSA* benchmark due to changes in the benchmark

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1

People

(Reporter: bzbarsky, Assigned: brendan)

References

(Blocks 1 open bug)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 10 obsolete files)

Compare http://jsastar.tapirpirates.net/original_tracemonkeybench/JSAStarBenchmark.html to http://jsastar.tapirpirates.net/dfdemo/JSAStarBenchmark.html

In all other browsers the two perform about the same; in SpiderMonkey the latter performs about 4x slower than the former.  The difference is largely due to the fact that the latter has a lot more aborts, a _lot_ more compiled traces, 2 orders of magnitude more trace enters/exits (400,000 or so instead of 2,000), and spends a good bit of time in interp.

I looked at the aborts on the latter, and other than the lambda aborts (which I eliminated using the patch in bug 495331), they look pretty similar for the two testcases: several SetPropHit, several "no compatible inner tree", a few "inner tree is trying to grow".  That's if I filter out the jquery stuff, but we don't seem to be aborting actual testcase loops on jquery stuff.

So I have no idea why we're ending up in interp so much here, nor jumping on/off trace so much...
I guess one thing that could be going on is us tracing things "fine" and then failing guards all over the place... I guess I should look at whereabouts all those trace exits are.
And there are in particular 400,000-some trace exits like so:

leaving trace at http://jsastar.tapirpirates.net/dfdemo/testFunctions.js:66@1, op=getargprop, lr=0x2084631c, exitType=BRANCH, sp=0, calldepth=1, cycles=0

Line 66 is:

  return n1.rank < n2.rank;

and the function it's in is:

        compareRank : function (n1, n2) {
            return n1.rank < n2.rank;
        },
    About to try emitting guard code for SideExit=0x2084631c exitType=BRANCH
    shape = ld map[4]
    39930 = int 39930
    guard_kshape = eq shape, 39930
    xf2: xf guard_kshape -> pc=0x1f182d8d imacpc=0x0 sp+48 rp+4 (GuardID=003)

And there are about 30-some other branch exits on the same line.  So it looks like we just hit MAX_BRANCHES here and then die our usual death.

As for why there are so many shapes...  n1 and n2 are Points; Points have a function-valued property that is a lambda.  The relevant code seems to be:

var Point = new (function() {
    var width = 0,
        height = 0;

    this.setSize = function (w, h) {
        width = w;
        height = h;
    }

    this.Point = function (x, y) {
        this.x = x;
        this.y = y;
        this.predecessor = null;
        this.cost = 0;
        this.rank = 0;

        if (arguments.length == 3) {
            this.predecessor = arguments[2];

        } else if (arguments.length == 4) {
            this.predecessor = arguments[2];
            this.cost        = arguments[3];

        } else if (arguments.length == 5) {
            this.predecessor = arguments[2];
            this.cost        = arguments[3];
            this.rank        = arguments[4];
        }

        this.samePosition = function(p) {
            return p.x == this.x && p.y == this.y;
        }

        this.toString = function () {
            return '(' + this.x + '|' + this.y + ')[' + this.cost + ']{' + this.rank + '} [id: ' + this.getPosId() + ']';
        }

        this.getPosId = function() {
            return this.y*width + this.x;
        }
    }

with getPosId being a lambda.  Points are then created using |new Point.Point()|.

Brendan, would this give all the points different shapes or something?
> Points have a function-valued property that is a lambda. 

I think this would give the points different shapes: assigning to a function-valued property changes the shape. This is so that when there is a call like |o.f()|, the prop cache can store the pointer to the target function directly and speed up the call. Two things that could help:

- a PIC

- relax function branding (i.e., changing the shape when we assign a function) so that it doesn't change the shape in cases like this (and then also make the property cache not specialize such things as much).
Yeah...  I bet moving the functions in question to Point.Point.prototype would help out in this case.  I guess there's no way we can detect that the same exact functions (or at least clones) are being assigned in this case and end up evolving shape the same way for all the points?
(In reply to comment #5)
> Yeah...  I bet moving the functions in question to Point.Point.prototype would
> help out in this case.  I guess there's no way we can detect that the same
> exact functions (or at least clones) are being assigned in this case and end up
> evolving shape the same way for all the points?

We can and do -- remember bug 471214, which is fixed now? But that works only if the lambda is a null closure. In the getPosId lambda's case, it can't be, because its upvars set is not the null set -- it uses width.

That width usage does not by itself rule out a flat closure, though, but alas width is a mutated upvar so it can't be copy-propagated into the getPosId lambda across the this.Point lambda (which propagation across an intervening function would require fixing bug 493754).

The mutated width upvar means the outermost lambda used to create the Point object is heavyweight. Whatever we do to fix this bug, the style of JS used here is overwrought -- it creates module objects to hold constructors whose state is hidden in mutated upvars, and so on. Not cheap, as I've been saying in keynotes for several years now.

Still, if we view this bug as a followup to bug 471214, it may be fixable. The getPosId lambda, because it's not a null closure, and because the upvar it depends on lives in Call object, must be cloned every time. This makes for a new shape on every (new Point.Point). Some comments on the solutions dmandelin lists in comment 4:

- We have a PIC in the form of the property cache for the interpreter, but of course we still need an on-trace PIC -- see bug 518487.

- We could indeed despecialize given the megamorphism of the callsite, and that would let us trace-JIT better, as well as not over-generate shapes and thereby thrash the property cache. This is bug 531132.

- Another idea: use our analysis that can detect mutated upvars to optimize harder. We know enough to throw in the towel and deoptimize. Deoptimizing means the mutated upvars go in a Call object for each activation of the outermost lambda, the "module pattern" one. All those Call objects have exactly the same shape. So do we really need to clone the getPosId lambda and its peer "method lambda" or their containing this.Point lambda?

The last thought suggests computing Call object shape, currently not generated eagerly to avoid pessimal overhead on function activation, at compile time, and using it to optimize away closure-cloning. I'd like to work on that in this bug over the holiday break. Comments welcome,

/be
Assignee: general → brendan
OS: Mac OS X → All
Priority: -- → P1
Hardware: x86 → All
Target Milestone: --- → mozilla1.9.3a1
> - We have a PIC in the form of the property cache for the interpreter, but of
> course we still need an on-trace PIC -- see bug 518487.

I'm not sure how a PIC would necessarily help here.  This testcase is _very_ megamorphic (there are 10,000 points, right?).  It would blow out any reasonable-sized PIC, as far as I can see...

> This is bug 531132.

Yeah, that seems more plausible.

> All those Call objects have exactly the same shape. So do we really need to
> clone the getPosId lambda and its peer "method lambda" or their containing
> this.Point lambda?

That's basically my question in comment 5, yes.  ;)  The plan sounds reasonable, given my at-best-vague understanding of our current shape evolution and Call object setups.
(In reply to comment #7)
> > - We have a PIC in the form of the property cache for the interpreter, but of
> > course we still need an on-trace PIC -- see bug 518487.
> 
> I'm not sure how a PIC would necessarily help here.  This testcase is _very_
> megamorphic (there are 10,000 points, right?).  It would blow out any
> reasonable-sized PIC, as far as I can see...

See bug 531132 comment 6 on PIC as instrumentation by which to decide you need a MIC.

> > All those Call objects have exactly the same shape. So do we really need to
> > clone the getPosId lambda and its peer "method lambda" or their containing
> > this.Point lambda?
> 
> That's basically my question in comment 5, yes.  ;)  The plan sounds
> reasonable, given my at-best-vague understanding of our current shape evolution
> and Call object setups.

When we deoptimize, we reify the scope chain and clone function objects as needed to carry the intrinsic scope chain link [*], the JSSLOT_PARENT slot. A cloned function object is an object-sized (8 words) allocation. We have no other way to go safer-and-slower at present.

/be

[*] An extrinsic scope chain link would still require allocating a linked list element, say two words, that points to the scope object and also links to the outer scope linked list element. This alternative scope chain representation could be optimized to be cheap for shallow scope chains, but it still doesn't win as much as not allocating anything and sharing a joined function object.
(In reply to comment #8)
> [*] An extrinsic scope chain link would still require allocating a linked list
> element, say two words, that points to the scope object and also links to the
> outer scope linked list element. This alternative scope chain representation
> could be optimized to be cheap for shallow scope chains, but it still doesn't
> win as much as not allocating anything and sharing a joined function object.

But this is all beside the point: a closure whose environment (one or more chained Call objects) has to be created each time the function expression is evaluated requires more than just the joined function object for that function expression: it requires some kind of scope chain link or cookie.

This means we can't avoid creating something at closure evaluation time. Normally it would not be a scope chain element, since we are not calling the closure (the case where the closure is immediately applied does not happen in this testcase for any of the "method lambdas", although it does for the "module pattern" lambda -- again an aside).

The ES3 spec represents the needed scope chain cookie via the [[Scope]] internal property of function objects. The ES3 language about joined function objects, which is gone in ES5 because it allowed implementations to differ observably via either mutation or equality testing, clearly can't apply to getPosId.

*Something*, some pointer-equivalent cookie, must be stored in the object that results from each evaluation of the lambda initializing this.getPosId, so that a later call to it from an unrelated scope (it escapes to the heap via this) can ensure that its particular outer scope (a Call object, an activation object in ES3 terms, or something different but serving the same purpose: holding the width and height upvars) is established as the new scope chain for the call.

We could optimize scope representation and linkage but it would not improve the big-O complexity of the testcase, or the consequences of branding (shaping) an object with methods based on called method identity. We need a new function object on each method-lambda evaluation, and without a way to despecialize from branded method identity caching/tracing, the rest follows.

So I'm concluding that the only fix here is to notice the megamorphic pattern and despecialize. This should affect interpreter and JIT together, so arguably it is not the same as the TM:MIC bug.

/be
Sorry if I'm unclear -- trying to clarify what I was getting at in comment 6's "Another idea" item:

I had hoped to make the shapes of the branded objects' unjoined methods be the same, and so make the branded objects' shapes all the same, somehow. This would entail compile-time analysis, which would avoid branding different shapes for different objects as their methods are called. The analysis is easier with object initialisers, harder with a sequence of assignments to this.method1, this.method2 etc.

The analysis would despecialize, since it would know that the method-lambdas were hopelessly unjoinable.

If the TM:MIC bug is fixed based on instrumentation built on-trace in the TM:PIC bug, the interpreter will still lack a megamorphic despecialization path. But if the above analysis is feasible, perhaps that does not matter.

More when I have more, again comments welcome.

/be
> We need a new function object on each method-lambda evaluation

So why would this work without the upvar?  Because in that case we use the same function object for all the points until someone tries to tell the difference?
(In reply to comment #11)
> > We need a new function object on each method-lambda evaluation
> 
> So why would this work without the upvar?  Because in that case we use the same
> function object for all the points until someone tries to tell the difference?

Yes -- bug 471214. No (null) parent, no upvars => no need to clone until a ref escapes (callee refs in call expressions do not escape -- note significance of TCF_FUN_USES_ARGUMENTS | TCF_FUN_USES_OWN_NAME in jsparse.cpp).

/be
OK, I just had jorendorff talk me through what the first paragraph of comment 9.

In this particular case, all instances of getPosId could in fact share the same Call object: since Point.Point has no locals for getPosId to close over, they could actually use the Call object for the outer Point invocation.  jorendorff also pointed out that if the outer Point is called multiple times, we'd need separate Call objects for each invocation.

So if we had a setup where we effectively reused function clones or something, in cases when they're not escaping and the two clones would have the same Call object, then we could in fact keep this case monomorphic.  Note that having the same Call object is a stronger condition than having same-shape Call objects; the latter is of course insufficient to actually prevent function cloning here.

It's not clear to me whether such a clone-sharing setup would actually be worth it; if we can effectively despecialize and get decent performance, that seems like a better first step (likely to help more code patterns) to me.
(In reply to comment #13)
> OK, I just had jorendorff talk me through what the first paragraph of comment
> 9.

Argh, do I really need a protocol adapter? :-P

> In this particular case, all instances of getPosId could in fact share the same
> Call object: since Point.Point has no locals for getPosId to close over, they
> could actually use the Call object for the outer Point invocation.

Good point (no pun intended!).

> jorendorff
> also pointed out that if the outer Point is called multiple times, we'd need
> separate Call objects for each invocation.

It looks like it's called once in the A* impl. Is it? (Lazy here, also on vacation still.)

Hacking a patch in my relatively spare time, will attach when ready.

/be
BTW, this pattern is the "private class static" pattern I first recall Richard Cornford espousing. You need a closure to have private variables in JS. To have class-private instance vars, the constructor's locals suffice -- the methods close over the constructor. For class-private class-statics you need another layer of lambda, around the constructor (this.Point here).

/be
> It looks like it's called once in the A* impl. Is it?

That's correct in the A* case; jorendorff also pointed out that this is a hard thing to determine (e.g. someone could JS_Evaluate the script multiple times, or the module pattern could be in a loop, or in a function called multiple times, or whatever.
(In reply to comment #16)
> > It looks like it's called once in the A* impl. Is it?
> 
> That's correct in the A* case; jorendorff also pointed out that this is a hard
> thing to determine (e.g. someone could JS_Evaluate the script multiple times,
> or the module pattern could be in a loop, or in a function called multiple
> times, or whatever.

See comment 15. This is a class-static-private-var pattern, the outermost lambda's Call object is almost certain to be a singleton. In a hypothetical ES4/AS3-ish extension of JS, or Java if you squint and replace var and method with types:

class C { static var sv; /*instance*/ var iv; method m1(){...}}

would translate to

var C = new function() { var sv; this.C = function(){var iv}; this.m1 = function(){...}}};

The multiple JS_Evaluate* case is hard to analyze statically in order to reuse the Call, I'd say impossible in JS today. The in-a-loop case we already analyze. But again this is a singleton pattern, so apart from detecting in-a-loop and not trying to optimize, we don't need to study too hard on it in the parser.

The bigger deal is avoiding those empty Call objects. That'll help lots of code. Great point, jorendorff!

/be
Oops, I made m1 a static method in comment 17's translation example. Here's the fix:

var C = new function() {
    var sv;
    this.C = function(){
        var iv;
        this.m1 = function(){...}
    }
};

Same as in the Point case at hand.

The code is still suboptimal in that it could set the methods once on the prototype if they do not use iv or other such private instance vars. The Point testcase's methods use only this.x, etc., and formal parameters, apart from getPosId which also uses the "class static" private var width.

/be
Attached patch patch (obsolete) — Splinter Review
Putting this up after passing js/src/{tests,trace-test} -- will test this bug's URL next, others trying same and other tests welcome.

/be
Attachment #420689 - Flags: review?(jorendorff)
On the url testcase, that patch drops us from 1300ms to 625ms or so.  Safari 4 is at 380ms, chrome at 390, Opera 10.5a at 410.

So we've got a factor of 2 back out of the loss of 4x.  ;)

I'll take a look at the details tomorrow.
Ah, and merging the lambda patch again (which I forgot to do before comment 20) gets me to 540ms or so.
OK, with the lambda patch in, I see about 1700 trace enters/exits (it's 11k without, presumably due to more stuff being aborted/blacklisted).

There are lots of exit spots with only one or two exits (for the BRANCH exits), some deep bail exits (canvas ops that need the jsstack like fillRect near the end of the whole thing; about 130 exits here).  There are 423 loop exits, which isn't great, but could be worse.  It might be nice if we logged what code locations we blacklist better, and if we somehow blacklisted those perma-deep-bail trees...

There are now 13 aborts in the "main" part of the testcase code (ignoring jquery, which I hope is not being called in the performance-sensitive parts).  4 inner trees trying to grow, 3 no compatible inner trees, and 6 SetPropHit when setting fillStyle on a canvas (I would assume this last again comes at a point when we're not in the algorithm guts).

I'm not quite sure how best to measure from here.  :(
It's also possible that we just generate crappy code, of course.  I'll try to profile tomorrow.
OK.  Profile says:

3% in js_Interpret
39% under ExecuteTree (including entering/leaving trace, etc).  About a quarter
    of this is under Array_p_push1.
28% under fillRect on canvas; almost all of this is cairo code.
14% under TraceRecorder::monitorRecording.  About half of this is under
    closeLoop.  About half of the rest looks like propcache tests.  I'll file
    some bugs on the other things I see in here.
11% is under the js_NativeSet that sets the fillStyle.  In true pre-quickstub
    fashion, only about 1/3 of this is under the actual fillStyle setter.  The
    rest is JS_ComputeThis, JSData2Native, security check, XPCVariant stuff,
    etc.  Can we maybe custom-quickstub this thing?
2% allocating an array object under js_Interpret
2% array_join under js_Interpret

and some minor stuff.

It looks like the testcase does in fact include the initial drawMatrix() call in its timings.  If I move the timer start past that, then as expected our time drops to 278ms.  Safari's drops to 215ms.  So for them drawMatrix takes about 165ms, while for us it takes closer to 250.  drawMatrix is about 60% fill on the canvas, about 30% the fillStyle sets (complete with all their baggage), and the rest looks like jsinterp.  Due to the fillStyle thing it doesn't trace at all.

vlad, could we custom-quickstub the fillStyle setter?
Depends on: 538592, 538593
Ah, bug 534735 covers quickstubbing fillStyle.  With that, our timedrops from 540ms to about 500ms.  The fraction of drawMatrix spent on fillStyle drops from 30% to 11%, as expected.  On the whole testcase, the fraction of time under fillStyle-related stuff drops from 11% to 5% or so, and now 82% of our time is under ExecuteTree (in particular, all the fillRect/fillStyle stuff is on trace).

Now the key thing is that 20% of the time is IN ExecuteTree itself.
Depends on: 534735
Attached patch patch, v2 (obsolete) — Splinter Review
bz, can you try this? Gets rid of double-test_property_cache, still needs work to share code and fix latent teleporting-to-obj2 bug in test_property_cache.

/be
Attachment #420689 - Attachment is obsolete: true
Attachment #420689 - Flags: review?(jorendorff)
And that's because we now have 22000 trace entries/exits.  And that's because the canvas quickstubs don't have a TN yet, and use JS_THIS_OBJECT, so deep-bail.  It's still faster because we dropped all that xpconnect gunk, but....

Of those 22k trace exits, 21k are deep-bails.  Mostly on fillRect.
OK.  With a bit of a hack to make the canvas stuff not deep-bail (bug coming up and will block this one), we get down to 460ms.  And the 20% in ExecuteTree was at least partially due to my profiler settings: that's 20% in ExecuteTree or running any of the actual symbol-less jit-generated code, which is a lot saner.

I'm out of obvious low-hanging fruit in the profile, for the moment... :(

Testing v2 with all the above bits now.
v2 performs about the same.
Oh, and in case someone cares, the new breakdown:

16% recording
20% running jit-generated code with no symbols
29% under fillRect in cairo.
12% under array_push
5% setting fillStyle

The remaining 15% is various stuff called from trace: js_AddProperty, js_NewArrayWithSlots, unboxing doubes, getpropertybyindex, cloning function objects, joining arrays, unbranding, setpropertybyindex, etc. All at about 0.5% to 3% of total.
And I bet the reason we record so much are the 5 inner tree trying to grow aborts...
Depends on: 538663
Attached patch patch, v3 (obsolete) — Splinter Review
Simplify ShouldUnbrand to return true if there are any slow methods. bz, can you test this and compare to previous patch? I hope it's no-change or pure-win.

Still need to common harder in jstracer.cpp.

/be
Attachment #420776 - Attachment is obsolete: true
Blocks: 467263
It seems to take the time from 450ms to 460ms or so....  Might not be worth worrying about.
I'm convinced that walking up the prototype chain in TraceRecorder::prop is unnecessary for the case where we can teleport.

In the PCVAL_IS_NULL case and the tag==1 case we still need to walk.

(This is mainly a note to myself to remember about the tag==1 case on review.)
Attached patch patch, v4 (obsolete) — Splinter Review
Passes trace-test and tests, commons without specializing via propTail method (name could be longer and uglier, propTailSpropOrSlotCases, what's the point?). Could try templates to specialize away some constant falses for the call from TR::record_JSOP_CALLPROP, but waiting for bz profiling data to show the need.

Like v3, this aggressively unbrands an object if any of its ad-hoc methods is a closure that can't be joined (is not a null closure).

/be
Attachment #421082 - Attachment is obsolete: true
Attachment #421157 - Flags: review?(jorendorff)
Attached patch patch, v4a (obsolete) — Splinter Review
Just rebased to tm tip, and a comment improvement.

/be
Attachment #421157 - Attachment is obsolete: true
Attachment #421187 - Flags: review?(jorendorff)
Attachment #421157 - Flags: review?(jorendorff)
Attached patch patch, v5 (obsolete) — Splinter Review
Jason pointed out inconsistent branded-generic state, this tests for that and fixes the bug.

/be
Attachment #421187 - Attachment is obsolete: true
Attachment #421295 - Flags: review?(jorendorff)
Attachment #421187 - Flags: review?(jorendorff)
This trace-test asserts with the patch, due to some refactoring in jstracer.cpp. (Save as js/src/trace-test/tests/basic/testMissingMethod.js or some such).

var o = {y: function () {}};
var a = [o, o, o, o, o, o, o, o, o];
a[RECORDLOOP - 1] = {};
try {
    for (var i = 0; i < 9; i++)
        a[i].y();
} catch (exc) {
    assertEq(exc.name, "TypeError"); // should happen when i == RECORDLOOP - 1
}
Attached patch patch, v6 (obsolete) — Splinter Review
Thanks to Jason for the great irc-time review.

/be
Attachment #421295 - Attachment is obsolete: true
Attachment #421320 - Flags: review?(jorendorff)
Attachment #421295 - Flags: review?(jorendorff)
Status: NEW → ASSIGNED
The refactoring still has a hole.

var o = {y: function() {}};
var a = [o, o, o, o, o, o, o, o, o];
Number.prototype.y = 0;
a[RECORDLOOP - 1] = 0;
try {
    for (var i = 0; i < 9; i++)
        a[i].y();
} catch (exc) {
    assertEq(exc.name, "TypeError"); // should happen when i == RECORDLOOP - 1
}
Attached patch patch, v7 (obsolete) — Splinter Review
Thanks again!

/be
Attachment #421320 - Attachment is obsolete: true
Attachment #421334 - Flags: review?(jorendorff)
Attachment #421320 - Flags: review?(jorendorff)
Won't this patch will affect perf for code like this?

    var myModule = {    /* global, created once, with unique shape */
        blah: function () ...
        ...
    };

I haven't read the analysis code too closely yet, but it seems like this will
be unbranded if it contains any slow methods at all. That seems
unfortunate. It's a common idiom in library code, right?

In jsemit.cpp, js_EmitFunctionScript:
>+        if (js_Emit1(cx, cg, JSOP_THIS) < 0)
>+            return false;
>+        if (js_Emit1(cx, cg, JSOP_UNBRAND) < 0)
>+            return false;
>+        if (js_NewSrcNote(cx, cg, SRC_HIDDEN) < 0)
>+            return JS_FALSE;

Nit: s/JS_FALSE/false/

In jsemit.h:
> /* Function has parameter named 'eval'. */
>-#define TCF_FUN_PARAM_EVAL 0x80000
>+#define TCF_FUN_PARAM_EVAL      0x80000
>+
>+/* Let's keep aligning the least significant hex digit from here on... */
>+#define TCF_FUN_UNBRAND_THIS   0x100000

Heh! I doubt that comment would work as intended. A few words on
TCF_FUN_UNBRAND_THIS would be nice though. :)

In jstracer.cpp, TraceRecorder::hasMethod:
>             if (VALUE_IS_FUNCTION(cx, v)) {
>                 found = true;
>-                if (!scope->branded()) {
>+                if (!scope->generic() && !scope->branded()) {

I think that line is supposed to read:
                  if (scope->generic() || !scope->branded()) {

This seems like too much obfuscation to save an AND instruction when
unbranding. :-\

In TraceRecorder::getProp:
>-    const JSCodeSpec& cs = js_CodeSpec[*cx->fp->regs->pc];
>+    JSOp op = JSOp(*cx->fp->regs->pc);
>+    const JSCodeSpec& cs = js_CodeSpec[op];

This surprised me a little bit. Just debugger-friendliness? Fine with me, in
any case.
(In reply to comment #42)
> Won't this patch will affect perf for code like this?
> 
>     var myModule = {    /* global, created once, with unique shape */
>         blah: function () ...
>         ...
>     };

Not unless those functions are not null closures. If they aren't, though, then there must be a closure wrapping the var myModule = ..., in which case it is not clear we have a singleton. More likely, it's a non-new-able constructor pattern with "class statics".

> I haven't read the analysis code too closely yet, but it seems like this will
> be unbranded if it contains any slow methods at all. That seems
> unfortunate. It's a common idiom in library code, right?

See http://prototypejs.org/ and this grep output:

$ grep ' = {' prototype.js 
var Prototype = {
var Class = {
Class.Methods = {
var Abstract = { };
var Try = {
var $break = { };
var Enumerable = {
var Ajax = {
Ajax.Responders = {
    this.options = {
    var headers = {
    this.container = {
          var insertion = { }; insertion[options.insertion] = responseText;
    this.updater = { };
if (!window.Node) var Node = { };
Element.cache = { };
Element.Methods = {
          insertions = {bottom:insertions};
    var attributes = { }, t = Element._attributeTranslations.write;
Element._attributeTranslations = {
  Element._attributeTranslations = {
  Element._attributeTranslations.write = {
  Element._attributeTranslations.has = {};
Element._insertionTranslations = {
Element.Methods.Simulated = {
Element.Methods.ByTag = { };
  window.HTMLElement = { };
  var Methods = { }, ByTag = Element.Methods.ByTag;
      Element.Methods.ByTag[tagName] = { };
    var trans = {
    window[klass] = { };
  Element.cache = { };
document.viewport = {
    var dimensions = { };
var Form = {
    if (typeof options != 'object') options = { hash: !!options };
Form.Methods = {
Form.Element = {
Form.Element.Methods = {
        var pair = { };
Form.Element.Serializers = {
if (!window.Event) var Event = { };
    var buttonMap = { 0: 1, 1: 4, 2: 2 };
var Toggle = { display: Element.toggle };
var Insertion = {
var Position = {
Element.ClassNames.prototype = {

None of the unindented assignment statements are wrapped in a closure, so any methods must be null closures. The indented ones seem to be data definitions rather than "objects with methods".

The harder case is jQuery, because it uses a "module pattern" closure just to avoid defining top-level variables and functions. Using jquery.com/download, same grep pattern:

jQuery.fn = jQuery.prototype = {
				options = {};
		target = {};
		var old = {};
			var val, props = { position: "absolute", visibility: "hidden", display:"block" }, which = name == "width" ? [ "Left", "Right" ] : [ "Top", "Bottom" ];
		var ret = [], done = {};
jQuery.browser = {
var expando = "jQuery" + now(), uuid = 0, windowData = {};
			jQuery.cache[ id ] = {};
var Expr = Sizzle.selectors = {
jQuery.event = {
				handlers = events[type] = {};
jQuery.Event.prototype = {
	jQuery.event.special[ fix ] = {
	jQuery.support = {};
	jQuery.support = {
jQuery.props = {
			data = {};
var elemdisplay = {},
	var obj = {};
			options.orig = {};
jQuery.fx.prototype = {
jQuery.offset = {
		rules = { position: 'absolute', top: 0, left: 0, margin: 0, border: 0, width: '1px', height: '1px', visibility: 'hidden' };
			results = {

Indentation is no help due to the module pattern. I'll study this harder and report back. Your point is a good one, for sure, but I'm hopeful we're better off unbranding aggressively. I'll try to divine with v8 and jsc do as well.
 
> In jstracer.cpp, TraceRecorder::hasMethod:
> >             if (VALUE_IS_FUNCTION(cx, v)) {
> >                 found = true;
> >-                if (!scope->branded()) {
> >+                if (!scope->generic() && !scope->branded()) {
> 
> I think that line is supposed to read:
>                   if (scope->generic() || !scope->branded()) {

Confusion -- we don't want to brand a generic scope, or that assertion in branded() will botch.

> This seems like too much obfuscation to save an AND instruction when
> unbranding. :-\

Maybe so but we could set the bits in either order. We don't want both set, so don't we have to check in both cases, here and JSObject::unbrand? The other place, js_FillPropertyCache, just uses carefully ordered optimization clauses.

> In TraceRecorder::getProp:
> >-    const JSCodeSpec& cs = js_CodeSpec[*cx->fp->regs->pc];
> >+    JSOp op = JSOp(*cx->fp->regs->pc);
> >+    const JSCodeSpec& cs = js_CodeSpec[op];
> 
> This surprised me a little bit. Just debugger-friendliness? Fine with me, in
> any case.

Yeah, it was exactly the enum pretty-printing. Usually we avoid single-use vars but this is a reason for an exception.

/be
So jQuery does suffer eager unbranding in object literals including the one that initializes jQuery.prototype -- this will hurt.

Idea: JSOP_UNBRAND gives you "one free pass" before doing what its name implies -- i.e., it does not call scope->setGeneric() until the second time for a given JSOP_UNBRAND bytecode.

This could be implemented with self-modifying code, or using the property cache (with inaccuracies due to collisions), or via another on-the-side data structure whether lossy or not.

Then it's not a concern if the object being nominally unbranded is the only one created by the relevant prior bytecode (constructor or object literal in which the JSOP_UNBRAND is emitted):

* If there's more than one, only the first object will have unpredictable shape.

* If there are several executions of the script for the same object, it must be a constructor called as a function, in which case we'll unbrand (setGeneric should clear branded in this case -- bug!).

* If there are several executions each for a distinct object, all but the first will be unbranded. The first will be shaped according to method identities but the rest will be despecialized to fetch methods from slots (or sprops, if there are getters involved).

So crazy it just might work. Comments?

/be
Comment 44 is a bit much. It may be a problem to defer unbranding once per UNBRAND opcode, aside from implementation ugliness.

For jQuery, another approach is to notice the module pattern in the JS compiler, and reason about its lambda being immediately applied, and at top-level as the whole of an expression statement. This would be treated as a singleton and turn off the JSOP_UNBRAND selection.

Comments welcome again.

/be
(In reply to comment #45)
> For jQuery, another approach is to notice the module pattern in the JS
> compiler, and reason about its lambda being immediately applied, and at
> top-level as the whole of an expression statement. This would be treated as a
> singleton and turn off the JSOP_UNBRAND selection.

That does sound a lot saner. :)
In jsparse.cpp, JSCompiler::setFunctionKinds:
>+                /* Skim the top level statements to tally kinds of assignments. */
>+                for (JSParseNode *stmt = pn2->pn_head; stmt; stmt = stmt->pn_next) {
>+                    if (PN_TYPE(stmt) == TOK_SEMI) {
>+                        ++allExprStmts;
>+                        JSParseNode *expr = stmt->pn_kid;
>+                        if (expr && PN_TYPE(expr) == TOK_ASSIGN) {
>+                            ++allSets;
>+                            if (PN_OP(expr->pn_left) == JSOP_SETPROP &&
>+                                PN_OP(expr->pn_left->pn_expr) == JSOP_THIS) {
>+                                ++thisPropSets;
>+                                if (PN_OP(expr->pn_right) == JSOP_LAMBDA ||
>+                                    PN_OP(expr->pn_right) == JSOP_LAMBDA_FC) {
>+                                    ++methodSets;
>+                                    JSFunction *method = (JSFunction *)
>+                                        expr->pn_right->pn_funbox->object;
>+                                    if (FUN_KIND(method) != JSFUN_NULL_CLOSURE)
>+                                        ++slowMethodSets;
>+                                }
>+                            }
>+                        }
>+                    }
>+                }
>+
>+                if (allExprStmts == allSets &&
>+                    allSets == thisPropSets &&
>+                    ShouldUnbrand(methodSets, slowMethodSets)) {
>+                    funbox->tcflags |= TCF_FUN_UNBRAND_THIS;
>+                }

As you mentioned this is crude.

What is the requirement allExprStmts == allSets good for? It seems like you'll
miss constructors like

    function C() {
        this.method1 = function ...;
        this.method2 = function ...;
        this.clear = function ...;
        this.clear();
    }

Perhaps worse, this happened to me while playing with the patch:

    function C(x) {
        print(shapeOf(this));  // inhibits unbranding!
        this.method = function () { x++; };
    }

Too sensitive to the input. I feel the same way about allSets == thisPropSets
and the limitation to top-level statements (skipping over if statements, e.g.).

You could instead burn another tcflags bit, call it TCF_THIS_METHOD, for
lambdas that are assigned to this.x. Set that during parsing. Then this code
could just walk funbox->kids instead of skimming the parse tree. (Is anyone
else reminded of the "Too Light / Too Heavy" beer commercials?)

In JSCompiler::setFunctionKinds:
>-        if (FUN_KIND(fun) == JSFUN_INTERPRETED) {
>-            if (pn->pn_type != TOK_UPVARS) {
>-                if (parent)
>-                    parent->tcflags |= TCF_FUN_HEAVYWEIGHT;
>-            } else {
>+        if (FUN_KIND(fun) == JSFUN_INTERPRETED && pn->pn_type == TOK_UPVARS) {

I don't understand what the code you're deleting here was supposed to be
doing. Was that a bug?
Comment on attachment 421334 [details] [diff] [review]
patch, v7

r=me with the nits fixed and a better heuristic in setFunctionKinds. (Marginally better is OK.)
Attachment #421334 - Flags: review?(jorendorff) → review+
Attached patch patch, v8 (obsolete) — Splinter Review
This gives the right results for jQuery: only two shouldUnbrand true return values, for the object initialisers starting at lines 2884 and 3258.

/be
Attachment #421334 - Attachment is obsolete: true
Attachment #421488 - Flags: review?(jorendorff)
Comment on attachment 421488 [details] [diff] [review]
patch, v8

Great.

>-    while (pn->pn_type == TOK_RP)
>-        pn = pn->pn_kid;

Also delete this comment, around line 7180:

>            /* CheckForImmediatelyAppliedLambda skips useless TOK_RP nodes. */

(You could even get rid of the return value. Keeping it is ok too.)

The comments about allExprStmts and so on remain to be dealt with, of course. r=me with that.
Attachment #421488 - Flags: review?(jorendorff) → review+
(In reply to comment #47)
> You could instead burn another tcflags bit, call it TCF_THIS_METHOD, for
> lambdas that are assigned to this.x. Set that during parsing. Then this code
> could just walk funbox->kids instead of skimming the parse tree. (Is anyone
> else reminded of the "Too Light / Too Heavy" beer commercials?)

TCF_FUN_SETS_THIS_METHODS is a good idea -- thanks.

> In JSCompiler::setFunctionKinds:
> >-        if (FUN_KIND(fun) == JSFUN_INTERPRETED) {
> >-            if (pn->pn_type != TOK_UPVARS) {
> >-                if (parent)
> >-                    parent->tcflags |= TCF_FUN_HEAVYWEIGHT;
> >-            } else {
> >+        if (FUN_KIND(fun) == JSFUN_INTERPRETED && pn->pn_type == TOK_UPVARS) {
> 
> I don't understand what the code you're deleting here was supposed to be
> doing. Was that a bug?

It was suboptimal to make every intervening activation require a Call object, as you noted -- some hold no variables used as upvars, indeed some (the Point.Point constructor) have no locals at all.

Will work up a v9 patch shortly.

/be
Or just TCF_HAS_THIS_METHODS, and test TCF_IN_FUNCTION too.

/be
Argh, TCF_HAS_THIS_METHODS is not enough. We want to unbrand only if any this-method is not a null closure. We don't know that till after setFunctionKinds recursion has unwound -- exactly the place where the patch's "Skim" code goes.

The parser could keep a list of such inferred methods around, for faster analysis. But one reason I skimmed was because if a ctor has if/else random logic to govern what methods are set, it's hard to analyze for likely method suite, and then check for non-null-closure methods.

I will noodle on this for a bit longer. The allExprStmts == thisPropSets == allSets requirement is too restrictive, agreed.

/be
Attached patch patch, v9 (obsolete) — Splinter Review
Once more, should be easy re-review.

/be
Attachment #421488 - Attachment is obsolete: true
Attachment #421579 - Flags: review?(jorendorff)
Comment on attachment 421579 [details] [diff] [review]
patch, v9

In JSCompiler::setFunctionKinds:
>+                    JSFunction *fun = (JSFunction *) method->pn_funbox->object;
>+                    if (!FUN_NULL_CLOSURE(fun))
>+                        ++slowMethodSets;
>+                }

if (!method->pn_funbox->joinable()) perhaps?

Very nice.
Attachment #421579 - Flags: review?(jorendorff) → review+
Thanks, joinable is better there. I beefed up the comment too.

/be
Attachment #421579 - Attachment is obsolete: true
Attachment #421651 - Flags: review+
http://hg.mozilla.org/tracemonkey/rev/36bbd730e24f

/be
Whiteboard: fixed-in-tracemonkey
This is causing a mochitest assertion on TM tinderbox:

41885 INFO TEST-PASS | /tests/content/smil/test/test_smilCSSFromBy.xhtml | filter: shouldn't be affected by animation (after animation end) - defined as non-additive in SVG spec
Assertion failure: op == JSOP_CALL || op == JSOP_APPLY || op == JSOP_NEW || op == JSOP_GETPROP || op == JSOP_GETTHISPROP || op == JSOP_GETARGPROP || op == JSOP_GETLOCALPROP || op == JSOP_LENGTH || op == JSOP_GETELEM || op == JSOP_CALLELEM || op == JSOP_SETPROP || op == JSOP_SETNAME || op == JSOP_SETMETHOD || op == JSOP_SETELEM || op == JSOP_INITELEM || op == JSOP_ENUMELEM || op == JSOP_INSTANCEOF, at /builds/slave/tracemonkey-linux-debug/build/js/src/jstracer.cpp:6752

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1263495155.1263496749.30632.gz
Also (mad props to jorendorff for noticing):

http://hg.mozilla.org/tracemonkey/rev/f6b37dc8709e

/be
(In reply to comment #58)
> This is causing a mochitest assertion on TM tinderbox:

Sorry, will fix as soon as I can.

> Assertion failure: op == JSOP_CALL || op == JSOP_APPLY || op == JSOP_NEW || op
> == JSOP_GETPROP || op == JSOP_GETTHISPROP || op == JSOP_GETARGPROP || op ==
> JSOP_GETLOCALPROP || op == JSOP_LENGTH || op == JSOP_GETELEM || op ==
> JSOP_CALLELEM || op == JSOP_SETPROP || op == JSOP_SETNAME || op ==
> JSOP_SETMETHOD || op == JSOP_SETELEM || op == JSOP_INITELEM || op ==
> JSOP_ENUMELEM || op == JSOP_INSTANCEOF, at
> /builds/slave/tracemonkey-linux-debug/build/js/src/jstracer.cpp:6752

Is this assertion worth anything? It's a slowly-growing laundry-list of ops at which we can deep-bail from a builtin.

/be
(In reply to comment #60)
> Is this assertion worth anything? It's a slowly-growing laundry-list of ops at
> which we can deep-bail from a builtin.

If we want to keep it, we should probably add a 'deep-bailable' bit to the debug op attributes.
Second followup patch. I added JSOP_CALLPROP to the assertion, the expedient fix:

http://hg.mozilla.org/tracemonkey/rev/09095420f56e

/be
Blocks: 508716
Depends on: 495331
http://hg.mozilla.org/mozilla-central/rev/36bbd730e24f
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.