Closed Bug 614834 Opened 14 years ago Closed 14 years ago

Recursive code using the module pattern is always at least quadratic in recursion depth due to GetUpvar walking the stack

Categories

(Core :: JavaScript Engine, defect)

x86_64
Windows CE
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: jerome.vouillon, Assigned: cdleary)

References

Details

(Keywords: regression, Whiteboard: softblocker)

Attachments

(3 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; fr; rv:1.9.1.15) Gecko/20101028 Iceweasel/3.5.15 (like Firefox/3.5.15)
Build Identifier: 

The attached piece of code is extremely with Tracemonkey or when run in Firefox 4.  Firefox 3.6 does not seem to have this issue.

The same computation is performed several times.  I would expect the duration of each iteration to be constant, but it seems to be linear in the number of iterations!  This may be due to the increasing depth of the stack, but I can't see why.


Reproducible: Always
Attached file Code sample
The time here is completely dominated by js::GetUpvar, under both mjit and interp; tjit is equivalent to interp here, since this doesn't trace on trunk.

GetUpvar has this findFrameAtLevel call which is certainly linear in the number of frames on the stack.  So yes, your source of quadratic behavior is the stack depth.  In particular, discovering what the name "n" is bound to while making those recursive n() calls inside n() is what's taking up all the time.  Note that removing that outermost function() bit, so that "n" is just a global variable, makes things much faster under methodjit here.

I assume that 3.6 managed to trace this using the now-defunct tracing of recursion and somehow did faster upvar access?
Status: UNCONFIRMED → NEW
blocking2.0: --- → ?
Ever confirmed: true
Keywords: regression
No, I lie.  This has nothing to do with tracing recursion and everything to do with us removing the display optimization that handled this case efficiently in bug 578272.  Running interp before/after that change on the attached testcase I get numbers like this (interp and tjit are about the same):

Before:
4.96
4.89
5.065
5.0075
5.11875

After:
8.36
9.88
14.25
34.22
75.48625

In fact, any recursive code using the module pattern is now automatically O(N^2) in recursion depth... that seems bad.
Summary: Unexpected quadratic behavior → Recursive code using the module pattern is always at least quadratic in recursion depth due to GetUpvar walking the stack
One of the major post 4.0 goals is "scope chain redux".  I don't think there is an official meta-bug, but this sounds like it should be on the list of things to fix.
Sure, but in the meantime we have a regression from 3.6 that affects code using a pattern that people are being encouraged to use....  Can we wait till post-4.0 to address that?  I'm assuming the common JS libs like jquery just don't use deep callstacks at all (and in particular, no deeply recursive functions)?  Or do they have things that recurse up the DOM and end up calling functions they close over?
I don't think display removal is the problem here. The problem is we're not IC'ing CALLNAME, and the inner functions reference their own name via the scope chain. This shouldn't be too difficult to fix.

If I just make the inner functions be global, I get these times, which will be around what ICs give us:
0.54
0.56
0.555
0.5375
0.53875

Compare to v8:
0.44
0.46
0.46
0.4575
0.45

So we can do something before getting new scope chains.
Assignee: general → dvander
Status: NEW → ASSIGNED
Well, the lack of IC for callname is what lands us in GetUpvar, yes.  And display removal is what makes the work GetUpvar needs to do slow.

I agree that if we ICed callname the problem I'm concerned about would likely go away, since we wouldn't be on trace for recursive code for the most part and thus wouldn't need to care about tjit/interp behavior.

Do we have an existing bug for a callname IC?  Is it something we can reasonably try to do for 4.0?
Sorry, I misanalyzed this. There are two problems.

We are generating CALLUPVAR, which has the very slow pathological walk. When I inserted "dis(s)", it generated CALLNAME instead. That's still slow, but with CALLNAME IC'd, it's almost as fast as using globals (read scope chain, guard on shape, get fp out of callobj).

So I guess I have a few questions:
 (1) Is there a point to *UPVAR if we are able to IC these things more efficiently as generic name lookups?
 (2) Why do the functions not have upvars if you dis() them? Is the analysis really fragile?
 (3) Should we have an IC for GETUPVAR that devolves to NAME if there are more than say, 3 frames?

Anyway, attacking this from an IC seems 4.0-safe.
> (2) Why do the functions not have upvars if you dis() them?

Presumably because escaping closures can't do the various upvar optimizations?

(So yeah, a somewhat conservative analysis that disables upvar stuff if you do something to the function other than call it.)
(In reply to comment #9)
> > (2) Why do the functions not have upvars if you dis() them?
> 
> Presumably because escaping closures can't do the various upvar optimizations?

When GETUPVAR was implemented in terms of the display, there was the strict requirement that a function had to be called while its lexical parent function was active.  When the display was removed, we never loosened the analysis.
blocking2.0: ? → beta9+
Stealing with permission.
Assignee: dvander → cdleary
Researching a fix I made another interesting observation: we seem to walk up the tree context for every upvar to offset by the holding function's nargs. Could be a nice location for some caching.
Attached file Microbenchmark.
Using name PICs takes this from 808ms to 9ms on my box. Patch is coming up.
Add JS_ALWAYS_INLINE to the factored out BindNames, just in case.
Attachment #496832 - Attachment is obsolete: true
Attachment #496838 - Flags: review?
Attachment #496832 - Flags: review?
Attachment #496838 - Flags: review? → review?(dvander)
Hmm.  That affects the tracer too, right?  How does this affect closure perf on things like dromaeo's harness?  Though I guess you'd have to find a dromaeo test that actually traces nowadays.  :(
Comment on attachment 496838 [details] [diff] [review]
Algol-like closures use JSOP_NAME PIC.

This patch solves (or at least really significantly helps) with bug 614874 - hangs are reduced from 20-30 seconds to say 100-200 milliseconds.
Comment on attachment 496838 [details] [diff] [review]
Algol-like closures use JSOP_NAME PIC.

Brendan, can you take this review? I won't have time during the next 2 weeks.
Attachment #496838 - Flags: review?(dvander) → review?(brendan)
Comment on attachment 496838 [details] [diff] [review]
Algol-like closures use JSOP_NAME PIC.

>+        /*
>+         * Add the upvar cookie for the atom to the upvar cookie vector.
>+         * The upvar cookie vectors are allocated lazily.
>+         * We walk the tree context to find the upvar-holding function, could
>+         * be a nice place to add caching.
>+         */
>+

The comment needs rewrapping but I'm not sure about the caching idea. FIXME: with bug or it isn't real :-P.

No blank line here.

>+        UpvarCookie *vector = cg->upvarMap.vector;
>+        if (!vector) {

This lost a crucial fix (moving code hazard), see bug 615657.

>+            uint32 length = cg->lexdeps.count;
>+
>+            vector = (UpvarCookie *) js_calloc(length * sizeof *vector);

I noticed the loss because this needs to be js_realloc (the if condition is a disjunction per bug 615657).

>+                bool hasBoundUpvar = false;
>+
>                 /*
>+                 * Check that at least one outer lexical binding exists
>+                 * (global variables don't count). This is conservative: we
>                  * could limit assignments to those in the current function,
>                  * but that's too much work. As with flat closures (handled
>                  * below), we optimize for the case where outer bindings are
>                  * not reassigned anywhere.
>                  */
>                 while ((ale = iter()) != NULL) {
>                     JSDefinition *lexdep = ALE_DEFN(ale)->resolve();
> 
>                     if (!lexdep->isFreeVar()) {
>                         JS_ASSERT(lexdep->frameLevel() <= funbox->level);
>-                        ++nupvars;
>-                        if (lexdep->isAssigned())
>-                            break;
>+                        hasBoundUpvar = true;
>+                        break;
>                     }
>                 }
>-                if (!ale)
>-                    mutation = false;
>-
>-                if (nupvars == 0) {
>+
>+                if (!hasBoundUpvar) {

Don't need hasBoundUpvar, just test if (!ale) here.

>                     FUN_METER(onlyfreevar);
>                     FUN_SET_KIND(fun, JSFUN_NULL_CLOSURE);

I'll re-review when the fix for bug 615657 is back.

/be
Attachment #496838 - Flags: review?(brendan) → review-
Whiteboard: softblocker
As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if you believe it MUST be in the next beta (ie: trunk is in an unshippable state without this)
No longer blocks: 617835
blocking2.0: beta9+ → betaN+
Keywords: regression
Blocks: 617835
Fixing fields my automated script accidentally blanked. Apologies for the bugspam
Keywords: regression
I'm pretty sure this is a dup of bug 592202 (and close to the same patch), but I told brendan I'd test once it landed before marking as dup.
Seems quite fixed by the patch for bug 592202:


$ ./Darwin_OPT.OBJ/js -j -m -p
js> function outer(x) {
    //with({}) {}
    function tick(x) { tock(x); }
    function tock(x) { if (x > 0) tick(x - 1); };
    tick(x);
}
js> 
js> function bench() {
    var start = new Date();
    for (var i = 0; i < 100; ++i)
        outer(1000);
    var end = new Date();
    print('Elapsed: ' + (end - start));
}
js> 
js> bench();
Elapsed: 12

/be
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
OS: Linux → Windows CE
Resolution: --- → FIXED
Could've dup'ed but went with fixed cuz this bug has a distinct testcase about perf, not crashing on a flat closure assert.

/be
Yep, confirmed that on the original testcase we now get pretty constant time/j for each call, under both methodjit and interp (the test doesn't trace, so tjit is not an issue).
Oh, the only remaining issue is that mjit is about 3x faster than interp (which is good) but 4x slower than v8 (which is bad) on that testcase.  ;)
(In reply to comment #27)
> Oh, the only remaining issue is that mjit is about 3x faster than interp (which
> is good) but 4x slower than v8 (which is bad) on that testcase.  ;)

New bug please.

/be
(In reply to comment #27)
> 4x slower than v8 (which is bad) on that testcase.  ;)

We don't have a callname IC yet, IIRC.
OK, filed bug 624298 on a callname IC and bug 624299 on the remaining performance issue here.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: