Closed
Bug 614834
Opened 14 years ago
Closed 14 years ago
Recursive code using the module pattern is always at least quadratic in recursion depth due to GetUpvar walking the stack
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | betaN+ |
People
(Reporter: jerome.vouillon, Assigned: cdleary)
References
Details
(Keywords: regression, Whiteboard: softblocker)
Attachments
(3 files, 1 obsolete file)
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; fr; rv:1.9.1.15) Gecko/20101028 Iceweasel/3.5.15 (like Firefox/3.5.15) Build Identifier: The attached piece of code is extremely with Tracemonkey or when run in Firefox 4. Firefox 3.6 does not seem to have this issue. The same computation is performed several times. I would expect the duration of each iteration to be constant, but it seems to be linear in the number of iterations! This may be due to the increasing depth of the stack, but I can't see why. Reproducible: Always
Reporter | ||
Comment 1•14 years ago
|
||
Comment 2•14 years ago
|
||
The time here is completely dominated by js::GetUpvar, under both mjit and interp; tjit is equivalent to interp here, since this doesn't trace on trunk. GetUpvar has this findFrameAtLevel call which is certainly linear in the number of frames on the stack. So yes, your source of quadratic behavior is the stack depth. In particular, discovering what the name "n" is bound to while making those recursive n() calls inside n() is what's taking up all the time. Note that removing that outermost function() bit, so that "n" is just a global variable, makes things much faster under methodjit here. I assume that 3.6 managed to trace this using the now-defunct tracing of recursion and somehow did faster upvar access?
Comment 3•14 years ago
|
||
No, I lie. This has nothing to do with tracing recursion and everything to do with us removing the display optimization that handled this case efficiently in bug 578272. Running interp before/after that change on the attached testcase I get numbers like this (interp and tjit are about the same): Before: 4.96 4.89 5.065 5.0075 5.11875 After: 8.36 9.88 14.25 34.22 75.48625 In fact, any recursive code using the module pattern is now automatically O(N^2) in recursion depth... that seems bad.
Summary: Unexpected quadratic behavior → Recursive code using the module pattern is always at least quadratic in recursion depth due to GetUpvar walking the stack
Comment 4•14 years ago
|
||
One of the major post 4.0 goals is "scope chain redux". I don't think there is an official meta-bug, but this sounds like it should be on the list of things to fix.
Comment 5•14 years ago
|
||
Sure, but in the meantime we have a regression from 3.6 that affects code using a pattern that people are being encouraged to use.... Can we wait till post-4.0 to address that? I'm assuming the common JS libs like jquery just don't use deep callstacks at all (and in particular, no deeply recursive functions)? Or do they have things that recurse up the DOM and end up calling functions they close over?
I don't think display removal is the problem here. The problem is we're not IC'ing CALLNAME, and the inner functions reference their own name via the scope chain. This shouldn't be too difficult to fix. If I just make the inner functions be global, I get these times, which will be around what ICs give us: 0.54 0.56 0.555 0.5375 0.53875 Compare to v8: 0.44 0.46 0.46 0.4575 0.45 So we can do something before getting new scope chains.
Updated•14 years ago
|
Assignee: general → dvander
Status: NEW → ASSIGNED
Comment 7•14 years ago
|
||
Well, the lack of IC for callname is what lands us in GetUpvar, yes. And display removal is what makes the work GetUpvar needs to do slow. I agree that if we ICed callname the problem I'm concerned about would likely go away, since we wouldn't be on trace for recursive code for the most part and thus wouldn't need to care about tjit/interp behavior. Do we have an existing bug for a callname IC? Is it something we can reasonably try to do for 4.0?
Sorry, I misanalyzed this. There are two problems. We are generating CALLUPVAR, which has the very slow pathological walk. When I inserted "dis(s)", it generated CALLNAME instead. That's still slow, but with CALLNAME IC'd, it's almost as fast as using globals (read scope chain, guard on shape, get fp out of callobj). So I guess I have a few questions: (1) Is there a point to *UPVAR if we are able to IC these things more efficiently as generic name lookups? (2) Why do the functions not have upvars if you dis() them? Is the analysis really fragile? (3) Should we have an IC for GETUPVAR that devolves to NAME if there are more than say, 3 frames? Anyway, attacking this from an IC seems 4.0-safe.
Comment 9•14 years ago
|
||
> (2) Why do the functions not have upvars if you dis() them?
Presumably because escaping closures can't do the various upvar optimizations?
(So yeah, a somewhat conservative analysis that disables upvar stuff if you do something to the function other than call it.)
Comment 10•14 years ago
|
||
(In reply to comment #9) > > (2) Why do the functions not have upvars if you dis() them? > > Presumably because escaping closures can't do the various upvar optimizations? When GETUPVAR was implemented in terms of the display, there was the strict requirement that a function had to be called while its lexical parent function was active. When the display was removed, we never loosened the analysis.
Updated•14 years ago
|
blocking2.0: ? → beta9+
Assignee | ||
Comment 12•14 years ago
|
||
Researching a fix I made another interesting observation: we seem to walk up the tree context for every upvar to offset by the holding function's nargs. Could be a nice location for some caching.
Assignee | ||
Comment 13•14 years ago
|
||
Using name PICs takes this from 808ms to 9ms on my box. Patch is coming up.
Assignee | ||
Comment 14•14 years ago
|
||
Attachment #496832 -
Flags: review?
Assignee | ||
Comment 15•14 years ago
|
||
Add JS_ALWAYS_INLINE to the factored out BindNames, just in case.
Attachment #496832 -
Attachment is obsolete: true
Attachment #496838 -
Flags: review?
Attachment #496832 -
Flags: review?
Assignee | ||
Updated•14 years ago
|
Attachment #496838 -
Flags: review? → review?(dvander)
Comment 16•14 years ago
|
||
Hmm. That affects the tracer too, right? How does this affect closure perf on things like dromaeo's harness? Though I guess you'd have to find a dromaeo test that actually traces nowadays. :(
Comment 17•14 years ago
|
||
Comment on attachment 496838 [details] [diff] [review] Algol-like closures use JSOP_NAME PIC. This patch solves (or at least really significantly helps) with bug 614874 - hangs are reduced from 20-30 seconds to say 100-200 milliseconds.
Comment on attachment 496838 [details] [diff] [review] Algol-like closures use JSOP_NAME PIC. Brendan, can you take this review? I won't have time during the next 2 weeks.
Attachment #496838 -
Flags: review?(dvander) → review?(brendan)
Comment 20•14 years ago
|
||
Comment on attachment 496838 [details] [diff] [review] Algol-like closures use JSOP_NAME PIC. >+ /* >+ * Add the upvar cookie for the atom to the upvar cookie vector. >+ * The upvar cookie vectors are allocated lazily. >+ * We walk the tree context to find the upvar-holding function, could >+ * be a nice place to add caching. >+ */ >+ The comment needs rewrapping but I'm not sure about the caching idea. FIXME: with bug or it isn't real :-P. No blank line here. >+ UpvarCookie *vector = cg->upvarMap.vector; >+ if (!vector) { This lost a crucial fix (moving code hazard), see bug 615657. >+ uint32 length = cg->lexdeps.count; >+ >+ vector = (UpvarCookie *) js_calloc(length * sizeof *vector); I noticed the loss because this needs to be js_realloc (the if condition is a disjunction per bug 615657). >+ bool hasBoundUpvar = false; >+ > /* >+ * Check that at least one outer lexical binding exists >+ * (global variables don't count). This is conservative: we > * could limit assignments to those in the current function, > * but that's too much work. As with flat closures (handled > * below), we optimize for the case where outer bindings are > * not reassigned anywhere. > */ > while ((ale = iter()) != NULL) { > JSDefinition *lexdep = ALE_DEFN(ale)->resolve(); > > if (!lexdep->isFreeVar()) { > JS_ASSERT(lexdep->frameLevel() <= funbox->level); >- ++nupvars; >- if (lexdep->isAssigned()) >- break; >+ hasBoundUpvar = true; >+ break; > } > } >- if (!ale) >- mutation = false; >- >- if (nupvars == 0) { >+ >+ if (!hasBoundUpvar) { Don't need hasBoundUpvar, just test if (!ale) here. > FUN_METER(onlyfreevar); > FUN_SET_KIND(fun, JSFUN_NULL_CLOSURE); I'll re-review when the fix for bug 615657 is back. /be
Attachment #496838 -
Flags: review?(brendan) → review-
Updated•14 years ago
|
Whiteboard: softblocker
Comment 21•14 years ago
|
||
As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if you believe it MUST be in the next beta (ie: trunk is in an unshippable state without this)
Comment 22•14 years ago
|
||
Fixing fields my automated script accidentally blanked. Apologies for the bugspam
Keywords: regression
Assignee | ||
Comment 23•14 years ago
|
||
I'm pretty sure this is a dup of bug 592202 (and close to the same patch), but I told brendan I'd test once it landed before marking as dup.
Comment 24•14 years ago
|
||
Seems quite fixed by the patch for bug 592202: $ ./Darwin_OPT.OBJ/js -j -m -p js> function outer(x) { //with({}) {} function tick(x) { tock(x); } function tock(x) { if (x > 0) tick(x - 1); }; tick(x); } js> js> function bench() { var start = new Date(); for (var i = 0; i < 100; ++i) outer(1000); var end = new Date(); print('Elapsed: ' + (end - start)); } js> js> bench(); Elapsed: 12 /be
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
OS: Linux → Windows CE
Resolution: --- → FIXED
Comment 25•14 years ago
|
||
Could've dup'ed but went with fixed cuz this bug has a distinct testcase about perf, not crashing on a flat closure assert. /be
Comment 26•14 years ago
|
||
Yep, confirmed that on the original testcase we now get pretty constant time/j for each call, under both methodjit and interp (the test doesn't trace, so tjit is not an issue).
Comment 27•14 years ago
|
||
Oh, the only remaining issue is that mjit is about 3x faster than interp (which is good) but 4x slower than v8 (which is bad) on that testcase. ;)
Comment 28•14 years ago
|
||
(In reply to comment #27) > Oh, the only remaining issue is that mjit is about 3x faster than interp (which > is good) but 4x slower than v8 (which is bad) on that testcase. ;) New bug please. /be
Assignee | ||
Comment 29•14 years ago
|
||
(In reply to comment #27) > 4x slower than v8 (which is bad) on that testcase. ;) We don't have a callname IC yet, IIRC.
Comment 30•14 years ago
|
||
OK, filed bug 624298 on a callname IC and bug 624299 on the remaining performance issue here.
You need to log in
before you can comment on or make changes to this bug.
Description
•