Closed Bug 501908 Opened 15 years ago Closed 14 years ago

UnlinkFunctionBoxes is slow for many nested functions

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.3a5
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: write2sebastian, Assigned: jimb)

References

()

Details

(4 keywords, Whiteboard: hardblocker,fixed-in-tracemonkey)

Attachments

(4 files, 4 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1) Gecko/20090624 Firefox/3.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1) Gecko/20090624 Firefox/3.5

Firefox 3.5 freezes (does not respond for about 10-12 seconds) on access to the mentioned website. This happens for the initial page load and whenever you change to a subdomain of the URL mentioned.

PS: Behavior is not shown for FF3, FF2, IE6, IE7, IE8

Reproducible: Always

Steps to Reproduce:
1. Go to http://www.opennetworx.org
2. Go to http://www.tu-darmstadt.opennetworx.org
3. Go to http://www.opennetworx.org
4. To to http://www.sozialenetzwerke.opennetworx.org
Actual Results:  
Every time you change the domain, FF 3.5 freezes for 10-12 seconds.

Expected Results:  
Time consumption should only be 3 seconds (like on FF 3.0)

This Website is running in Google Web Toolkit and using extensive JavaScript
wfm on win2003 and Seamonkey 1.9.1 branch and FF3.5

Did you already tried http://support.mozilla.com/en-US/kb/Safe+Mode ?
Unfortunately safe mode doesn't change the behavior. Firefox is know to slow down if you are using plugins like Firebug in combination with GWT. But this isn't the case with all plugins disabled in safe mode.
I can confirm the slowdown of firefox 3.5 on access http://www.opennetworx.org and changing the domain on this social networking plattform.
I can confirm it too using

Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2a1pre) Gecko/20090706 Minefield/3.6a1pre (.NET CLR 3.5.30729) WinXP x64
and 
Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.1) Gecko/20090624 Firefox/3.5 (.NET CLR 3.5.30729) Win7 x64
Both running in safe mode, so someone could change platform to windows and not only x86.
Yep, same here. Using:

Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1) Gecko/20090624 Firefox/3.5
Yep, also:

Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2

WinXP SP3
Pentium 4, 2.35GHz, 2 GB RAM
CPU goes to 100% for more than 10 seconds
can you narrow the issue to something specific within the web page(s)?

100% for about 15 sec with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2b1pre) Gecko/20090928 Namoroka/3.6b1pre
Assignee: nobody → general
Severity: critical → major
Component: General → JavaScript Engine
Product: Firefox → Core
QA Contact: general → general
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.4) Gecko/20100503 Firefox/3.6.4

I see the same issue on Firefox 3.6.4.
This is the http://www.opennetworx.org link saved and zipped.
Attached file Reduced testcase
All that is left is a big script.
Big difference between Firefox 3 and the others. 

Hangs ~ 1 sec:
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.0.19pre) Gecko/2010021606 GranParadiso/3.0.19pre

Hangs ~ 10 sec:
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.11pre) Gecko/20100608 Shiretoko/3.5.11pre

Hangs ~ 10 sec:
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.5pre) Gecko/20100526 Namoroka/3.6.5pre

Hangs ~ 7 sec:
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a5pre) Gecko/20100603 Minefield/3.7a5pre
Keywords: testcase
Very nice testcase. Thanks a lot. We definitely have a problem here.

UnlinkFunctionBoxes and setFunctionKinds suck up a lot of time.

+ 95.4%, start, firefox-bin
| + 95.4%, main, firefox-bin
| | + 95.4%, XRE_main, XUL
| | | + 95.4%, nsAppStartup::Run(), XUL
| | | | + 95.4%, nsAppShell::Run(), XUL
| | | | | + 95.4%, -[NSApplication run], AppKit
| | | | | | + 95.0%, -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:], AppKit
| | | | | | | + 95.0%, _DPSNextEvent, AppKit
| | | | | | | | + 94.9%, BlockUntilNextEventMatchingListInMode, HIToolbox
| | | | | | | | | + 94.9%, ReceiveNextEventCommon, HIToolbox
| | | | | | | | | | + 94.9%, RunCurrentEventLoopInMode, HIToolbox
| | | | | | | | | | | + 94.8%, CFRunLoopRunSpecific, CoreFoundation
| | | | | | | | | | | | + 94.7%, __CFRunLoopRun, CoreFoundation
| | | | | | | | | | | | | + 91.2%, __CFRunLoopDoSources0, CoreFoundation
| | | | | | | | | | | | | | + 91.2%, nsAppShell::ProcessGeckoEvents(void*), XUL
| | | | | | | | | | | | | | | + 91.1%, nsBaseAppShell::NativeEventCallback(), XUL
| | | | | | | | | | | | | | | | + 91.1%, NS_ProcessPendingEvents_P(nsIThread*, unsigned int), XUL
| | | | | | | | | | | | | | | | | + 91.1%, nsThread::ProcessNextEvent(int, int*), XUL
| | | | | | | | | | | | | | | | | | + 74.1%, nsInputStreamReadyEvent::Run(), XUL
| | | | | | | | | | | | | | | | | | | + 74.0%, nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*), XUL
| | | | | | | | | | | | | | | | | | | | + 72.4%, nsInputStreamPump::OnStateStop(), XUL
| | | | | | | | | | | | | | | | | | | | | + 72.3%, nsHttpChannel::OnStopRequest(nsIRequest*, nsISupports*, unsigned int), XUL
| | | | | | | | | | | | | | | | | | | | | | + 64.7%, nsStreamLoader::OnStopRequest(nsIRequest*, nsISupports*, unsigned int), XUL
| | | | | | | | | | | | | | | | | | | | | | | + 64.7%, nsScriptLoader::OnStreamComplete(nsIStreamLoader*, nsISupports*, unsigned int, unsigned int, unsigned char const*), XUL
| | | | | | | | | | | | | | | | | | | | | | | | + 64.7%, nsScriptLoader::ProcessPendingRequests(), XUL
| | | | | | | | | | | | | | | | | | | | | | | | | + 64.7%, nsScriptLoader::ProcessRequest(nsScriptLoadRequest*), XUL
| | | | | | | | | | | | | | | | | | | | | | | | | | + 64.7%, nsScriptLoader::EvaluateScript(nsScriptLoadRequest*, nsString const&), XUL
| | | | | | | | | | | | | | | | | | | | | | | | | | | + 64.7%, nsJSContext::EvaluateString(nsAString_internal const&, void*, nsIPrincipal*, char const*, unsigned int, unsigned int, nsAString_internal*, int*), XUL
| | | | | | | | | | | | | | | | | | | | | | | | | | | | + 64.7%, JS_EvaluateUCScriptForPrincipals, libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | + 64.2%, js::Compiler::compileScript(JSContext*, JSObject*, JSStackFrame*, JSPrincipals*, unsigned int, unsigned short const*, unsigned long, __sFILE*, char const*, unsigned int, JSString*, unsigned int), libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + 41.2%, RecycleTree(JSParseNode*, JSTreeContext*), libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + 41.2%, UnlinkFunctionBoxes(JSParseNode*, JSTreeContext*), libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + 41.2%, UnlinkFunctionBoxes(JSParseNode*, JSTreeContext*), libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + 41.2%, UnlinkFunctionBoxes(JSParseNode*, JSTreeContext*), libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - 41.2%, UnlinkFunctionBoxes(JSParseNode*, JSTreeContext*), libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - 12.5%, js::Parser::setFunctionKinds(JSFunctionBox*, unsigned int&), libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - 7.1%, js::Parser::statement(), libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - 2.7%, js_EmitTree, libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - 0.5%, js::Parser::markFunArgs(JSFunctionBox*, unsigned int), libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - 0.2%, js::Parser::~Parser(), libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - 0.0%, js_NewScriptFromCG, libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - 0.0%, js::TokenStream::getTokenInternal(), libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - 0.0%, js_FoldConstants, libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - 0.0%, JS_FinishArenaPool, libmozjs.dylib
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |   0.0%, JSAtomListIterator::operator()(), libmozjs.dylib
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
            JSParseNode **pnp = &funbox->parent->methods;
            while (JSParseNode *method = *pnp) {
                if (method == pn) {
                    *pnp = method->pn_link;
                    break;
                }
                pnp = &method->pn_link;
            }

        JSFunctionBox **funboxp = &tc->functionList;
        while (*funboxp) {
            if (*funboxp == funbox) {
                *funboxp = funbox->siblings;
                break;
            }
            funboxp = &(*funboxp)->siblings;
        }

Both loops are quadratic for large numbers of functions as in this test case.
Assignee: general → brendan
blocking2.0: --- → ?
OS: Windows XP → All
Priority: -- → P2
Hardware: x86 → All
Actually, UnlinkFunctionBoxes is recursive and explores the expression tree, so this is more cubic than quadratic.
Summary: GWT (Google Web Toolkit) / JavaScript slow/freezes on Firefox 3.5 → UnlinkFunctionBoxes is slow for many nested functions
Happy to help.

Seems the testcase is workable so removing "testcase-wanted".
Keywords: testcase-wanted
The testcases were tested locally.

Regression range:

Works - 1 sec hang:
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2a1pre) Gecko/20090406 Minefield/3.6a1pre
http://hg.mozilla.org/mozilla-central/rev/b14428284d51

Broken - 15 sec hang:
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2a1pre) Gecko/20090407 Minefield/3.6a1pre
http://hg.mozilla.org/mozilla-central/rev/80f223f6296e

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b14428284d51&tochange=80f223f6296e
Keywords: regression
Yeah, your bisecting is about right. This is the upvar patch.

http://hg.mozilla.org/mozilla-central/rev/2cf0bbe3772a
Once again any O(n^2) on the web is a bad idea. I've learned this before, this time for sure.

/be
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.9.3a5
sayrer, want to block on this?
blocking2.0: ? → betaN+
I think Chris was interested in this bug and willing to take it, but if not, bounce to me -- I'll get to it.

/be
Assignee: brendan → cdleary
Blocks: 610296
Blocks: 610240
Blocks: 616465
Passing to jimb, who *loves* front-end bugs!
Assignee: cdleary → jimb
I do enjoy working on the front end.
One interesting wrinkle: carefully constructed code can cause FindFunArgs to access recycled function body parse nodes:

function f(x) { return delete (true ? null : function g() { return x; }); }

When we parse a 'delete' expression, we call js_FoldConstants on its operand, which recycles the unused branch of a conditional it folds --- in this case, the TOK_FUNCTION node. However, the funbox, still pointing to the recycled body parse node, is left in the tree, and is visited by FindFunArgs, which treats it as the body.

If we're going to use RecycleTree the way we do, it should have simpler semantics --- it should clean up after itself properly. People shouldn't have to reason through this kind of crap.
Sorry --- moment of exasperation there.

If the way function boxes stay in the tree referring to nodes destined to be recycled is accidental, well, I've done worse myself before and I'll do worse again; we clean it up and move on.

But if that was a deliberate design decision --- that is, the author observed, or believed they observed, that it just so happened that no harm would ever come from leaving the funboxes in the tree --- then this is the sort of nonsense up with which we should not put. Such delicate invariants are usually, and properly, known by less flattering names.
Jim: no worries, of course it was a mix of intentional folly and unintended bugs. The idea was to keep body-less function definition nodes around. That's why RecycleTree has this test:

     if (pn->pn_used || pn->pn_defn) {

which tries to recycle kids of PN_FUNC and PN_NAME nodes but leave the parent nodes themselves around, for analysis purposes. But RecycleFuncNameKids does not null things appropriately, and it's not clear at this late hour that such nulling would be checked for elsewhere.

/be
[This may be too big a change to take at this point, but it's what we should do in the long run.]

It turns out that the careful effort RecycleTree and NewOrRecycledNode make
to disassemble the recycled tree lazily is wasted: every recycling call
ends up calling UnlinkFunctionBoxes and walking the entire tree to fix up
funbox and method links. There's no locality; you might as well queue up
the nodes while you're there.

This patch replaces the (very clever) lazy recycling with a simple
walk-and-free pass, and only cleans up the method and funbox lists when we
finish up the function box in functionDef, making a single linear pass.
Putting this off to the end avoids quadratic behavior, as noted in the
comments.

The patch localizes the process of adding nodes to the free list in a
single function, ensuring that we don't recycle used/defn nodes. It also
poisons newly freed nodes.

The patch also simplifies js_FoldConstants to simply recycle the whole tree
it is replacing with a boolean constant and pull a fresh node off the list,
rather than copying the logic for correctly freeing PN_FUNC nodes.
I think I have a tiny fix...
This is a known problem for GWT generated code. We should really fix it for 4.
With the big patch, by the way, the allocator no longer appears as one of the top items in the cachegrind profile; NewOrRecycledNode is the largest allocation-related item, at 53Minstructions out of 5Gi total. (Without the patch, UnlinkFunctionBox is 1.7Gi out of 6.8Gi total.)
I think another important point to note in comment 30 is that the overall instruction count is reduced by 1.8 Gi, which is a 26.5% win.  (Yay!)
Comment on attachment 500383 [details] [diff] [review]
Bug 501908: Avoid O(n^2) behavior when recycling large trees.

Jim, great work -- thanks for wiping up my mess here.

>+/*
>+ * Add |pn| and its children to |tc|'s free node list. However, do not free pn_used or
>+ * pn_defn nodes; there are references to those nodes in places that we cannot fix up, so
>+ * we leak them.

"leak" is too strong -- how about retain them until end of compilation or some such?

I don't have any concern about this patch's size. But the lazy recycling from NewOrRecycledNode was a win in its day. Eagerly recycling could hurt code that had this kind of structure:

while (deep) {
   code; ...
       here ...
}
shallow();
shallow();

Most of the nodes for the deep code here structure could be left untouched as the shallow successors were compiled a statement or function declaration at a time.

Igor may remember more, the very clever lazy recycling was his idea. The bug where lazy recycling went in was bug 195385.

/be
Attachment #500383 - Flags: feedback?(igor)
Patch is deeply flawed. Fixing.
(In reply to comment #32)
> >+/*
> >+ * Add |pn| and its children to |tc|'s free node list. However, do not free pn_used or
> >+ * pn_defn nodes; there are references to those nodes in places that we cannot fix up, so
> >+ * we leak them.
> 
> "leak" is too strong -- how about retain them until end of compilation or some
> such?

Yeah, the comment's misleading. Will fix.

> I don't have any concern about this patch's size. But the lazy recycling from
> NewOrRecycledNode was a win in its day. Eagerly recycling could hurt code that
> had this kind of structure:
> 
> while (deep) {
>    code; ...
>        here ...
> }
> shallow();
> shallow();
> 
> Most of the nodes for the deep code here structure could be left untouched as
> the shallow successors were compiled a statement or function declaration at a
> time.

That would certainly be a win, but we don't get that win now --- the function box unlinking already walks that deep structure now, completely.

> Igor may remember more, the very clever lazy recycling was his idea. The bug
> where lazy recycling went in was bug 195385.

It is a great hack.
(In reply to comment #34)
> That would certainly be a win, but we don't get that win now --- the function
> box unlinking already walks that deep structure now, completely.

Apples to oranges. Function boxes are skeletal and sparse or even non-existent. The huge deep code here structure could be function-free, just a big/deep/nesty statement full of expressions and substatements. IINM your patch will walk the whole tree of nodes when recycling the top node, where tm tip won't -- right?

> It is a great hack.

It's reminiscent of sweep-during-GC-allocation but of course simpler, even though ad hoc.

/be
(In reply to comment #35)
> Igor may remember more, the very clever lazy recycling was his idea. The bug
> where lazy recycling went in was bug 195385.

The primary reason for that was to avoid the recursion during the recycling and corresponding too deep native stack checks. The stack check in the parser itself was not enough as the tree to recycle could be bigger then the parsing stack depth. So to avoid the error propagation from the recycle functions I replaced that with lazy recycle. 

The patch AFAICS must be fixed to add those over recursive stack use checks.
(In reply to comment #36)
> The primary reason for that was to avoid the recursion during the recycling and
> corresponding too deep native stack checks. The stack check in the parser
> itself was not enough as the tree to recycle could be bigger then the parsing
> stack depth. So to avoid the error propagation from the recycle functions I
> replaced that with lazy recycle. 
> 
> The patch AFAICS must be fixed to add those over recursive stack use checks.

And do what, though? Fail the compilation?

We could in principle restore lazy recycling at some complexity. Jim, have you thought about that?

/be
(In reply to comment #37)
> And do what, though? Fail the compilation?

This is a very good point. The parser tries to eliminate the tail recursion to allow presumably for compiling more complex scripts. Throwing an error during recycle would defeat that.
(In reply to comment #35)
> Apples to oranges. Function boxes are skeletal and sparse or even non-existent.
> The huge deep code here structure could be function-free, just a big/deep/nesty
> statement full of expressions and substatements. IINM your patch will walk the
> whole tree of nodes when recycling the top node, where tm tip won't -- right?

I'm not making myself clear. UnlinkFunctionBoxes walks the entire parse node tree being recycled, not the function box tree. In unmodified TM, the following code:

    function f() {
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
    while(false)
        1;
    }

produces the following call stack:

Breakpoint 17, UnlinkFunctionBoxes (pn=0xaf23d0 {NUMBER:DOUBLE, pn_dval=1}, tc=0x7fffffffdcb0) at ../jsparse.cpp:390
(gdb) where
#0  UnlinkFunctionBoxes (pn=0xaf23d0 {NUMBER:DOUBLE, pn_dval=1}, tc=0x7fffffffdcb0) at ../jsparse.cpp:390
#1  0x0000000000512600 in UnlinkFunctionBoxes (pn=0xaf2418 {SEMI} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:392
#2  0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf2340 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#3  0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf22b0 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#4  0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf2220 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#5  0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf2190 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#6  0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf2100 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#7  0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf2070 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#8  0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1fe0 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#9  0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1ef8 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#10 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1e68 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#11 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1dd8 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#12 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1d48 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#13 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1cb8 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#14 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1c28 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#15 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1b98 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#16 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1b08 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#17 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1a78 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#18 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf19e8 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#19 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1958 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#20 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf18c8 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#21 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1838 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#22 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf17a8 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#23 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1718 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#24 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1688 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#25 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf15f8 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#26 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1568 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#27 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf14d8 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#28 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1448 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#29 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf13b8 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#30 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1328 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#31 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1298 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#32 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1208 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#33 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf1178 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#34 0x0000000000512633 in UnlinkFunctionBoxes (pn=0xaf10e8 {WHILE} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:396
#35 0x00000000005126a3 in UnlinkFunctionBoxes (pn=0xaf10a0 {LC} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:405
#36 0x000000000051255a in UnlinkFunctionBox (pn=0xaf0fd0 {FUNCTION, "f"} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:374
#37 0x0000000000512826 in RecycleFuncNameKids (pn=0xaf0fd0 {FUNCTION, "f"} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:461
#38 0x00000000005127a2 in RecycleTree (pn=0xaf0fd0 {FUNCTION, "f"} = {...}, tc=0x7fffffffdcb0) at ../jsparse.cpp:443
#39 0x0000000000513b57 in js::Compiler::compileScript (cx=0xa9f100, scopeChain=0x7ffff6803048 [Object global], callerFrame=0x0, principals=0x0, tcflags=155648, chars=0xaf0a80, length=453, filename=0x7fffffffeabe "/home/jimb/moz/deep.js", lineno=1, source=NULL, staticLevel=0) at ../jsparse.cpp:913
#40 0x000000000042cacb in CompileFileHelper (cx=0xa9f100, obj=0x7ffff6803048 [Object global], principals=0x0, tcflags=155648, filename=0x7fffffffeabe "/home/jimb/moz/deep.js", fp=0xaf0840) at ../jsapi.cpp:4630
#41 0x000000000042cce0 in JS_CompileFileHandleForPrincipals (cx=0xa9f100, obj=0x7ffff6803048 [Object global], filename=0x7fffffffeabe "/home/jimb/moz/deep.js", file=0xaf0840, principals=0x0) at ../jsapi.cpp:4680
#42 0x000000000042cdfd in JS_CompileFileHandle (cx=0xa9f100, obj=0x7ffff6803048 [Object global], filename=0x7fffffffeabe "/home/jimb/moz/deep.js", file=0xaf0840) at ../jsapi.cpp:4702
#43 0x00000000004053be in Process (cx=0xa9f100, obj=0x7ffff6803048 [Object global], filename=0x7fffffffeabe "/home/jimb/moz/deep.js", forceTTY=0) at ../../shell/js.cpp:449
#44 0x00000000004061da in ProcessArgs (cx=0xa9f100, obj=0x7ffff6803048 [Object global], argv=0x7fffffffe7e0, argc=2) at ../../shell/js.cpp:870
#45 0x000000000040fa74 in Shell (cx=0xa9f100, argc=2, argv=0x7fffffffe7e0, envp=0x7fffffffe7f8) at ../../shell/js.cpp:5471
#46 0x000000000040fc3a in main (argc=2, argv=0x7fffffffe7e0, envp=0x7fffffffe7f8) at ../../shell/js.cpp:5579
(gdb) 

The optimization has been lost.
(In reply to comment #37)
> We could in principle restore lazy recycling at some complexity. Jim, have you
> thought about that?

At first I wanted to keep things lazy, for niftyness if nothing else, but if one walks the function tree it's impossible to tell which functions lie within the tree you're recycling and which lie outside it. We could force a full unlazying of the freelist before doing the function analysis; that would keep the stack shallow, but still waste the time.
Re: comment 39 -- D'oh!

Re: comment 40: could you use pn_pos (which has nice operators for ordering source coords) to tell what's in the tree and what is not?

/be
(In reply to comment #41)
> Re: comment 39 -- D'oh!
> 
> Re: comment 40: could you use pn_pos (which has nice operators for ordering
> source coords) to tell what's in the tree and what is not?

That's a thought. Do those work nicely with the generator reordering, and interesting target expressions in destructuring lhs's?
(In reply to comment #42)
> (In reply to comment #41)
> > Re: comment 39 -- D'oh!
> > 
> > Re: comment 40: could you use pn_pos (which has nice operators for ordering
> > source coords) to tell what's in the tree and what is not?
> 
> That's a thought. Do those work nicely with the generator reordering, and
> interesting target expressions in destructuring lhs's?

Should -- if we have any interior node pn_pos errors those are usually easy bugs to fix.

/be
Whiteboard: hardblocker
It seems to me that MakeDefIntoUse can drop the old definition's body on the floor when called from functionDef. I do see function boxes pointing to NAME nodes in FindFunArgs, and I've caught MakeDefIntoUse mashing a function box without recycling its body.

In my current patch (almost ready, really), I have a function PrepareNodeForMutation that js_FoldConstants can use; I believe MakeDefIntoUse should do the same. Does this sound right?
Yes, body-dropping with nulling was the plan -- sorry it wasn't carried out.

/be
One slightly disturbing thing I've noticed: while the loop in compileScript recycles the tree, it leaves the decls table in place for the next iteration to find, meaning that you can have a functionDef->MakeDefIntoUse call rewriting a node from a previous iteration. This seems like asking for trouble.
(In reply to comment #46)
> One slightly disturbing thing I've noticed: while the loop in compileScript
> recycles the tree, it leaves the decls table in place for the next iteration to
> find, meaning that you can have a functionDef->MakeDefIntoUse call rewriting a
> node from a previous iteration. This seems like asking for trouble.

Yeah, compilerScript is special that way. It's not going to revisit the old uses or do anything that might visit function bodies. If you null-poison it seems ok. What trouble are you thinking of?

The requirement that uses of a given def can span multiple top-level statements or function declarations seems hard to avoid. Any ideas on removing uses?

/be
(In reply to comment #47)
> Yeah, compilerScript is special that way. It's not going to revisit the old
> uses or do anything that might visit function bodies. If you null-poison it
> seems ok. What trouble are you thinking of?

I haven't been able to concoct a stimulus that gets an interesting response. It may be fine.

> The requirement that uses of a given def can span multiple top-level statements
> or function declarations seems hard to avoid. Any ideas on removing uses?

Well, my understanding of this stuff never seems to be complete, but as far as I can tell, it can't be necessary to maintain decls from one iteration to the next: We generate bytecode before we've seen all the source code. I would expect that code that needs to see decls from prior iterations ought to also depend on decls from later iterations. But obviously it doesn't have that information.

I should try wiping out cg.decls at the bottom of the compileScript loop and see what fails... tomorrow.
Proper function recycling may mean eliminating the tree context's entire
function list; it's misleading to pass in the function list, rather than
side-effecting the tc in place.

Let analyzeFunctions take care of testing whether we have any functions to
analyze, instead of making each caller do it. In the next patch in the
series, we won't know whether the function list is really clear or not in
the callers anyway.

Avoid passing tcflags around by non-const reference; SpiderMonkey style is
to use pointers for parameters the callee may mutate, to make call sites
more evidently potential mutations.
Attachment #500383 - Attachment is obsolete: true
Attachment #501600 - Flags: review?(brendan)
Attachment #500383 - Flags: feedback?(igor)
It turns out that the careful effort RecycleTree and NewOrRecycledNode make
to disassemble the recycled tree lazily is wasted: every recycling call
ends up calling UnlinkFunctionBoxes and walking the entire tree to fix up
funbox and method links. There's no locality; you might as well queue up
the nodes while you're at it. And the stack doesn't stay shallow.

This patch replaces the (very clever) lazy recycling with a simple
walk-and-free pass, and only cleans up the method lists and funbox tree
before function analysis, making a single linear pass. Putting this off to
the end avoids quadratic behavior, as noted in the comments.

The patch localizes the process of adding nodes to the free list in a
single function, ensuring that we don't recycle used/defn nodes. It also
poisons newly freed nodes.

The patch also more clearly distinguishes between function nodes that have
been fully deleted, and function nodes that have been mutated (by
js_FoldConstants) into other kinds of nodes. See the comments before
Parser::cleanFunctionList.
Attachment #501601 - Flags: review?(brendan)
Brendan, as always, feel free to kick the review off to someone else if you're swamped.
Confirmed that, with latest patch rev, the allocator's contribution to the instruction count is as required.
Hi, Igor. Would you be willing to take the reviews from Brendan?
[Revised; new changes in last paragraph below.]

It turns out that the careful effort RecycleTree and NewOrRecycledNode make
to disassemble the recycled tree lazily is wasted: every recycling call
ends up calling UnlinkFunctionBoxes and walking the entire parse node tree
to fix up funbox and method links. There's no locality; you might as well
queue up the parse nodes while you're at it. And the stack doesn't stay
shallow.

This patch replaces the (very clever) lazy recycling with a simple
recursive walk-and-free pass, and only cleans up the method lists and
funbox tree before function analysis, making a single linear pass. Putting
this off to the end avoids quadratic behavior, as noted in the comments.

The patch localizes the process of adding nodes to the free list in a
single function, ensuring that we don't recycle used/defn nodes. It also
poisons newly freed nodes.

The patch also more clearly distinguishes between function nodes that have
been fully deleted, and function nodes that have been mutated (by
js_FoldConstants) into other kinds of nodes. See the comments before
Parser::cleanFunctionList.

I believe the patch also improves the care with which we handle nodes that
cannot be recycled immediately (those that appear in JSAtomLists, or are
referred to by JSFunctionBoxes). In some cases, those nodes may be picked
up and fiddled with later, so it is important that they not refer to nodes
around them that did get recycled.
Attachment #501601 - Attachment is obsolete: true
Attachment #502199 - Flags: review?(brendan)
Attachment #501601 - Flags: review?(brendan)
[Sorry --- *this* is the revised patch. Time to go to bed, apparently.]
Attachment #502199 - Attachment is obsolete: true
Attachment #502201 - Flags: review?(brendan)
Attachment #502199 - Flags: review?(brendan)
Attachment #501600 - Flags: review?(brendan) → review?(igor)
Attachment #502201 - Flags: review?(brendan) → review?(igor)
Attachment #502201 - Flags: review?(igor) → review+
Comment on attachment 502201 [details] [diff] [review]
Avoid O(n^2) behavior when recycling large trees.

Ops, the + for the wrong patch
Attachment #502201 - Flags: review+ → review?
Attachment #501600 - Flags: review?(igor) → review+
Comment on attachment 502201 [details] [diff] [review]
Avoid O(n^2) behavior when recycling large trees.

The patch makes the situation with ignoring the native stack recursion limit worse then currently present. At the very least the patch should add those checks. A better possibility would be to implement non-recursive tree walk, but that adds a lot of complexity. So what about simply not doing the recycling and measuring performance/memory usage?
Attachment #502201 - Flags: review? → review-
(In reply to comment #57)
> Comment on attachment 502201 [details] [diff] [review]
> Avoid O(n^2) behavior when recycling large trees.
> 
> The patch makes the situation with ignoring the native stack recursion limit
> worse then currently present. At the very least the patch should add those
> checks. A better possibility would be to implement non-recursive tree walk, but
> that adds a lot of complexity. So what about simply not doing the recycling and
> measuring performance/memory usage?

Well, the trees we're traversing were created by the parser, which already did the depth check. Why do we need to check again?

But, in any case, I think I can do a constant-space (including C++ stack space) version pretty quickly.
(In reply to comment #58)
> Well, the trees we're traversing were created by the parser, which already did
> the depth check. Why do we need to check again?

See comment 38. The parser tail-call-optimizes recursion to iteration for certain structures but the recycler may not (this needs to be checked -- long time ago we did the lazy recycler).

> But, in any case, I think I can do a constant-space (including C++ stack space)
> version pretty quickly.

Great. I keep showing my scars (some recent) to warn everyone away from bad asymptotic behavior in web-facing code.

/be
Revised, to avoid creating deep C++ stacks when recycling deep parse trees.
Attachment #502201 - Attachment is obsolete: true
Attachment #503326 - Flags: review?(igor)
Comment on attachment 503326 [details] [diff] [review]
Avoid O(n^2) behavior when recycling large trees.

>+class NodeStack {

AFAICS this stack is not necessary. Instead of moving the nodes between lists the code needs a pointer into the recycled list denoting the end of unprocessed part and then doing a loop until the list head matches the unprocessed part.

Another observation is that the list is expensive as it involves writes to a node link field leading to a write to the whole cache line. If the list would be replaced by a vector then this extra write would be avoided. But this can wait to another bug.
(In reply to comment #61)
> AFAICS this stack is not necessary. Instead of moving the nodes between lists
> the code needs a pointer into the recycled list denoting the end of unprocessed
> part and then doing a loop until the list head matches the unprocessed part.

So, my first shot at this patch used a beautiful queue data that did something like what you're describing, where the freelist was generated as a side effect of processing the worklist. Very elegant, less memory traffic.

The reason I took it out is that we need to have nodes on the worklist that will not go on the freelist: function nodes and pn_used/pn_defn nodes must not be recycled, because they are referred to by other data structures. Integrating the work pool and the free list meant that one had to unlink certain structures, and it wasn't graceful any more. It could certainly be made to work.

> Another observation is that the list is expensive as it involves writes to a
> node link field leading to a write to the whole cache line. If the list would
> be replaced by a vector then this extra write would be avoided. But this can
> wait to another bug.

This is an interesting point.
(In reply to comment #62)
> The reason I took it out is that we need to have nodes on the worklist that
> will not go on the freelist: function nodes and pn_used/pn_defn nodes must not
> be recycled, because they are referred to by other data structures. Integrating
> the work pool and the free list meant that one had to unlink certain
> structures, and it wasn't graceful any more. It could certainly be made to
> work.

I have missed the node that should not be in the list. But removal of those does when processing the unprocessed part of the list does not look bad.

> 
> > Another observation is that the list is expensive as it involves writes to a
> > node link field leading to a write to the whole cache line. If the list would
> > be replaced by a vector then this extra write would be avoided. But this can
> > wait to another bug.
> 
> This is an interesting point.

The function nodes to be removed also complicate things with a vector as one would need to maintain a potential hole between fully recycled nodes and unprocessed ones. Alternatively one may replace with null still used nodes for the cost of an extra null check during the allocation. Still overall this should be simpler then list manipulations and would minimize the write bandwidth by factor of 9 (JSNode has 9 words) when recycling a big tree.

With vector there is one more complication as one need to deal with failed allocation. In such case the recycling can just throw away the node under assumption that OOM would quickly reveal itself during other allocations.
(In reply to comment #63)
> I have missed the node that should not be in the list. But removal of those
> does when processing the unprocessed part of the list does not look bad.

It can be done, of course. It just isn't very readable.

> With vector there is one more complication as one need to deal with failed
> allocation. In such case the recycling can just throw away the node under
> assumption that OOM would quickly reveal itself during other allocations.

Or it could fall back to using the linked list...
So, you feel that using a vector for the stack instead of a linked list is important enough that it should be addressed in this blocker, not in a follow-up bug?
(In reply to comment #65)
> So, you feel that using a vector for the stack instead of a linked list is
> important enough that it should be addressed in this blocker, not in a
> follow-up bug?

Clearly this can wait a follow-up due to not-to-be-reused nodes complications. I will add r+ and a nit in a moment.
Comment on attachment 503326 [details] [diff] [review]
Avoid O(n^2) behavior when recycling large trees.

>+PrepareNodeForMutation(JSParseNode *pn, JSTreeContext *tc)
>+{
>+    if (pn->pn_arity != PN_NULLARY) {
>+        if (pn->pn_arity == PN_FUNC) {
...
>+        }
>+
>+        NodeStack stack;
>+        PushNodeChildren(pn, &stack);

Nit: we need an assert and perhaps comments about why PushNodeChildren is not followed by AddNodeToFreeList and why RecycleTree is not called here.
Attachment #503326 - Flags: review?(igor) → review+
(In reply to comment #67)
> Nit: we need an assert and perhaps comments about why PushNodeChildren is not
> followed by AddNodeToFreeList and why RecycleTree is not called here.

I've added some comments, but I'm not sure which assertion you would like to see. 

-/* Prepare a node to be mutated in place into a new kind of node. */
+/*
+ * Prepare |pn| to be mutated in place into a new kind of node. Recycle all
+ * |pn|'s recyclable children (but not |pn| itself!), and disconnect it from
+ * metadata structures (the function box tree).
+ */
 static void
 PrepareNodeForMutation(JSParseNode *pn, JSTreeContext *tc)
 {
...
+        /* Put |pn|'s children (but not |pn| itself) on a work stack. */
         NodeStack stack;
         PushNodeChildren(pn, &stack);
+        /*
+         * For each node on the work stack, push its children on the work stack,
+         * and free the node if we can.
+         */
         while (!stack.empty()) {
             pn = stack.pop();
             if (PushNodeChildren(pn, &stack))
                 AddNodeToFreeList(pn, tc);
         }
     }
 }
http://hg.mozilla.org/tracemonkey/rev/f1be82c29a1e
Whiteboard: hardblocker → hardblocker,fixed-in-tracemonkey
Depends on: 626436
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: