494268 - Failure to trace a loop we should really be tracing

Reporter

Description

•

16 years ago

Attached file Testcase — Details

The attached testcase is basically Aaron's testcase from bug 460050 comment 76, but with the loop sizes changed from 1000 to 100 to run a bit faster. The fundamental problem with it is that we end up with about 100*100 trace enters/exits; most are LOOP_EXIT. Taking out the outermost loop (the one to loopCount) and thus running |manyTimes(funcB); funcA();| only once gives us 6 trace enters/exits. Clearly the worst-case behavior here should be 6*100, not 100*100. I suspect that fixing this might go a long way towards improving life on the go game for us.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 1

•

16 years ago

As gal just noted on irc, setting loopCount to 1 gives us 6 exits. Setting it to 2 gives use 1006 exits....

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

16 years ago

Blocks: 460050

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

16 years ago

Flags: blocking1.9.1?

Andreas Gal :gal

Assignee

Comment 2

•

16 years ago

We record identical traces for outer loop 1 or 2, but 2 exits a 1000 times at the loop edge. Working on it.

Andreas Gal :gal

Assignee

Comment 3

•

16 years ago

Ok, I figured out whats going on here. This function here calls its argument: function manyTimes(func) { // was eachInt for(var i = 0; i < 1000; i++) { func(); } } We initially pass it funcB: function funcB() { for (var i = 0; i < 10; ++i); } here: manyTimes(funcB); That creates a happy and fast nested tree. Then we call funcA, which calls manyTimes as follows: function funcA() { // was findChains var n = 0; manyTimes(function() { n += 1; }); } So we go into the loop in manyTimes, and fall on our face repeatedly, because we can't trace the flat closure lambda function. Every time we abort, we blacklist THE TREE we were coming from. That tree contains the outer loop for funcA. The part we abort on is a branch we attach to the func function dispatch. That blacklists the entire tree. We do this because sometimes we manage to capture a part of the control flow, but abort somewhere else and the overall thing is slow. So basically the second use of manyTimes kills all uses of it because we abort on it repeatedly. All that remains is the for 1..10 loop, which we execute a gazillion times. This kinda blows. The best fix is not to abort. The second best fix would be to change blacklist, which I am totally terrified to do, but I can discuss it with dmandelin. Maybe we don't want to blacklist if we are recording a secondary trace (branch). David? Anyway, this is not some terrible correctness bug as I initially suspected. Might drop blocking.

Andreas Gal :gal

Assignee

Comment 4

•

16 years ago

Btw, not blacklist here doesn't help either. If we do so, we get even more side exits total, because now the loop in manyTimes is always entered, even if we don't cover half of it, and that half keeps immediately exiting (since we couldn't extend the tree there due to lambda_fc). Summary: need lambda_fc and bindname. I don't think thats feasible for 3.5. I am not happy with a 1.3 slowdown either, but we will always have such cases as long we can't trace everything and we still have a bunch of stuff in that department.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 5

•

16 years ago

Hmm. I'd thought we blacklisted branches, not trees, but that might have just been a proposal. If we blacklist entire trees, then the behavior here makes sense. While having lambda_fc would help this testcase, there are also other aborts on the original go game... That said, entering the manyTimes loop every time and then immediately exiting if we call the func that does the lambda_fc shouldn't give us more side-exits on this testcase, should it? I can confirm that it _does_, but it seems like it _should_ give us one side-exit (a branch exit) per invocation of funcA(). Is that not the case? I can also confirm that completely turning off blacklisting just slows down the go testcase, fwiw.

Brendan Eich [:brendan]

Comment 6

•

16 years ago

JSOP_LAMBDA_FC is easy now that js_GetUpvarOnTrace is correct. Given the function to clone, fun, you just need to build a flat closure from a builtin by calling js_GetUpvarOnTrace for all upvars in JS_SCRIPT_UPVARS(fun->u.i.script). I will do it later today; dmandelin should steal if he can do it sooner. /be

Assignee: general → brendan

Brendan Eich [:brendan]

Updated

•

16 years ago

Status: NEW → ASSIGNED

OS: Mac OS X → All

Priority: -- → P1

Hardware: x86 → All

Target Milestone: --- → mozilla1.9.1

Boris Zbarsky [:bzbarsky]

Reporter

Comment 7

•

16 years ago

Let's keep that in the lambda_fc bug, mkay? ;) Given comment 3, this should probably not be blocking.

Andreas Gal :gal

Assignee

Comment 8

•

16 years ago

Concur with brendan, we can do LAMBDA_FC (I can probably do it, looked easy indeed, I would unroll the GetUpvars loop). Problem is still BINDNAME though. Either way, I don't think we have to block on this. We can bring this online in 3.5.x if we are confident about it.

Brendan Eich [:brendan]

Updated

•

16 years ago

Assignee: brendan → gal

Brendan Eich [:brendan]

Comment 9

•

16 years ago

BINDNAME is doable too, dmandelin and I talked about it and SetPropHit, which need to work when a Call object is at the scope chain head, instead of insisting on a global object (BINDNAME copes with Blocks too, SetPropHit doesn't yet; this all should be rolled up into a helper related to activeCallOrGlobalSlot, if not a generalization of that helper). What is the BINDNAME bug? /be

Benjamin Smedberg

Updated

•

16 years ago

Flags: wanted1.9.1.x?

Flags: blocking1.9.1?

Flags: blocking1.9.1-

David Mandelin [:dmandelin]

Updated

•

16 years ago

Depends on: 495323

David Mandelin [:dmandelin]

Updated

•

16 years ago

Depends on: 494269

Brendan Eich [:brendan]

Comment 10

•

16 years ago

Please file and link the BINDNAME bug up as a blocker. And any others, but maybe they all get patched under that bug ;-). /be

No longer depends on: 495323

David Mandelin [:dmandelin]

Updated

•

16 years ago

Depends on: 495329

David Mandelin [:dmandelin]

Updated

•

16 years ago

Depends on: 495331

David Mandelin [:dmandelin]

Updated

•

16 years ago

Depends on: 507449

Samuel Sidler (old account; do not CC)

Updated

•

16 years ago

status1.9.1: --- → ?

Flags: wanted1.9.1.x?

Paul Wright

Comment 11

•

15 years ago

Is this bug still applicable?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 12

•

15 years ago

Applicable in what sense? On the original testcase in this bug, jitstats on today's tip look like this: recorder: started(6), aborted(4), completed(34), different header(0), trees trashed(2), slot promoted(0), unstable loop variable(1), breaks(0), returns(0), merged loop exits(31), unstableInnerCalls(1), blacklisted(1) monitor: exits(6996), timeouts(0), type mismatch(0), triggered(6996), global mismatch(0), flushed(0) That's fairly suboptimal, though about 14 times fewer exits than when I filed. Then again I get those stats even if I take out the manyTimes() call in the main loop body and just leave the funcA() call in there. Which seems broken to me; why are we blackisting on inner trees trying to grow, for example (the only aborts in the testcase)? Why are we getting growing inner trees at all? Perhaps that should be filed as a separate focused bug; Andreas/dvander/Brendan would have to make that call, since I'm a little lost in the details of the discussion above at this point.

David Mandelin [:dmandelin]

Comment 13

•

15 years ago

It is probably not necessary to work on making this trace at this point, unless it is easy. Currently, JM runs this test about 3x faster than we do with the tracer turned on, so it would probably run at about the same speed as in JM if we did make it trace. But we are still 4x slower than JSC on this test in JM, just because we have not optimized setting closed-over variables yet. I will file.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 14

•

15 years ago

I thought we had PICs for sets on Call objects now. Or is that not the same as what comment 13 is talking about?

Ryan VanderMeulen [:RyanVM]

Comment 15

•

14 years ago

Obsolete with the removal of tracejit. JM+TI looks pretty good on the attached testcase. Interp: loop count: 100, run time: 20 JM: loop count: 100, run time: 13 JM+TI: loop count: 100, run time: 2 d8: loop count: 100, run time: 1

Blocks: 467263

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → WONTFIX