Closed Bug 494268 Opened 15 years ago Closed 13 years ago

Failure to trace a loop we should really be tracing

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED WONTFIX
mozilla1.9.1
Tracking Status
status1.9.1 --- ?

People

(Reporter: bzbarsky, Assigned: gal)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

706 bytes, application/x-javascript
Details
Attached file Testcase
The attached testcase is basically Aaron's testcase from bug 460050 comment 76, but with the loop sizes changed from 1000 to 100 to run a bit faster.

The fundamental problem with it is that we end up with about 100*100 trace enters/exits; most are LOOP_EXIT.

Taking out the outermost loop (the one to loopCount) and thus running |manyTimes(funcB); funcA();| only once gives us 6 trace enters/exits.  Clearly the worst-case behavior here should be 6*100, not 100*100.

I suspect that fixing this might go a long way towards improving life on the go game for us.
As gal just noted on irc, setting loopCount to 1 gives us 6 exits.  Setting it to 2 gives use 1006 exits....
Blocks: 460050
Flags: blocking1.9.1?
We record identical traces for outer loop 1 or 2, but 2 exits a 1000 times at the loop edge. Working on it.
Ok, I figured out whats going on here.

This function here calls its argument:

function manyTimes(func) {  // was eachInt
    for(var i = 0; i < 1000; i++) {
	func();
    }
}

We initially pass it funcB:

function funcB() {
    for (var i = 0; i < 10; ++i);
}

here:

manyTimes(funcB);

That creates a happy and fast nested tree.

Then we call funcA, which calls manyTimes as follows:

function funcA() {  // was findChains
    var n = 0;
    manyTimes(function() {
	    n += 1;
	});
}

So we go into the loop in manyTimes, and fall on our face repeatedly, because we can't trace the flat closure lambda function.

Every time we abort, we blacklist THE TREE we were coming from. That tree contains the outer loop for funcA. The part we abort on is a branch we attach to the func function dispatch. That blacklists the entire tree. We do this because sometimes we manage to capture a part of the control flow, but abort somewhere else and the overall thing is slow.

So basically the second use of manyTimes kills all uses of it because we abort on it repeatedly. All that remains is the for 1..10 loop, which we execute a gazillion times.

This kinda blows. The best fix is not to abort. The second best fix would be to change blacklist, which I am totally terrified to do, but I can discuss it with dmandelin. Maybe we don't want to blacklist if we are recording a secondary trace (branch). David?

Anyway, this is not some terrible correctness bug as I initially suspected. Might drop blocking.
Btw, not blacklist here doesn't help either. If we do so, we get even more side exits total, because now the loop in manyTimes is always entered, even if we don't cover half of it, and that half keeps immediately exiting (since we couldn't extend the tree there due to lambda_fc).

Summary: need lambda_fc and bindname. I don't think thats feasible for 3.5. I am not happy with a 1.3 slowdown either, but we will always have such cases as long we can't trace everything and we still have a bunch of stuff in that department.
Hmm.  I'd thought we blacklisted branches, not trees, but that might have just been a proposal.  If we blacklist entire trees, then the behavior here makes sense.

While having lambda_fc would help this testcase, there are also other aborts on the original go game...  

That said, entering the manyTimes loop every time and then immediately exiting if we call the func that does the lambda_fc shouldn't give us more side-exits on this testcase, should it?  I can confirm that it _does_, but it seems like it _should_ give us one side-exit (a branch exit) per invocation of funcA().  Is that not the case?

I can also confirm that completely turning off blacklisting just slows down the go testcase, fwiw.
JSOP_LAMBDA_FC is easy now that js_GetUpvarOnTrace is correct. Given the function to clone, fun, you just need to build a flat closure from a builtin by calling js_GetUpvarOnTrace for all upvars in JS_SCRIPT_UPVARS(fun->u.i.script). I will do it later today; dmandelin should steal if he can do it sooner.

/be
Assignee: general → brendan
Status: NEW → ASSIGNED
OS: Mac OS X → All
Priority: -- → P1
Hardware: x86 → All
Target Milestone: --- → mozilla1.9.1
Let's keep that in the lambda_fc bug, mkay?  ;)

Given comment 3, this should probably not be blocking.
Concur with brendan, we can do LAMBDA_FC (I can probably do it, looked easy indeed, I would unroll the GetUpvars loop). Problem is still BINDNAME though. Either way, I don't think we have to block on this. We can bring this online in 3.5.x if we are confident about it.
Assignee: brendan → gal
BINDNAME is doable too, dmandelin and I talked about it and SetPropHit, which need to work when a Call object is at the scope chain head, instead of insisting on a global object (BINDNAME copes with Blocks too, SetPropHit doesn't yet; this all should be rolled up into a helper related to activeCallOrGlobalSlot, if not a generalization of that helper).

What is the BINDNAME bug?

/be
Flags: wanted1.9.1.x?
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Depends on: 495323
Depends on: 494269
Please file and link the BINDNAME bug up as a blocker. And any others, but maybe they all get patched under that bug ;-).

/be
No longer depends on: 495323
Depends on: 495329
Depends on: 495331
Depends on: 507449
status1.9.1: --- → ?
Flags: wanted1.9.1.x?
Is this bug still applicable?
Applicable in what sense?

On the original testcase in this bug, jitstats on today's tip look like this:

recorder: started(6), aborted(4), completed(34), different header(0), trees trashed(2), slot promoted(0), unstable loop variable(1), breaks(0), returns(0), merged loop exits(31), unstableInnerCalls(1), blacklisted(1)
monitor: exits(6996), timeouts(0), type mismatch(0), triggered(6996), global mismatch(0), flushed(0)

That's fairly suboptimal, though about 14 times fewer exits than when I filed.  Then again I get those stats even if I take out the manyTimes() call in the main loop body and just leave the funcA() call in there.  Which seems broken to me;  why are we blackisting on inner trees trying to grow, for example (the only aborts in the testcase)?  Why are we getting growing inner trees at all?

Perhaps that should be filed as a separate focused bug; Andreas/dvander/Brendan would have to make that call, since I'm a little lost in the details of the discussion above at this point.
It is probably not necessary to work on making this trace at this point, unless it is easy. Currently, JM runs this test about 3x faster than we do with the tracer turned on, so it would probably run at about the same speed as in JM if we did make it trace.

But we are still 4x slower than JSC on this test in JM, just because we have not optimized setting closed-over variables yet. I will file.
I thought we had PICs for sets on Call objects now.  Or is that not the same as what comment 13 is talking about?
Obsolete with the removal of tracejit.

JM+TI looks pretty good on the attached testcase.
Interp: loop count: 100, run time: 20
JM: loop count: 100, run time: 13
JM+TI: loop count: 100, run time: 2
d8: loop count: 100, run time: 1
Blocks: 467263
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: