Closed Bug 535925 Opened 15 years ago Closed 13 years ago

TM: High loop and branch exits and poor performance with JIT in this example

Categories

(Core :: JavaScript Engine, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: duncan.loveday, Assigned: dvander)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression, testcase)

Attachments

(3 files, 1 obsolete file)

691.91 KB, application/x-javascript
Details
691.92 KB, application/x-javascript
Details
402 bytes, application/x-javascript
Details
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 (.NET CLR 2.0.50727)
Build Identifier: Local m-c build as of 18/12/2009

The attached has a loop structure that seems to perform badly with JIT. I get large numbers of LOOP and BRANCH exits


Reproducible: Always

Steps to Reproduce:
1. Run the attached in the shell with TMFLAGS=stats
2.
3.
Actual Results:  
$ ./js test.js
Time=2696
$ ./js -j test.js
Time=5874
$ export TMFLAGS=stats       
$ ./js -j test.js
Time=7915
recorder: started(27), aborted(20), completed(135), different header(0), trees trashed(11), slot promoted(0), unstable loop variable(93), breaks(0), returns(0), merged loop exits(37), unstableInnerCalls(5), blacklisted(3)
monitor: exits(81794), BRANCH(36160), CASE(0), DEFAULT(0), LOOP(45634), NESTED(0), MISMATCH(0), OOM(0), OVERFLOW(0), UNSTABLE_LOOP(0), TIMEOUT(0), DEEP_BAIL(0), STATUS(0), RECURSIVE_UNLINKED(0), RECURSIVE_LOOP(0), RECURSIVE_MISMATCH(0), RECURSIVE_EMPTY_RP(0), RECURSIVE_SLURP_FAIL(0), RECURSIVE_SLURP_MISMATCH(0), timeouts(0), type mismatch(0), triggered(81794), global mismatch(0), flushed(1)
$
Attached file test case
Blocks: 530956
This might have something to do with initialisation of locals to null within a nested loop structure. Attaching a slightly modified case that runs a ton better. The only difference is a couple of locals are not initialised. These locals are set but never read so it shouldn't be affecting the control flow.

$ diff test.js test2.js
33,34c33,34
<                   var inboundREQGraphID=null;
<                   var leftArrowGraphID=null;
---
>                   var inboundREQGraphID; //=null;
>                   var leftArrowGraphID; //=null;
$ diff -u test.js test2.js
--- test.js     2009-12-19 15:01:45.078125000 +0000
+++ test2.js    2009-12-19 15:01:20.468750000 +0000
@@ -30,8 +30,8 @@
                    var messageOtherTierID=values[7];
 
                    var isREQOrRCT=(messageType=="REQ" || messageType=="RCT");
-                   var inboundREQGraphID=null;
-                   var leftArrowGraphID=null;
+                   var inboundREQGraphID; //=null;
+                   var leftArrowGraphID; //=null;
                    var lTierName=null, otherTierKey=null;
 
                    var otherTierIDorName=(messageOtherTierID=="" ? (dir=="IN" ? from : to) : messageOtherTierID);
$ ./js -j test.js
Time=7453
recorder: started(27), aborted(20), completed(135), different header(0), trees trashed(11), slot promoted(0), unstable loop variable(93), breaks(0), returns(0), merged loop exits(37), unstableInnerCalls(5), blacklisted(3)
monitor: exits(81794), BRANCH(36160), CASE(0), DEFAULT(0), LOOP(45634), NESTED(0), MISMATCH(0), OOM(0), OVERFLOW(0), UNSTABLE_LOOP(0), TIMEOUT(0), DEEP_BAIL(0), STATUS(0), RECURSIVE_UNLINKED(0), RECURSIVE_LOOP(0), RECURSIVE_MISMATCH(0), RECURSIVE_EMPTY_RP(0), RECURSIVE_SLURP_FAIL(0), RECURSIVE_SLURP_MISMATCH(0), timeouts(0), type mismatch(0), triggered(81794), global mismatch(0), flushed(1)
$ ./js -j test2.js
Time=3391
recorder: started(32), aborted(26), completed(154), different header(0), trees trashed(13), slot promoted(0), unstable loop variable(94), breaks(0), returns(0), merged loop exits(50), unstableInnerCalls(9), blacklisted(5)
monitor: exits(3618), BRANCH(1736), CASE(0), DEFAULT(0), LOOP(1882), NESTED(0), MISMATCH(0), OOM(0), OVERFLOW(0), UNSTABLE_LOOP(0), TIMEOUT(0), DEEP_BAIL(0), STATUS(0), RECURSIVE_UNLINKED(0), RECURSIVE_LOOP(0), RECURSIVE_MISMATCH(0), RECURSIVE_EMPTY_RP(0), RECURSIVE_SLURP_FAIL(0), RECURSIVE_SLURP_MISMATCH(0), timeouts(0), type mismatch(0), triggered(3618), global mismatch(0), flushed(1)
$
I have been working on a patch to help mitigate this problem, so taking. It is sort of a prerequisite to fixing the NES emulator, JPEG decoder, etc - pretty much anything with deep nesting and lots of variables.
Assignee: general → dvander
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
I talked myself into believing the work on merge traces might help here (though not sure it explains the difference between the two test cases). I counted 384 paths from top to bottom of the inner loop (don't know if they're all taken).
Depends on: 516264
No longer depends on: 516264
(In reply to comment #4)
> I talked myself into believing the work on merge traces might help here...

No, that can't be right. The second test case is no more branchy than the first. I'll shut up.
Though certainly merge traces will help, after playing around with the test case a bit, I think the problem is with variable hoisting. The |var| is hoisted to the top of the function as an |undefined|.

If it's initialized to |null| inside the innermost loop, then there's an initial transition from |undefined| -> |null|. That's fine, but then there's a type change to |null| -> |string| in the middle of the loop, and then future invocations of the loop have a |string| -> |null| transition. This flipping back and forth badly affects the tracing behavior of the outer loops.

When it's left uninitialized, there's only one transition, and everything traces nicely.

I'm working on a patch to mark slots as ignorable for the purpose of connecting outer trees to inner trees - I think (or hope) it will help here.
I think I see. So I ought to be able to reduce the test case quite a bit - the branchiness is a distraction. Might have a go at that tomorrow.

One thing that I don't follow: There are other variables that are declared, initialised and set in much the same way, e.g. "var lTierName=null" but don't seem to disrupt tracing to the same extent (if at all).
Attached file Reduced testcase (obsolete) —
I think this is more or less minimal and hopefully shows the same effect. Seems at least four variables are required (perhaps you cater for up to 16 different type maps ?)

My results are

No JIT
======

$ ./js.exe test4.js
cnt=1000000
Time=8765

WITHOUT INITIALISATION TO null
==============================

$ ./js.exe -j test4.js
cnt=1000000
Time=265
recorder: started(7), aborted(0), completed(22), different header(0), trees trashed(7), slot promoted(0), unstable loop variable(2), breaks(0), returns(0), merged loop exits(15), unstableInnerCalls(0), blacklisted(0)
monitor: exits(73), BRANCH(32), CASE(0), DEFAULT(0), LOOP(41), NESTED(0), MISMATCH(0), OOM(0), OVERFLOW(0), UNSTABLE_LOOP(0), TIMEOUT(0), DEEP_BAIL(0), STATUS(0), RECURSIVE_UNLINKED(0), RECURSIVE_LOOP(0), RECURSIVE_MISMATCH(0), RECURSIVE_EMPTY_RP(0), RECURSIVE_SLURP_FAIL(0), RECURSIVE_SLURP_MISMATCH(0), timeouts(0), type mismatch(0), triggered(73), global mismatch(0), flushed(1)
$

WITH INITIALISATION TO null
===========================

$ ./js.exe -j test4.js
cnt=1000000
Time=15328
recorder: started(30), aborted(18), completed(28), different header(0), trees trashed(12), slot promoted(0), unstable loop variable(27), breaks(0), returns(0), merged loop exits(0), unstableInnerCalls(2), blacklisted(5)
monitor: exits(999987), BRANCH(23), CASE(0), DEFAULT(0), LOOP(999961), NESTED(0), MISMATCH(0), OOM(0), OVERFLOW(0), UNSTABLE_LOOP(3), TIMEOUT(0), DEEP_BAIL(0), STATUS(0), RECURSIVE_UNLINKED(0), RECURSIVE_LOOP(0), RECURSIVE_MISMATCH(0), RECURSIVE_EMPTY_RP(0), RECURSIVE_SLURP_FAIL(0), RECURSIVE_SLURP_MISMATCH(0), timeouts(0), type mismatch(0), triggered(999987), global mismatch(0), flushed(1)
$
Attached file Reduced testcase v2
Slightly reduced
Attachment #418816 - Attachment is obsolete: true
Here is the culprit:

checking vm types 0x83e0dc (ip: 0x30cc45): args0=O/F args1=O/O arguments0=O/N var0=I/I var1=O/N var2=S/N null != tag4
checking vm types 0x849384 (ip: 0x30cc45): args0=O/F args1=O/O arguments0=O/N var0=I/I var1=O/S string != tag0
checking vm types 0x84f6dc (ip: 0x30cc45): args0=O/F args1=O/O arguments0=O/N var0=I/I var1=O/N var2=S/N null != tag4
checking vm types 0x850de4 (ip: 0x30cc45): args0=O/F args1=O/O arguments0=O/N var0=I/I var1=O/S string != tag0
checking vm types 0x85157c (ip: 0x30cc45): args0=O/F args1=O/O arguments0=O/N var0=I/I var1=O/S string != tag0
Blacklisted: too many peer trees.

Merge traces won't help here.

This loop has exponential type explosion and we stop trying to specialize it further. The long-term strategy for this type of code is to back off on aggressive specialization. In the short term we should be able to do some sort of dumb analysis and figure out that those variables don't matter for connecting the loop edge to the header.

My first thought was to do this entirely in the tracer, but it doesn't seem possible. Say a trace has variable |x| that is only used or defined in one or more branches from the main trace. When leaving the trace, there is no (efficient) way to know which set of branches were taken in order to flush native values with the correct types. We could trash trees if this problem came up, and blacklist those slots from being lazily specialized, but that seems pretty complicated, and awful.

So it seems like a dumb parser analysis is the best way to try tackling this. If we can mark which variables have a definition that dominates all uses, we can exclude those variables from poisoning tree stability.
In my last attachment the variables can't impact the loop because they're not in scope.

This is an orthogonal point but has the suggestion been made that once the system has stopped recording traces, whether due to branchiness or too many type maps or any other reason, stats are kept on trace entries vs exits ? If the percentage of bad exits exceeds a threshold, delete all the compiled traces and black list the loop.

Crude but would allow code that can't currently be traced well to revert to interpreter performance rather than suffer a drop in performance due to continual exits.
(In reply to comment #11)
> In my last attachment the variables can't impact the loop

Yup, that is the sort of thing we should be able to recognize.

> but has the suggestion been made that once the system has stopped recording traces

Yes, definitely - I don't think there's an open bug on it yet, but we need a heuristic for situations where a very short inner tree's time is dominated by the entry/exit expense, and an outer tree around it doesn't trace.
The ultimate goal is bug 536277, a hybrid approach where we can de-specialize, still get decent perf on the things that don't trace well, and quickly transition to the inner loops which do trace well.
Thanks David. For me, I feel TM has yet to show its true potential. It performs brilliantly on very simple control structures but the gains quickly evaporate for code of even modest complexity. I now understand a little more about why this is so and I can see there's a great deal of good work going which I am sure will make things better and better so keep it up !
> ... we need a heuristic for situations where a very short inner tree's time is dominated by the entry/exit expense, and an outer tree around it doesn't trace.

Yes, indeed, that is the hard part. I played about with a simple patch where whenever an outer tree is blacklisted, all inner trees are ignored. Huge gain in the test case here but as you point out that's only because the inner tree is so short and entered so often. With a longer inner tree and a short, untraceable outer the patch destroys JIT performance of a well-traced inner loop.

The only heuristic I can think of is to record the elapsed time spent executing the inner tree vs time spent looking up fragments etc. Short loops would have a low percentage of time spent actually executing the tree.
Depends on: 470779
Current js shell results for attached reduced testcase v2:
Interp: 856.004 ms
-j:     872.811 ms
-m:     75.313 ms
-m -n:  46.262 ms

Looks like JM+TI wins here.
Blocks: 467263
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: