Closed
Bug 535925
Opened 15 years ago
Closed 13 years ago
TM: High loop and branch exits and poor performance with JIT in this example
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: duncan.loveday, Assigned: dvander)
References
(Blocks 1 open bug)
Details
(Keywords: perf, regression, testcase)
Attachments
(3 files, 1 obsolete file)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 (.NET CLR 2.0.50727) Build Identifier: Local m-c build as of 18/12/2009 The attached has a loop structure that seems to perform badly with JIT. I get large numbers of LOOP and BRANCH exits Reproducible: Always Steps to Reproduce: 1. Run the attached in the shell with TMFLAGS=stats 2. 3. Actual Results: $ ./js test.js Time=2696 $ ./js -j test.js Time=5874 $ export TMFLAGS=stats $ ./js -j test.js Time=7915 recorder: started(27), aborted(20), completed(135), different header(0), trees trashed(11), slot promoted(0), unstable loop variable(93), breaks(0), returns(0), merged loop exits(37), unstableInnerCalls(5), blacklisted(3) monitor: exits(81794), BRANCH(36160), CASE(0), DEFAULT(0), LOOP(45634), NESTED(0), MISMATCH(0), OOM(0), OVERFLOW(0), UNSTABLE_LOOP(0), TIMEOUT(0), DEEP_BAIL(0), STATUS(0), RECURSIVE_UNLINKED(0), RECURSIVE_LOOP(0), RECURSIVE_MISMATCH(0), RECURSIVE_EMPTY_RP(0), RECURSIVE_SLURP_FAIL(0), RECURSIVE_SLURP_MISMATCH(0), timeouts(0), type mismatch(0), triggered(81794), global mismatch(0), flushed(1) $
Reporter | ||
Comment 1•15 years ago
|
||
Reporter | ||
Comment 2•15 years ago
|
||
This might have something to do with initialisation of locals to null within a nested loop structure. Attaching a slightly modified case that runs a ton better. The only difference is a couple of locals are not initialised. These locals are set but never read so it shouldn't be affecting the control flow.
$ diff test.js test2.js
33,34c33,34
< var inboundREQGraphID=null;
< var leftArrowGraphID=null;
---
> var inboundREQGraphID; //=null;
> var leftArrowGraphID; //=null;
$ diff -u test.js test2.js
--- test.js 2009-12-19 15:01:45.078125000 +0000
+++ test2.js 2009-12-19 15:01:20.468750000 +0000
@@ -30,8 +30,8 @@
var messageOtherTierID=values[7];
var isREQOrRCT=(messageType=="REQ" || messageType=="RCT");
- var inboundREQGraphID=null;
- var leftArrowGraphID=null;
+ var inboundREQGraphID; //=null;
+ var leftArrowGraphID; //=null;
var lTierName=null, otherTierKey=null;
var otherTierIDorName=(messageOtherTierID=="" ? (dir=="IN" ? from : to) : messageOtherTierID);
$ ./js -j test.js
Time=7453
recorder: started(27), aborted(20), completed(135), different header(0), trees trashed(11), slot promoted(0), unstable loop variable(93), breaks(0), returns(0), merged loop exits(37), unstableInnerCalls(5), blacklisted(3)
monitor: exits(81794), BRANCH(36160), CASE(0), DEFAULT(0), LOOP(45634), NESTED(0), MISMATCH(0), OOM(0), OVERFLOW(0), UNSTABLE_LOOP(0), TIMEOUT(0), DEEP_BAIL(0), STATUS(0), RECURSIVE_UNLINKED(0), RECURSIVE_LOOP(0), RECURSIVE_MISMATCH(0), RECURSIVE_EMPTY_RP(0), RECURSIVE_SLURP_FAIL(0), RECURSIVE_SLURP_MISMATCH(0), timeouts(0), type mismatch(0), triggered(81794), global mismatch(0), flushed(1)
$ ./js -j test2.js
Time=3391
recorder: started(32), aborted(26), completed(154), different header(0), trees trashed(13), slot promoted(0), unstable loop variable(94), breaks(0), returns(0), merged loop exits(50), unstableInnerCalls(9), blacklisted(5)
monitor: exits(3618), BRANCH(1736), CASE(0), DEFAULT(0), LOOP(1882), NESTED(0), MISMATCH(0), OOM(0), OVERFLOW(0), UNSTABLE_LOOP(0), TIMEOUT(0), DEEP_BAIL(0), STATUS(0), RECURSIVE_UNLINKED(0), RECURSIVE_LOOP(0), RECURSIVE_MISMATCH(0), RECURSIVE_EMPTY_RP(0), RECURSIVE_SLURP_FAIL(0), RECURSIVE_SLURP_MISMATCH(0), timeouts(0), type mismatch(0), triggered(3618), global mismatch(0), flushed(1)
$
Assignee | ||
Comment 3•15 years ago
|
||
I have been working on a patch to help mitigate this problem, so taking. It is sort of a prerequisite to fixing the NES emulator, JPEG decoder, etc - pretty much anything with deep nesting and lots of variables.
Assignee: general → dvander
Assignee | ||
Updated•15 years ago
|
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Reporter | ||
Comment 4•15 years ago
|
||
I talked myself into believing the work on merge traces might help here (though not sure it explains the difference between the two test cases). I counted 384 paths from top to bottom of the inner loop (don't know if they're all taken).
Depends on: 516264
Reporter | ||
Comment 5•15 years ago
|
||
(In reply to comment #4) > I talked myself into believing the work on merge traces might help here... No, that can't be right. The second test case is no more branchy than the first. I'll shut up.
Assignee | ||
Comment 6•15 years ago
|
||
Though certainly merge traces will help, after playing around with the test case a bit, I think the problem is with variable hoisting. The |var| is hoisted to the top of the function as an |undefined|. If it's initialized to |null| inside the innermost loop, then there's an initial transition from |undefined| -> |null|. That's fine, but then there's a type change to |null| -> |string| in the middle of the loop, and then future invocations of the loop have a |string| -> |null| transition. This flipping back and forth badly affects the tracing behavior of the outer loops. When it's left uninitialized, there's only one transition, and everything traces nicely. I'm working on a patch to mark slots as ignorable for the purpose of connecting outer trees to inner trees - I think (or hope) it will help here.
Reporter | ||
Comment 7•15 years ago
|
||
I think I see. So I ought to be able to reduce the test case quite a bit - the branchiness is a distraction. Might have a go at that tomorrow. One thing that I don't follow: There are other variables that are declared, initialised and set in much the same way, e.g. "var lTierName=null" but don't seem to disrupt tracing to the same extent (if at all).
Reporter | ||
Comment 8•15 years ago
|
||
I think this is more or less minimal and hopefully shows the same effect. Seems at least four variables are required (perhaps you cater for up to 16 different type maps ?) My results are No JIT ====== $ ./js.exe test4.js cnt=1000000 Time=8765 WITHOUT INITIALISATION TO null ============================== $ ./js.exe -j test4.js cnt=1000000 Time=265 recorder: started(7), aborted(0), completed(22), different header(0), trees trashed(7), slot promoted(0), unstable loop variable(2), breaks(0), returns(0), merged loop exits(15), unstableInnerCalls(0), blacklisted(0) monitor: exits(73), BRANCH(32), CASE(0), DEFAULT(0), LOOP(41), NESTED(0), MISMATCH(0), OOM(0), OVERFLOW(0), UNSTABLE_LOOP(0), TIMEOUT(0), DEEP_BAIL(0), STATUS(0), RECURSIVE_UNLINKED(0), RECURSIVE_LOOP(0), RECURSIVE_MISMATCH(0), RECURSIVE_EMPTY_RP(0), RECURSIVE_SLURP_FAIL(0), RECURSIVE_SLURP_MISMATCH(0), timeouts(0), type mismatch(0), triggered(73), global mismatch(0), flushed(1) $ WITH INITIALISATION TO null =========================== $ ./js.exe -j test4.js cnt=1000000 Time=15328 recorder: started(30), aborted(18), completed(28), different header(0), trees trashed(12), slot promoted(0), unstable loop variable(27), breaks(0), returns(0), merged loop exits(0), unstableInnerCalls(2), blacklisted(5) monitor: exits(999987), BRANCH(23), CASE(0), DEFAULT(0), LOOP(999961), NESTED(0), MISMATCH(0), OOM(0), OVERFLOW(0), UNSTABLE_LOOP(3), TIMEOUT(0), DEEP_BAIL(0), STATUS(0), RECURSIVE_UNLINKED(0), RECURSIVE_LOOP(0), RECURSIVE_MISMATCH(0), RECURSIVE_EMPTY_RP(0), RECURSIVE_SLURP_FAIL(0), RECURSIVE_SLURP_MISMATCH(0), timeouts(0), type mismatch(0), triggered(999987), global mismatch(0), flushed(1) $
Reporter | ||
Comment 9•15 years ago
|
||
Slightly reduced
Attachment #418816 -
Attachment is obsolete: true
Assignee | ||
Comment 10•15 years ago
|
||
Here is the culprit: checking vm types 0x83e0dc (ip: 0x30cc45): args0=O/F args1=O/O arguments0=O/N var0=I/I var1=O/N var2=S/N null != tag4 checking vm types 0x849384 (ip: 0x30cc45): args0=O/F args1=O/O arguments0=O/N var0=I/I var1=O/S string != tag0 checking vm types 0x84f6dc (ip: 0x30cc45): args0=O/F args1=O/O arguments0=O/N var0=I/I var1=O/N var2=S/N null != tag4 checking vm types 0x850de4 (ip: 0x30cc45): args0=O/F args1=O/O arguments0=O/N var0=I/I var1=O/S string != tag0 checking vm types 0x85157c (ip: 0x30cc45): args0=O/F args1=O/O arguments0=O/N var0=I/I var1=O/S string != tag0 Blacklisted: too many peer trees. Merge traces won't help here. This loop has exponential type explosion and we stop trying to specialize it further. The long-term strategy for this type of code is to back off on aggressive specialization. In the short term we should be able to do some sort of dumb analysis and figure out that those variables don't matter for connecting the loop edge to the header. My first thought was to do this entirely in the tracer, but it doesn't seem possible. Say a trace has variable |x| that is only used or defined in one or more branches from the main trace. When leaving the trace, there is no (efficient) way to know which set of branches were taken in order to flush native values with the correct types. We could trash trees if this problem came up, and blacklist those slots from being lazily specialized, but that seems pretty complicated, and awful. So it seems like a dumb parser analysis is the best way to try tackling this. If we can mark which variables have a definition that dominates all uses, we can exclude those variables from poisoning tree stability.
Reporter | ||
Comment 11•15 years ago
|
||
In my last attachment the variables can't impact the loop because they're not in scope. This is an orthogonal point but has the suggestion been made that once the system has stopped recording traces, whether due to branchiness or too many type maps or any other reason, stats are kept on trace entries vs exits ? If the percentage of bad exits exceeds a threshold, delete all the compiled traces and black list the loop. Crude but would allow code that can't currently be traced well to revert to interpreter performance rather than suffer a drop in performance due to continual exits.
Assignee | ||
Comment 12•15 years ago
|
||
(In reply to comment #11) > In my last attachment the variables can't impact the loop Yup, that is the sort of thing we should be able to recognize. > but has the suggestion been made that once the system has stopped recording traces Yes, definitely - I don't think there's an open bug on it yet, but we need a heuristic for situations where a very short inner tree's time is dominated by the entry/exit expense, and an outer tree around it doesn't trace.
Assignee | ||
Comment 13•15 years ago
|
||
The ultimate goal is bug 536277, a hybrid approach where we can de-specialize, still get decent perf on the things that don't trace well, and quickly transition to the inner loops which do trace well.
Reporter | ||
Comment 14•15 years ago
|
||
Thanks David. For me, I feel TM has yet to show its true potential. It performs brilliantly on very simple control structures but the gains quickly evaporate for code of even modest complexity. I now understand a little more about why this is so and I can see there's a great deal of good work going which I am sure will make things better and better so keep it up !
Reporter | ||
Comment 15•15 years ago
|
||
> ... we need a heuristic for situations where a very short inner tree's time is dominated by the entry/exit expense, and an outer tree around it doesn't trace.
Yes, indeed, that is the hard part. I played about with a simple patch where whenever an outer tree is blacklisted, all inner trees are ignored. Huge gain in the test case here but as you point out that's only because the inner tree is so short and entered so often. With a longer inner tree and a short, untraceable outer the patch destroys JIT performance of a well-traced inner loop.
The only heuristic I can think of is to record the elapsed time spent executing the inner tree vs time spent looking up fragments etc. Short loops would have a low percentage of time spent actually executing the tree.
Comment 16•13 years ago
|
||
Current js shell results for attached reduced testcase v2: Interp: 856.004 ms -j: 872.811 ms -m: 75.313 ms -m -n: 46.262 ms Looks like JM+TI wins here.
You need to log in
before you can comment on or make changes to this bug.
Description
•