Closed Bug 225831 Opened 22 years ago Closed 22 years ago

Byte code generation: using dups instead of temporaries

Categories

(Rhino Graveyard :: Core, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: igor, Assigned: igor)

References

Details

Attachments

(15 files, 1 obsolete file)

7.36 KB, patch
Details | Diff | Splinter Review
2.16 KB, patch
Details | Diff | Splinter Review
8.49 KB, patch
Details | Diff | Splinter Review
37.18 KB, patch
Details | Diff | Splinter Review
24.72 KB, patch
Details | Diff | Splinter Review
10.52 KB, patch
Details | Diff | Splinter Review
3.50 KB, patch
Details | Diff | Splinter Review
30.05 KB, patch
Details | Diff | Splinter Review
36.42 KB, patch
Details | Diff | Splinter Review
14.41 KB, patch
Details | Diff | Splinter Review
20.36 KB, patch
Details | Diff | Splinter Review
47.27 KB, patch
Details | Diff | Splinter Review
13.38 KB, patch
Details | Diff | Splinter Review
38.10 KB, patch
Details | Diff | Splinter Review
206.64 KB, patch
Details | Diff | Splinter Review
Currently to implement operations such as x += y, ++x and object and array literals, {}, [], parser in Rhino generates nodes to hold temporary values. Interpreter in Rhino does not support more then 256 temporaries and silently recycles them if their number exceeds 256. The following test case demonstrates this interpreter limitation. In principle by changing space taken by temporary index that can be resolved but it seems that eliminating such temporary nodes and using various forms of dup operations on stack would make generated byte code smaller and faster both for interpreter and optimizer.
Assigning to self
Assignee: nboyd → igor
The test evaluates string of the form ++(a[++f().x + ++f().x + ... + ++f().x]) that will expose recycling of temporaries in interpreter mode when there is more then 255 ++f terms.
This is trivial part since the patch changes IR-Factory to always creates Token.INC/Token.DEC nodes for post ++ and --. Previously it happens only for child nodes of ++ or -- where IRFactory.hasSideEffects() returns true but the rest of code in Rhino both in interpreter and optimizer modes contained full support for INC/DEC applied for arbitrary complex lvalue expression. Or did I miss something?
Igor: thank you for this testcase! I've added it to the JS testsuite: mozilla/js/tests/js1_5/Regress/regress-225831.js As Igor indicates, it currently passes in Rhino compiled mode, but fails in interpreted mode: *-* Testcase js1_5/Regress/regress-225831.js failed: Bug Number 225831 STATUS: Stressing the byte code generator Failure messages were: FAILED!: [reported from test()] Section 1 of test - FAILED!: [reported from test()] Expected value '11', Actual value 'NaN'
The patch ads 3 more sections to the test case to cover post --, {} and [] operations. Without the first fix all sections fails, but with the fix applied the session 2 testing post-- passes.
Attachment #135601 - Attachment is obsolete: true
Igor: thanks, I checked in the extension to the test. But note that SpiderMonkey doesn't pass Section 4. The failure looks like this: Expected value = '({1:({a:1}), 2:({a:1}), 3:({a:1}), 4:({a:1}), etc. Actual value = '({1:{a:1}, 2:{a:1}, 3:{a:1}, 4:{a:1}, etc. Rhino and SpiderMonkey seem to differ on how toSource() and uneval() treat objects nested inside objects: js> var obj = { prop:({a:1}) }; --------------------------------- RHINO ----------------------------------- js> obj.toSource(); ({prop:({a:1})}) js> uneval(obj); ({prop:({a:1})}) ------------------------------- SPIDERMONKEY ------------------------------- js> obj.toSource(); ({prop:{a:1}}) js> uneval(obj); ({prop:{a:1}}) That is, SpiderMonkey doesn't put the parens around the inner object. What should we do? Is this something that SpiderMonkey should change, or Rhino?
I will change Rhino to much SM, so the test should use the version without inner (). So the test would be double-edge sword to check both bytecode generation and toSource().
Igor: thanks. I have checked in v 1.3 of the test, without the inner ()'s: RCS file: /cvsroot/mozilla/js/tests/js1_5/Regress/regress-225831.js,v retrieving revision 1.2 diff -r1.2 regress-225831.js 106c106 < // build string of the form ({1:({a:1}), 2:({a:1}), ... N:({a:1})}) --- > // build string of the form ({1:{a:1}, 2:{a:1}, ... N:{a:1}}) 111c111 < arr[i] = i+":({a:1}), "; --- > arr[i] = i+":{a:1}, "; 113c113 < arr[N] = N+":({a:1})})"; --- > arr[N] = N+":{a:1}})"; [//d/JS_TRUNK/mozilla/js/tests/js1_5/Regress] cvs ci regress-225831.js Checking in regress-225831.js; /cvsroot/mozilla/js/tests/js1_5/Regress/regress-225831.js,v <-- regress-225831.js new revision: 1.3; previous revision: 1.2 done
I created a branch in CVS templess_bytecode_225831 to put the code for the bug to merge it later when ready and committed the above patch there.
Status: NEW → ASSIGNED
It turned out that usage of dups or similar stack-based constructions require proper maintenance of in Interpreter.itsStackDept which is not the case now. It is not visible since the only consequence of it is wasted stack space but that will be exposed through stack-based references.
One of the reasons of stack miscalculations is that current IRFactory reuse the code for "if" statement when generation tree for "?". Since then/else parts of "?" places an object to stack while then/else of "if" suppose to keep stack the same, so effectively byte code generation end up assuming 2 additional objects on stack, not one, for each "?". The problem presents in the optimizer as well, so class files claims more stack space then necessary there also. Patch handles that via using Token.HOOK for parse subtree representing "?" and adding code to interpreter/optimizer to process that properly. Patch also adds missed stack adjustments for &&, ||, __proto__ and __parent__ in Interpreter. I will committ the patch to the branch after regression testing.
Attached patch templess switchSplinter Review
Another source for stack depth miscalculations is switch code, where stack was not updated to account for IF jump. The patch resolves via using 2 new byte codes, Icode_DUPSECOND duplicating stack element one below top and Icode_IFEQ_POP which would pop additional stack element in case of successful jump. In this way switch condition will be popped from the stack when done. To prevent stack miscalculations, the patch adds explicit checks for expected stack change during code generation. The overhead of this self checking should be very low. The patch also adds one line fix to optimizer/Codegen to release switch register after code for switches is generated.
Attached patch templess op=Splinter Review
The patch changes generation of code for various op= to use dup commands instead of saving lvalue into temporaries and then restoring it from temporaries to store assignment result. In this way more compact code is generated both for interpreter and optimizer and IRFactory code for op= is simplified since it is no longer necessary to apply optimization of loading numbers and strings twice instead of using temporaries as DUP is the fastest way to duplicate even constant values. The price for this is the need to support 2 additional node types, SETELEM_OP and SETPROP_OP, but most of the code for them is shared with SETELEM and SETPROP so the total code bloat is not that big.
The patch changes code to generate [] and {} to use dups instead of temporaries to get shorter byte code both for interpreter and optimizer. With the patch the test case now passes in the interpreted mode.
The previous patches removes the need to to have Node.cloneNode so the patch removes it.
The patch changes interpreter not to use temporaries to calculate both function reference and function this for function calls. The patch moved away the code to transform parsing tree in NodeTrandformer and instead adds necessary logic to interpreter. The code is moved to the OptTransformer so optimizer still uses it. The patch changes few special interpreter byte codes that supported function this calculations to place on the stack both function reference and its this. In this way byte code for calls became more compact and faster. It seems that direct code generation from non-transformed call subtree would shrink generated byte code in the optimizer as well but since various optimizations there depends on particular transformed tree structure I left it as is since the optimizer handles references properly
Attached patch templess enumSplinter Review
Patch adds new parse tree node, Token.ENUM_ID, to get id string from enumeration object that is used to implement "for (x in y)". In this way code to generate byte code to store the result of enumNext to compare it with null and assign to enum counter becomes unnecessary.
Attached patch templess catchSplinter Review
Currently handling of the catch statement involves creation of 2 locals per catch block, one to store catch object and another to hold catch scope during its setup. The second usage should use temporary instead since additional catch, finally or with blocks can not appear between the initial creation of scope and creation of with object that uses it. So effectively it is a temporary and as previous patches shows it could be eliminated. The patch just does that via changing Token.NEWSCOPE nodes into Token.CATCH_SCOPE since the former was used only to support catch nodes. The new node is responsible for populating properly the scope objects based on the exception thus eliminating the last temporary usage in the Interpreter and simplifying catch parsing trees substantially.
The patch changes omj/optimizer/Block.java to use temporaries instead of locals to hold results of common this subexpressions since in the optimizer temporaries are reused while locals are not unless special measures are taken. Since the previous patches changes moved creation of temporaries to static method in OptTransformer, the change removed the need to pass IRFactory instance around Block and FatBlock and the patch removes it.
Attached patch reuse of localsSplinter Review
To allow reuse of additional locals that interpreter/optimizer has to allocate to support try/catch/finally and for (in) patch adds new parse node node, Token.LOCAL_BLOCK to indicate sequence of statements that needs new local. In this way simple stack-like allocation of local slots in the interpreter provides locals reuse and in the optimizer the locals can be treated in the same way as temporaries and does not require initial preallocation.
The patch adds checks in Interpreter to throw an exception if number of locals exceeds 256 and switch to use activation if number of variables exceeds 256. In this way silent reuse of locals would be avoided and interpreter limitations are exposed earlier. Without the the patch js1_5/Regress/regress-226507.js would go to infinite loop when N there is increased to 257.
Blocks: 226517
The patch adds new parse tree node type, Token.RETURN_POPV, to return the value stored by the last Token.POPV and updates interpreter/optimizer to support POPV for functions and the new nodes. With this it in place NodeTransformer changes code to return from finally to first store via POPV the result of the original return, then execute finally and then return the stored value fixing bug 225831. The patch does not depend on changes to avoid using temporaries during expression evaluation as I initially expected but it does depend on the change to use recursion of NodeTransformer which is committed on the bug branch so log it here.
Attached patch Cumulative patchSplinter Review
Cumulative patch to merger to HEAD
I committed the change
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Verified FIXED. The above testcase used to fail in both the compiled and interpreted modes of Rhino. Now it passes in both -
Status: RESOLVED → VERIFIED
*** Bug 228446 has been marked as a duplicate of this bug. ***
Trageting as resolved against 1.5R5
Target Milestone: --- → 1.5R5
*** Bug 273745 has been marked as a duplicate of this bug. ***
I don't really understand in which version it is fixed/will be fixed. Target milestone indicates 1.5R5 but I have the problem with 1.6R1.
(In reply to comment #29) > I don't really understand in which version it is fixed/will be fixed. Target > milestone indicates 1.5R5 but I have the problem with 1.6R1. That was fixed in 1.5R5. What is your test case? For example, the following case passes for me in either 1.5R5 or 1.6R1: var N = 600; var array = new Array(N); for (var i = 0; i != N; ++i) { array[i] = '{index: '+i+', name: "foo'+i+'"}'; } var script_text = "["+array.join(',\n')+"]"; var script = Script(script_text); var tab = script(); java.lang.System.out.println(uneval(tab[N - 1]));
I think that you're right. I'm using js inside htmlunit and I thought that I had correctly changed my classpath to use version 1.6R1. Testing again it seems to work fine. Sorry for trouble.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: