Last Comment Bug 647624 - Remove property access inc/dec ops
: Remove property access inc/dec ops
Status: RESOLVED FIXED
[not a good introductory bug]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: ---
Assigned To: David Anderson [:dvander]
:
Mentors:
Depends on: 672076 672153 673710 673731 706710
Blocks: 647620 667096
  Show dependency treegraph
 
Reported: 2011-04-03 18:43 PDT by Brian Hackett (:bhackett)
Modified: 2011-11-30 16:33 PST (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
WIP v1 (25.18 KB, patch)
2011-04-07 13:28 PDT, David Anderson [:dvander]
no flags Details | Diff | Review
WIP v2 remove elem ops too (109.05 KB, patch)
2011-04-07 14:13 PDT, David Anderson [:dvander]
no flags Details | Diff | Review
remove prop/elem inc ops (43.08 KB, patch)
2011-04-07 14:41 PDT, David Anderson [:dvander]
dmandelin: review+
Details | Diff | Review
updated (86.93 KB, patch)
2011-07-07 16:10 PDT, Brian Hackett (:bhackett)
dvander: review+
Details | Diff | Review
followup (16.41 KB, patch)
2011-07-16 13:55 PDT, Brian Hackett (:bhackett)
dvander: review+
Details | Diff | Review
followup 2 (10.87 KB, patch)
2011-07-26 08:53 PDT, Brian Hackett (:bhackett)
dvander: review+
Details | Diff | Review

Description Brian Hackett (:bhackett) 2011-04-03 18:43:52 PDT
The pre- and post- inc/dec ops are very complex opcodes to implement in the interpreter and JITs.  Handling overflow in these is very tricky for type inference and has caused a few bugs.  JM implements these by decomposing them into their basic operations (i.e. propinc => getprop, convert to number, add 1, setprop), which is pretty unsatisfying both because of the ugliness and because of a vague desire going forward to make one stub call per opcode.  TM implements INCELEM using imacros, which in a microbenchmark are more than eight times slower than if the op is expanded.  TM implements the other inc/dec ops directly; times are shown below.  INCPROP is slower than the expanded version, other incops are 20% to 100% faster.

INCELEM

x[i] = x[i] + 1  205
x[i]++           1700
++x[i]           1700

INCPROP

x.p = x.p + 1  440
x.p++          503
++x.p          503

INCGNAME

x = x + 1      330
x++            217
++x            217

INCNAME

x = x + 1      619
x++            500
++x            500

INCLOCAL/INCARG

a = a + 1      760
a++            340
++a            340

INCLOCAL and INCARG probably shouldn't be removed (at least for now): they are significantly faster in the tracer, and they simplify upcoming analysis for bounds check hoisting (determining linear relationships holding in a loop).  INCELEM/INCPROP could be removed immediately and improve perf (to my knowledge, these are not used at all on the three main benchmarks we track).  Provided that the perf issues with INCGNAME and INCNAME could be sorted, those would also be nice to remove; the main issue is in combining the multiple property accesses with an arithmetic operation in the same bytecode.

Can the tracer eliminate redundant shape guards when there are no intervening calls that can change object shapes?  If so, it doesn't seem that 'x = x + 1' should be any slower than 'x++' for a GNAME or NAME.
Comment 1 Nicholas Nethercote [:njn] 2011-04-04 17:29:10 PDT
Some redundant guard elimination occurs in the tracer, but it all depends on what occurs in between.

Seeing the LIR for all of these alternative versions would be very instructive!
Comment 2 David Anderson [:dvander] 2011-04-07 13:28:21 PDT
Created attachment 524466 [details] [diff] [review]
WIP v1

To avoid too much mucking with the decompiler and emitter, the goal of this patch is to perform the decomposition in the emitter and transform the existing ops into nops. There's a PCDELTA srcnote so the decompiler skips the decomposition. The method JIT, tracer, and interpreter implementations are removed.

The method JIT needed some hackery to implement JSOP_PICK.
Comment 3 David Anderson [:dvander] 2011-04-07 14:13:19 PDT
Created attachment 524489 [details] [diff] [review]
WIP v2 remove elem ops too
Comment 4 David Anderson [:dvander] 2011-04-07 14:41:56 PDT
Created attachment 524494 [details] [diff] [review]
remove prop/elem inc ops

These cases were slower in the tracer than their decomposed counterparts. I'll leave the NAME/GNAME ops for a follow-up bug since those are currently faster in the tracer.

 11 files changed, 265 insertions(+), 477 deletions(-)

No perf change on ss/v8. Some microbenchmarks that use the old/new ops in a loop:

ELEMINC benchmark:
  pre -m:  71ms
  pre -j:  59ms
  post -m: 55ms
  post -j: 59ms

PROPINC benchmark:
  pre -m:  48ms
  pre -j:  26ms
  post -m: 56ms
  post -j: 35ms
Comment 5 David Anderson [:dvander] 2011-04-07 14:44:23 PDT
The POSTINC benchmark is:
    for (var i = 0; i < 10000000; i++) {
        a.x++;
    }

So it got a fraction slower per iteration, but I think this is well worth it for the simplicity it will bring to TI and ionmonkey.
Comment 6 Brian Hackett (:bhackett) 2011-04-07 14:52:50 PDT
Oops, I suspect in comment 0 I was adding undefined properties of the object/array for INCELEM and INCPROP.
Comment 7 Brian Hackett (:bhackett) 2011-04-07 14:56:48 PDT
Also, for GNAME here is the tracer-generated code for infinite loops incrementing values:

x++

1: x/i $pc  0x69ffbc:	mov    0x10(%ecx),%esi
1: x/i $pc  0x69ffbf:	mov    0x827440,%ebx
1: x/i $pc  0x69ffc5:	test   %ebx,%ebx
1: x/i $pc  0x69ffc7:	jne    0x6affa8
1: x/i $pc  0x69ffcd:	mov    0x638(%esi),%ebx
1: x/i $pc  0x69ffd3:	add    $0x1,%ebx
1: x/i $pc  0x69ffd6:	jo     0x6affbd
1: x/i $pc  0x69ffdc:	mov    %ebx,0x638(%esi)
1: x/i $pc  0x69ffe2:	jmp    0x69ffbc

x = x + 1

1: x/i $pc  0x69ffa9:	mov    0xc(%ecx),%edi
1: x/i $pc  0x69ffac:	mov    0x10(%ecx),%esi
1: x/i $pc  0x69ffaf:	mov    0x827440,%ebx
1: x/i $pc  0x69ffb5:	test   %ebx,%ebx
1: x/i $pc  0x69ffb7:	jne    0x6affa8
1: x/i $pc  0x69ffbd:	movl   $0x1402028,(%edi)
1: x/i $pc  0x69ffc3:	mov    0x638(%esi),%ebx
1: x/i $pc  0x69ffc9:	mov    %ebx,0x8(%edi)
1: x/i $pc  0x69ffcc:	movl   $0x1,0x10(%edi)
1: x/i $pc  0x69ffd3:	add    $0x1,%ebx
1: x/i $pc  0x69ffd6:	jo     0x6affbd
1: x/i $pc  0x69ffdc:	mov    %ebx,0x638(%esi)
1: x/i $pc  0x69ffe2:	jmp    0x69ffa9

It looks like the shape guard on the setprop *is* eliminated, but there are three extra stores issued before the overflow jump (syncing the stack?) and an extra load at the head of the loop (loading stack pointer?).
Comment 8 Boris Zbarsky [:bz] 2011-04-07 15:14:10 PDT
Are those something we think we'll be able to eliminate later?  Making x++ pretty fast is actually important in some "benchmarks"...
Comment 9 David Anderson [:dvander] 2011-04-07 15:17:40 PDT
I'm not removing (x++) yet where x is an arg/local. Eliminating the stack stores is not easy in the tracer. JM should be able to match the tracer, eventually, and then we can remove these ops entirely.
Comment 10 David Mandelin [:dmandelin] 2011-04-07 18:20:14 PDT
Comment on attachment 524494 [details] [diff] [review]
remove prop/elem inc ops

Patch looks good.

My only concern is about whether we won't like the effect on interpreter performance. Of course, in Firefox we run in the jits. David Anderson also points out that these ops are rare. So it seems OK. If we end up needing to get that interp perf back, it seems we can just put the interp cases back in, and let them skip the following ops--and then we still get to keep the jit simplifications.
Comment 11 Brian Hackett (:bhackett) 2011-07-06 08:32:44 PDT
(In reply to comment #10)
> Comment on attachment 524494 [details] [diff] [review] [review]
> remove prop/elem inc ops
> 
> Patch looks good.
> 
> My only concern is about whether we won't like the effect on interpreter
> performance. Of course, in Firefox we run in the jits. David Anderson also
> points out that these ops are rare. So it seems OK. If we end up needing to
> get that interp perf back, it seems we can just put the interp cases back
> in, and let them skip the following ops--and then we still get to keep the
> jit simplifications.

Would be good to get this in pretty soon.  It is also I think worth revisiting and extending the idea in comment 10 to remove all property access ops (including NAME/GNAME/etc.) from JM only: keep the ops and their semantics, but follow them up with their broken down version.  The interpreter and tracer can run the fat ops, and JM and analysis/inference can run the broken down ops.  If we need to rejoin in the middle of an INCPROP, we can point the regs at the appropriate place to resume and the interpreter will know what to do.
Comment 12 David Anderson [:dvander] 2011-07-06 12:59:34 PDT
Yeah, this patch absolutely has to get in for IonMonkey to handle these opcodes. What do NAME/GNAME break into?
Comment 13 Brian Hackett (:bhackett) 2011-07-06 13:11:29 PDT
Mind if I take this patch/bug and rework per comment 11?  I've been wanting to not think about these opcodes for soooooo long.  NAME and GNAME are the same as PROP, except there is a BINDNAME or BINDGNAME up front.  The consistent stack layouts were necessary to simplify (or, at least, make not-impossibly-complex) the handling of these ops in the interpoline.
Comment 14 David Anderson [:dvander] 2011-07-06 13:15:13 PDT
(In reply to comment #13)
> Mind if I take this patch/bug and rework per comment 11?  I've been wanting
> to not think about these opcodes for soooooo long.  NAME and GNAME are the
> same as PROP, except there is a BINDNAME or BINDGNAME up front.  The
> consistent stack layouts were necessary to simplify (or, at least, make
> not-impossibly-complex) the handling of these ops in the interpoline.

Please do take this patch :D but I think the NAME/GNAME thing should be a separate bug. This one is basically done, and it seems like NAME and GNAME might not necessarily be decomposed with the same trick, they're a lot simpler.
Comment 15 Brendan Eich [:brendan] 2011-07-06 13:21:45 PDT
These ops date from '97 or so when I was benchmarking jsinterp.c code (profiled on my SGI Indy) against Perl 4!

/be
Comment 16 David Anderson [:dvander] 2011-07-06 13:27:27 PDT
FWIW, v8, Nitro seem to have them too, so they must be convenient. Crankshaft just aborts if it sees an Inc/Dec node with a property access, so I'm guessing they ran into the same bailout problems that decomposition or imacros solve.
Comment 17 Brendan Eich [:brendan] 2011-07-06 13:32:29 PDT
Comment on attachment 524494 [details] [diff] [review]
remove prop/elem inc ops

Why the JOF_TMPSLOT2 addition for JSOP_SWAP and JSOP_PICK?

/be
Comment 18 Brian Hackett (:bhackett) 2011-07-06 13:33:11 PDT
(In reply to comment #14)
> it seems like NAME and GNAME
> might not necessarily be decomposed with the same trick, they're a lot
> simpler.

Not sure about this, NAME and GNAME are both property incops (one on a scope chain object, the other on a global object) so with the extra BIND they should be more complicated, rather than less.
Comment 19 David Anderson [:dvander] 2011-07-06 14:22:30 PDT
(In reply to comment #18)
Sorry, I misunderstood - totally, INCNAME/GNAME should go, the same tricks in this patch will work.

(In reply to comment #17)
> Why the JOF_TMPSLOT2 addition for JSOP_SWAP and JSOP_PICK?
JM implements these by simulating a bunch of stack movement, so it needs a little more temporary space than the interpreter.
Comment 20 Brian Hackett (:bhackett) 2011-07-07 16:10:21 PDT
Created attachment 544648 [details] [diff] [review]
updated

With all the analysis changes pulled in on the JM merge this needs a new review.  That being the case, I made some other changes:

- INCNAME, INCGNAME and friends also have decomposed versions.  This allows removing all the code related to property incops in inference, JM, stub calls and interpoline.
- Going with the comment 10 strategy where the interpreter and tracer are basically unchanged (the interpreter runs the decomposed version if inference is on to accumulate types on the property accesses).
- Decomposed opcode lengths are generated into a table, rather than added as source notes (should be faster).

Double checked that tracer performance was not affected on any of the opcodes.  JM+TI is faster than TM or d8 on all the opcodes except INCNAME, where TM is fastest (JM+TI should get faster here with the reentrant closures patch).
Comment 21 Brian Hackett (:bhackett) 2011-07-16 08:28:33 PDT
Pushed to JM.

http://hg.mozilla.org/projects/jaegermonkey/rev/3273738a165e
Comment 22 Brian Hackett (:bhackett) 2011-07-16 13:55:27 PDT
Created attachment 546350 [details] [diff] [review]
followup

Some followup fixes to cover for inadequate testing before pushing:

- The method of setting the decomposed length of opcodes doesn't work with XDR, and is also totally broken in the presence of INDEXBASE opcodes.  This table is removed, in favor of adding an extra byte at the end of DECOMPOSE ops storing the decomposed length.

- Some fixes so imacros work.

A test added for an obscure language feature (how many times the index is converted to an id in incelem ops) also breaks on trunk -j so is disabled until that is fixed (see bug 672076).
Comment 23 Brian Hackett (:bhackett) 2011-07-26 08:53:00 PDT
Created attachment 548477 [details] [diff] [review]
followup 2

Patches from bugs 673710, 672153, and 673731 fixing regressions from this patch.  Two of these are for special cases in the emitter which previously had no test coverage.
Comment 24 David Anderson [:dvander] 2011-07-28 13:37:31 PDT
Comment on attachment 544648 [details] [diff] [review]
updated

Review of attachment 544648 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, very glad to see even more code go.

::: js/src/jsopcode.h
@@ +526,5 @@
> +static inline uintN
> +GetDecomposeLength(JSOp op)
> +{
> +    /* Table of lengths, updated on demand. See SetDecomposeLength. */
> +    extern uint8 js_decomposeLengthTable[];

Nit: js_DecomposeLengthTable?
Comment 25 David Anderson [:dvander] 2011-07-28 13:42:34 PDT
Comment on attachment 546350 [details] [diff] [review]
followup

Review of attachment 546350 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsopcode.h
@@ +528,5 @@
>  {
> +    /*
> +     * The last byte of a DECOMPOSE op stores the decomposed length. This can
> +     * vary across different instances of an opcode due to INDEXBASE ops.
> +     */

Could this bit of info be copied into either jsopcode.tbl or to the definition of JOF_DECOMPOSE?
Comment 26 David Anderson [:dvander] 2011-07-28 13:45:14 PDT
Comment on attachment 548477 [details] [diff] [review]
followup 2

Review of attachment 548477 [details] [diff] [review]:
-----------------------------------------------------------------

Note You need to log in before you can comment on or make changes to this bug.