Closed Bug 647624 Opened 13 years ago Closed 13 years ago

Remove property access inc/dec ops

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bhackett1024, Assigned: dvander)

References

Details

(Whiteboard: [not a good introductory bug])

Attachments

(4 files, 2 obsolete files)

The pre- and post- inc/dec ops are very complex opcodes to implement in the interpreter and JITs.  Handling overflow in these is very tricky for type inference and has caused a few bugs.  JM implements these by decomposing them into their basic operations (i.e. propinc => getprop, convert to number, add 1, setprop), which is pretty unsatisfying both because of the ugliness and because of a vague desire going forward to make one stub call per opcode.  TM implements INCELEM using imacros, which in a microbenchmark are more than eight times slower than if the op is expanded.  TM implements the other inc/dec ops directly; times are shown below.  INCPROP is slower than the expanded version, other incops are 20% to 100% faster.

INCELEM

x[i] = x[i] + 1  205
x[i]++           1700
++x[i]           1700

INCPROP

x.p = x.p + 1  440
x.p++          503
++x.p          503

INCGNAME

x = x + 1      330
x++            217
++x            217

INCNAME

x = x + 1      619
x++            500
++x            500

INCLOCAL/INCARG

a = a + 1      760
a++            340
++a            340

INCLOCAL and INCARG probably shouldn't be removed (at least for now): they are significantly faster in the tracer, and they simplify upcoming analysis for bounds check hoisting (determining linear relationships holding in a loop).  INCELEM/INCPROP could be removed immediately and improve perf (to my knowledge, these are not used at all on the three main benchmarks we track).  Provided that the perf issues with INCGNAME and INCNAME could be sorted, those would also be nice to remove; the main issue is in combining the multiple property accesses with an arithmetic operation in the same bytecode.

Can the tracer eliminate redundant shape guards when there are no intervening calls that can change object shapes?  If so, it doesn't seem that 'x = x + 1' should be any slower than 'x++' for a GNAME or NAME.
Some redundant guard elimination occurs in the tracer, but it all depends on what occurs in between.

Seeing the LIR for all of these alternative versions would be very instructive!
Whiteboard: [good first bug]
Attached patch WIP v1 (obsolete) — Splinter Review
To avoid too much mucking with the decompiler and emitter, the goal of this patch is to perform the decomposition in the emitter and transform the existing ops into nops. There's a PCDELTA srcnote so the decompiler skips the decomposition. The method JIT, tracer, and interpreter implementations are removed.

The method JIT needed some hackery to implement JSOP_PICK.
Assignee: general → dvander
Status: NEW → ASSIGNED
These cases were slower in the tracer than their decomposed counterparts. I'll leave the NAME/GNAME ops for a follow-up bug since those are currently faster in the tracer.

 11 files changed, 265 insertions(+), 477 deletions(-)

No perf change on ss/v8. Some microbenchmarks that use the old/new ops in a loop:

ELEMINC benchmark:
  pre -m:  71ms
  pre -j:  59ms
  post -m: 55ms
  post -j: 59ms

PROPINC benchmark:
  pre -m:  48ms
  pre -j:  26ms
  post -m: 56ms
  post -j: 35ms
Attachment #524466 - Attachment is obsolete: true
Attachment #524489 - Attachment is obsolete: true
Attachment #524494 - Flags: review?(dmandelin)
The POSTINC benchmark is:
    for (var i = 0; i < 10000000; i++) {
        a.x++;
    }

So it got a fraction slower per iteration, but I think this is well worth it for the simplicity it will bring to TI and ionmonkey.
Oops, I suspect in comment 0 I was adding undefined properties of the object/array for INCELEM and INCPROP.
Also, for GNAME here is the tracer-generated code for infinite loops incrementing values:

x++

1: x/i $pc  0x69ffbc:	mov    0x10(%ecx),%esi
1: x/i $pc  0x69ffbf:	mov    0x827440,%ebx
1: x/i $pc  0x69ffc5:	test   %ebx,%ebx
1: x/i $pc  0x69ffc7:	jne    0x6affa8
1: x/i $pc  0x69ffcd:	mov    0x638(%esi),%ebx
1: x/i $pc  0x69ffd3:	add    $0x1,%ebx
1: x/i $pc  0x69ffd6:	jo     0x6affbd
1: x/i $pc  0x69ffdc:	mov    %ebx,0x638(%esi)
1: x/i $pc  0x69ffe2:	jmp    0x69ffbc

x = x + 1

1: x/i $pc  0x69ffa9:	mov    0xc(%ecx),%edi
1: x/i $pc  0x69ffac:	mov    0x10(%ecx),%esi
1: x/i $pc  0x69ffaf:	mov    0x827440,%ebx
1: x/i $pc  0x69ffb5:	test   %ebx,%ebx
1: x/i $pc  0x69ffb7:	jne    0x6affa8
1: x/i $pc  0x69ffbd:	movl   $0x1402028,(%edi)
1: x/i $pc  0x69ffc3:	mov    0x638(%esi),%ebx
1: x/i $pc  0x69ffc9:	mov    %ebx,0x8(%edi)
1: x/i $pc  0x69ffcc:	movl   $0x1,0x10(%edi)
1: x/i $pc  0x69ffd3:	add    $0x1,%ebx
1: x/i $pc  0x69ffd6:	jo     0x6affbd
1: x/i $pc  0x69ffdc:	mov    %ebx,0x638(%esi)
1: x/i $pc  0x69ffe2:	jmp    0x69ffa9

It looks like the shape guard on the setprop *is* eliminated, but there are three extra stores issued before the overflow jump (syncing the stack?) and an extra load at the head of the loop (loading stack pointer?).
Are those something we think we'll be able to eliminate later?  Making x++ pretty fast is actually important in some "benchmarks"...
I'm not removing (x++) yet where x is an arg/local. Eliminating the stack stores is not easy in the tracer. JM should be able to match the tracer, eventually, and then we can remove these ops entirely.
Comment on attachment 524494 [details] [diff] [review]
remove prop/elem inc ops

Patch looks good.

My only concern is about whether we won't like the effect on interpreter performance. Of course, in Firefox we run in the jits. David Anderson also points out that these ops are rare. So it seems OK. If we end up needing to get that interp perf back, it seems we can just put the interp cases back in, and let them skip the following ops--and then we still get to keep the jit simplifications.
Attachment #524494 - Flags: review?(dmandelin) → review+
Depends on: 667096
Blocks: 667096
No longer depends on: 667096
(In reply to comment #10)
> Comment on attachment 524494 [details] [diff] [review] [review]
> remove prop/elem inc ops
> 
> Patch looks good.
> 
> My only concern is about whether we won't like the effect on interpreter
> performance. Of course, in Firefox we run in the jits. David Anderson also
> points out that these ops are rare. So it seems OK. If we end up needing to
> get that interp perf back, it seems we can just put the interp cases back
> in, and let them skip the following ops--and then we still get to keep the
> jit simplifications.

Would be good to get this in pretty soon.  It is also I think worth revisiting and extending the idea in comment 10 to remove all property access ops (including NAME/GNAME/etc.) from JM only: keep the ops and their semantics, but follow them up with their broken down version.  The interpreter and tracer can run the fat ops, and JM and analysis/inference can run the broken down ops.  If we need to rejoin in the middle of an INCPROP, we can point the regs at the appropriate place to resume and the interpreter will know what to do.
Yeah, this patch absolutely has to get in for IonMonkey to handle these opcodes. What do NAME/GNAME break into?
Mind if I take this patch/bug and rework per comment 11?  I've been wanting to not think about these opcodes for soooooo long.  NAME and GNAME are the same as PROP, except there is a BINDNAME or BINDGNAME up front.  The consistent stack layouts were necessary to simplify (or, at least, make not-impossibly-complex) the handling of these ops in the interpoline.
(In reply to comment #13)
> Mind if I take this patch/bug and rework per comment 11?  I've been wanting
> to not think about these opcodes for soooooo long.  NAME and GNAME are the
> same as PROP, except there is a BINDNAME or BINDGNAME up front.  The
> consistent stack layouts were necessary to simplify (or, at least, make
> not-impossibly-complex) the handling of these ops in the interpoline.

Please do take this patch :D but I think the NAME/GNAME thing should be a separate bug. This one is basically done, and it seems like NAME and GNAME might not necessarily be decomposed with the same trick, they're a lot simpler.
These ops date from '97 or so when I was benchmarking jsinterp.c code (profiled on my SGI Indy) against Perl 4!

/be
FWIW, v8, Nitro seem to have them too, so they must be convenient. Crankshaft just aborts if it sees an Inc/Dec node with a property access, so I'm guessing they ran into the same bailout problems that decomposition or imacros solve.
Comment on attachment 524494 [details] [diff] [review]
remove prop/elem inc ops

Why the JOF_TMPSLOT2 addition for JSOP_SWAP and JSOP_PICK?

/be
(In reply to comment #14)
> it seems like NAME and GNAME
> might not necessarily be decomposed with the same trick, they're a lot
> simpler.

Not sure about this, NAME and GNAME are both property incops (one on a scope chain object, the other on a global object) so with the extra BIND they should be more complicated, rather than less.
(In reply to comment #18)
Sorry, I misunderstood - totally, INCNAME/GNAME should go, the same tricks in this patch will work.

(In reply to comment #17)
> Why the JOF_TMPSLOT2 addition for JSOP_SWAP and JSOP_PICK?
JM implements these by simulating a bunch of stack movement, so it needs a little more temporary space than the interpreter.
Attached patch updatedSplinter Review
With all the analysis changes pulled in on the JM merge this needs a new review.  That being the case, I made some other changes:

- INCNAME, INCGNAME and friends also have decomposed versions.  This allows removing all the code related to property incops in inference, JM, stub calls and interpoline.
- Going with the comment 10 strategy where the interpreter and tracer are basically unchanged (the interpreter runs the decomposed version if inference is on to accumulate types on the property accesses).
- Decomposed opcode lengths are generated into a table, rather than added as source notes (should be faster).

Double checked that tracer performance was not affected on any of the opcodes.  JM+TI is faster than TM or d8 on all the opcodes except INCNAME, where TM is fastest (JM+TI should get faster here with the reentrant closures patch).
Attachment #544648 - Flags: review?(dmandelin)
Attachment #544648 - Flags: review?(dmandelin) → review?(dvander)
Attached patch followupSplinter Review
Some followup fixes to cover for inadequate testing before pushing:

- The method of setting the decomposed length of opcodes doesn't work with XDR, and is also totally broken in the presence of INDEXBASE opcodes.  This table is removed, in favor of adding an extra byte at the end of DECOMPOSE ops storing the decomposed length.

- Some fixes so imacros work.

A test added for an obscure language feature (how many times the index is converted to an id in incelem ops) also breaks on trunk -j so is disabled until that is fixed (see bug 672076).
Attachment #546350 - Flags: review?(dvander)
Depends on: 672076
Depends on: 673710
Depends on: 672153, 673731
Attached patch followup 2Splinter Review
Patches from bugs 673710, 672153, and 673731 fixing regressions from this patch.  Two of these are for special cases in the emitter which previously had no test coverage.
Attachment #548477 - Flags: review?(dvander)
Whiteboard: [good first bug] → [not a good first bug]
Whiteboard: [not a good first bug] → [not a good introductory bug]
Comment on attachment 544648 [details] [diff] [review]
updated

Review of attachment 544648 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, very glad to see even more code go.

::: js/src/jsopcode.h
@@ +526,5 @@
> +static inline uintN
> +GetDecomposeLength(JSOp op)
> +{
> +    /* Table of lengths, updated on demand. See SetDecomposeLength. */
> +    extern uint8 js_decomposeLengthTable[];

Nit: js_DecomposeLengthTable?
Attachment #544648 - Flags: review?(dvander) → review+
Comment on attachment 546350 [details] [diff] [review]
followup

Review of attachment 546350 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsopcode.h
@@ +528,5 @@
>  {
> +    /*
> +     * The last byte of a DECOMPOSE op stores the decomposed length. This can
> +     * vary across different instances of an opcode due to INDEXBASE ops.
> +     */

Could this bit of info be copied into either jsopcode.tbl or to the definition of JOF_DECOMPOSE?
Attachment #546350 - Flags: review?(dvander) → review+
Comment on attachment 548477 [details] [diff] [review]
followup 2

Review of attachment 548477 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #548477 - Flags: review?(dvander) → review+
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: