Closed Bug 647624 Opened 14 years ago Closed 14 years ago

Remove property access inc/dec ops

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: bhackett1024, Assigned: dvander)

References

Details

(Whiteboard: [not a good introductory bug])

Attachments

(4 files, 2 obsolete files)

WIP v1 14 years ago David Anderson [:dvander] - inactive, e-mail if emergency 25.18 KB, patch		Details \| Diff \| Splinter Review
WIP v2 remove elem ops too 14 years ago David Anderson [:dvander] - inactive, e-mail if emergency 109.05 KB, patch		Details \| Diff \| Splinter Review
remove prop/elem inc ops 14 years ago David Anderson [:dvander] - inactive, e-mail if emergency 43.08 KB, patch	dmandelin : review+	Details \| Diff \| Splinter Review
updated 14 years ago Brian Hackett [Laid off!] 86.93 KB, patch	dvander : review+	Details \| Diff \| Splinter Review
followup 14 years ago Brian Hackett [Laid off!] 16.41 KB, patch	dvander : review+	Details \| Diff \| Splinter Review
followup 2 14 years ago Brian Hackett [Laid off!] 10.87 KB, patch	dvander : review+	Details \| Diff \| Splinter Review

Brian Hackett [Laid off!]

Reporter

Description

•

14 years ago

The pre- and post- inc/dec ops are very complex opcodes to implement in the interpreter and JITs. Handling overflow in these is very tricky for type inference and has caused a few bugs. JM implements these by decomposing them into their basic operations (i.e. propinc => getprop, convert to number, add 1, setprop), which is pretty unsatisfying both because of the ugliness and because of a vague desire going forward to make one stub call per opcode. TM implements INCELEM using imacros, which in a microbenchmark are more than eight times slower than if the op is expanded. TM implements the other inc/dec ops directly; times are shown below. INCPROP is slower than the expanded version, other incops are 20% to 100% faster. INCELEM x[i] = x[i] + 1 205 x[i]++ 1700 ++x[i] 1700 INCPROP x.p = x.p + 1 440 x.p++ 503 ++x.p 503 INCGNAME x = x + 1 330 x++ 217 ++x 217 INCNAME x = x + 1 619 x++ 500 ++x 500 INCLOCAL/INCARG a = a + 1 760 a++ 340 ++a 340 INCLOCAL and INCARG probably shouldn't be removed (at least for now): they are significantly faster in the tracer, and they simplify upcoming analysis for bounds check hoisting (determining linear relationships holding in a loop). INCELEM/INCPROP could be removed immediately and improve perf (to my knowledge, these are not used at all on the three main benchmarks we track). Provided that the perf issues with INCGNAME and INCNAME could be sorted, those would also be nice to remove; the main issue is in combining the multiple property accesses with an arithmetic operation in the same bytecode. Can the tracer eliminate redundant shape guards when there are no intervening calls that can change object shapes? If so, it doesn't seem that 'x = x + 1' should be any slower than 'x++' for a GNAME or NAME.

Nicholas Nethercote [inactive]

Comment 1

•

14 years ago

Some redundant guard elimination occurs in the tracer, but it all depends on what occurs in between. Seeing the LIR for all of these alternative versions would be very instructive!

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

14 years ago

Whiteboard: [good first bug]

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 2

•

14 years ago

Attached patch WIP v1 (obsolete) — Details — Splinter Review

To avoid too much mucking with the decompiler and emitter, the goal of this patch is to perform the decomposition in the emitter and transform the existing ops into nops. There's a PCDELTA srcnote so the decompiler skips the decomposition. The method JIT, tracer, and interpreter implementations are removed. The method JIT needed some hackery to implement JSOP_PICK.

Assignee: general → dvander

Status: NEW → ASSIGNED

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 3

•

14 years ago

Attached patch WIP v2 remove elem ops too (obsolete) — Details — Splinter Review

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 4

•

14 years ago

Attached patch remove prop/elem inc ops — Details — Splinter Review

These cases were slower in the tracer than their decomposed counterparts. I'll leave the NAME/GNAME ops for a follow-up bug since those are currently faster in the tracer. 11 files changed, 265 insertions(+), 477 deletions(-) No perf change on ss/v8. Some microbenchmarks that use the old/new ops in a loop: ELEMINC benchmark: pre -m: 71ms pre -j: 59ms post -m: 55ms post -j: 59ms PROPINC benchmark: pre -m: 48ms pre -j: 26ms post -m: 56ms post -j: 35ms

Attachment #524466 - Attachment is obsolete: true

Attachment #524489 - Attachment is obsolete: true

Attachment #524494 - Flags: review?(dmandelin)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 5

•

14 years ago

The POSTINC benchmark is: for (var i = 0; i < 10000000; i++) { a.x++; } So it got a fraction slower per iteration, but I think this is well worth it for the simplicity it will bring to TI and ionmonkey.

Brian Hackett [Laid off!]

Reporter

Comment 6

•

14 years ago

Oops, I suspect in comment 0 I was adding undefined properties of the object/array for INCELEM and INCPROP.

Brian Hackett [Laid off!]

Reporter

Comment 7

•

14 years ago

Also, for GNAME here is the tracer-generated code for infinite loops incrementing values: x++ 1: x/i $pc 0x69ffbc: mov 0x10(%ecx),%esi 1: x/i $pc 0x69ffbf: mov 0x827440,%ebx 1: x/i $pc 0x69ffc5: test %ebx,%ebx 1: x/i $pc 0x69ffc7: jne 0x6affa8 1: x/i $pc 0x69ffcd: mov 0x638(%esi),%ebx 1: x/i $pc 0x69ffd3: add $0x1,%ebx 1: x/i $pc 0x69ffd6: jo 0x6affbd 1: x/i $pc 0x69ffdc: mov %ebx,0x638(%esi) 1: x/i $pc 0x69ffe2: jmp 0x69ffbc x = x + 1 1: x/i $pc 0x69ffa9: mov 0xc(%ecx),%edi 1: x/i $pc 0x69ffac: mov 0x10(%ecx),%esi 1: x/i $pc 0x69ffaf: mov 0x827440,%ebx 1: x/i $pc 0x69ffb5: test %ebx,%ebx 1: x/i $pc 0x69ffb7: jne 0x6affa8 1: x/i $pc 0x69ffbd: movl $0x1402028,(%edi) 1: x/i $pc 0x69ffc3: mov 0x638(%esi),%ebx 1: x/i $pc 0x69ffc9: mov %ebx,0x8(%edi) 1: x/i $pc 0x69ffcc: movl $0x1,0x10(%edi) 1: x/i $pc 0x69ffd3: add $0x1,%ebx 1: x/i $pc 0x69ffd6: jo 0x6affbd 1: x/i $pc 0x69ffdc: mov %ebx,0x638(%esi) 1: x/i $pc 0x69ffe2: jmp 0x69ffa9 It looks like the shape guard on the setprop *is* eliminated, but there are three extra stores issued before the overflow jump (syncing the stack?) and an extra load at the head of the loop (loading stack pointer?).

Boris Zbarsky [:bzbarsky]

Comment 8

•

14 years ago

Are those something we think we'll be able to eliminate later? Making x++ pretty fast is actually important in some "benchmarks"...

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 9

•

14 years ago

I'm not removing (x++) yet where x is an arg/local. Eliminating the stack stores is not easy in the tracer. JM should be able to match the tracer, eventually, and then we can remove these ops entirely.

David Mandelin [:dmandelin]

Comment 10

•

14 years ago

Comment on attachment 524494 [details] [diff] [review] remove prop/elem inc ops Patch looks good. My only concern is about whether we won't like the effect on interpreter performance. Of course, in Firefox we run in the jits. David Anderson also points out that these ops are rare. So it seems OK. If we end up needing to get that interp perf back, it seems we can just put the interp cases back in, and let them skip the following ops--and then we still get to keep the jit simplifications.

Attachment #524494 - Flags: review?(dmandelin) → review+

Jan de Mooij [:jandem]

Updated

•

14 years ago

Depends on: 667096

Jan de Mooij [:jandem]

Updated

•

14 years ago

Blocks: 667096

No longer depends on: 667096

Brian Hackett [Laid off!]

Reporter

Comment 11

•

14 years ago

(In reply to comment #10) > Comment on attachment 524494 [details] [diff] [review] [review] > remove prop/elem inc ops > > Patch looks good. > > My only concern is about whether we won't like the effect on interpreter > performance. Of course, in Firefox we run in the jits. David Anderson also > points out that these ops are rare. So it seems OK. If we end up needing to > get that interp perf back, it seems we can just put the interp cases back > in, and let them skip the following ops--and then we still get to keep the > jit simplifications. Would be good to get this in pretty soon. It is also I think worth revisiting and extending the idea in comment 10 to remove all property access ops (including NAME/GNAME/etc.) from JM only: keep the ops and their semantics, but follow them up with their broken down version. The interpreter and tracer can run the fat ops, and JM and analysis/inference can run the broken down ops. If we need to rejoin in the middle of an INCPROP, we can point the regs at the appropriate place to resume and the interpreter will know what to do.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 12

•

14 years ago

Yeah, this patch absolutely has to get in for IonMonkey to handle these opcodes. What do NAME/GNAME break into?

Brian Hackett [Laid off!]

Reporter

Comment 13

•

14 years ago

Mind if I take this patch/bug and rework per comment 11? I've been wanting to not think about these opcodes for soooooo long. NAME and GNAME are the same as PROP, except there is a BINDNAME or BINDGNAME up front. The consistent stack layouts were necessary to simplify (or, at least, make not-impossibly-complex) the handling of these ops in the interpoline.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 14

•

14 years ago

(In reply to comment #13) > Mind if I take this patch/bug and rework per comment 11? I've been wanting > to not think about these opcodes for soooooo long. NAME and GNAME are the > same as PROP, except there is a BINDNAME or BINDGNAME up front. The > consistent stack layouts were necessary to simplify (or, at least, make > not-impossibly-complex) the handling of these ops in the interpoline. Please do take this patch :D but I think the NAME/GNAME thing should be a separate bug. This one is basically done, and it seems like NAME and GNAME might not necessarily be decomposed with the same trick, they're a lot simpler.

Brendan Eich [:brendan]

Comment 15

•

14 years ago

These ops date from '97 or so when I was benchmarking jsinterp.c code (profiled on my SGI Indy) against Perl 4! /be

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 16

•

14 years ago

FWIW, v8, Nitro seem to have them too, so they must be convenient. Crankshaft just aborts if it sees an Inc/Dec node with a property access, so I'm guessing they ran into the same bailout problems that decomposition or imacros solve.

Brendan Eich [:brendan]

Comment 17

•

14 years ago

Comment on attachment 524494 [details] [diff] [review] remove prop/elem inc ops Why the JOF_TMPSLOT2 addition for JSOP_SWAP and JSOP_PICK? /be

Brian Hackett [Laid off!]

Reporter

Comment 18

•

14 years ago

(In reply to comment #14) > it seems like NAME and GNAME > might not necessarily be decomposed with the same trick, they're a lot > simpler. Not sure about this, NAME and GNAME are both property incops (one on a scope chain object, the other on a global object) so with the extra BIND they should be more complicated, rather than less.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 19

•

14 years ago

(In reply to comment #18) Sorry, I misunderstood - totally, INCNAME/GNAME should go, the same tricks in this patch will work. (In reply to comment #17) > Why the JOF_TMPSLOT2 addition for JSOP_SWAP and JSOP_PICK? JM implements these by simulating a bunch of stack movement, so it needs a little more temporary space than the interpreter.

Brian Hackett [Laid off!]

Reporter

Comment 20

•

14 years ago

Attached patch updated — Details — Splinter Review

With all the analysis changes pulled in on the JM merge this needs a new review. That being the case, I made some other changes: - INCNAME, INCGNAME and friends also have decomposed versions. This allows removing all the code related to property incops in inference, JM, stub calls and interpoline. - Going with the comment 10 strategy where the interpreter and tracer are basically unchanged (the interpreter runs the decomposed version if inference is on to accumulate types on the property accesses). - Decomposed opcode lengths are generated into a table, rather than added as source notes (should be faster). Double checked that tracer performance was not affected on any of the opcodes. JM+TI is faster than TM or d8 on all the opcodes except INCNAME, where TM is fastest (JM+TI should get faster here with the reentrant closures patch).

Attachment #544648 - Flags: review?(dmandelin)

David Mandelin [:dmandelin]

Updated

•

14 years ago

Attachment #544648 - Flags: review?(dmandelin) → review?(dvander)

Brian Hackett [Laid off!]

Reporter

Comment 21

•

14 years ago

Pushed to JM. http://hg.mozilla.org/projects/jaegermonkey/rev/3273738a165e

Brian Hackett [Laid off!]

Reporter

Comment 22

•

14 years ago

Attached patch followup — Details — Splinter Review

Some followup fixes to cover for inadequate testing before pushing: - The method of setting the decomposed length of opcodes doesn't work with XDR, and is also totally broken in the presence of INDEXBASE opcodes. This table is removed, in favor of adding an extra byte at the end of DECOMPOSE ops storing the decomposed length. - Some fixes so imacros work. A test added for an obscure language feature (how many times the index is converted to an id in incelem ops) also breaks on trunk -j so is disabled until that is fixed (see bug 672076).

Attachment #546350 - Flags: review?(dvander)

Brian Hackett [Laid off!]

Reporter

Updated

•

14 years ago

Depends on: 672076

Jan de Mooij [:jandem]

Updated

•

14 years ago

Depends on: 673710

Brian Hackett [Laid off!]

Reporter

Updated

•

14 years ago

Depends on: 672153, 673731

Brian Hackett [Laid off!]

Reporter

Comment 23

•

14 years ago

Attached patch followup 2 — Details — Splinter Review

Patches from bugs 673710, 672153, and 673731 fixing regressions from this patch. Two of these are for special cases in the emitter which previously had no test coverage.

Attachment #548477 - Flags: review?(dvander)

Brian Hackett [Laid off!]

Reporter

Updated

•

14 years ago

Whiteboard: [good first bug] → [not a good first bug]

Josh Matthews [:jdm]

Updated

•

14 years ago

Whiteboard: [not a good first bug] → [not a good introductory bug]

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 24

•

14 years ago

Comment on attachment 544648 [details] [diff] [review] updated Review of attachment 544648 [details] [diff] [review]: ----------------------------------------------------------------- Nice, very glad to see even more code go. ::: js/src/jsopcode.h @@ +526,5 @@ > +static inline uintN > +GetDecomposeLength(JSOp op) > +{ > + /* Table of lengths, updated on demand. See SetDecomposeLength. */ > + extern uint8 js_decomposeLengthTable[]; Nit: js_DecomposeLengthTable?

Attachment #544648 - Flags: review?(dvander) → review+

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 25

•

14 years ago

Comment on attachment 546350 [details] [diff] [review] followup Review of attachment 546350 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsopcode.h @@ +528,5 @@ > { > + /* > + * The last byte of a DECOMPOSE op stores the decomposed length. This can > + * vary across different instances of an opcode due to INDEXBASE ops. > + */ Could this bit of info be copied into either jsopcode.tbl or to the definition of JOF_DECOMPOSE?

Attachment #546350 - Flags: review?(dvander) → review+

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 26

•

14 years ago

Comment on attachment 548477 [details] [diff] [review] followup 2 Review of attachment 548477 [details] [diff] [review]: -----------------------------------------------------------------

Attachment #548477 - Flags: review?(dvander) → review+

Tom S. (please needinfo tschuster)

Updated

•

14 years ago

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

14 years ago

Depends on: 706710

You need to log in before you can comment on or make changes to this bug.