Closed Bug 522593 Opened 15 years ago Closed 11 years ago

Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: gal, Assigned: gal)

References

Details

Attachments

(4 files, 15 obsolete files)

patch 15 years ago Andreas Gal :gal 10.96 KB, patch		Details \| Diff \| Splinter Review
patch 15 years ago Andreas Gal :gal 16.11 KB, patch		Details \| Diff \| Splinter Review
patch 15 years ago Andreas Gal :gal 28.98 KB, patch		Details \| Diff \| Splinter Review
patch 15 years ago Andreas Gal :gal 30.53 KB, patch		Details \| Diff \| Splinter Review
working patch (for the testcase anyway) 15 years ago Andreas Gal :gal 31.32 KB, patch		Details \| Diff \| Splinter Review
patch 15 years ago Andreas Gal :gal 36.34 KB, patch		Details \| Diff \| Splinter Review
patch 15 years ago Andreas Gal :gal 36.78 KB, patch		Details \| Diff \| Splinter Review
patch 15 years ago Andreas Gal :gal 36.78 KB, patch		Details \| Diff \| Splinter Review
working patch (passes trace tests) 15 years ago Andreas Gal :gal 36.78 KB, patch		Details \| Diff \| Splinter Review
inline UnboxInt32 as well 15 years ago Andreas Gal :gal 41.96 KB, patch		Details \| Diff \| Splinter Review
Rebased patch (nanojit only, untested) 15 years ago Steven Johnson 14.08 KB, patch		Details \| Diff \| Splinter Review
Rebased, tested patch 15 years ago Steven Johnson 14.28 KB, patch		Details \| Diff \| Splinter Review
proof-of-concept for tamarin 15 years ago Steven Johnson 5.13 KB, patch		Details \| Diff \| Splinter Review
Rebased, tested patch, v2 15 years ago Steven Johnson 14.84 KB, patch		Details \| Diff \| Splinter Review
Additional patch 15 years ago Steven Johnson 784 bytes, patch		Details \| Diff \| Splinter Review
Rebased, tested patch, v3 15 years ago Steven Johnson 14.41 KB, patch		Details \| Diff \| Splinter Review
intToAtom, uintToAtom for TR 15 years ago Steven Johnson 5.37 KB, patch	edwsmith : review-	Details \| Diff \| Splinter Review
Rebased, tested patch, v4 15 years ago Steven Johnson 14.64 KB, patch	edwsmith : review-	Details \| Diff \| Splinter Review
patch 14 years ago Andreas Gal :gal 24.82 KB, patch		Details \| Diff \| Splinter Review

Andreas Gal :gal

Assignee

Description

•

15 years ago

I could have sworn I had a bug for this but I can't find it. The idea here is to add a LIns* field to CallInfo and when assembling we can inline that LIR code in place instead of emitting a call. This avoids the prolog/epilog overhead. At the same time the LIR only stores the call, and it can be matched in our filters so box/unbox elimination works, but when a box really remains on trace, we emit optimized code for it.

Andreas Gal :gal

Assignee

Updated

•

15 years ago

Assignee: general → gal

Andreas Gal :gal

Assignee

Comment 1

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Andreas Gal :gal

Assignee

Comment 2

•

15 years ago

Current state. I had to rewrite the reservation code a bit to be more flexible. Inlined LIR code must end with a LIR_[i|f|q]ret, and that ret inherits the reservation of the call. Next up is properly connecting arguments to their definitions when inlining.

Andreas Gal :gal

Assignee

Comment 3

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #406563 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 4

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Complete inlining support and basic framework in place to on-the fly generate inline code in the tracer and cache and reuse it (still untested, and no builtin translated so far).

Attachment #406585 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Updated

•

15 years ago

Summary: TM: Support inlining of LIR_calls if the function is available as LIR as well → TM: Support inlining of LIR_calls if the function is available as LIR as well [shaver is very hot for this]

Andreas Gal :gal

Assignee

Updated

•

15 years ago

Attachment #406613 - Attachment is patch: true

Attachment #406613 - Attachment mime type: application/octet-stream → text/plain

Andreas Gal :gal

Assignee

Updated

•

15 years ago

OS: Mac OS X → All

Hardware: x86 → All

Summary: TM: Support inlining of LIR_calls if the function is available as LIR as well [shaver is very hot for this] → TM: Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call

Andreas Gal :gal

Assignee

Comment 5

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Complete patch, but not working yet. Here is a test case: var k = 0, a = [5]; for (var i = 0; i < 10; ++i) k += a[0]; This calls UnboxDouble for the array access.

Attachment #406613 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 6

•

15 years ago

Attached patch working patch (for the testcase anyway) (obsolete) — Details — Splinter Review

We compile 2 traces. I have to look into that why, but this compiles the code as intended.

Attachment #406617 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 7

•

15 years ago

Here is the gist of it. Instead of emitting code to do the call, LIR is directly inlined into the trace. The control-flow merge phi is done with a stack store, which is all sorts of lame, but its probably still better than the epilog/prolog overhead. ----------------------------------- ## END exit block 0x600c218 js_UnboxDouble2 = fcall #js_UnboxDouble ( ld11 ) ialloc2 = ialloc 8 >FP 4(state) 8(ebx) 12(esi) 16(edi) 20-24(ialloc2) and6 = and iparam2, JSVAL_INT 0005e95e7c mov eax,edi ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1) 0005e95e7e and eax,1 ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1) jt eq7 -> label5 0005e95e81 cmp eax,1 eax(and6) ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1) 0005e95e84 je 0x5e95e92 ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1) ldq4 = ldq iparam2[-2] 0005e95e86 movq xmm1,-2(edi) ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1) stqi ialloc2[0] = ldq4 0005e95e8b movq -24(ebp),xmm1 ecx(state) ebx(ld9) esi(sp) xmm0(ld1) xmm1(ldq4) j -> label4 0005e95e90 jmp 0x5e95ea2 ecx(state) ebx(ld9) esi(sp) xmm0(ld1) label5: 0005e95e92 [label5] ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1) rsh2 = rsh iparam2, 1 0005e95e92 sar edi,1 ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1) i2f3 = i2f rsh2 0005e95e95 xorpd xmm1,xmm1 ecx(state) ebx(ld9) esi(sp) edi(rsh2) xmm0(ld1) 0005e95e99 cvtsi2sd xmm1,edi ecx(state) ebx(ld9) esi(sp) edi(rsh2) xmm0(ld1) stqi ialloc2[0] = i2f3 0005e95e9d movq -24(ebp),xmm1 ecx(state) ebx(ld9) esi(sp) xmm0(ld1) xmm1(i2f3) label4: 0005e95ea2 [label4] ecx(state) ebx(ld9) esi(sp) xmm0(ld1) ldq3 = ldq ialloc2[0] 0005e95ea2 movq xmm1,-24(ebp) ecx(state) ebx(ld9) esi(sp) xmm0(ld1) <FP 4(state) 8(ebx) 12(esi) 16(edi) fret ldq3 fadd2 = fadd ld1, js_UnboxDouble2 0005e95ea7 addsd xmm0,xmm1 ecx(state) ebx(ld9) esi(sp) xmm0(ld1) xmm1(js_UnboxDouble2) stqi state[748] = fadd2

Andreas Gal :gal

Assignee

Comment 8

•

15 years ago

As the title change indicates I decided to leave the LIR filter pipeline alone and not pass in an extra LIns* parameter to indicate the chunk. The semantics is basically that a filter can decide to annotate the call with a LIR chunk as needed. We might reconsider this later but for now it seems to work quite well.

Andreas Gal :gal

Assignee

Comment 9

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Remove labels after inlining a chunk in case we inline it again. Now we die when we try to forward the use of a LIR_param inside the inlined chunk and the definition already has a reservation. This can't happen for the LIR_ret at the end of the inlined chunk since that we clear every time, but the LIR_param value is a problem. Ideally we want to snoop the use of the LIR_param instead of ever allocating a reservation for it but I am not sure thats possible. I might have to copy here if the definition of the LIR_param source already has a reservation.

Attachment #406627 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 10

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Force a register allocation for both participants of a forwarding if the destination is already used and register-copy the value.

Attachment #406741 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 11

•

15 years ago

Looks like we are not freeing the activation record entry of a LIR_iparam somewhere. TEST-PASS | trace-test.js | testEqFalseEmptyString TEST-PASS | trace-test.js | testIncDec Assertion failed: frame entry -28 wasn't freed : _activation.entry[i] == 0 (../nanojit/Assembler.cpp:834) (gdb) p *_activation.entry[i] $2 = { { lastWord = { arIndex = 7, reg = nanojit::UnknownReg, used = 1, opcode = nanojit::LIR_iparam }, dummy = 0x8910007 } } (gdb)

Andreas Gal :gal

Assignee

Comment 12

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Deallocate the forwarding source's reservation.

Attachment #406811 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 13

•

15 years ago

Remaining failures: /Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/basic/testApplyCall.js /Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/basic/testTableSwitch2.js /Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/sunspider/check-3d-raytrace.js /Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/sunspider/check-date-format-xparb.js

Andreas Gal :gal

Assignee

Comment 14

•

15 years ago

Attached patch working patch (passes trace tests) (obsolete) — Details — Splinter Review

I am going to get those two lines right eventually.

Attachment #406830 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Updated

•

15 years ago

Attachment #406831 - Attachment is patch: true

Attachment #406831 - Attachment mime type: application/octet-stream → text/plain

Andreas Gal :gal

Assignee

Updated

•

15 years ago

Depends on: 522849

Andreas Gal :gal

Assignee

Updated

•

15 years ago

Depends on: 523251

Andreas Gal :gal

Assignee

Comment 15

•

15 years ago

Attached patch inline UnboxInt32 as well — Details — Splinter Review

Attachment #406831 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 16

•

15 years ago

I see a very slight speedup (3ms).

Andreas Gal :gal

Assignee

Comment 17

•

15 years ago

Discussed with ed. Patch will land.

Steven Johnson

Comment 18

•

15 years ago

Andreas: cool stuff, but we're gonna stomp on each others toes re: changes to AR. You may want to review my proposed patch for https://bugzilla.mozilla.org/show_bug.cgi?id=473769 since I was planning on landing it Real Soon Now.

Steven Johnson

Comment 19

•

15 years ago

Attached patch Rebased patch (nanojit only, untested) (obsolete) — Details — Splinter Review

Updated patch 407200 for current NJ tip... compiles, but I haven't tested in TM at all yet (this is for some TR experimentation). Offered with no warranty for now.

Andreas Gal :gal

Assignee

Comment 20

•

15 years ago

Thanks Steven.

Steven Johnson

Comment 21

•

15 years ago

This looks like a cool approach I really want to make use of... one nit I see with this is that nonlinear inlines pretty much have to use insAlloc() for a phi nodes, which means we end up doing redundant load/stores unless we're verrry careful. Any way we could dance around this, you think?

Steven Johnson

Comment 22

•

15 years ago

Attached patch Rebased, tested patch (obsolete) — Details — Splinter Review

Updated nanojit patch (again), tweaked and fixed a few minor issues. Tested with TR (see subsequent patch) but not TM.

Attachment #420431 - Attachment is obsolete: true

Steven Johnson

Comment 23

•

15 years ago

Attached patch proof-of-concept for tamarin (obsolete) — Details — Splinter Review

really just a proof-of-concept patch for tamarin, which inlines the intToAtom callout. (on 64-bit this is probably worth doing. on 32-bit it may or may not, careful code measurement would be necessary... this part was really done just to flush out bugs)

Steven Johnson

Comment 24

•

15 years ago

Attached patch Rebased, tested patch, v2 (obsolete) — Details — Splinter Review

Revised version of 420455 that allows inlining of calls-returning-void (ie, don't assume that the function result is in use)

Attachment #420455 - Attachment is obsolete: true

Steven Johnson

Comment 25

•

15 years ago

Attached patch Additional patch (obsolete) — Details — Splinter Review

Additive to previous patch: calling markAsClear is inadequate, we need to call freeResourcesOf() to free stack/reg reservations as well.

Attachment #422563 - Flags: review?(gal)

Steven Johnson

Comment 26

•

15 years ago

Attached patch Rebased, tested patch, v3 (obsolete) — Details — Splinter Review

Rebased to current NJ; also includes the fix for freeResourcesOf (patch 422563) and an additional fix (LIR_param should break, not return). I think TR will be using this soon, so I'd like to think about landing it.

Attachment #420647 - Attachment is obsolete: true

Attachment #422563 - Attachment is obsolete: true

Attachment #422657 - Flags: review?(gal)

Attachment #422563 - Flags: review?(gal)

Steven Johnson

Comment 27

•

15 years ago

Comment on attachment 422657 [details] [diff] [review] Rebased, tested patch, v3 Adding Edwin.

Attachment #422657 - Flags: review?(edwsmith)

Steven Johnson

Comment 28

•

15 years ago

Attached patch intToAtom, uintToAtom for TR — Details — Splinter Review

updated proof-of-concept for TR usage: implement inlining of intToAtom and uintToAtom for 64-bit (where it's trivial).

Attachment #420456 - Attachment is obsolete: true

Attachment #422669 - Flags: review?(edwsmith)

Steven Johnson

Comment 29

•

15 years ago

Attached patch Rebased, tested patch, v4 — Details — Splinter Review

Uploaded wrong patch. Oops.

Attachment #422657 - Attachment is obsolete: true

Attachment #422811 - Flags: review?(edwsmith)

Attachment #422657 - Flags: review?(gal)

Attachment #422657 - Flags: review?(edwsmith)

Steven Johnson

Updated

•

15 years ago

Attachment #422811 - Flags: review?(gal)

Andreas Gal :gal

Assignee

Comment 30

•

15 years ago

Comment on attachment 422811 [details] [diff] [review] Rebased, tested patch, v4 >+ } >+ else >+ { >+ // call-returning-void? >+ NanoAssert(!to->isUsed()); >+ //to->markAsClear(); >+ } Can you double check this and then fix the comment and remove the dead code in the comment?

Steven Johnson

Comment 31

•

15 years ago

Oops. Yes, that's definitely dead-code that should just be nuked. Want a new patch minus that?

Edwin Smith

Comment 32

•

15 years ago

Comment on attachment 422669 [details] [diff] [review] intToAtom, uintToAtom for TR As a test-case for assembly phase inlining, these make sense. however for the purpose of inlining intToAtom (etc) in general, a better place to do it is in Specializer, so that the LIR of the inlined function body is exposed to the optimization pipeline.

Attachment #422669 - Flags: review?(edwsmith) → review-

Steven Johnson

Comment 33

•

15 years ago

(In reply to comment #32) > (From update of attachment 422669 [details] [diff] [review]) > As a test-case for assembly phase inlining, these make sense. Fair enough -- the primary purpose here was a testcase. If there's a better way to inline these, let's do that instead.

Edwin Smith

Comment 34

•

15 years ago

I'm nervous about this chunk of code in Assembler.cpp case LIR_icall: // Clear the reservation state of the code we are about to inline. If any instruction in the body of the inlined function is "live" then it's a bug in the inliner, right? If a previous copy of the function is not yet finished being inlined, all hell breaks loose. The reg/stack reservation fields in the LIR instructinos could be nonzero as leftover cruft; if thats whats happening, we should zero them out, but not generate any code or mutate AR/RegAlloc, as a side effect. calling freeResourcesOf() seems dangerous. nit: don't use "l" for a local variable, hard to read in some fonts. * calling gen() recursively is somewhat mindblowing here, but it works, so okay. * HashMap::remove() does what it says, but doesn't free the hash list nodes. inlining a lot of functions will leak nodes until the outer assembler pass is finished and the whole allocator is released. A fix would be to make a new nested label map, modify the label map code to support nested label maps, and make the new nested label map use an allocator with the lifetime of gen(), so that it will get freed eagerly. Alternatively HashMap could keep a private list of free hashnodes it could reuse. for a very long compile session the slow leakage would be O(lir-code-size), so it may not be a problem in practice. (this is how we already rationalized other minor transient leakage). If the allocator used by Assembler lives longer than the assembly session, it could be a worse problem. (I think that's the case for TM, but not TR).

Edwin Smith

Comment 35

•

15 years ago

I can't find any other potential bugs but it really needs a "how this works" comment and comments on the key helper methods: - the recursion in gen() - how reservations are transferred from a LIR_call to LIR_ret's argument - ditto for LIR_param -> LIR_call's args It looks like inlineState nests correctly, tests needed since this has a high pucker factor.

Edwin Smith

Comment 36

•

15 years ago

(In reply to comment #35) > I can't find any other potential bugs but it really needs a "how this works" > comment and comments on the key helper methods: > - the recursion in gen() > - how reservations are transferred from a LIR_call to LIR_ret's argument > - ditto for LIR_param -> LIR_call's args > > tests needed since this has a high pucker factor. Specifically i meant tests where a() inlines b(), which inlines c() -- gen() recursion more than one level deep. An assert that protects against impossible inline situations (a() calls a()) would also be good, now or in a followon bug.

Edwin Smith

Updated

•

15 years ago

Attachment #422811 - Flags: review?(edwsmith) → review-

Andreas Gal :gal

Assignee

Comment 37

•

15 years ago

(In reply to comment #34) > I'm nervous about this chunk of code in Assembler.cpp case LIR_icall: > > // Clear the reservation state of the code we are about to inline. > > If any instruction in the body of the inlined function is "live" then it's a > bug in the inliner, right? If a previous copy of the function is not yet > finished being inlined, all hell breaks loose. The reg/stack reservation > fields in the LIR instructinos could be nonzero as leftover cruft; if thats > whats happening, we should zero them out, but not generate any code or mutate > AR/RegAlloc, as a side effect. calling freeResourcesOf() seems dangerous. If we abort compilation half-way through an inlined chunk, we leave it in a bad state. That's what the loop was for. freeResourceOf() seems a terrible idea. I hope I didn't write that originally. My intent was markAsClear() or something like that. > > nit: don't use "l" for a local variable, hard to read in some fonts. > > * calling gen() recursively is somewhat mindblowing here, but it works, so > okay. > > * HashMap::remove() does what it says, but doesn't free the hash list nodes. > inlining a lot of functions will leak nodes until the outer assembler pass is > finished and the whole allocator is released. A fix would be to make a new > nested label map, modify the label map code to support nested label maps, and > make the new nested label map use an allocator with the lifetime of gen(), so > that it will get freed eagerly. Alternatively HashMap could keep a private > list of free hashnodes it could reuse. Not opposed to a separate allocator, however, if we inline the same short fragment a lot we make a lot of malloc calls. Trade-off. > > for a very long compile session the slow leakage would be O(lir-code-size), so > it may not be a problem in practice. (this is how we already rationalized > other minor transient leakage). I think we already leak all over the place. > > If the allocator used by Assembler lives longer than the assembly session, it > could be a worse problem. (I think that's the case for TM, but not TR). Not sure if we still do, I have to check. If we do, we shouldn't, so not too worried.

Steven Johnson

Comment 38

•

15 years ago

(In reply to comment #37) > freeResourceOf() seems a terrible idea. I changed it in my patch; it originally was markAsClear(). When I made that change I thought I had a legitimate reason, but in hindsight, I think I was originally doing the sort of naughtiness that Edwin pointed out... and later changed it, but didn't change it back to markAsClear.

Nicholas Nethercote [inactive]

Comment 39

•

15 years ago

BTW markAsClear() is now called deprecated_markAsClear() and should be avoided. Use clearReg() and/or clearArIndex() instead.

Steven Johnson

Comment 40

•

15 years ago

(In reply to comment #35) > it really needs a "how this works" comment and comments on the key helper methods: > - the recursion in gen() > - how reservations are transferred from a LIR_call to LIR_ret's argument > - ditto for LIR_param -> LIR_call's args Andreas, you have first dibs on this. Want to take a stab at it, or shall I? > It looks like inlineState nests correctly, tests needed since this has a high > pucker factor. Indeed, I guess an inlined-function-calling-another-inlined-function. Alternately, we could simply disallow this on principle.

Andreas Gal :gal

Assignee

Comment 41

•

15 years ago

> > tests needed since this has a high pucker factor. Yeah, took me a while to get all cases right, a few test would be important. The critical part is the state of AR and RegAlloc. It forms a matrix. The outcome depends on whether the value is already in use etc. > > Specifically i meant tests where a() inlines b(), which inlines c() -- gen() > recursion more than one level deep. > > An assert that protects against impossible inline situations (a() calls a()) > would also be good, now or in a followon bug. I am not too worried about these cases. There are already a lot of ways in which you can shoot yourself in the foot with nanojit, i.e. not using live correctly. Our IR requires you to understand the structure of the code you generate. If we can add easy asserts to help diagnose problems, we should definitely do that though.

Steven Johnson

Updated

•

15 years ago

Blocks: 458279

Steven Johnson

Updated

•

15 years ago

No longer blocks: 458279

Edwin Smith

Updated

•

14 years ago

Component: JavaScript Engine → Nanojit

Andreas Gal :gal

Assignee

Comment 42

•

14 years ago

Attached patch patch — Details — Splinter Review

rebased, untested

Steven Johnson

Comment 43

•

14 years ago

Comment on attachment 422811 [details] [diff] [review] Rebased, tested patch, v4 removed long-dormat review request

Attachment #422811 - Flags: review?(gal)

Ryan VanderMeulen [:RyanVM]

Comment 44

•

13 years ago

Edwin, does Nanojit want this?

Ryan VanderMeulen [:RyanVM]

Updated

•

13 years ago

Summary: TM: Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call → Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: Core → Core Graveyard

Nicholas Nethercote [inactive]

Comment 45

•

11 years ago

Nanojit has been dead for several years. Its Bugzilla component has been moved to the graveyard (bug 984276). I checked all the open bugs. They're all uninteresting, so I'm WONTFIXing them all. Apologies for the bugspam.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.