Closed Bug 522593 Opened 13 years ago Closed 8 years ago

Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call

Categories

(Core Graveyard :: Nanojit, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: gal, Assigned: gal)

References

Details

Attachments

(4 files, 15 obsolete files)

41.96 KB, patch
Details | Diff | Splinter Review
5.37 KB, patch
edwsmith
: review-
Details | Diff | Splinter Review
14.64 KB, patch
edwsmith
: review-
Details | Diff | Splinter Review
24.82 KB, patch
Details | Diff | Splinter Review
I could have sworn I had a bug for this but I can't find it.

The idea here is to add a LIns* field to CallInfo and when assembling we can inline that LIR code in place instead of emitting a call. This avoids the prolog/epilog overhead. At the same time the LIR only stores the call, and it can be matched in our filters so box/unbox elimination works, but when a box really remains on trace, we emit optimized code for it.
Assignee: general → gal
Attached patch patch (obsolete) — Splinter Review
Current state. I had to rewrite the reservation code a bit to be more flexible. Inlined LIR code must end with a LIR_[i|f|q]ret, and that ret inherits the reservation of the call. Next up is properly connecting arguments to their definitions when inlining.
Attached patch patch (obsolete) — Splinter Review
Attachment #406563 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Complete inlining support and basic framework in place to on-the fly generate inline code in the tracer and cache and reuse it (still untested, and no builtin translated so far).
Attachment #406585 - Attachment is obsolete: true
Summary: TM: Support inlining of LIR_calls if the function is available as LIR as well → TM: Support inlining of LIR_calls if the function is available as LIR as well [shaver is very hot for this]
Attachment #406613 - Attachment is patch: true
Attachment #406613 - Attachment mime type: application/octet-stream → text/plain
OS: Mac OS X → All
Hardware: x86 → All
Summary: TM: Support inlining of LIR_calls if the function is available as LIR as well [shaver is very hot for this] → TM: Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call
Attached patch patch (obsolete) — Splinter Review
Complete patch, but not working yet. Here is a test case:

var k = 0, a = [5];

for (var i = 0; i < 10; ++i)
    k += a[0];

This calls UnboxDouble for the array access.
Attachment #406613 - Attachment is obsolete: true
We compile 2 traces. I have to look into that why, but this compiles the code as intended.
Attachment #406617 - Attachment is obsolete: true
Here is the gist of it. Instead of emitting code to do the call, LIR is directly inlined into the trace. The control-flow merge phi is done with a stack store, which is all sorts of lame, but its probably still better than the epilog/prolog overhead.

  ----------------------------------- ## END exit block 0x600c218
      js_UnboxDouble2 = fcall #js_UnboxDouble ( ld11 )
      ialloc2 = ialloc 8
                                                >FP 4(state) 8(ebx) 12(esi) 16(edi) 20-24(ialloc2)
      and6 = and iparam2, JSVAL_INT
  0005e95e7c   mov eax,edi            ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
  0005e95e7e   and eax,1              ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
      jt eq7 -> label5
  0005e95e81   cmp eax,1              eax(and6) ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
  0005e95e84   je    0x5e95e92        ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
      ldq4 = ldq iparam2[-2]
  0005e95e86   movq xmm1,-2(edi)      ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
      stqi ialloc2[0] = ldq4
  0005e95e8b   movq -24(ebp),xmm1     ecx(state) ebx(ld9) esi(sp) xmm0(ld1) xmm1(ldq4)
      j -> label4
  0005e95e90   jmp 0x5e95ea2          ecx(state) ebx(ld9) esi(sp) xmm0(ld1)
      label5:
  0005e95e92   [label5]               ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
      rsh2 = rsh iparam2, 1
  0005e95e92   sar edi,1              ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
      i2f3 = i2f rsh2
  0005e95e95   xorpd xmm1,xmm1        ecx(state) ebx(ld9) esi(sp) edi(rsh2) xmm0(ld1)
  0005e95e99   cvtsi2sd xmm1,edi      ecx(state) ebx(ld9) esi(sp) edi(rsh2) xmm0(ld1)
      stqi ialloc2[0] = i2f3
  0005e95e9d   movq -24(ebp),xmm1     ecx(state) ebx(ld9) esi(sp) xmm0(ld1) xmm1(i2f3)
      label4:
  0005e95ea2   [label4]               ecx(state) ebx(ld9) esi(sp) xmm0(ld1)
      ldq3 = ldq ialloc2[0]
  0005e95ea2   movq xmm1,-24(ebp)     ecx(state) ebx(ld9) esi(sp) xmm0(ld1)
                                                <FP 4(state) 8(ebx) 12(esi) 16(edi)
      fret ldq3
      fadd2 = fadd ld1, js_UnboxDouble2
  0005e95ea7   addsd xmm0,xmm1        ecx(state) ebx(ld9) esi(sp) xmm0(ld1) xmm1(js_UnboxDouble2)
      stqi state[748] = fadd2
As the title change indicates I decided to leave the LIR filter pipeline alone and not pass in an extra LIns* parameter to indicate the chunk. The semantics is basically that a filter can decide to annotate the call with a LIR chunk as needed. We might reconsider this later but for now it seems to work quite well.
Attached patch patch (obsolete) — Splinter Review
Remove labels after inlining a chunk in case we inline it again.

Now we die when we try to forward the use of a LIR_param inside the inlined chunk and the definition already has a reservation. This can't happen for the LIR_ret at the end of the inlined chunk since that we clear every time, but the LIR_param value is a problem. Ideally we want to snoop the use of the LIR_param instead of ever allocating a reservation for it but I am not sure thats possible. I might have to copy here if the definition of the LIR_param source already has a reservation.
Attachment #406627 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Force a register allocation for both participants of a forwarding if the destination is already used and register-copy the value.
Attachment #406741 - Attachment is obsolete: true
Looks like we are not freeing the activation record entry of a LIR_iparam somewhere.

TEST-PASS | trace-test.js | testEqFalseEmptyString
TEST-PASS | trace-test.js | testIncDec
Assertion failed: frame entry -28 wasn't freed
: _activation.entry[i] == 0 (../nanojit/Assembler.cpp:834)


(gdb) p *_activation.entry[i]
$2 = {
  {
    lastWord = {
      arIndex = 7, 
      reg = nanojit::UnknownReg, 
      used = 1, 
      opcode = nanojit::LIR_iparam
    }, 
    dummy = 0x8910007
  }
}
(gdb)
Attached patch patch (obsolete) — Splinter Review
Deallocate the forwarding source's reservation.
Attachment #406811 - Attachment is obsolete: true
Remaining failures:

    /Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/basic/testApplyCall.js
    /Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/basic/testTableSwitch2.js
    /Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/sunspider/check-3d-raytrace.js
    /Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/sunspider/check-date-format-xparb.js
I am going to get those two lines right eventually.
Attachment #406830 - Attachment is obsolete: true
Attachment #406831 - Attachment is patch: true
Attachment #406831 - Attachment mime type: application/octet-stream → text/plain
Depends on: 522849
Depends on: 523251
Attachment #406831 - Attachment is obsolete: true
I see a very slight speedup (3ms).
Discussed with ed. Patch will land.
Andreas: cool stuff, but we're gonna stomp on each others toes re: changes to AR. You may want to review my proposed patch for https://bugzilla.mozilla.org/show_bug.cgi?id=473769 since I was planning on landing it Real Soon Now.
Updated patch 407200 for current NJ tip... compiles, but I haven't tested in TM at all yet (this is for some TR experimentation). Offered with no warranty for now.
Thanks Steven.
This looks like a cool approach I really want to make use of... one nit I see
with this is that nonlinear inlines pretty much have to use insAlloc() for a
phi nodes, which means we end up doing redundant load/stores unless we're
verrry careful. Any way we could dance around this, you think?
Attached patch Rebased, tested patch (obsolete) — Splinter Review
Updated nanojit patch (again), tweaked and fixed a few minor issues. Tested with TR (see subsequent patch) but not TM.
Attachment #420431 - Attachment is obsolete: true
Attached patch proof-of-concept for tamarin (obsolete) — Splinter Review
really just a proof-of-concept patch for tamarin, which inlines the intToAtom callout. (on 64-bit this is probably worth doing. on 32-bit it may or may not, careful code measurement would be necessary... this part was really done just to flush out bugs)
Attached patch Rebased, tested patch, v2 (obsolete) — Splinter Review
Revised version of 420455 that allows inlining of calls-returning-void (ie, don't assume that the function result is in use)
Attachment #420455 - Attachment is obsolete: true
Attached patch Additional patch (obsolete) — Splinter Review
Additive to previous patch: calling markAsClear is inadequate, we need to call freeResourcesOf() to free stack/reg reservations as well.
Attachment #422563 - Flags: review?(gal)
Attached patch Rebased, tested patch, v3 (obsolete) — Splinter Review
Rebased to current NJ; also includes the fix for freeResourcesOf (patch 422563) and an additional fix (LIR_param should break, not return).

I think TR will be using this soon, so I'd like to think about landing it.
Attachment #420647 - Attachment is obsolete: true
Attachment #422563 - Attachment is obsolete: true
Attachment #422657 - Flags: review?(gal)
Attachment #422563 - Flags: review?(gal)
Comment on attachment 422657 [details] [diff] [review]
Rebased, tested patch, v3

Adding Edwin.
Attachment #422657 - Flags: review?(edwsmith)
updated proof-of-concept for TR usage: implement inlining of intToAtom and uintToAtom for 64-bit (where it's trivial).
Attachment #420456 - Attachment is obsolete: true
Attachment #422669 - Flags: review?(edwsmith)
Uploaded wrong patch. Oops.
Attachment #422657 - Attachment is obsolete: true
Attachment #422811 - Flags: review?(edwsmith)
Attachment #422657 - Flags: review?(gal)
Attachment #422657 - Flags: review?(edwsmith)
Attachment #422811 - Flags: review?(gal)
Comment on attachment 422811 [details] [diff] [review]
Rebased, tested patch, v4

>+        }
>+        else
>+        {
>+            // call-returning-void?
>+            NanoAssert(!to->isUsed());
>+            //to->markAsClear();
>+        }


Can you double check this and then fix the comment and remove the dead code in the comment?
Oops. Yes, that's definitely dead-code that should just be nuked. Want a new patch minus that?
Comment on attachment 422669 [details] [diff] [review]
intToAtom, uintToAtom for TR

As a test-case for assembly phase inlining, these make sense.

however for the purpose of inlining intToAtom (etc) in general, a better place to do it is in Specializer, so that the LIR of the inlined function body is exposed to the optimization pipeline.
Attachment #422669 - Flags: review?(edwsmith) → review-
(In reply to comment #32)
> (From update of attachment 422669 [details] [diff] [review])
> As a test-case for assembly phase inlining, these make sense.

Fair enough -- the primary purpose here was a testcase. If there's a better way to inline these, let's do that instead.
I'm nervous about this chunk of code in Assembler.cpp case LIR_icall:

   // Clear the reservation state of the code we are about to inline.

If any instruction in the body of the inlined function is "live" then it's a bug in the inliner, right?  If a previous copy of the function is not yet finished being inlined, all hell breaks loose.  The reg/stack reservation fields in the LIR instructinos could be nonzero as leftover cruft; if thats whats happening, we should zero  them out, but not generate any code or mutate AR/RegAlloc, as a side effect.  calling freeResourcesOf() seems dangerous.

nit: don't use "l" for a local variable, hard to read in some fonts.

* calling gen() recursively is somewhat mindblowing here, but it works, so okay.

* HashMap::remove() does what it says, but doesn't free the hash list nodes.  inlining a lot of functions will leak nodes until the outer assembler pass is finished and the whole allocator is released.  A fix would be to make a new nested label map, modify the label map code to support nested label maps, and make the new nested label map use an allocator with the lifetime of gen(), so that it will get freed eagerly.  Alternatively HashMap could keep a private list of free hashnodes it could reuse.

for a very long compile session the slow leakage would be O(lir-code-size), so it may not be a problem in practice.  (this is how we already rationalized other minor transient leakage).

If the allocator used by Assembler lives longer than the assembly session, it could be a worse problem.  (I think that's the case for TM, but not TR).
I can't find any other potential bugs but it really needs a "how this works" comment and comments on the key helper methods: 
 - the recursion in gen()
 - how reservations are transferred from a LIR_call to LIR_ret's argument
 - ditto for LIR_param -> LIR_call's args

It looks like inlineState nests correctly, tests needed since this has a high pucker factor.
(In reply to comment #35)
> I can't find any other potential bugs but it really needs a "how this works"
> comment and comments on the key helper methods: 
>  - the recursion in gen()
>  - how reservations are transferred from a LIR_call to LIR_ret's argument
>  - ditto for LIR_param -> LIR_call's args
> 
> tests needed since this has a high pucker factor.

Specifically i meant tests where a() inlines b(), which inlines c() -- gen() recursion more than one level deep.

An assert that protects against impossible inline situations (a() calls a()) would also be good, now or in a followon bug.
Attachment #422811 - Flags: review?(edwsmith) → review-
(In reply to comment #34)
> I'm nervous about this chunk of code in Assembler.cpp case LIR_icall:
> 
>    // Clear the reservation state of the code we are about to inline.
> 
> If any instruction in the body of the inlined function is "live" then it's a
> bug in the inliner, right?  If a previous copy of the function is not yet
> finished being inlined, all hell breaks loose.  The reg/stack reservation
> fields in the LIR instructinos could be nonzero as leftover cruft; if thats
> whats happening, we should zero  them out, but not generate any code or mutate
> AR/RegAlloc, as a side effect.  calling freeResourcesOf() seems dangerous.

If we abort compilation half-way through an inlined chunk, we leave it in a bad state. That's what the loop was for. freeResourceOf() seems a terrible idea. I hope I didn't write that originally. My intent was markAsClear() or something like that.

> 
> nit: don't use "l" for a local variable, hard to read in some fonts.
> 
> * calling gen() recursively is somewhat mindblowing here, but it works, so
> okay.
> 
> * HashMap::remove() does what it says, but doesn't free the hash list nodes. 
> inlining a lot of functions will leak nodes until the outer assembler pass is
> finished and the whole allocator is released.  A fix would be to make a new
> nested label map, modify the label map code to support nested label maps, and
> make the new nested label map use an allocator with the lifetime of gen(), so
> that it will get freed eagerly.  Alternatively HashMap could keep a private
> list of free hashnodes it could reuse.

Not opposed to a separate allocator, however, if we inline the same short fragment a lot we make a lot of malloc calls. Trade-off.

> 
> for a very long compile session the slow leakage would be O(lir-code-size), so
> it may not be a problem in practice.  (this is how we already rationalized
> other minor transient leakage).

I think we already leak all over the place.

> 
> If the allocator used by Assembler lives longer than the assembly session, it
> could be a worse problem.  (I think that's the case for TM, but not TR).

Not sure if we still do, I have to check. If we do, we shouldn't, so not too worried.
(In reply to comment #37)
> freeResourceOf() seems a terrible idea. 

I changed it in my patch; it originally was markAsClear(). When I made that change I thought I had a legitimate reason, but in hindsight, I think I was originally doing the sort of naughtiness that Edwin pointed out... and later changed it, but didn't change it back to markAsClear.
BTW markAsClear() is now called deprecated_markAsClear() and should be avoided.  Use clearReg() and/or clearArIndex() instead.
(In reply to comment #35)
> it really needs a "how this works" comment and comments on the key helper methods: 
>  - the recursion in gen()
>  - how reservations are transferred from a LIR_call to LIR_ret's argument
>  - ditto for LIR_param -> LIR_call's args

Andreas, you have first dibs on this. Want to take a stab at it, or shall I?
 
> It looks like inlineState nests correctly, tests needed since this has a high
> pucker factor.

Indeed, I guess an inlined-function-calling-another-inlined-function. Alternately, we could simply disallow this on principle.
> > tests needed since this has a high pucker factor.

Yeah, took me a while to get all cases right, a few test would be important. The critical part is the state of AR and RegAlloc. It forms a matrix. The outcome depends on whether the value is already in use etc.

> 
> Specifically i meant tests where a() inlines b(), which inlines c() -- gen()
> recursion more than one level deep.
> 
> An assert that protects against impossible inline situations (a() calls a())
> would also be good, now or in a followon bug.

I am not too worried about these cases. There are already a lot of ways in which you can shoot yourself in the foot with nanojit, i.e. not using live correctly. Our IR requires you to understand the structure of the code you generate. If we can add easy asserts to help diagnose problems, we should definitely do that though.
Blocks: 458279
No longer blocks: 458279
Component: JavaScript Engine → Nanojit
Attached patch patchSplinter Review
rebased, untested
Comment on attachment 422811 [details] [diff] [review]
Rebased, tested patch, v4

removed long-dormat review request
Attachment #422811 - Flags: review?(gal)
Edwin, does Nanojit want this?
Summary: TM: Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call → Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call
Product: Core → Core Graveyard
Nanojit has been dead for several years. Its Bugzilla component has been moved to the graveyard (bug 984276).

I checked all the open bugs. They're all uninteresting, so I'm WONTFIXing them all. Apologies for the bugspam.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.