Closed
Bug 522593
Opened 15 years ago
Closed 11 years ago
Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call
Categories
(Core Graveyard :: Nanojit, defect)
Core Graveyard
Nanojit
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: gal, Assigned: gal)
References
Details
Attachments
(4 files, 15 obsolete files)
41.96 KB,
patch
|
Details | Diff | Splinter Review | |
5.37 KB,
patch
|
edwsmith
:
review-
|
Details | Diff | Splinter Review |
14.64 KB,
patch
|
edwsmith
:
review-
|
Details | Diff | Splinter Review |
24.82 KB,
patch
|
Details | Diff | Splinter Review |
I could have sworn I had a bug for this but I can't find it.
The idea here is to add a LIns* field to CallInfo and when assembling we can inline that LIR code in place instead of emitting a call. This avoids the prolog/epilog overhead. At the same time the LIR only stores the call, and it can be matched in our filters so box/unbox elimination works, but when a box really remains on trace, we emit optimized code for it.
Assignee | ||
Updated•15 years ago
|
Assignee: general → gal
Assignee | ||
Comment 1•15 years ago
|
||
Assignee | ||
Comment 2•15 years ago
|
||
Current state. I had to rewrite the reservation code a bit to be more flexible. Inlined LIR code must end with a LIR_[i|f|q]ret, and that ret inherits the reservation of the call. Next up is properly connecting arguments to their definitions when inlining.
Assignee | ||
Comment 3•15 years ago
|
||
Attachment #406563 -
Attachment is obsolete: true
Assignee | ||
Comment 4•15 years ago
|
||
Complete inlining support and basic framework in place to on-the fly generate inline code in the tracer and cache and reuse it (still untested, and no builtin translated so far).
Attachment #406585 -
Attachment is obsolete: true
Assignee | ||
Updated•15 years ago
|
Summary: TM: Support inlining of LIR_calls if the function is available as LIR as well → TM: Support inlining of LIR_calls if the function is available as LIR as well [shaver is very hot for this]
Assignee | ||
Updated•15 years ago
|
Attachment #406613 -
Attachment is patch: true
Attachment #406613 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Updated•15 years ago
|
OS: Mac OS X → All
Hardware: x86 → All
Summary: TM: Support inlining of LIR_calls if the function is available as LIR as well [shaver is very hot for this] → TM: Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call
Assignee | ||
Comment 5•15 years ago
|
||
Complete patch, but not working yet. Here is a test case:
var k = 0, a = [5];
for (var i = 0; i < 10; ++i)
k += a[0];
This calls UnboxDouble for the array access.
Attachment #406613 -
Attachment is obsolete: true
Assignee | ||
Comment 6•15 years ago
|
||
We compile 2 traces. I have to look into that why, but this compiles the code as intended.
Attachment #406617 -
Attachment is obsolete: true
Assignee | ||
Comment 7•15 years ago
|
||
Here is the gist of it. Instead of emitting code to do the call, LIR is directly inlined into the trace. The control-flow merge phi is done with a stack store, which is all sorts of lame, but its probably still better than the epilog/prolog overhead.
----------------------------------- ## END exit block 0x600c218
js_UnboxDouble2 = fcall #js_UnboxDouble ( ld11 )
ialloc2 = ialloc 8
>FP 4(state) 8(ebx) 12(esi) 16(edi) 20-24(ialloc2)
and6 = and iparam2, JSVAL_INT
0005e95e7c mov eax,edi ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
0005e95e7e and eax,1 ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
jt eq7 -> label5
0005e95e81 cmp eax,1 eax(and6) ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
0005e95e84 je 0x5e95e92 ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
ldq4 = ldq iparam2[-2]
0005e95e86 movq xmm1,-2(edi) ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
stqi ialloc2[0] = ldq4
0005e95e8b movq -24(ebp),xmm1 ecx(state) ebx(ld9) esi(sp) xmm0(ld1) xmm1(ldq4)
j -> label4
0005e95e90 jmp 0x5e95ea2 ecx(state) ebx(ld9) esi(sp) xmm0(ld1)
label5:
0005e95e92 [label5] ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
rsh2 = rsh iparam2, 1
0005e95e92 sar edi,1 ecx(state) ebx(ld9) esi(sp) edi(iparam2) xmm0(ld1)
i2f3 = i2f rsh2
0005e95e95 xorpd xmm1,xmm1 ecx(state) ebx(ld9) esi(sp) edi(rsh2) xmm0(ld1)
0005e95e99 cvtsi2sd xmm1,edi ecx(state) ebx(ld9) esi(sp) edi(rsh2) xmm0(ld1)
stqi ialloc2[0] = i2f3
0005e95e9d movq -24(ebp),xmm1 ecx(state) ebx(ld9) esi(sp) xmm0(ld1) xmm1(i2f3)
label4:
0005e95ea2 [label4] ecx(state) ebx(ld9) esi(sp) xmm0(ld1)
ldq3 = ldq ialloc2[0]
0005e95ea2 movq xmm1,-24(ebp) ecx(state) ebx(ld9) esi(sp) xmm0(ld1)
<FP 4(state) 8(ebx) 12(esi) 16(edi)
fret ldq3
fadd2 = fadd ld1, js_UnboxDouble2
0005e95ea7 addsd xmm0,xmm1 ecx(state) ebx(ld9) esi(sp) xmm0(ld1) xmm1(js_UnboxDouble2)
stqi state[748] = fadd2
Assignee | ||
Comment 8•15 years ago
|
||
As the title change indicates I decided to leave the LIR filter pipeline alone and not pass in an extra LIns* parameter to indicate the chunk. The semantics is basically that a filter can decide to annotate the call with a LIR chunk as needed. We might reconsider this later but for now it seems to work quite well.
Assignee | ||
Comment 9•15 years ago
|
||
Remove labels after inlining a chunk in case we inline it again.
Now we die when we try to forward the use of a LIR_param inside the inlined chunk and the definition already has a reservation. This can't happen for the LIR_ret at the end of the inlined chunk since that we clear every time, but the LIR_param value is a problem. Ideally we want to snoop the use of the LIR_param instead of ever allocating a reservation for it but I am not sure thats possible. I might have to copy here if the definition of the LIR_param source already has a reservation.
Attachment #406627 -
Attachment is obsolete: true
Assignee | ||
Comment 10•15 years ago
|
||
Force a register allocation for both participants of a forwarding if the destination is already used and register-copy the value.
Attachment #406741 -
Attachment is obsolete: true
Assignee | ||
Comment 11•15 years ago
|
||
Looks like we are not freeing the activation record entry of a LIR_iparam somewhere.
TEST-PASS | trace-test.js | testEqFalseEmptyString
TEST-PASS | trace-test.js | testIncDec
Assertion failed: frame entry -28 wasn't freed
: _activation.entry[i] == 0 (../nanojit/Assembler.cpp:834)
(gdb) p *_activation.entry[i]
$2 = {
{
lastWord = {
arIndex = 7,
reg = nanojit::UnknownReg,
used = 1,
opcode = nanojit::LIR_iparam
},
dummy = 0x8910007
}
}
(gdb)
Assignee | ||
Comment 12•15 years ago
|
||
Deallocate the forwarding source's reservation.
Attachment #406811 -
Attachment is obsolete: true
Assignee | ||
Comment 13•15 years ago
|
||
Remaining failures:
/Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/basic/testApplyCall.js
/Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/basic/testTableSwitch2.js
/Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/sunspider/check-3d-raytrace.js
/Users/gal/workspace/tracemonkey-repository/js/src/trace-test/tests/sunspider/check-date-format-xparb.js
Assignee | ||
Comment 14•15 years ago
|
||
I am going to get those two lines right eventually.
Attachment #406830 -
Attachment is obsolete: true
Assignee | ||
Updated•15 years ago
|
Attachment #406831 -
Attachment is patch: true
Attachment #406831 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Comment 15•15 years ago
|
||
Attachment #406831 -
Attachment is obsolete: true
Assignee | ||
Comment 16•15 years ago
|
||
I see a very slight speedup (3ms).
Assignee | ||
Comment 17•15 years ago
|
||
Discussed with ed. Patch will land.
Comment 18•15 years ago
|
||
Andreas: cool stuff, but we're gonna stomp on each others toes re: changes to AR. You may want to review my proposed patch for https://bugzilla.mozilla.org/show_bug.cgi?id=473769 since I was planning on landing it Real Soon Now.
Comment 19•15 years ago
|
||
Updated patch 407200 for current NJ tip... compiles, but I haven't tested in TM at all yet (this is for some TR experimentation). Offered with no warranty for now.
Assignee | ||
Comment 20•15 years ago
|
||
Thanks Steven.
Comment 21•15 years ago
|
||
This looks like a cool approach I really want to make use of... one nit I see
with this is that nonlinear inlines pretty much have to use insAlloc() for a
phi nodes, which means we end up doing redundant load/stores unless we're
verrry careful. Any way we could dance around this, you think?
Comment 22•15 years ago
|
||
Updated nanojit patch (again), tweaked and fixed a few minor issues. Tested with TR (see subsequent patch) but not TM.
Attachment #420431 -
Attachment is obsolete: true
Comment 23•15 years ago
|
||
really just a proof-of-concept patch for tamarin, which inlines the intToAtom callout. (on 64-bit this is probably worth doing. on 32-bit it may or may not, careful code measurement would be necessary... this part was really done just to flush out bugs)
Comment 24•15 years ago
|
||
Revised version of 420455 that allows inlining of calls-returning-void (ie, don't assume that the function result is in use)
Attachment #420455 -
Attachment is obsolete: true
Comment 25•15 years ago
|
||
Additive to previous patch: calling markAsClear is inadequate, we need to call freeResourcesOf() to free stack/reg reservations as well.
Attachment #422563 -
Flags: review?(gal)
Comment 26•15 years ago
|
||
Rebased to current NJ; also includes the fix for freeResourcesOf (patch 422563) and an additional fix (LIR_param should break, not return).
I think TR will be using this soon, so I'd like to think about landing it.
Attachment #420647 -
Attachment is obsolete: true
Attachment #422563 -
Attachment is obsolete: true
Attachment #422657 -
Flags: review?(gal)
Attachment #422563 -
Flags: review?(gal)
Comment 27•15 years ago
|
||
Comment on attachment 422657 [details] [diff] [review]
Rebased, tested patch, v3
Adding Edwin.
Attachment #422657 -
Flags: review?(edwsmith)
Comment 28•15 years ago
|
||
updated proof-of-concept for TR usage: implement inlining of intToAtom and uintToAtom for 64-bit (where it's trivial).
Attachment #420456 -
Attachment is obsolete: true
Attachment #422669 -
Flags: review?(edwsmith)
Comment 29•15 years ago
|
||
Uploaded wrong patch. Oops.
Attachment #422657 -
Attachment is obsolete: true
Attachment #422811 -
Flags: review?(edwsmith)
Attachment #422657 -
Flags: review?(gal)
Attachment #422657 -
Flags: review?(edwsmith)
Updated•15 years ago
|
Attachment #422811 -
Flags: review?(gal)
Assignee | ||
Comment 30•15 years ago
|
||
Comment on attachment 422811 [details] [diff] [review]
Rebased, tested patch, v4
>+ }
>+ else
>+ {
>+ // call-returning-void?
>+ NanoAssert(!to->isUsed());
>+ //to->markAsClear();
>+ }
Can you double check this and then fix the comment and remove the dead code in the comment?
Comment 31•15 years ago
|
||
Oops. Yes, that's definitely dead-code that should just be nuked. Want a new patch minus that?
Comment 32•15 years ago
|
||
Comment on attachment 422669 [details] [diff] [review]
intToAtom, uintToAtom for TR
As a test-case for assembly phase inlining, these make sense.
however for the purpose of inlining intToAtom (etc) in general, a better place to do it is in Specializer, so that the LIR of the inlined function body is exposed to the optimization pipeline.
Attachment #422669 -
Flags: review?(edwsmith) → review-
Comment 33•15 years ago
|
||
(In reply to comment #32)
> (From update of attachment 422669 [details] [diff] [review])
> As a test-case for assembly phase inlining, these make sense.
Fair enough -- the primary purpose here was a testcase. If there's a better way to inline these, let's do that instead.
Comment 34•15 years ago
|
||
I'm nervous about this chunk of code in Assembler.cpp case LIR_icall:
// Clear the reservation state of the code we are about to inline.
If any instruction in the body of the inlined function is "live" then it's a bug in the inliner, right? If a previous copy of the function is not yet finished being inlined, all hell breaks loose. The reg/stack reservation fields in the LIR instructinos could be nonzero as leftover cruft; if thats whats happening, we should zero them out, but not generate any code or mutate AR/RegAlloc, as a side effect. calling freeResourcesOf() seems dangerous.
nit: don't use "l" for a local variable, hard to read in some fonts.
* calling gen() recursively is somewhat mindblowing here, but it works, so okay.
* HashMap::remove() does what it says, but doesn't free the hash list nodes. inlining a lot of functions will leak nodes until the outer assembler pass is finished and the whole allocator is released. A fix would be to make a new nested label map, modify the label map code to support nested label maps, and make the new nested label map use an allocator with the lifetime of gen(), so that it will get freed eagerly. Alternatively HashMap could keep a private list of free hashnodes it could reuse.
for a very long compile session the slow leakage would be O(lir-code-size), so it may not be a problem in practice. (this is how we already rationalized other minor transient leakage).
If the allocator used by Assembler lives longer than the assembly session, it could be a worse problem. (I think that's the case for TM, but not TR).
Comment 35•15 years ago
|
||
I can't find any other potential bugs but it really needs a "how this works" comment and comments on the key helper methods:
- the recursion in gen()
- how reservations are transferred from a LIR_call to LIR_ret's argument
- ditto for LIR_param -> LIR_call's args
It looks like inlineState nests correctly, tests needed since this has a high pucker factor.
Comment 36•15 years ago
|
||
(In reply to comment #35)
> I can't find any other potential bugs but it really needs a "how this works"
> comment and comments on the key helper methods:
> - the recursion in gen()
> - how reservations are transferred from a LIR_call to LIR_ret's argument
> - ditto for LIR_param -> LIR_call's args
>
> tests needed since this has a high pucker factor.
Specifically i meant tests where a() inlines b(), which inlines c() -- gen() recursion more than one level deep.
An assert that protects against impossible inline situations (a() calls a()) would also be good, now or in a followon bug.
Updated•15 years ago
|
Attachment #422811 -
Flags: review?(edwsmith) → review-
Assignee | ||
Comment 37•15 years ago
|
||
(In reply to comment #34)
> I'm nervous about this chunk of code in Assembler.cpp case LIR_icall:
>
> // Clear the reservation state of the code we are about to inline.
>
> If any instruction in the body of the inlined function is "live" then it's a
> bug in the inliner, right? If a previous copy of the function is not yet
> finished being inlined, all hell breaks loose. The reg/stack reservation
> fields in the LIR instructinos could be nonzero as leftover cruft; if thats
> whats happening, we should zero them out, but not generate any code or mutate
> AR/RegAlloc, as a side effect. calling freeResourcesOf() seems dangerous.
If we abort compilation half-way through an inlined chunk, we leave it in a bad state. That's what the loop was for. freeResourceOf() seems a terrible idea. I hope I didn't write that originally. My intent was markAsClear() or something like that.
>
> nit: don't use "l" for a local variable, hard to read in some fonts.
>
> * calling gen() recursively is somewhat mindblowing here, but it works, so
> okay.
>
> * HashMap::remove() does what it says, but doesn't free the hash list nodes.
> inlining a lot of functions will leak nodes until the outer assembler pass is
> finished and the whole allocator is released. A fix would be to make a new
> nested label map, modify the label map code to support nested label maps, and
> make the new nested label map use an allocator with the lifetime of gen(), so
> that it will get freed eagerly. Alternatively HashMap could keep a private
> list of free hashnodes it could reuse.
Not opposed to a separate allocator, however, if we inline the same short fragment a lot we make a lot of malloc calls. Trade-off.
>
> for a very long compile session the slow leakage would be O(lir-code-size), so
> it may not be a problem in practice. (this is how we already rationalized
> other minor transient leakage).
I think we already leak all over the place.
>
> If the allocator used by Assembler lives longer than the assembly session, it
> could be a worse problem. (I think that's the case for TM, but not TR).
Not sure if we still do, I have to check. If we do, we shouldn't, so not too worried.
Comment 38•15 years ago
|
||
(In reply to comment #37)
> freeResourceOf() seems a terrible idea.
I changed it in my patch; it originally was markAsClear(). When I made that change I thought I had a legitimate reason, but in hindsight, I think I was originally doing the sort of naughtiness that Edwin pointed out... and later changed it, but didn't change it back to markAsClear.
Comment 39•15 years ago
|
||
BTW markAsClear() is now called deprecated_markAsClear() and should be avoided. Use clearReg() and/or clearArIndex() instead.
Comment 40•15 years ago
|
||
(In reply to comment #35)
> it really needs a "how this works" comment and comments on the key helper methods:
> - the recursion in gen()
> - how reservations are transferred from a LIR_call to LIR_ret's argument
> - ditto for LIR_param -> LIR_call's args
Andreas, you have first dibs on this. Want to take a stab at it, or shall I?
> It looks like inlineState nests correctly, tests needed since this has a high
> pucker factor.
Indeed, I guess an inlined-function-calling-another-inlined-function. Alternately, we could simply disallow this on principle.
Assignee | ||
Comment 41•15 years ago
|
||
> > tests needed since this has a high pucker factor.
Yeah, took me a while to get all cases right, a few test would be important. The critical part is the state of AR and RegAlloc. It forms a matrix. The outcome depends on whether the value is already in use etc.
>
> Specifically i meant tests where a() inlines b(), which inlines c() -- gen()
> recursion more than one level deep.
>
> An assert that protects against impossible inline situations (a() calls a())
> would also be good, now or in a followon bug.
I am not too worried about these cases. There are already a lot of ways in which you can shoot yourself in the foot with nanojit, i.e. not using live correctly. Our IR requires you to understand the structure of the code you generate. If we can add easy asserts to help diagnose problems, we should definitely do that though.
Updated•14 years ago
|
Component: JavaScript Engine → Nanojit
Assignee | ||
Comment 42•14 years ago
|
||
rebased, untested
Comment 43•14 years ago
|
||
Comment on attachment 422811 [details] [diff] [review]
Rebased, tested patch, v4
removed long-dormat review request
Attachment #422811 -
Flags: review?(gal)
Comment 44•13 years ago
|
||
Edwin, does Nanojit want this?
Updated•13 years ago
|
Summary: TM: Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call → Support annotating existing LIR_calls to inline a chunk of LIR instead of a native call
Updated•11 years ago
|
Product: Core → Core Graveyard
Comment 45•11 years ago
|
||
Nanojit has been dead for several years. Its Bugzilla component has been moved to the graveyard (bug 984276).
I checked all the open bugs. They're all uninteresting, so I'm WONTFIXing them all. Apologies for the bugspam.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•