Closed Bug 706472 Opened 14 years ago Closed 14 years ago

IonMonkey: not inlining function call in 3bit-bits-in-byte

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 3 obsolete files)

Testcase 14 years ago Jan de Mooij [:jandem] 451 bytes, application/x-javascript		Details
WIP v0 14 years ago Jan de Mooij [:jandem] 15.80 KB, patch		Details \| Diff \| Splinter Review
WIP v0.2 14 years ago Jan de Mooij [:jandem] 32.07 KB, patch		Details \| Diff \| Splinter Review
Patch 14 years ago Jan de Mooij [:jandem] 37.59 KB, patch	dvander : review+	Details \| Diff \| Splinter Review
Patch v2 14 years ago Jan de Mooij [:jandem] 45.54 KB, patch	mjrosenb : review+	Details \| Diff \| Splinter Review

Jan de Mooij [:jandem]

Assignee

Description

•

14 years ago

Attached file Testcase — Details

Attached is a stand-alone version of bitops-3bit-bits-in-byte. We don't inline the call to fast3bitlookup because ranInference() in canEnterInlinedScript() returns false. A simple workaround is to change the inner loop: -- for(var y=0; y<256; y++) { func(y); } -- to -- for(var y=0; y<256; y++) { func(y); func(y) } -- The unmodified version runs in 32 ms, the second one in 1 ms. The problem is that TimeFunc is compiled and analyzed right before fast3bitlookup, so we don't inline the call. JM+TI does not have this problem because it recompiles after ~10000 iterations to inline calls.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 1

•

14 years ago

Is it too expensive to run inference if we've decided we want to inline something? How is the JM+TI counter implemented?

Brian Hackett [Laid off!]

Comment 2

•

14 years ago

(In reply to David Anderson [:dvander] from comment #1) > Is it too expensive to run inference if we've decided we want to inline > something? How is the JM+TI counter implemented? Doing the analysis eagerly runs into problems because it can change type information which in turn affects previous inlining decisions. This could be controlled for by rechecking that type information once all analysis is done, but instead JM+TI avoids this by only inlining calls once a script gets really hot. Initial non-inlining versions of the code are compiled, and several thousand iterations/calls later the caller will be recompiled and inline applicable calls. If at that point the callee still has not been analyzed then it isn't hot anyways.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 3

•

14 years ago

Where does the counter get stored and incremented?

Brian Hackett [Laid off!]

Comment 4

•

14 years ago

(In reply to David Anderson [:dvander] from comment #3) > Where does the counter get stored and incremented? script->useCount is used both for triggering initial compilation and for inlining recompilation. The counter gets bumped and tested at loop heads and at function entry. The counter is not updated if the function doesn't have any candidates for inlining (the testing for this is stupid, it just greps for CALL opcodes I think).

Jan de Mooij [:jandem]

Assignee

Updated

•

14 years ago

Assignee: general → jandemooij

Status: NEW → ASSIGNED

Jan de Mooij [:jandem]

Assignee

Comment 5

•

14 years ago

Attached patch WIP v0 (obsolete) — Details — Splinter Review

Hackish and incomplete first stab. Bumps the script use count at loop heads, bails out if it's >= some limit and invalidates the script.

Jan de Mooij [:jandem]

Assignee

Comment 6

•

14 years ago

This seems to work well for the testcase in comment 0, but I had to work around some invalidation problem, I'll look into these first.

Jan de Mooij [:jandem]

Assignee

Comment 7

•

14 years ago

Attached patch WIP v0.2 (obsolete) — Details — Splinter Review

Much more complete than the previous patch. x86 is mostly done, but code generation for x64 is a bit more complex - going to finish this tomorrow.

Attachment #584491 - Attachment is obsolete: true

Jan de Mooij [:jandem]

Assignee

Comment 8

•

14 years ago

Attached patch Patch (obsolete) — Details — Splinter Review

Like JM, this bumps the use count at function entry and loop heads and if it's >= some value, we invalidate the script and return to the interpreter. Also removes an invalid LSRA assert: an interval can have a use position at interval->end() and covers() returns false in this case. Since use positions were moved from the virtual register to the interval, I think we should just remove this assert. Works on x86 and x64, but I will add ARM support before landing this.

Attachment #585441 - Attachment is obsolete: true

Attachment #585847 - Flags: review?(dvander)

Sean Stangl [:sstangl]

Comment 9

•

14 years ago

Arbitrarily invalidating scripts worries me because compilation times in IM are going to be significantly greater than in JM, scaling more than quadratically with the size of the script. Could we be smarter, and only try a recompilation if there is a known candidate for inlining that now has type information?

Sean Stangl [:sstangl]

Comment 10

•

14 years ago

Note also that if we return to the script in the interpreter, that can re-enter from an OSR entrypoint, and OSR implies deoptimization. So if we're recompiling for the sake of better optimization, we want to avoid OSR as much as possible.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 11

•

14 years ago

Comment on attachment 585847 [details] [diff] [review] Patch Review of attachment 585847 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ion/TypeOracle.cpp @@ +252,5 @@ > > bool > +TypeInferenceOracle::canInlineCalls() > +{ > + JS_ASSERT(script->getUseCount() < js_IonOptions.usesBeforeInlining); What prevents this function from being called if useCount > usesBeforeInlining? ::: js/src/ion/x86/CodeGenerator-x86.cpp @@ +401,5 @@ > +bool > +CodeGeneratorX86::visitRecompileCheck(LRecompileCheck *lir) > +{ > + Operand addr(gen->info().script()->addressOfUseCount()); > + masm.addl(Imm32(1), addr); This (and the x64 equivalent) will break with moving GC. On x86 you can use the [address + offset] form, and on x64 you can use ScratchReg (to get the script base address into a register and add a relocation marker).

Attachment #585847 - Flags: review?(dvander) → review+

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 12

•

14 years ago

(In reply to Sean Stangl from comment #9) > Arbitrarily invalidating scripts worries me because compilation times in IM > are going to be significantly greater than in JM, scaling more than > quadratically with the size of the script. Could we be smarter, and only try > a recompilation if there is a known candidate for inlining that now has type > information? Brian suggested this in comment #4, it sounds like good follow-up work. (In reply to Sean Stangl from comment #10) > Note also that if we return to the script in the interpreter, that can > re-enter from an OSR entrypoint, and OSR implies deoptimization. So if we're > recompiling for the sake of better optimization, we want to avoid OSR as > much as possible. It's not yet clear how much OSR impacts performance, I would be surprised if it was significant, but we should be able to measure.

Jan de Mooij [:jandem]

Assignee

Comment 13

•

14 years ago

(In reply to Sean Stangl from comment #9) > Could we be smarter, and only try > a recompilation if there is a known candidate for inlining that now has type > information? Currently we recompile only if the script has calls. Scanning the bytecode for inlineable calls seems resaonable, but one problem is that if we cannot inline, we'd still have the recompile checks in the script (and it'd have to check for == instead of >= to avoid bailing out on every iteration).

Brian Hackett [Laid off!]

Comment 14

•

14 years ago

(In reply to David Anderson [:dvander] from comment #11) > ::: js/src/ion/x86/CodeGenerator-x86.cpp > @@ +401,5 @@ > > +bool > > +CodeGeneratorX86::visitRecompileCheck(LRecompileCheck *lir) > > +{ > > + Operand addr(gen->info().script()->addressOfUseCount()); > > + masm.addl(Imm32(1), addr); > > This (and the x64 equivalent) will break with moving GC. On x86 you can use > the [address + offset] form, and on x64 you can use ScratchReg (to get the > script base address into a register and add a relocation marker). It should be safe to treat JSScripts as non-movable within the compiler. They will not be nursery allocated by a generational GC, and all jitcode will be purged before doing a compacting GC. The same holds for Shapes. I believe that it will be possible to never need to worry about moving pointers embedded in jitcode, which should (I think?) avoid the need to distinguish ImmGCPtr from ImmWord. Object-related pointers could only be embedded if the object was not nursery allocated, but that's not a big deal. Embedding object pointers is rare, and would only reasonably happen for objects with singleton types (globals, singleton functions, prototypes, etc.), which could always be allocated in the major heap.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 15

•

14 years ago

> I believe that it will be possible to never need to worry about moving pointers embedded in > jitcode, which should (I think?) avoid the need to distinguish ImmGCPtr from ImmWord. I'd prefer to keep the distinction so we can support potential changes in the future without much work. I also prefer using the relocatable functions where we can, even if the types are immovable at the moment (it's very easy and shouldn't affect performance). We can #if 0 writing the actual relocation tables if they're taking up memory for no reason.

Jan de Mooij [:jandem]

Assignee

Comment 16

•

14 years ago

Attached patch Patch v2 — Details — Splinter Review

Addresses review comments. We decided to bake in the script pointer though, it should be safe per comment 14 and avoids a temp register on x86. Marty, can you review the ARM changes?

Attachment #585847 - Attachment is obsolete: true

Attachment #586941 - Flags: review?(mrosenberg)

Jan de Mooij [:jandem]

Assignee

Comment 17

•

14 years ago

(In reply to David Anderson [:dvander] from comment #11) > > What prevents this function from being called if useCount > > usesBeforeInlining? Yes, we don't really care about the order in which we check these conditions, so I removed the assert.

Marty Rosenberg [:mjrosenb]

Comment 18

•

14 years ago

Comment on attachment 586941 [details] [diff] [review] Patch v2 Review of attachment 586941 [details] [diff] [review]: ----------------------------------------------------------------- I've only reviewed the ARM bits of this patch. Let me know if you had actually wanted me to review any other parts of it. ::: js/src/ion/arm/CodeGenerator-arm.cpp @@ +1314,5 @@ > + // since scripts are never nursery allocated and jitcode will be purged before > + // doing a compacting GC. > + masm.load32(ImmWord(addr), tmp); > + masm.ma_add(Imm32(1), tmp); > + masm.store32(tmp, ImmWord(addr)); I am a bit worried about having ImmWord(addr) in here twice, since this will result in addr being moved into ScratchRegister twice (likely requiring two instructions each).

Attachment #586941 - Flags: review?(mrosenberg) → review+

Jan de Mooij [:jandem]

Assignee

Updated

•

14 years ago

Blocks: 716682

Jan de Mooij [:jandem]

Assignee

Comment 19

•

14 years ago

(In reply to Marty Rosenberg [:Marty] from comment #18) > I am a bit worried about having ImmWord(addr) in here twice, since this will > result in addr being moved into ScratchRegister twice (likely requiring two > instructions each). We talked about this on IRC. A possible fix is to add "add32(Imm32, ImmWord, reg)" to the macro assembler, but Marty is working on a more general fix for this problem by tracking the values stored in the scratch register. It should get rid of the second "move ImmWord into ScratchReg". Filed a follow-up bug (bug 706472) to avoid unnecessary recompilations. http://hg.mozilla.org/projects/ionmonkey/rev/90bb9afe9b90

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Jan de Mooij [:jandem]

Assignee

Comment 20

•

14 years ago

(In reply to Jan de Mooij (:jandem) from comment #19) > Filed a follow-up bug (bug 706472) Embarassing, bug 716682 of course.

You need to log in before you can comment on or make changes to this bug.