Closed Bug 706472 Opened 13 years ago Closed 12 years ago

IonMonkey: not inlining function call in 3bit-bits-in-byte

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 3 obsolete files)

Attached file Testcase
Attached is a stand-alone version of bitops-3bit-bits-in-byte.

We don't inline the call to fast3bitlookup because ranInference() in canEnterInlinedScript() returns false. A simple workaround is to change the inner loop:
--
for(var y=0; y<256; y++) {
    func(y);
}
--
to
--
for(var y=0; y<256; y++) {
    func(y);
    func(y)
}
--
The unmodified version runs in 32 ms, the second one in 1 ms.

The problem is that TimeFunc is compiled and analyzed right before fast3bitlookup, so we don't inline the call. JM+TI does not have this problem because it recompiles after ~10000 iterations to inline calls.
Is it too expensive to run inference if we've decided we want to inline something? How is the JM+TI counter implemented?
(In reply to David Anderson [:dvander] from comment #1)
> Is it too expensive to run inference if we've decided we want to inline
> something? How is the JM+TI counter implemented?

Doing the analysis eagerly runs into problems because it can change type information which in turn affects previous inlining decisions.  This could be controlled for by rechecking that type information once all analysis is done, but instead JM+TI avoids this by only inlining calls once a script gets really hot.  Initial non-inlining versions of the code are compiled, and several thousand iterations/calls later the caller will be recompiled and inline applicable calls.  If at that point the callee still has not been analyzed then it isn't hot anyways.
Where does the counter get stored and incremented?
(In reply to David Anderson [:dvander] from comment #3)
> Where does the counter get stored and incremented?

script->useCount is used both for triggering initial compilation and for inlining recompilation.  The counter gets bumped and tested at loop heads and at function entry.  The counter is not updated if the function doesn't have any candidates for inlining (the testing for this is stupid, it just greps for CALL opcodes I think).
Assignee: general → jandemooij
Status: NEW → ASSIGNED
Attached patch WIP v0 (obsolete) — Splinter Review
Hackish and incomplete first stab. Bumps the script use count at loop heads, bails out if it's >= some limit and invalidates the script.
This seems to work well for the testcase in comment 0, but I had to work around some invalidation problem, I'll look into these first.
Attached patch WIP v0.2 (obsolete) — Splinter Review
Much more complete than the previous patch. x86 is mostly done, but code generation for x64 is a bit more complex - going to finish this tomorrow.
Attachment #584491 - Attachment is obsolete: true
Attached patch Patch (obsolete) — Splinter Review
Like JM, this bumps the use count at function entry and loop heads and if it's >= some value, we invalidate the script and return to the interpreter.

Also removes an invalid LSRA assert: an interval can have a use position at interval->end() and covers() returns false in this case. Since use positions were moved from the virtual register to the interval, I think we should just remove this assert.

Works on x86 and x64, but I will add ARM support before landing this.
Attachment #585441 - Attachment is obsolete: true
Attachment #585847 - Flags: review?(dvander)
Arbitrarily invalidating scripts worries me because compilation times in IM are going to be significantly greater than in JM, scaling more than quadratically with the size of the script. Could we be smarter, and only try a recompilation if there is a known candidate for inlining that now has type information?
Note also that if we return to the script in the interpreter, that can re-enter from an OSR entrypoint, and OSR implies deoptimization. So if we're recompiling for the sake of better optimization, we want to avoid OSR as much as possible.
Comment on attachment 585847 [details] [diff] [review]
Patch

Review of attachment 585847 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/TypeOracle.cpp
@@ +252,5 @@
>  
>  bool
> +TypeInferenceOracle::canInlineCalls()
> +{
> +    JS_ASSERT(script->getUseCount() < js_IonOptions.usesBeforeInlining);

What prevents this function from being called if useCount > usesBeforeInlining?

::: js/src/ion/x86/CodeGenerator-x86.cpp
@@ +401,5 @@
> +bool
> +CodeGeneratorX86::visitRecompileCheck(LRecompileCheck *lir)
> +{
> +    Operand addr(gen->info().script()->addressOfUseCount());
> +    masm.addl(Imm32(1), addr);

This (and the x64 equivalent) will break with moving GC. On x86 you can use the [address + offset] form, and on x64 you can use ScratchReg (to get the script base address into a register and add a relocation marker).
Attachment #585847 - Flags: review?(dvander) → review+
(In reply to Sean Stangl from comment #9)
> Arbitrarily invalidating scripts worries me because compilation times in IM
> are going to be significantly greater than in JM, scaling more than
> quadratically with the size of the script. Could we be smarter, and only try
> a recompilation if there is a known candidate for inlining that now has type
> information?

Brian suggested this in comment #4, it sounds like good follow-up work.

(In reply to Sean Stangl from comment #10)
> Note also that if we return to the script in the interpreter, that can
> re-enter from an OSR entrypoint, and OSR implies deoptimization. So if we're
> recompiling for the sake of better optimization, we want to avoid OSR as
> much as possible.

It's not yet clear how much OSR impacts performance, I would be surprised if it was significant, but we should be able to measure.
(In reply to Sean Stangl from comment #9)
> Could we be smarter, and only try
> a recompilation if there is a known candidate for inlining that now has type
> information?

Currently we recompile only if the script has calls. Scanning the bytecode for inlineable calls seems resaonable, but one problem is that if we cannot inline, we'd still have the recompile checks in the script (and it'd have to check for == instead of >= to avoid bailing out on every iteration).
(In reply to David Anderson [:dvander] from comment #11)
> ::: js/src/ion/x86/CodeGenerator-x86.cpp
> @@ +401,5 @@
> > +bool
> > +CodeGeneratorX86::visitRecompileCheck(LRecompileCheck *lir)
> > +{
> > +    Operand addr(gen->info().script()->addressOfUseCount());
> > +    masm.addl(Imm32(1), addr);
> 
> This (and the x64 equivalent) will break with moving GC. On x86 you can use
> the [address + offset] form, and on x64 you can use ScratchReg (to get the
> script base address into a register and add a relocation marker).

It should be safe to treat JSScripts as non-movable within the compiler.  They will not be nursery allocated by a generational GC, and all jitcode will be purged before doing a compacting GC.  The same holds for Shapes.  I believe that it will be possible to never need to worry about moving pointers embedded in jitcode, which should (I think?) avoid the need to distinguish ImmGCPtr from ImmWord.  Object-related pointers could only be embedded if the object was not nursery allocated, but that's not a big deal.  Embedding object pointers is rare, and would only reasonably happen for objects with singleton types (globals, singleton functions, prototypes, etc.), which could always be allocated in the major heap.
> I believe that it will be possible to never need to worry about moving pointers embedded in
> jitcode, which should (I think?) avoid the need to distinguish ImmGCPtr from ImmWord.

I'd prefer to keep the distinction so we can support potential changes in the future without much work. I also prefer using the relocatable functions where we can, even if the types are immovable at the moment (it's very easy and shouldn't affect performance). We can #if 0 writing the actual relocation tables if they're taking up memory for no reason.
Attached patch Patch v2Splinter Review
Addresses review comments. We decided to bake in the script pointer though, it should be safe per comment 14 and avoids a temp register on x86.

Marty, can you review the ARM changes?
Attachment #585847 - Attachment is obsolete: true
Attachment #586941 - Flags: review?(mrosenberg)
(In reply to David Anderson [:dvander] from comment #11)
>
> What prevents this function from being called if useCount >
> usesBeforeInlining?

Yes, we don't really care about the order in which we check these conditions, so I removed the assert.
Comment on attachment 586941 [details] [diff] [review]
Patch v2

Review of attachment 586941 [details] [diff] [review]:
-----------------------------------------------------------------

I've only reviewed the ARM bits of this patch.  Let me know if you had actually wanted me to review any other parts of it.

::: js/src/ion/arm/CodeGenerator-arm.cpp
@@ +1314,5 @@
> +    // since scripts are never nursery allocated and jitcode will be purged before
> +    // doing a compacting GC.
> +    masm.load32(ImmWord(addr), tmp);
> +    masm.ma_add(Imm32(1), tmp);
> +    masm.store32(tmp, ImmWord(addr));

I am a bit worried about having ImmWord(addr) in here twice, since this will result in addr being moved into ScratchRegister twice (likely requiring two instructions each).
Attachment #586941 - Flags: review?(mrosenberg) → review+
Blocks: 716682
(In reply to Marty Rosenberg [:Marty] from comment #18)
> I am a bit worried about having ImmWord(addr) in here twice, since this will
> result in addr being moved into ScratchRegister twice (likely requiring two
> instructions each).

We talked about this on IRC. A possible fix is to add "add32(Imm32, ImmWord, reg)" to the macro assembler, but Marty is working on a more general fix for this problem by tracking the values stored in the scratch register. It should get rid of the second "move ImmWord into ScratchReg".

Filed a follow-up bug (bug 706472) to avoid unnecessary recompilations.

http://hg.mozilla.org/projects/ionmonkey/rev/90bb9afe9b90
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
(In reply to Jan de Mooij (:jandem) from comment #19)
> Filed a follow-up bug (bug 706472)

Embarassing, bug 716682 of course.
You need to log in before you can comment on or make changes to this bug.