IonMonkey: inline richards :204

RESOLVED FIXED in mozilla36

Status

()

Core
JavaScript Engine
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: h4writer, Assigned: h4writer)

Tracking

(Blocks: 1 bug)

unspecified
mozilla36
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 8 obsolete attachments)

(Assignee)

Description

5 years ago
In richards function :204 isn't inlined. This is enormously short and gets called a lot.

http://alasal.be/ionmonkey/index.php?subject=octane-richards-82d28bdf9317 
:204 gets called 3462534 times
:305 (parent) gets called 3213820 times

There is not really a reason why this shouldn't get inlined.

Reason:
A -> B -> C

A is hot enough to get compiled (usecount 1000). At that time B has only be called 100 times (no loops in B). But that's high enough to get inlined. Now C gets always called when B gets called. So they have the same hotness. But due to parent script heurstic we don't. Parent (i.e. B) needs to have usecount 125 before any of his functions get inlined.

Now not much later B and C are hot enough to get inlined. But we don't recompile anymore. So we are stuck in not inlined C.

Two possible solutions:
1) Enable usecount in ion again and add recompile for A when a callee is hot enough
2) Idea I have been thinking of to create a non-inlined version with high priority and a full inlined version with low priority and switch to improved version as soon as inlined version is done.
(Assignee)

Updated

5 years ago
Blocks: 807188
(Assignee)

Comment 1

5 years ago
(In reply to Hannes Verschore [:h4writer] from comment #0)
> http://alasal.be/ionmonkey/index.php?subject=octane-richards-82d28bdf9317 
> :204 gets called 3462534 times
> :305 (parent) gets called 3213820 times

The parent is :374. Sorry about making this mistake. But the issue is the same. :374 calls :204 just once every execution. So :204/:305 have the same usecount.

Comment 2

5 years ago
Created attachment 801235 [details] [diff] [review]
First approach draft v1

I made a draft for the first solution you suggested. It does inline :204, but I get 3 recompile bailouts and performance is worse than the unpatched version.

Comment 3

5 years ago
Created attachment 801239 [details] [diff] [review]
911738_inline_small_funcs.patch

With this change :204 gets inlined in the first attempt, and I get a ~500 point increase (~2.48%).
(Assignee)

Comment 4

5 years ago
(In reply to Yaron Tausky from comment #3)
> Created attachment 801239 [details] [diff] [review]
> 911738_inline_small_funcs.patch
> 
> With this change :204 gets inlined in the first attempt, and I get a ~500
> point increase (~2.48%).

As explained on IRC I don't think this is a good way to solve it. This increases the amount of inlining, resulting in higher compilation times. There is a balance between inlining and compilation time. If we really want to inline everything we should go for the proposed second solution. But that needs some research and a lot of changes. Still doable though and hopefully I or somebody else can try that in the future.

(In reply to Yaron Tausky from comment #2)
> Created attachment 801235 [details] [diff] [review]
> First approach draft v1
> 
> I made a draft for the first solution you suggested. It does inline :204,
> but I get 3 recompile bailouts and performance is worse than the unpatched
> version.

I saw the following:
unpatched: 26200
patched: 24500

Patched version is indeed slower, but why. I quickly disabled the improvement of recompilation by changing InliningStatus_MaybeLater in makeInliningDecision to InliningStatus_NotInlined. That way we would know if the recompilation is the real issue.
patched + disable recompile: 23000

Interesting. Recompilation is indeed better, but we are still loosing time. But where. The only difference is MIncUseCount that gets added. So I removed that.
patched + disable recompile + disable MIncUseCount: 26000

So the reason is that MIncUseCount is too slow. I don't really see a way to improve the IncUseCount that much, but there is another solution. We only need to decrease the number of times we add this. It only makes sense to add this when usecount is low. If it is already high (e.g. 1000). We won't inline more because it gets even higher. So I tested this with only adding MIncUseCount when usecount is low:
patched + disable recompile + limited adding MIncUseCount: 25500

This seems better. We could possibly decrease the amount of MIncUseCounts it even more. I'll let this for you to research. Now I tried enabling recompile check. We had a 6% improvement with enabling it. Hopefully that works now too:
patched + limited adding MIncUseCount: 27500

I think that is a nice win ;)
(Assignee)

Comment 5

5 years ago
Sorry about the bad English. I should have reread my brain dump, before posting.

If you also get this improvement, I think you can finish up this patch and make sure this doesn't decrease performance on SS/kraken/Octane and mark for review? I think it was mostly done, right?

Comment 6

5 years ago
Nice catch! I'll work on it over the weekend. I'd like to change the code generation a bit, I suspect that right now it only works on x64 (and maybe I can spare a register while I'm at it?).

Comment 7

5 years ago
Created attachment 804986 [details] [diff] [review]
911738_recompile.patch

Indeed I saw the same improvement. Some of the other tests have suffered, though. Here are my results for the unpatched version (averaged over 10 runs, with top and bottom results discarded):
Richards 19803
DeltaBlue 19262
Crypto 21912
RayTrace 23870
EarleyBoyer 17528
RegExp 3438
Splay 13288
NavierStokes 26298
PdfJS 10541
Mandreel 4917
Gameboy 27500
CodeLoad 23811
Box2D 7096
Score (version 8) 14216

And the results for the patched version:
Richards 21033
DeltaBlue 19128
Crypto 21947
RayTrace 23007
EarleyBoyer 17495
RegExp 3400
Splay 12902
NavierStokes 26426
PdfJS 10669
Mandreel 5043
Gameboy 27290
CodeLoad 23998
Box2D 7076
Score (version 8) 14248
So, a slight improvement of the overall score. I'll upload the SS and Kraken results as separate files.
Attachment #801235 - Attachment is obsolete: true
Attachment #801239 - Attachment is obsolete: true
Attachment #804986 - Flags: review?(hv1989)

Comment 8

5 years ago
Created attachment 804987 [details]
911738_ss_comparison.txt

Tested with sunspider-1.0.1.

Comment 9

5 years ago
Created attachment 804990 [details]
911738_kraken_comparison.txt

Tested with kraken-1.1.

I also sent the patch to try: https://tbpl.mozilla.org/?tree=Try&rev=43f2337aa9e8
I don't have a way to test it myself on ARM, though.
(Assignee)

Comment 10

5 years ago
The regression on kraken-ai-astar was increased due to this bug.
Depends on: 916846
(Assignee)

Comment 11

5 years ago
Comment on attachment 804986 [details] [diff] [review]
911738_recompile.patch

Review of attachment 804986 [details] [diff] [review]:
-----------------------------------------------------------------

Most things look good. But can you report numbers on x86 (32bit) and with --parallel-compile=on as argument to js?
That's our current configuration. For the later you need a thread-safe build. If you don't know how to create one, just ask me on irc ;)
(Assignee)

Updated

5 years ago
Depends on: 918278
(Assignee)

Comment 12

5 years ago
Comment on attachment 804986 [details] [diff] [review]
911738_recompile.patch

Review of attachment 804986 [details] [diff] [review]:
-----------------------------------------------------------------

Removing review request. I reran the numbers on v8. We are hitting a regression in raytrace, caused because we are invalidating the script, instead of only requesting a recompile and upon ready insert the script. This should remove the raytrace regression.
Atm. this improves richards with 11%, but decreases raytrace with 9%.

When the depending bug gets fixed, I'll review this.
Attachment #804986 - Flags: review?(hv1989)
With bug 918278 fixed, this should be doable, right?
Flags: needinfo?(hv1989)
(Assignee)

Comment 14

5 years ago
(In reply to Till Schneidereit [:till] from comment #13)
> With bug 918278 fixed, this should be doable, right?

Correct. So now there is already a MRecompileCheck. So or we create a new specific MIR/LIR for this or we adjust MRecompileCheck to have two ways to work... Afterwards in jit/VMFunctions.cpp there is a function Recompile that should do this. The only small adjustment is that a new option to jit/Ion.cpp should get added to "force" a recompile.

(This is not in my immediate todo list. So anybody feel free to pick it up. I can mentor this.)
Flags: needinfo?(hv1989)
Yaron, you wouldn't be interested in taking this up again by any chance?
Flags: needinfo?(yaron.tausky)

Comment 16

5 years ago
I kinda drifted to other things now, but I'll take a look and try it. Might take a few days, though.
Flags: needinfo?(yaron.tausky)
Awesome, thanks!
Yaron, are you still interested in working on this? Otherwise, we should probably turn it into a mentored bug.
Flags: needinfo?(yaron.tausky)

Comment 19

4 years ago
Sorry for the wait. I think it would be better to release this bug and let someone else have a go.
Flags: needinfo?(yaron.tausky)
Assigning to h4writer per IRC discussion
Assignee: general → hv1989
(Assignee)

Comment 21

4 years ago
Created attachment 8418668 [details] [diff] [review]
Add check to recompile when hitting heurstic that disables inlining

Uses the infrastructure set up by optimization levels to force a recompile when we can potentially inline another function.
Attachment #804986 - Attachment is obsolete: true
Attachment #804987 - Attachment is obsolete: true
Attachment #804990 - Attachment is obsolete: true
Attachment #8418668 - Flags: review?(jdemooij)
(Assignee)

Comment 22

4 years ago
Note: richards :204 already gets inlined due to another bugfix. So shouldn't make a big difference there.
(Assignee)

Comment 23

4 years ago
Created attachment 8418669 [details] [diff] [review]
8418668: Add check to recompile when hitting heurstic that disables inlining

Oh forgot to qref
Attachment #8418668 - Attachment is obsolete: true
Attachment #8418668 - Flags: review?(jdemooij)
Attachment #8418669 - Flags: review?(jdemooij)
Comment on attachment 8418669 [details] [diff] [review]
8418668: Add check to recompile when hitting heurstic that disables inlining

Review of attachment 8418669 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay.

::: js/src/jit/Ion.cpp
@@ +2047,5 @@
>  }
>  
>  static MethodStatus
>  Compile(JSContext *cx, HandleScript script, BaselineFrame *osrFrame, jsbytecode *osrPc,
> +        bool constructing, ExecutionMode executionMode, bool force = false)

Nit: s/force/forceRecompile to make this argument a bit less ambiguous, also in the callers.

@@ +2118,1 @@
>          return Method_Compiled;

Can we MOZ_ASSERT_IF(osrPc, osrPc == script->ionScript()->osrPc()); here?

@@ +2179,5 @@
>  
> +    // Return the compilation was skipped when the osr pc wasn't adjusted.
> +    scriptIon = GetIonScript(script, SequentialExecution);
> +    if (!scriptIon || pc != scriptIon->osrPc())
> +        return Method_Skipped;

If Compile returns Method_Compiled, there must be an IonScript right?

::: js/src/jit/IonBuilder.cpp
@@ +5172,5 @@
>  
> +    if (target && status == InliningStatus_UseCountToLow) {
> +        MRecompileCheck *check = MRecompileCheck::New(alloc(), target->nonLazyScript(),
> +                                                      optimizationInfo().usesBeforeInlining(),
> +                                                      /* force = */ true);

This means we will bump the callee's use count twice right? Can we just turn it into a check without incrementing it in this case?

::: js/src/jit/IonBuilder.h
@@ +629,5 @@
>      enum InliningStatus
>      {
>          InliningStatus_Error,
>          InliningStatus_NotInlined,
> +        InliningStatus_UseCountToLow,

Nit: s/ToLow/TooLow

@@ +638,5 @@
>      {
>          InliningDecision_Error,
>          InliningDecision_Inline,
> +        InliningDecision_DontInline,
> +        InliningDecision_UseCountToLow

Same here.
Attachment #8418669 - Flags: review?(jdemooij)
(Assignee)

Comment 25

4 years ago
Comment on attachment 8418669 [details] [diff] [review]
8418668: Add check to recompile when hitting heurstic that disables inlining

Review of attachment 8418669 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Ion.cpp
@@ +2118,1 @@
>          return Method_Compiled;

No we can't. Method_Compiled is returned when there is a IonScript. This can be an old IonScript with wrong osrPC().
Callers that want a specific pc (like enterAtBranch) need to check this. This makes code easier and removes the need of "failedState".

@@ +2179,5 @@
>  
> +    // Return the compilation was skipped when the osr pc wasn't adjusted.
> +    scriptIon = GetIonScript(script, SequentialExecution);
> +    if (!scriptIon || pc != scriptIon->osrPc())
> +        return Method_Skipped;

Yes, correct.

::: js/src/jit/IonBuilder.cpp
@@ +5172,5 @@
>  
> +    if (target && status == InliningStatus_UseCountToLow) {
> +        MRecompileCheck *check = MRecompileCheck::New(alloc(), target->nonLazyScript(),
> +                                                      optimizationInfo().usesBeforeInlining(),
> +                                                      /* force = */ true);

Oh right. Since callee->useCount() < 1000 we will still be running in baseline/interpreter, which update the usecount themself.
(Assignee)

Comment 26

4 years ago
Created attachment 8421672 [details] [diff] [review]
Add check to recompile when hitting heurstic that disables inlining
Attachment #8418669 - Attachment is obsolete: true
Attachment #8421672 - Flags: review?(jdemooij)
Comment on attachment 8421672 [details] [diff] [review]
Add check to recompile when hitting heurstic that disables inlining

Review of attachment 8421672 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments below addressed.

::: js/src/jit/CodeGenerator.cpp
@@ +8626,5 @@
> +    AbsoluteAddress useCount = AbsoluteAddress(ins->mir()->script()->addressOfUseCount());
> +    if (ins->mir()->increaseUseCount()) {
> +        masm.loadPtr(useCount, tmp);
> +        masm.add32(Imm32(1), tmp);
> +        masm.storePtr(tmp, useCount);

These should be load32 and store32

::: js/src/jit/Ion.cpp
@@ +2047,5 @@
>  }
>  
>  static MethodStatus
>  Compile(JSContext *cx, HandleScript script, BaselineFrame *osrFrame, jsbytecode *osrPc,
> +        bool constructing, ExecutionMode executionMode, bool force = false)

s/force/forceRecompile

@@ +2157,5 @@
> +
> +    // By default a recompilation doesn't happen on osr mismatch.
> +    // Decide if we want to force a recompilation if this happens too much.
> +    bool force = false;
> +    IonScript *scriptIon = GetIonScript(script, SequentialExecution);

Nit: script->ionScript()

@@ +2176,5 @@
>              ForbidCompilation(cx, script);
>          return status;
>      }
>  
> +    // Return the compilation was skipped when the osr pc wasn't adjusted.

When does Compile return Method_Compiled without the OSR pc matching? Can you add a line to this comment explaining when this happens?

@@ +2178,5 @@
>      }
>  
> +    // Return the compilation was skipped when the osr pc wasn't adjusted.
> +    scriptIon = GetIonScript(script, SequentialExecution);
> +    if (pc != scriptIon->osrPc())

Nit: if (pc != script->ionScript()->osrPc())
Attachment #8421672 - Flags: review?(jdemooij) → review+

Comment 29

4 years ago
This bug (or bug 1008590) appears to have caused a 14.3% regression on Octane-mandreel and a 7.1% regression on Octane-mandreelLatency on machine 11 on AWFY

range from AWFY: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=10c6ee2c7524&tochange=66a520c18efe
(Assignee)

Comment 30

4 years ago
It was this bug. Already under investigating. Probably gonna backout soonish.
One more datapoint, asmjs-apps-zlib-throughput (non-odin) also regressed (by 12%), I assume due to the same reasons as the others mentioned 2 comments back.
https://hg.mozilla.org/mozilla-central/rev/66a520c18efe
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
(Assignee)

Comment 33

4 years ago
Backout:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b729d9a17712
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 34

4 years ago
So the issue is that when recompiling during the linking fase, we are invalidating the old IonScript. As a result we go back to baseline and it takes until the function starts again that we can go back in ionmonkey. This is a big issue for mandreel apparently.
this caused a 2.33% kraken regression on osx 10.8 in talos, when we backed it out the regression went away.
(Assignee)

Updated

4 years ago
Depends on: 1013172
(Assignee)

Comment 36

4 years ago
Created attachment 8459316 [details] [diff] [review]
Delay linking IonBuilder until execution  WIP
(Assignee)

Updated

4 years ago
Depends on: 1047346
(Assignee)

Comment 37

4 years ago
Comment on attachment 8459316 [details] [diff] [review]
Delay linking IonBuilder until execution  WIP

Took its own bug nummer for it: bug 1047346.
If that lands everything has landed to try this again. Locally I don't see the regressions anymore :D
Attachment #8459316 - Attachment is obsolete: true
(Assignee)

Comment 39

4 years ago
Since the error was from another patch. Back to try to confirm and try relanding:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=df6ccb2f0e2d
https://hg.mozilla.org/mozilla-central/rev/a797dcc3dafc
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: mozilla32 → mozilla36
You need to log in before you can comment on or make changes to this bug.