Closed Bug 777583 Opened 9 years ago Closed 9 years ago

IonMonkey: Unregress 3d-raytrace performance

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla18

People

(Reporter: dvander, Assigned: nbp)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ion:p1:fx18])

Attachments

(5 files, 1 obsolete file)

Attached file benchmark
We're losing about 3ms on 3d-raytrace on my machine (13.6ms -> 16.8ms). The attached benchmark attempts to increase the runtime - I'm not sure how representative this is of the original benchmark, but there is clearly a difference:

JM+TI: 440ms
Ion:   644ms
V8:    295ms
Whiteboard: [js:p1:fx18]
Whiteboard: [js:p1:fx18] → [ion:p1:fx18]
Depends on: 780020
Depends on: 780052
Depends on: 781052
Assignee: general → nicolas.b.pierron
Status: NEW → ASSIGNED
Attached file Reuced benchmark.
This benchmarks is a reduced test case showing ~2.75 slowdown on IonMonkey branch between --no-ion (~1.1ms) and the default JM+Ion couple (~3.0ms) with the same build.  It can be even worst by using the commented line ~1.2ms to ~4.1ms (~3.6 slowdown)

I don't know if this reduced test case is representative of the current performance issue.  I noticed that the performances issue seems to disappear when we increase the loop counter inside the original benchmark.  The performances of JM+Ion are then better then JM and better than Ion alone.  This behaviour can be reproduced on the current test case by increasing the loop boundary.

Increasing the loop boundary above 10000 should cause a recompilation from JM to Ion, so the error is likely located in JM when IonMonkey is enabled.
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #1)
> Created attachment 655133 [details]
> Reuced benchmark.
> 
> Increasing the loop boundary above 10000 should cause a recompilation from
> JM to Ion, so the error is likely located in JM when IonMonkey is enabled.

Correction, the performance issue noticed here is caused by Ion->JM calls after the compilation of SceneIntersect which happen early due to usecounts which are incremented by 3, guessing at: function entrance and loop-condition, even if the content of the for loop is run only once per function call.

Based on profiling, we spend half of InvokeFunction to decide if we are calling js::RunScript the script or not — see below — knowing that the JM-compiled script takes about 1/3 of js::RunScript times to execute.

The useCount of TriangleIntersect is high enough to be inlined with the new inline rules for small bytecodes, however the type oracle refuse the compilation because arguments have to be guarded with type barriers.

Adding arguments type barrier when we are inlining should remove the Ion->JM calls and thus improve the performances on this micro-benchmark.  This problem also seems to appear in the original benchmark, with 7 already hot script and 6 not-so-hot scripts (among all inlining refused due to the argument type barriers).  Knowing the impact of Ion->JM call performances and the runtime of small sunspider benchmarks, this is likely a cause of slow down.


====
With a loop limit at 5000 instead of 10000, only SceneIntersect is compiled and:

5000   10000  loop boundary.

25.04% 38.39% EnterIon
 `  1.43%  3.23% PR_SetThreadPrivate [1]
21.66% 30.74% C++ --> Ion
21.02% 29.30% SceneIntersect (Ion)
20.28% 24.25% VM Wrapper
19.99% 23.91% js::ion::InvokeFunction
       …
10.13% 12.40% js::RunScript
 `  1.11%  1.86% js::ion::CanEnter [2]
       …
 3.07%  3.59% C++ --> JM
 2.69%  3.14% TriangleIntersect (JM)
Attachment #657467 - Flags: review?(dvander)