IonMonkey & B2G: Ion code is compiled but not run, cause 5 times slow-down on deltablue.

RESOLVED FIXED

Status

()

RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: nbp, Assigned: nbp)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments)

(Assignee)

Description

6 years ago
Created attachment 729763 [details]
debug B2G browser, IONFLAGS=scripts,bailouts

DeltaBlue results on the B2G browser does not match the improvement noticed on Are We Fast Yet when switching from JM to Ion, as reported in Bug 790092.  This improvement is noticeable in the JS shell issued from the same build and ran on the same device when b2g process is stopped.

Running the profiler does not report any useful symbols under JS frames.

No GC seems to be the cause of this slowdown:
* Label for GC stored on the SPS stack, and that I haven't seen any profile containing a GC under a JSScript execution.

* MOZ_GCTIMER=/dev/stdout does not report anything

* Increasing the GC threshold to 150 MB does not seems to improve the stand-alone deltablue benchmark. (by changing the pref in /system/b2g/defaults/pref/user.js)

The only thing which seems weird is in the attached profiled where the usecount of some function getting compiled in Ion are like ~19000 instead of 10240.  Note that this functions don't seems to be the hottest functions in the profile.

The hottest functions being: (incomplete profile)
18.5%  Plan.prototype.execute
12.0%  EqualityConstraint.prototype.execute
 9.4%  Planner.prototype.addPropagate
(Assignee)

Updated

6 years ago
Assignee: general → nicolas.b.pierron
Status: NEW → ASSIGNED
(Assignee)

Comment 1

6 years ago
I tried to monitor different things:

* Printing the usecount at the end of the compilation shows that b2g (both shell & browser) are not using parallel compilation.

* Adding a print inside the path which is calling into the Interrupt check does not print anything during the time in which benchmark is running.

* Compilation & Invalidation spew differs between the browser and the shell.  One function deltablue.js:526 (aka. EqualityConstraint.prototype.execute) is compiled and invalidated in the browser, while it is not even compiled at all in the shell  [This might be it]

My blind guess is that browser is somehow inlining functions but when ran under the browser it is still calling into the function instead of using the inlined version.  I still have to check this hypothesis.

* Browser:
[Scripts] Analyzing script ./deltablue.js:776 (0x44dcd780) (usecount=10242) (maxloopcount=85)
[Scripts] Inlining script ./deltablue.js:768 (0x44dcd670)
[Scripts] Inlining script ./deltablue.js:71 (0x44dcb560)
[Scripts] Inlining script ./deltablue.js:772 (0x44dcd6f8)
[Scripts] Inlining script ./deltablue.js:67 (0x44dcb4d8)
[Scripts] Inlining script ./deltablue.js:327 (0x44dcc450)
[Scripts] Inlining script ./deltablue.js:526 (0x44dcce68)
[Scripts] Inlining script ./deltablue.js:414 (0x44dcc808)
[Scripts] Inlining script ./deltablue.js:407 (0x44dcc780)
[Scripts] Inlining script ./deltablue.js:302 (0x44dcc2b8)
[Scripts] Ending compilation script ./deltablue.js:776 (usecount=10242) (maxloopcount=85) success == 1
[Scripts] Analyzing script ./deltablue.js:526 (0x44dcce68) (usecount=10240) (maxloopcount=0)
[Scripts] Inlining script ./deltablue.js:414 (0x44dcc808)
[Scripts] Inlining script ./deltablue.js:407 (0x44dcc780)
[Scripts] Ending compilation script ./deltablue.js:526 (usecount=10240) (maxloopcount=0) success == 1
[Scripts] Analyzing script ./deltablue.js:768 (0x44dcd670) (usecount=10240) (maxloopcount=0)
[Scripts] Inlining script ./deltablue.js:71 (0x44dcb560)
[Scripts] Ending compilation script ./deltablue.js:768 (usecount=10240) (maxloopcount=0) success == 1
[Invalidate] Start invalidation.
[Invalidate]  No IonScript invalidation.
[Invalidate] Start invalidation.
[Invalidate]  Invalidate ./deltablue.js:776, IonScript 0x45356c00

* Shell:
[Scripts] Analyzing script ./deltablue.js:776 (0x40f3c4d8) (usecount=10242) (maxloopcount=85)
[Scripts] Inlining script ./deltablue.js:768 (0x40f3c3c8)
[Scripts] Inlining script ./deltablue.js:71 (0x40f392b8)
[Scripts] Inlining script ./deltablue.js:772 (0x40f3c450)
[Scripts] Inlining script ./deltablue.js:67 (0x40f39230)
[Scripts] Inlining script ./deltablue.js:327 (0x40f3b1a8)
[Scripts] Inlining script ./deltablue.js:526 (0x40f3bbc0)
[Scripts] Inlining script ./deltablue.js:414 (0x40f3b560)
[Scripts] Inlining script ./deltablue.js:407 (0x40f3b4d8)
[Scripts] Inlining script ./deltablue.js:302 (0x40f3b010)
[Scripts] Ending compilation script ./deltablue.js:776 (usecount=10242) (maxloopcount=85) success == 1
[Invalidate] Start invalidation.
[Invalidate]  No IonScript invalidation.
[Invalidate] Start invalidation.
[Invalidate]  Invalidate ./deltablue.js:776, IonScript 0x4059a800


Note: Strangely, the debug version of the shell is 2 times slower than the debug version of the browser … *@!?  But I don't think I should care about this one, and hope I am would be fixing the right thing.
Summary: [deltablue] IonMonkey & B2G: shell 4 times faster than the b2g browser (on unagi) → [deltablue] IonMonkey & B2G: shell 2 times faster than the b2g browser (on unagi)
(Assignee)

Comment 2

6 years ago
Ok, it seems that the shell does not use parallel compilation by default and that the browser use it by default except that there is no helper thread for it.

The difference reported in comment 1 seems to be caused by the fact that JM does not expect the compilation to finish before resuming JM execution and skip the check of the result which prevent the entry into the Ion code, and thus the code keeps running into JM until it got invalidated.

One other issue which appear here is that while we are running in JM, we might spawn extra compilations of inner functions which might be inlined in the outer function.  Compiling an inlined function seems like a small waste of time, where we could have compiled a non-inlined inner function.
(Assignee)

Comment 3

6 years ago
Created attachment 730481 [details] [diff] [review]
JäegerMonkey: Check if there is an helper thread for Ion compilation.

So far just adding the change in mjit::Compiler::ionCompileHelper() proved to be a huge improvement on deltablue (~ x5.2) and on crypto (~ x2). [on a unagi with 256MB]

This patch also change different locations which are using js_IonOptions.parallelCompilation, I still have to evaluate these changes but this sounds like something that should be done to be consistent.  I will check the performances on a unagi with 256MB of RAM.

This improvement should be visible on all devices with only one core running with JM & IonMonkey enabled.
Attachment #730481 - Flags: review?(dvander)
(Assignee)

Comment 4

6 years ago
Comment on attachment 730481 [details] [diff] [review]
JäegerMonkey: Check if there is an helper thread for Ion compilation.

Review of attachment 730481 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/methodjit/StubCalls.cpp
@@ +771,1 @@
>  }

nit to my-self: Add #include "jsworkers.h" in the header of this file.
Comment on attachment 730481 [details] [diff] [review]
JäegerMonkey: Check if there is an helper thread for Ion compilation.

Good catch.
Attachment #730481 - Flags: review?(dvander) → review+
(Assignee)

Comment 6

6 years ago
Created attachment 732496 [details] [diff] [review]
Reproduce on desktop: Virtually limit the number of core available.

Apparently the previous patch does not seems to be enough anymore. I am able to reproduce the same performance issue on desktop by changing GetCPUCount to always return 1.
(Assignee)

Comment 7

6 years ago
Checking with attachment 732496 [details] [diff] [review] to reproduce this particular Bug on desktop before the patch and after the patch show that the patch is effective at fixing the issue which is related to parallel compilation, and that the new issue might be coming from somewhere else, with a strangely similar out-come.

1-cpu benchmark before attachment 730481 [details] [diff] [review]:
Richards: 12942
DeltaBlue: 4967
Crypto: 8580
RayTrace: 5709
EarleyBoyer: 11980
RegExp: 3000
Splay: 11743
NavierStokes: 22615
PdfJS: 9348
Mandreel: 5421
Gameboy: 13284
CodeLoad: 12714
Box2D: 9304
----
Score: 8958

1-cpu benchmark after attachment 730481 [details] [diff] [review]:
Richards: 12793
DeltaBlue: 12972
Crypto: 16409
RayTrace: 13380
EarleyBoyer: 15600
RegExp: 2959
Splay: 13438
NavierStokes: 23649
PdfJS: 9763
Mandreel: 11887
Gameboy: 10072
CodeLoad: 12862
Box2D: 9700
----
Score: 11711

Based on this conclusion, I will land attachment 730481 [details] [diff] [review] and mark this bug as [leave open] for more investigation.
Whiteboard: [leave open]
(Assignee)

Comment 8

6 years ago
Comment on attachment 730481 [details] [diff] [review]
JäegerMonkey: Check if there is an helper thread for Ion compilation.

https://hg.mozilla.org/integration/mozilla-inbound/rev/12ec336988e5
(Assignee)

Comment 10

6 years ago
Created attachment 733074 [details]
octane deltablue log -- IONFLAGS=scripts,aborts,osi,bailouts b2g.sh

(In reply to Nicolas B. Pierron [:nbp] from comment #6)
> Apparently the previous patch does not seems to be enough anymore.

Now, that the parallel compilation is fixed, I hit GC issues, where I got 3 invalidating GC in deltablue.
(Assignee)

Comment 11

6 years ago
I will mark this bug as closed and open a new Bug related to the GC issue once I got time to investigate more on it.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Summary: [deltablue] IonMonkey & B2G: shell 2 times faster than the b2g browser (on unagi) → IonMonkey & B2G: Ion code is compiled but not run, cause 5 times slow-down on deltablue.
Whiteboard: [leave open]
(Assignee)

Updated

6 years ago
Blocks: 863398
You need to log in before you can comment on or make changes to this bug.