[Sharedstubs] Re-use baseline caches in ionmonkey.

RESOLVED FIXED

Status

()

RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: h4writer, Assigned: h4writer)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

4 years ago
Ionmonkey is mainly based around creating the fastest code as possible. To do this we take type information, baseline cache information and try to emit the best possible MIR/LIR/assembly given that case. Now it is impossible to write the best code for every combination of "MIR * TI * baseline cache information" (And if it was possible it wouldn't be cost-effective). That's why for combinations that don't occur a lot, we do the easiest but slowest solution: just do a vmcall.

The most important/popular combinations currently have their fastpaths and as a result ionmonkey is quite fast. The issue is when something pollutes the combination a little bit and as a result we don't have a fastpath anymore. Here we have a performance cliff. This is still a major problem in ionmonkey.

E.g.
1) a (int) + b (int) => fastpath
2) a (string) + b (string) => fastpath
3) a (int, string) + b (int, string) => slow (3.8 slower than (1) and 1.5 slower than (2) )

For the most important ones (GETPROP etc), we already created caches, to dynamic add checks in ionmonkey without having to specialize during compilation. But for most opcodes we don't have that. Though baseline already has IC's for all opcodes.

So the idea is to generalize the baseline caches, so they (the code) can get reused in IonMonkey to also create caches for all opcodes. (Though it could be possible we don't enable them all. Sometimes a vmcall could be faster. Need to have numbers for that ...). But in the general case having a fallback IC instead of vmcall should be faster.
(Assignee)

Comment 1

4 years ago
Posted patch WIPSplinter Review
An outdated WIP that shows how the final system should look.
Assignee: nobody → hv1989
(Assignee)

Updated

4 years ago
Depends on: 1168750
(Assignee)

Updated

4 years ago
Depends on: 1168753, 1168756, 1168757
(Assignee)

Updated

4 years ago
Depends on: 1169213, 1169214
(Assignee)

Updated

4 years ago
Depends on: 1171945
(Assignee)

Updated

4 years ago
Depends on: 1175976
(Assignee)

Updated

4 years ago
Depends on: 1176288
(Assignee)

Comment 2

4 years ago
Status update:
- behind the pref: --ion-shared-stubs=on
- only works on x86/x64, code for arm needs to get added before trying to enable in nightly
(Assignee)

Updated

4 years ago
Summary: Re-use baseline caches in ionmonkey. → [Sharedstubs] Re-use baseline caches in ionmonkey.
(Assignee)

Updated

4 years ago
Depends on: 1197604
(Assignee)

Updated

4 years ago
Depends on: 1200560
(Assignee)

Updated

4 years ago
Depends on: 1201810
(Assignee)

Updated

4 years ago
Depends on: 1206051
(Assignee)

Updated

4 years ago
Depends on: 1206066
(Assignee)

Updated

4 years ago
Depends on: 1214508
(Assignee)

Updated

3 years ago
Depends on: 1233343

Comment 3

3 years ago
Please add "PERF" key word.
(Assignee)

Updated

3 years ago
Depends on: 1263609, 1241088
(Assignee)

Comment 4

3 years ago
We currently support jsop_binary_arith, jsop_getprop, jsop_pow, jsop_newarray and jsop_newobject, jsop_compare, jsop_bitwise_arith.

In those cases if we fallback to the slowest path (vm call), we can now use a shared stub and be approx. 2x faster.

VM calls:
h4writer@h4writer-ThinkPad-W530:~/Build/octane2.0$ JIT_OPTION_forceInlineCaches=true JIT_OPTION_disableSharedStubs=true js run.js
Richards: 2824
DeltaBlue: 3930
Crypto: 517
RayTrace: 6988
EarleyBoyer: 8305
RegExp: 1069
Splay: 1895
SplayLatency: 7526
NavierStokes: 544
PdfJS: 3905
Mandreel: 822
MandreelLatency: 3626
Gameboy: 6102
CodeLoad: 12366
Box2D: 3293
zlib: 65614
Typescript: 12534
----
Score (version 9): 3780

Shared Stubs:
h4writer@h4writer-ThinkPad-W530:~/Build/octane2.0$ JIT_OPTION_forceInlineCaches=true js run.js
Richards: 5023
DeltaBlue: 5931
Crypto: 1263
RayTrace: 11470
EarleyBoyer: 10140
RegExp: 1407
Splay: 2548
SplayLatency: 9810
NavierStokes: 1099
PdfJS: 5651
Mandreel: 1628
MandreelLatency: 6719
Gameboy: 10289
CodeLoad: 11938
Box2D: 5876
zlib: 67231
Typescript: 14264
----
Score (version 9): 5708
(Assignee)

Comment 5

3 years ago
Baseline compiler:
h4writer@h4writer-ThinkPad-W530:~/Build/octane2.0$ js --no-ion run.js
Richards: 1297
DeltaBlue: 1092
Crypto: 2164
RayTrace: 535
EarleyBoyer: 2160
RegExp: 595
Splay: 2983
SplayLatency: 8430
NavierStokes: 3014
PdfJS: 6142
Mandreel: 1466
MandreelLatency: 5129
Gameboy: 8785
CodeLoad: 12607
Box2D: 3683
zlib: 63199
Typescript: 6091
----
Score (version 9): 3413
(Assignee)

Updated

3 years ago
Depends on: 1266092
(Assignee)

Comment 6

3 years ago
I'll call this finished. This works nicely and more effort should go into CacheIR, which is better for the long term.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.