Ionmonkey is mainly based around creating the fastest code as possible. To do this we take type information, baseline cache information and try to emit the best possible MIR/LIR/assembly given that case. Now it is impossible to write the best code for every combination of "MIR * TI * baseline cache information" (And if it was possible it wouldn't be cost-effective). That's why for combinations that don't occur a lot, we do the easiest but slowest solution: just do a vmcall. The most important/popular combinations currently have their fastpaths and as a result ionmonkey is quite fast. The issue is when something pollutes the combination a little bit and as a result we don't have a fastpath anymore. Here we have a performance cliff. This is still a major problem in ionmonkey. E.g. 1) a (int) + b (int) => fastpath 2) a (string) + b (string) => fastpath 3) a (int, string) + b (int, string) => slow (3.8 slower than (1) and 1.5 slower than (2) ) For the most important ones (GETPROP etc), we already created caches, to dynamic add checks in ionmonkey without having to specialize during compilation. But for most opcodes we don't have that. Though baseline already has IC's for all opcodes. So the idea is to generalize the baseline caches, so they (the code) can get reused in IonMonkey to also create caches for all opcodes. (Though it could be possible we don't enable them all. Sometimes a vmcall could be faster. Need to have numbers for that ...). But in the general case having a fallback IC instead of vmcall should be faster.
An outdated WIP that shows how the final system should look.
Assignee: nobody → hv1989
Status update: - behind the pref: --ion-shared-stubs=on - only works on x86/x64, code for arm needs to get added before trying to enable in nightly
Summary: Re-use baseline caches in ionmonkey. → [Sharedstubs] Re-use baseline caches in ionmonkey.
Please add "PERF" key word.
We currently support jsop_binary_arith, jsop_getprop, jsop_pow, jsop_newarray and jsop_newobject, jsop_compare, jsop_bitwise_arith. In those cases if we fallback to the slowest path (vm call), we can now use a shared stub and be approx. 2x faster. VM calls: h4writer@h4writer-ThinkPad-W530:~/Build/octane2.0$ JIT_OPTION_forceInlineCaches=true JIT_OPTION_disableSharedStubs=true js run.js Richards: 2824 DeltaBlue: 3930 Crypto: 517 RayTrace: 6988 EarleyBoyer: 8305 RegExp: 1069 Splay: 1895 SplayLatency: 7526 NavierStokes: 544 PdfJS: 3905 Mandreel: 822 MandreelLatency: 3626 Gameboy: 6102 CodeLoad: 12366 Box2D: 3293 zlib: 65614 Typescript: 12534 ---- Score (version 9): 3780 Shared Stubs: h4writer@h4writer-ThinkPad-W530:~/Build/octane2.0$ JIT_OPTION_forceInlineCaches=true js run.js Richards: 5023 DeltaBlue: 5931 Crypto: 1263 RayTrace: 11470 EarleyBoyer: 10140 RegExp: 1407 Splay: 2548 SplayLatency: 9810 NavierStokes: 1099 PdfJS: 5651 Mandreel: 1628 MandreelLatency: 6719 Gameboy: 10289 CodeLoad: 11938 Box2D: 5876 zlib: 67231 Typescript: 14264 ---- Score (version 9): 5708
Baseline compiler: h4writer@h4writer-ThinkPad-W530:~/Build/octane2.0$ js --no-ion run.js Richards: 1297 DeltaBlue: 1092 Crypto: 2164 RayTrace: 535 EarleyBoyer: 2160 RegExp: 595 Splay: 2983 SplayLatency: 8430 NavierStokes: 3014 PdfJS: 6142 Mandreel: 1466 MandreelLatency: 5129 Gameboy: 8785 CodeLoad: 12607 Box2D: 3683 zlib: 63199 Typescript: 6091 ---- Score (version 9): 3413
I'll call this finished. This works nicely and more effort should go into CacheIR, which is better for the long term.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.