186.37 KB, patch
|Details | Diff | Splinter Review|
32.13 KB, patch
|Details | Diff | Splinter Review|
10.95 KB, patch
|Details | Diff | Splinter Review|
Two things I forgot to mention: - 4x on SunSpider; v8-v6 goes from ~200 to ~2000 (10x). Haven't run Kraken on it recently. - The GPR<->FPR moves and zeroDouble make assumptions about the linkage area that there is a double-size block of space nominally reserved for the compiler at r1(16), so those should be marked ABI dependent. I'll do that with any requested changes.
Comment on attachment 601168 [details] [diff] [review] Part 1 (new files): Macro assembler, assembler, OS X trampoline Review of attachment 601168 [details] [diff] [review]: ----------------------------------------------------------------- Wow, great work! The speedups are right inline with what we got for interpreter vs. JM+TI on x86--maybe even a bit better on v8--I don't recall the exact numbers.
(In reply to Cameron Kaiser from comment #0) > I'm not sure if you want the changes to JS in the tree because it is for OS > X/ppc, so this first patch does NOT include: > - Changes to methodjit for our veneer, for MonoIC calls (to add a faux > argument space) and for repatching. > - Changes to the Makefile. Those changes don't sound too extensive, so I'd like to take them. > - Changes to YarrJIT. We have both the regular changes and Ben also cooked > up a 8% win by having YARR use an extra register to track position. This > should also benefit register-rich big RISC systems like SPARC and MIPS, and > might also be useful for ARM. I can file a separate bug for this last part. I try to keep our version of Yarr as close as possible to upstream [*] in order to make it easy to backport their bugfixes to our tree. If you'd like to get the Yarr stuff in our tree, could you file a bug with WebKit first, and try to get them to take it? [*] We don't refresh Yarr from upstream very frequently, so it's more accurate to say "close as possible to *some* upstream rev".
Okay, I'll keep the position register code out and file it upstream. We do have some otherwise vanilla changes to YarrJIT just to get PPC regex JIT to work, though; I'll put that in a separate patch just in case. I'm merging our changes against mozilla-central and will post Part 2 presently (thanks for the quick r+ on part 1). I'll also post a "for checkin" Part 1 with the tweak to correctly flag the GPR<->FPR moves as ABI-dependent.
I'm going to retract part 1 because I have a terrible performance regression between our ESR build and mozilla-central. I did a partial bisect and have this narrowed down to 10->11. The numbers I'm getting on v8-v6 look like this: # 11 interpreter only baseline ../../../obj-ff-dbg/dist/TenFourFox.app/Contents/MacOS/js -f run.js Richards: 127 DeltaBlue: 124 Crypto: 118 RayTrace: 196 EarleyBoyer: 254 RegExp: 247 Splay: 614 ---- Score (version 6): 203 # score in 10.0.3pre /Applications/TenFourFoxG5.app/Contents/MacOS/js -m -n -f run.js Richards: 3096 DeltaBlue: 2521 Crypto: 3114 RayTrace: 1222 EarleyBoyer: 2924 RegExp: 505 Splay: 3037 ---- Score (version 6): 2012 # score in 11 opt G5 build test ../../../obj-ff-dbg/dist/TenFourFox.app/Contents/MacOS/js -m -n -f run.js Richards: 820 DeltaBlue: 1082 Crypto: 125 RayTrace: 1140 EarleyBoyer: 1492 RegExp: 361 Splay: 1661 ---- Score (version 6): 732 With the exception of RayTrace, virtually all the benchmarks suffer badly. Crypto doesn't show any improvement at all. I'm going to bisect further, but do these sound like a particular change or set of changes that may have affected runtime?
(Just thinking out loud: could the write barriers for incremental GC cause changes like this?)
Kindly disregard the spam, it turned out that a build system change between 10 and 11 made our custom settings in the mozconfig not stick (G5 does very badly if not tuned properly). However, I will have a revised part 1 anyway.
Created attachment 602548 [details] [diff] [review] Part 1 (new files): Macroassembler, assembler, trampoline (v2) Revised part 1. The only difference is that I added a standard define for a free zone in the linkage area and made it ABI dependent, and fixed up lea(BaseIndex, RegisterID). I don't know if you need to re-review it.
Created attachment 602549 [details] [diff] [review] Part 2: JS build system changes (except YarrJIT) This patch contains all the needed patches to methodjit/, assembler/jit/, and configure.in. YarrJIT will be under separate cover.
Comment on attachment 602549 [details] [diff] [review] Part 2: JS build system changes (except YarrJIT) Review of attachment 602549 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/methodjit/MachineRegs.h @@ +372,5 @@ > #elif defined(JS_CPU_MIPS) > return 4; > +#elif defined(JS_CPU_PPC_OSX) > + // Yup, we're awesome. > + return 8; :-)
Created attachment 602556 [details] [diff] [review] Part 3: YarrJIT and Makefile.in This patch I'm not 100% sure on -- please double check my changes to Makefile.in, since I'm not sure I wrote it in the most efficient manner (our Makefile obviously is hardcoded to our own configurations). The changes to YarrJIT, besides the prologue and epilogue, are to keep stack pointer-relative addressing out of the linkage area (as written, it stomps all over it, wrecking the PPC frame). I'll get the position tracking register changes accepted upstream with these changes also, but this will get it working here.
(and thanks very much for the fast r+ on 1/2 :)
Created attachment 602571 [details] [diff] [review] Part 3: YarrJIT.cpp and Makefile.in (v2) This looks better.
Created attachment 602616 [details] [diff] [review] Part 3: YarrJIT.cpp, Makefile.in, MacroAssembler.h (v3) It would be even better with the MacroAssembler.h changes.
Comment on attachment 602616 [details] [diff] [review] Part 3: YarrJIT.cpp, Makefile.in, MacroAssembler.h (v3) Review of attachment 602616 [details] [diff] [review]: ----------------------------------------------------------------- The Makefile change looks OK. I like that the changes to Yarr are non-instrusive, so I'm not worried about taking them into our tree even if they don't make it upstream. But can you create some constants for the magic numbers just for style points? You also might want to add some comments to explain a bit more about what problems you're solving with the PPC-specific code, but I guess you're probably the major maintainer of that code, so it's up to you.
I can do that. I'll work the magic constants into PPCAssembler.h, since that is probably the logical place.
I'm also going to have a revised part 1 that streamlines G3/G4 square root and improves V8 performance by 2% on G5/big POWER by rescheduling the Veneer. I'll rebase everything against the tree after bug 691898.
We're significantly revising our code to prepare for IonMonkey, particularly with regards to strict ABI compliance so that the stack is in a sane state at all times. I'm still having some trouble avoiding tricks to get typed arrays to work, so this is not ready yet. In any case, the current set of patches are now all obsolete.
(In reply to Cameron Kaiser from comment #18) > We're significantly revising our code to prepare for IonMonkey, particularly > with regards to strict ABI compliance so that the stack is in a sane state > at all times. I'm still having some trouble avoiding tricks to get typed > arrays to work, so this is not ready yet. In any case, the current set of > patches are now all obsolete. Cool! As you probably have already found out, IonMonkey uses a new MacroAssembler interface. It is possible to build this on top of either the existing MacroAssembler, or by writing an entirely new one on top of PPCAssembler (this is what we did for x86/x64). The biggest changes are around branching (our "Label" is the inverse of the old "Jump") and type safety of Register IDs. Our long-term plan is to replace JM with a new baseline compiler (bug 805241), and that is using the same MacroAssembler interface as IonMonkey. With any luck that'll mostly work for you without significant changes.
> It is possible to build this on top of either the existing MacroAssembler, That would certainly be the easiest way to bootstrap it, but how? Is the ARM version built that way? We had the advantage of being able to crib off the SPARC JaegerMonkey when writing that, but I think we're the trailblazers here.
Updating the description to reflect that this bug has morphed from being about JM+TI about IM+TI.