ppc64le JIT
Categories
(Core :: JavaScript Engine: JIT, enhancement, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox120 | --- | affected |
People
(Reporter: spectre, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(3 files)
37.88 KB,
patch
|
rhunt
:
feedback+
|
Details | Diff | Splinter Review |
102.79 KB,
patch
|
jandem
:
feedback+
|
Details | Diff | Splinter Review |
11.57 KB,
patch
|
jandem
:
feedback+
|
Details | Diff | Splinter Review |
As requested in https://discourse.mozilla.org/t/spidermonkey-jit-for-power-isa-ppc64le/122930/3 .
This is the first draft of a JIT to support little-endian 64-bit Power ISA in Firefox. It is currently compatible with POWER9 OpenPOWER systems and passes the full test suite on Linux. The two pieces left are support for POWER8 and enabling the optimizing Wasm JIT (it uses Baseline right now), but it’s good enough to ship and testers are already evaluating it, and we can get those other pieces completed assuming it sticks. This is a descendant of the TenFourFox JIT for 32-bit PowerPC, upgraded to 64-bit and using the true little-endian mode supported on POWER8 and later.
Architecturally, ppc64le’s main differences from x64 are that it uses a link register, it has up to eight independent condition codes plus other condition codes in special purpose registers, it does not have strict store ordering, and it is a fixed-length instruction word (even Power10, which has “double” instructions, still aligns to a 32-bit word). Like MIPS, there is a limited displacement for branches and offsets.
ppc64le’s main differences from ARM64 is that it has up to eight independent condition codes plus other condition codes in special purpose registers, and that the link register is not a GPR (it’s a special purpose register), which requires it to be transferred and pushed separately.
Additionally, the standard ABI for Power ISA requires a linkage area of no less than 32 bytes on the top of the stack for the callee to store the link register, condition register, stack pointer and TOC register. This is dealt with in the port by pulling down dummy frames as necessary at ABI-Ion and ABI-Wasm edges (otherwise things like the Wasm Frame would get walloped, since it sits on top of the stack).
As requested, I'm posting the diffs against js/ for feedback and to determine if there is interest in accepting the port into the tree. If it is, I will pull up all changes to tip and formally request review.
Reporter | ||
Comment 1•11 months ago
|
||
Changes against wasm/. These are mostly the same as other platforms. There are a couple specific tweaks for the ABI.
Reporter | ||
Comment 2•11 months ago
|
||
Again, largely standard and the same as other ports, except for a couple ABI and architecture-specific tweaks.
Reporter | ||
Comment 3•11 months ago
|
||
These are the miscellaneous patches. The only vaguely notable one is that irregexp/ isn't really ABI compliant and needs a couple hacks due to its assumptions about the link register.
Updated•11 months ago
|
for the record
- I am using the JIT-enabled Firefox 115 ESR daily on my workstation for over a month now and no issues
- we (Fedora / Red Hat) are running a CI with daily rebuilds for ppc64le (and s390x, aarch64, ...) so any build issues are detected early
Updated•11 months ago
|
Comment 5•11 months ago
|
||
Additionally, the standard ABI for Power ISA requires a linkage area of no less than 32 bytes on the top of the stack for the callee to store the link register, condition register, stack pointer and TOC register. This is dealt with in the port by pulling down dummy frames as necessary at ABI-Ion and ABI-Wasm edges (otherwise things like the Wasm Frame would get walloped, since it sits on top of the stack).
Could you elaborate on this a bit? Where are you inserting this dummy frame? When calling into wasm or exiting from wasm to native/JIT code? I looked through your wasm/
diff and didn't see anything here (but may have missed something obvious).
Reporter | ||
Comment 6•11 months ago
|
||
(In reply to Ryan Hunt [:rhunt] from comment #5)
Could you elaborate on this a bit? Where are you inserting this dummy frame? When calling into wasm or exiting from wasm to native/JIT code? I looked through your
wasm/
diff and didn't see anything here (but may have missed something obvious).
Most of that I've subsumed into the Power architecture-specific MacroAssembler
and CodeGenerator
. Calls to wasm::SymbolicAddress
in particular need that stack adjustment but I didn't need to make any change within js/src/wasm
to implement it. ResumeFromException
in JitFrames.h
does need a dummy linkage area baked into the struct since it's built on stack, but that's in js/src/jit
.
Comment 7•11 months ago
|
||
Okay, if the ABI changes can be limited to calls in/out of wasm in such a manner that the internal wasm abi doesn't need to change then this looks good to me.
Updated•11 months ago
|
Reporter | ||
Comment 8•11 months ago
|
||
Much obliged, thank you.
Updated•11 months ago
|
Comment 9•11 months ago
|
||
Reporter | ||
Comment 10•11 months ago
|
||
Thanks for the f+ on the other one.
(In reply to Jan de Mooij [:jandem] from comment #9)
I'm not sure I understand this part. Isn't it possible to define float32 vs
double registers like what we do on x86 for example?
Do you mean determine from the ISA side if there's a float in the register, or do something like what x86's PushRegsInMask
does by querying each FloatRegister
's size before store*()
-ing it? If you mean the latter, I suppose that could work, though we would run the risk of an unaligned stack on the other end.
As a counterargument, I note that RISC-V does basically the same as I'm doing, although for the record I don't know why it doesn't need the same change: without this only the lower half got properly loaded, which made the FPR gibberish.
Comment 11•11 months ago
|
||
(In reply to Cameron Kaiser [:spectre] from comment #10)
Do you mean determine from the ISA side if there's a float in the register, or do something like what x86's
PushRegsInMask
does by querying eachFloatRegister
's size beforestore*()
-ing it? If you mean the latter, I suppose that could work, though we would run the risk of an unaligned stack on the other end.As a counterargument, I note that RISC-V does basically the same as I'm doing, although for the record I don't know why it doesn't need the same change: without this only the lower half got properly loaded, which made the FPR gibberish.
Trying to understand how this works: the code we have in tree loads a (4-byte) float if we have a float32 register or an (8-byte) double if it's a double register. If we change this to always load a double on PPC64, where does the conversion from float32 (what's in the register at some point) to double happen?
I'm also asking because this code doesn't have great test coverage from what I remember. It's probably worth checking you hit this path on jit-tests.
Reporter | ||
Comment 12•11 months ago
|
||
Power ISA floating point registers are always doubles internally, so merely loading a 32-bit single precision value will autopromote it to double. You can always round to single precision either with an explicit rounding instruction or the set of single precision math instructions, but the register is still a double (just within single precision range). Actual conversion to a 32-bit single occurs when written to memory with specific instructions.
It does hit this path for us, because I had to make this change to get the test suite to pass (ISTR it was one of the Wasm tests).
Comment 13•11 months ago
|
||
Comment on attachment 9359678 [details] [diff] [review]
Diff against js/src/jit
Ah I see. This makes sense then :)
Reporter | ||
Comment 14•11 months ago
|
||
Many thanks. I will pull these up to tip. Do you (both) want to review those first before the new files?
Comment 15•10 months ago
|
||
(In reply to Cameron Kaiser [:spectre] from comment #14)
Many thanks. I will pull these up to tip. Do you (both) want to review those first before the new files?
It would be best to post everything as a single patch stack, with different patches for the new files and the parts you have here.
Comment 16•4 months ago
|
||
The bug assignee is inactive on Bugzilla, so the assignee is being reset.
Description
•