Ion for wasm on ARM64 - base functionality (was: Plan B)
Categories
(Core :: JavaScript: WebAssembly, enhancement, P1)
Tracking
()
People
(Reporter: lth, Assigned: jseward)
References
Details
Attachments
(6 files, 7 obsolete files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
150.00 KB,
application/x-tar
|
Details |
Plan B, in case we need it (to be updated).
Reporter | ||
Comment 1•4 years ago
|
||
Depends on D97477
Reporter | ||
Comment 2•4 years ago
|
||
Some experiments, need test cases + some documentation:
- generate much better code for constant-address accesses
- generate better code when storing zero
Depends on D97478
Updated•4 years ago
|
Updated•4 years ago
|
Reporter | ||
Comment 3•4 years ago
|
||
Depends on D97478
Updated•4 years ago
|
Reporter | ||
Comment 4•4 years ago
|
||
Depends on D100116
Reporter | ||
Comment 5•4 years ago
|
||
Move some SIMD code to jit/shared; copy other SIMD code to jit/arm64
as stubs. SIMD not enabled for ARM64 yet, so all tests continue to
pass with this patch.
Depends on D100654
Reporter | ||
Comment 6•4 years ago
|
||
Depends on D100655
Reporter | ||
Comment 7•4 years ago
|
||
Depends on D100656
Reporter | ||
Comment 8•4 years ago
|
||
Depends on D100657
Reporter | ||
Comment 9•4 years ago
|
||
Depends on D100658
Updated•4 years ago
|
Comment 10•4 years ago
|
||
Comment on attachment 9195178 [details]
Bug 1678097 - Make SIMD code cross-platform
Revision D100655 was moved to bug 1685984. Setting attachment 9195178 [details] to obsolete.
Reporter | ||
Comment 11•4 years ago
|
||
Every operation that updates the SP must be sure to sync the SP. Various
addToStackPtr APIs did not do this.
I'm hoping to get rid of the PSP for wasm code, but until then, this
is the best we have.
Updated•4 years ago
|
Updated•4 years ago
|
Reporter | ||
Comment 12•4 years ago
|
||
Depends on D97478
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Comment 13•4 years ago
|
||
Comment on attachment 9195179 [details]
Bug 1678097 - Enable simd for arm64 in moz.configure
Revision D100656 was moved to bug 1687629. Setting attachment 9195179 [details] to obsolete.
Comment 14•4 years ago
|
||
Comment on attachment 9195182 [details]
Bug 1678097 - ARM64 SIMD WIP
Revision D100659 was moved to bug 1687629. Setting attachment 9195182 [details] to obsolete.
Comment 15•4 years ago
|
||
Comment on attachment 9191525 [details]
Bug 1678097 - Optimize ARM64 memory accesses (sketches, not actually correct)
Revision D98857 was moved to bug 1687630. Setting attachment 9191525 [details] to obsolete.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Reporter | ||
Comment 16•4 years ago
|
||
In Phase 0, both cranelift and ion are available on arm64, and ion
is enabled by default; use --wasm-force-cranelift at the shell to
select cranelift, or set javascript.options.wasm_force_ion to false
in about:config. Phase 0 is appropriate for developers, before
the patch set lands in mozilla-central.
In Phase 1, both compilers are still available but ion is DISABLED by
default; use --wasm-force-ion at the shell to select ion, or make sure
javascript.options.wasm_force_ion is true in about:config. Phase 1 is
appropriate for fuzzing, after the patch set lands in mozilla-central
but before ion is enabled by default. The patch for Phase 1 will
appear on bug 1678097 and will be very small, and MUST land with the
patch for Phase 0.
in Phase 2, cranelift becomes disabled in moz.configure and all the
changes in the present patch are removed again. The patch for Phase 2
will appear on bug 1686626 and will revert Phase 0 + Phase 1, and
additionally update moz.configure.
Updated•4 years ago
|
Updated•4 years ago
|
Reporter | ||
Comment 17•4 years ago
|
||
Julian will take over and make sure this gets cleaned up, reviewed, and ready for landing.
Reporter | ||
Comment 18•4 years ago
•
|
||
Some notes about what I belive remains to be done here, in patch queue order:
Enablement level 0: This is probably clean, but note it disables arm64 SIMD. It needs to be reviewed, but should be reviewed along with the Level 1 patch.
SP/PSP: This needs major work, but possibly not much code. The patch has two aspects at the moment: misc bug fixes for stack related things for arm64, and then misc fixes for dealing with the SP/PSP thing. There's a big comment block in this patch that is the initial attempt at describing global invariants for SP/PSP, but it's not done yet, and there are things I know I don't understand. The SP/PSP affects both JS (baseline+ion) and wasm (ion), as well as irregexp, though not Cranelift or Wasm baseline. (Wasm baseline must be aware of the PSP: It must set up the PSP when making a call, since the callee may be Ion code that depends on the PSP being set.) The PSP code appears brittle. There are syncStackPtr calls where they don't appear to make sense to me, and no such calls where I thought they were needed, so probably I didn't understand the invariants. The first work item here is therefore to understand system-wide invariants and document them and make sure everyone agrees about what they are. The second is then to clean up the patch, get it reviewed.
ARM64 code generation: This is "done" but nbp left many comments (which I have not yet read) and requested re-review, so all that needs to be done.
Add vector registers: This is almost done but (a) I don't quite trust it, because if I enable simd on arm64 and run jit-tests then this patch makes non-simd tests fail, and (b) Yury reported some SIMD-related errors that might be related to the register sets introduced here or might be bailout logic.
There is also more work to be done in the vector register patch: the baseline compiler has a hack to use SIMD without SIMD in the register sets, for reasons described there (search for RABALDR_SIDEALLOC_V128). That hack should be disabled, and normal SIMD register sets should be used. But in addition there's code in arm64/*.cpp (PushRegsInMaskForWasmStubs, PopRegsInMaskForWasmStubs, GetPushSizeInBytesForWasmStubs) related to that hack, and those should be removed, and the regular PushRegsInMask/PopRegsInMask/GetPushSize code should be used instead. This may take some debugging. It's possible all this work should be a separate patch, for ease of review.
Enablement level 1: This adds some code so that ion and cranelift are both present and cranelift is the default while ion can be selected by a command line switch in the shell, this is for fuzzing. This patch must land when the level 0 patch lands. I don't know if this patch is quite baked yet, time will tell.
Reporter | ||
Comment 19•4 years ago
•
|
||
And two final notes:
It would have been interesting to try to avoid the PSP altogether for Ion wasm code. It's not clear why it's needed. There's one MIR/LIR node used by Wasm (visitWasmBuiltinModD) that performs an explicit push; any other push/pop logic is hidden in the masm. Also, getting rid of the PSP probably means making the SP a "normal" register in order for the MoveResolver play along, which means getting rid of JS_HIDDEN_SP, see next. It would for sure be interesting to look into this, but it looks like it could be a lot of work and I would prefer for us just to understand how the PSP is supposed to work and make the code work with it.
Since the ARM64 SP is weird and the SP has a code (63 in vixl) that does not fit in the register set, there's a hack in the JIT, JS_HIDDEN_SP, that is enabled on ARM64, that treats the SP differently and introduces some constraints (that are checked by the type system). I have been toying with removing this restriction, now that we have larger register sets and the SP could have code 32, say, and I have a patch tucked away for that, but there are lots of assumptions in the systems about things not going beyond 32 bits and I never got this fully working. So I think this is a useful exploration but not on the critical path for Ion-on-ARM64, and it does not address the SP/PSP problem either.
Assignee | ||
Comment 20•4 years ago
|
||
Here are some observations, for the current SP-vs-PSP situation. I don't claim these always hold, but I suspect this is pretty close:
-
PSP is "primary", SP is "secondary"
- Most stack refs are PSP-relative. SP-relative is rare and (obviously) only done when we know that SP is aligned.
- Most adjustments (pushes, pops, etc) are done to PSP first.
-
The intent is to keep them the same at (almost) all times
- After a change to PSP, SP is updated (the common case)
- After a change to SP, PSP is updated (the rare case)
- The update always happens within the same basic block
- Hence SP == PSP at all branches, calls and returns
-
(Relating to the system ABI requirement that there are no accesses below SP):
- All PSP-relative accesses are to zero or positive offsets (observed)
- If PSP < SP (very temporary; until resync in the same BB) then no PSP-relative accesses are allowed
Reasoning about this when reading the compiler/assembler sources is made more difficult by the fact that class MacroAssembler
for arm64 has a stack pointer reg that can be changed as the compiler runs, via Get/SetStackPointer64()
. Hence the intent of various compiler code fragments that use the assembler becomes conditional on what the current setting is. It's hard to believe that they are really agnostic about whether they generate SP- or PSP-relative stack-manipulation code. So this might be a feature worth removing. That would require duplicating some functions (Push
, Pop
etc).
Reporter | ||
Comment 21•4 years ago
|
||
It's important to note that SetStackPointer64 is used in very controlled ways. Specifically:
- the vixl MacroAssembler constructor in MacroAssembler-vixl.cpp sets the sp_ to be the PSP, so "use SP+PSP" is the default on all arm64 compilations
- however the moz MacroAssembler for wasm (end of MacroAssembler.cpp) changes this since, until Ion, all wasm code required that we not use SP+PSP, but only SP (while JS remains on SP+PSP)
- and then with ion, to compensate for this, we have to go back to SP+PSP again for ion compilations, hence the code in WasmIonCompile.cpp
- the stubs code and the baseline compiler do not use push/pop at all, in any form, except when it is expressly known that those uses are sp-safe (see WasmPush / WasmPop in the stubs code)
- JS code always uses SP+PSP and all uses of push/pop that are reached from JS compilations should be compatible with that
Beyond that general mess, SetStackPointer64 is used in sections of trampoline code that need to manipulate the raw SP, but these are well controlled.
It looks like a number of uses of GetStackPointer64() should be replaced by either syncStackPtr or initPseudoStackPtr, frankly, since they implement that functionality. Many of the others are simply substitutes for what would be the global StackPointer value on other platforms. And the rest look like they are part of the SP/PSP protocol in more or less complicated ways :-)
Assignee | ||
Comment 22•4 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #20)
Reasoning about this when reading the compiler/assembler sources is made more difficult by the fact that [..]
For example MacroAssemblerCompat::handleFailureWithHandlerTail
in jit/arm64/MacroAssembler-arm64.cpp
. This uses a mixture of direct references to r28
and to GetStackPointer64()
, despite having asserted earlier on in the function that GetStackPointer64().Is(x28)
.
Assignee | ||
Comment 23•4 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #20) (more)
Currently we have that PSP is "primary" and SP is "secondary", meaning that (perhaps apart from at function call boundaries) PSP is always changed first, then the value is copied into SP.
It might give us an easier incremental path to eventually removing PSP entirely if we switched this around. That is:
(1) SP is primary, PSP is secondary
(2) After any assignment to SP, it is copied into PSP
(3) All (non-frame-pointer-based) stack accesses are PSP-relative (as at present)
This would have the effect that:
-
It would reinstate the invariant that on all targets, the "real" SP value is in the ABI-and-or-hardware-mandated stack pointer register.
-
It would give us a simple story about calls and returns:
- for calls to non-JIT generated code (viz, C++ etc), we need no extra copies, because x28 is callee-saved
- for calls to JIT-generated code, we need no extra copies, because of (2) above
-
We could incrementally migrate those parts of the code generator where we know that SP is 16-aligned, to use SP- rather than PSP-relative accesses
One might ask what mechanical checks we can add to ensure correctness, rather than having to verify these invariants by hand indefinitely. Maybe some combination of:
-
in debug builds, compiling-in
assert(SP == PSP)
at critical places -
in debug builds, scanning sections of generated code to ensure
- no SP-relative stack accesses have been created -- for some sections, at least
- every assignment to SP is immediately followed by a copy to x28
Assignee | ||
Comment 24•4 years ago
|
||
WIP patch drop, containing:
02-0-enable-phase0.diff
02-1-arm64-codegen.diff
02-2-deal-with-sp-psp.diff
02-3-enable-phase1.diff
Changes relative to previous:
- The SP-vs-PSP situation has been cleaned up, documented and tested as much as I reasonably can at this point.
- The phase0 and phase1 enablement patches both support
--wasm-compiler={ion,cranelift,baseline}
for testing convenience. Their behaviour is otherwise unchanged.
Next step is to address nbp's initial feedback for 02-1-arm64-codegen.diff
. After that I'll put them back up in Phabricator for review.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 25•4 years ago
|
||
Comment 26•4 years ago
|
||
Backed out for bustages on XPCJSContext.cpp
Backout link: https://hg.mozilla.org/integration/autoland/rev/e8346137ae4ebf136fb69dd07a5d34e1d35d00c7
Log link: https://treeherder.mozilla.org/logviewer?job_id=333291822&repo=autoland&lineNumber=31298
Comment 27•4 years ago
|
||
Comment 28•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a0ed871de9e1
https://hg.mozilla.org/mozilla-central/rev/b2465e848eaa
https://hg.mozilla.org/mozilla-central/rev/a96f90afc318
https://hg.mozilla.org/mozilla-central/rev/99e10a776a25
Comment 29•4 years ago
|
||
== Change summary for alert #29331 (as of Fri, 19 Mar 2021 10:25:42 GMT) ==
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
125% | unity-webgl | android-hw-p2-8-0-android-aarch64-shippable | nocondprof webrender | 654.55 -> 1,472.51 | |
123% | unity-webgl | android-hw-p2-8-0-android-aarch64-shippable | nocondprof webrender | 659.03 -> 1,471.43 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29331
Assignee | ||
Comment 30•4 years ago
|
||
(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #29)
== Change summary for alert #29331 (as of Fri, 19 Mar 2021 10:25:42 GMT) ==
FTR, these performance changes were discussed in bug 1700295. They (the
perf changes only, nothing else) have since been backed out in bug 1699379,
so that performance on "unity-webgl" has returned to its original level.
Reporter | ||
Comment 31•4 years ago
|
||
Reopening because this is not done.
Reporter | ||
Comment 32•4 years ago
|
||
The vector register patch is still blocking SIMD.
Updated•4 years ago
|
Comment 33•4 years ago
|
||
Comment 34•4 years ago
|
||
bugherder |
Comment 35•4 years ago
|
||
Change the status for beta to have the same as nightly and release.
For more information, please visit auto_nag documentation.
Description
•