Closed Bug 1678097 Opened 4 years ago Closed 3 years ago

Ion for wasm on ARM64 - base functionality (was: Plan B)

Categories

(Core :: JavaScript: WebAssembly, enhancement, P1)

ARM64
All
enhancement

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox88 --- fixed
firefox89 --- fixed
firefox90 --- fixed

People

(Reporter: lth, Assigned: jseward)

References

Details

Attachments

(6 files, 7 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
150.00 KB, application/x-tar
Details

Plan B, in case we need it (to be updated).

Depends on: 1678542
Depends on: 1680968

Some experiments, need test cases + some documentation:

  • generate much better code for constant-address accesses
  • generate better code when storing zero

Depends on D97478

Attachment #9191525 - Attachment description: Bug 1678097 - Optimize ARM64 memory accesses (sketches) → Bug 1678097 - Optimize ARM64 memory accesses (sketches, not actually correct)
Depends on: 1683080
Attachment #9188642 - Attachment description: Bug 1678097 - PLAN B (sketches) → Bug 1678097 - PLAN B (functional, modulo SIMD)

Depends on D97478

Attachment #9193921 - Attachment description: Bug 1678097 - Bitset128 + very preliminary adaptation (sketches) → Bug 1678097 - 128-bit register sets on ARM64 (no SIMD regs defined)

Depends on D100116

Move some SIMD code to jit/shared; copy other SIMD code to jit/arm64
as stubs. SIMD not enabled for ARM64 yet, so all tests continue to
pass with this patch.

Depends on D100654

Depends on D100655

Depends on D100656

Depends on D100657

Attached file Bug 1678097 - ARM64 SIMD WIP (obsolete) —

Depends on D100658

Depends on: 1684902
Attachment #9193921 - Attachment description: Bug 1678097 - 128-bit register sets on ARM64 (no SIMD regs defined) → Bug 1678097 - Add vector registers on ARM64

Comment on attachment 9195178 [details]
Bug 1678097 - Make SIMD code cross-platform

Revision D100655 was moved to bug 1685984. Setting attachment 9195178 [details] to obsolete.

Attachment #9195178 - Attachment is obsolete: true

Every operation that updates the SP must be sure to sync the SP. Various
addToStackPtr APIs did not do this.

I'm hoping to get rid of the PSP for wasm code, but until then, this
is the best we have.

Depends on: 1686626
Attachment #9196344 - Attachment description: Bug 1678097 - Always sync the SP (WIP) → Bug 1678097 - Deal with SP/PSP (WIP)
Attachment #9188642 - Attachment description: Bug 1678097 - PLAN B (functional, modulo SIMD) → Bug 1678097 - ARM64 code generation for Wasm (non-SIMD). r?nbp
Summary: Plan B → Ion for wasm on ARM64 - base functionality (was: Plan B)
Blocks: 1687626
Type: task → enhancement
Priority: P3 → P2
Blocks: 1687629

Comment on attachment 9195179 [details]
Bug 1678097 - Enable simd for arm64 in moz.configure

Revision D100656 was moved to bug 1687629. Setting attachment 9195179 [details] to obsolete.

Attachment #9195179 - Attachment is obsolete: true

Comment on attachment 9195182 [details]
Bug 1678097 - ARM64 SIMD WIP

Revision D100659 was moved to bug 1687629. Setting attachment 9195182 [details] to obsolete.

Attachment #9195182 - Attachment is obsolete: true

Comment on attachment 9191525 [details]
Bug 1678097 - Optimize ARM64 memory accesses (sketches, not actually correct)

Revision D98857 was moved to bug 1687630. Setting attachment 9191525 [details] to obsolete.

Attachment #9191525 - Attachment is obsolete: true
Attachment #9195177 - Attachment is obsolete: true
Attachment #9195181 - Attachment is obsolete: true
Attachment #9195180 - Attachment is obsolete: true

In Phase 0, both cranelift and ion are available on arm64, and ion
is enabled by default; use --wasm-force-cranelift at the shell to
select cranelift, or set javascript.options.wasm_force_ion to false
in about:config. Phase 0 is appropriate for developers, before
the patch set lands in mozilla-central.

In Phase 1, both compilers are still available but ion is DISABLED by
default; use --wasm-force-ion at the shell to select ion, or make sure
javascript.options.wasm_force_ion is true in about:config. Phase 1 is
appropriate for fuzzing, after the patch set lands in mozilla-central
but before ion is enabled by default. The patch for Phase 1 will
appear on bug 1678097 and will be very small, and MUST land with the
patch for Phase 0.

in Phase 2, cranelift becomes disabled in moz.configure and all the
changes in the present patch are removed again. The patch for Phase 2
will appear on bug 1686626 and will revert Phase 0 + Phase 1, and
additionally update moz.configure.

Attachment #9197278 - Attachment description: Bug 1678097 - Enable Ion for wasm on ARM64. → Bug 1678097 - Enable Ion for wasm on ARM64 (phase 1).
Attachment #9198169 - Attachment description: Bug 1678097 - Enable Ion for wasm on arm64 (phase 0) → Bug 1678097 - Enable Ion for wasm on arm64 without SIMD (phase 0)

Julian will take over and make sure this gets cleaned up, reviewed, and ready for landing.

Assignee: lhansen → jseward

Some notes about what I belive remains to be done here, in patch queue order:

Enablement level 0: This is probably clean, but note it disables arm64 SIMD. It needs to be reviewed, but should be reviewed along with the Level 1 patch.

SP/PSP: This needs major work, but possibly not much code. The patch has two aspects at the moment: misc bug fixes for stack related things for arm64, and then misc fixes for dealing with the SP/PSP thing. There's a big comment block in this patch that is the initial attempt at describing global invariants for SP/PSP, but it's not done yet, and there are things I know I don't understand. The SP/PSP affects both JS (baseline+ion) and wasm (ion), as well as irregexp, though not Cranelift or Wasm baseline. (Wasm baseline must be aware of the PSP: It must set up the PSP when making a call, since the callee may be Ion code that depends on the PSP being set.) The PSP code appears brittle. There are syncStackPtr calls where they don't appear to make sense to me, and no such calls where I thought they were needed, so probably I didn't understand the invariants. The first work item here is therefore to understand system-wide invariants and document them and make sure everyone agrees about what they are. The second is then to clean up the patch, get it reviewed.

ARM64 code generation: This is "done" but nbp left many comments (which I have not yet read) and requested re-review, so all that needs to be done.

Add vector registers: This is almost done but (a) I don't quite trust it, because if I enable simd on arm64 and run jit-tests then this patch makes non-simd tests fail, and (b) Yury reported some SIMD-related errors that might be related to the register sets introduced here or might be bailout logic.

There is also more work to be done in the vector register patch: the baseline compiler has a hack to use SIMD without SIMD in the register sets, for reasons described there (search for RABALDR_SIDEALLOC_V128). That hack should be disabled, and normal SIMD register sets should be used. But in addition there's code in arm64/*.cpp (PushRegsInMaskForWasmStubs, PopRegsInMaskForWasmStubs, GetPushSizeInBytesForWasmStubs) related to that hack, and those should be removed, and the regular PushRegsInMask/PopRegsInMask/GetPushSize code should be used instead. This may take some debugging. It's possible all this work should be a separate patch, for ease of review.

Enablement level 1: This adds some code so that ion and cranelift are both present and cranelift is the default while ion can be selected by a command line switch in the shell, this is for fuzzing. This patch must land when the level 0 patch lands. I don't know if this patch is quite baked yet, time will tell.

And two final notes:

It would have been interesting to try to avoid the PSP altogether for Ion wasm code. It's not clear why it's needed. There's one MIR/LIR node used by Wasm (visitWasmBuiltinModD) that performs an explicit push; any other push/pop logic is hidden in the masm. Also, getting rid of the PSP probably means making the SP a "normal" register in order for the MoveResolver play along, which means getting rid of JS_HIDDEN_SP, see next. It would for sure be interesting to look into this, but it looks like it could be a lot of work and I would prefer for us just to understand how the PSP is supposed to work and make the code work with it.

Since the ARM64 SP is weird and the SP has a code (63 in vixl) that does not fit in the register set, there's a hack in the JIT, JS_HIDDEN_SP, that is enabled on ARM64, that treats the SP differently and introduces some constraints (that are checked by the type system). I have been toying with removing this restriction, now that we have larger register sets and the SP could have code 32, say, and I have a patch tucked away for that, but there are lots of assumptions in the systems about things not going beyond 32 bits and I never got this fully working. So I think this is a useful exploration but not on the critical path for Ion-on-ARM64, and it does not address the SP/PSP problem either.

Here are some observations, for the current SP-vs-PSP situation. I don't claim these always hold, but I suspect this is pretty close:

  • PSP is "primary", SP is "secondary"

    • Most stack refs are PSP-relative. SP-relative is rare and (obviously) only done when we know that SP is aligned.
    • Most adjustments (pushes, pops, etc) are done to PSP first.
  • The intent is to keep them the same at (almost) all times

    • After a change to PSP, SP is updated (the common case)
    • After a change to SP, PSP is updated (the rare case)
    • The update always happens within the same basic block
    • Hence SP == PSP at all branches, calls and returns
  • (Relating to the system ABI requirement that there are no accesses below SP):

    • All PSP-relative accesses are to zero or positive offsets (observed)
    • If PSP < SP (very temporary; until resync in the same BB) then no PSP-relative accesses are allowed

Reasoning about this when reading the compiler/assembler sources is made more difficult by the fact that class MacroAssembler for arm64 has a stack pointer reg that can be changed as the compiler runs, via Get/SetStackPointer64(). Hence the intent of various compiler code fragments that use the assembler becomes conditional on what the current setting is. It's hard to believe that they are really agnostic about whether they generate SP- or PSP-relative stack-manipulation code. So this might be a feature worth removing. That would require duplicating some functions (Push, Pop etc).

It's important to note that SetStackPointer64 is used in very controlled ways. Specifically:

  • the vixl MacroAssembler constructor in MacroAssembler-vixl.cpp sets the sp_ to be the PSP, so "use SP+PSP" is the default on all arm64 compilations
  • however the moz MacroAssembler for wasm (end of MacroAssembler.cpp) changes this since, until Ion, all wasm code required that we not use SP+PSP, but only SP (while JS remains on SP+PSP)
  • and then with ion, to compensate for this, we have to go back to SP+PSP again for ion compilations, hence the code in WasmIonCompile.cpp
  • the stubs code and the baseline compiler do not use push/pop at all, in any form, except when it is expressly known that those uses are sp-safe (see WasmPush / WasmPop in the stubs code)
  • JS code always uses SP+PSP and all uses of push/pop that are reached from JS compilations should be compatible with that

Beyond that general mess, SetStackPointer64 is used in sections of trampoline code that need to manipulate the raw SP, but these are well controlled.

It looks like a number of uses of GetStackPointer64() should be replaced by either syncStackPtr or initPseudoStackPtr, frankly, since they implement that functionality. Many of the others are simply substitutes for what would be the global StackPointer value on other platforms. And the rest look like they are part of the SP/PSP protocol in more or less complicated ways :-)

(In reply to Julian Seward [:jseward] from comment #20)

Reasoning about this when reading the compiler/assembler sources is made more difficult by the fact that [..]

For example MacroAssemblerCompat::handleFailureWithHandlerTail in jit/arm64/MacroAssembler-arm64.cpp. This uses a mixture of direct references to r28 and to GetStackPointer64(), despite having asserted earlier on in the function that GetStackPointer64().Is(x28).

(In reply to Julian Seward [:jseward] from comment #20) (more)

Currently we have that PSP is "primary" and SP is "secondary", meaning that (perhaps apart from at function call boundaries) PSP is always changed first, then the value is copied into SP.

It might give us an easier incremental path to eventually removing PSP entirely if we switched this around. That is:

(1) SP is primary, PSP is secondary
(2) After any assignment to SP, it is copied into PSP
(3) All (non-frame-pointer-based) stack accesses are PSP-relative (as at present)

This would have the effect that:

  • It would reinstate the invariant that on all targets, the "real" SP value is in the ABI-and-or-hardware-mandated stack pointer register.

  • It would give us a simple story about calls and returns:

    • for calls to non-JIT generated code (viz, C++ etc), we need no extra copies, because x28 is callee-saved
    • for calls to JIT-generated code, we need no extra copies, because of (2) above
  • We could incrementally migrate those parts of the code generator where we know that SP is 16-aligned, to use SP- rather than PSP-relative accesses

One might ask what mechanical checks we can add to ensure correctness, rather than having to verify these invariants by hand indefinitely. Maybe some combination of:

  • in debug builds, compiling-in assert(SP == PSP) at critical places

  • in debug builds, scanning sections of generated code to ensure

    • no SP-relative stack accesses have been created -- for some sections, at least
    • every assignment to SP is immediately followed by a copy to x28
Attached file bug1678097_rev02a.tar

WIP patch drop, containing:

02-0-enable-phase0.diff
02-1-arm64-codegen.diff
02-2-deal-with-sp-psp.diff
02-3-enable-phase1.diff

Changes relative to previous:

  • The SP-vs-PSP situation has been cleaned up, documented and tested as much as I reasonably can at this point.
  • The phase0 and phase1 enablement patches both support --wasm-compiler={ion,cranelift,baseline} for testing convenience. Their behaviour is otherwise unchanged.

Next step is to address nbp's initial feedback for 02-1-arm64-codegen.diff. After that I'll put them back up in Phabricator for review.

Attachment #9198169 - Attachment description: Bug 1678097 - Enable Ion for wasm on arm64 without SIMD (phase 0) → Bug 1678097 - Enable Ion for wasm on arm64 without SIMD (phase 0). r?
Attachment #9196344 - Attachment description: Bug 1678097 - Deal with SP/PSP (WIP) → Bug 1678097 - Deal with SP/PSP (WIP). r?
Attachment #9197278 - Attachment description: Bug 1678097 - Enable Ion for wasm on ARM64 (phase 1). → Bug 1678097 - Enable Ion for wasm on ARM64 (phase 1). r?
Attachment #9198169 - Attachment description: Bug 1678097 - Enable Ion for wasm on arm64 without SIMD (phase 0). r? → Bug 1678097 - Enable Ion for wasm on arm64 without SIMD (phase 0). r=lth.
Attachment #9197278 - Attachment description: Bug 1678097 - Enable Ion for wasm on ARM64 (phase 1). r? → Bug 1678097 - Enable Ion for wasm on ARM64 (phase 1). r=lth.
Attachment #9188642 - Attachment description: Bug 1678097 - ARM64 code generation for Wasm (non-SIMD). r?nbp → Bug 1678097 - ARM64 code generation for Wasm (non-SIMD). r=nbp.
Attachment #9196344 - Attachment description: Bug 1678097 - Deal with SP/PSP (WIP). r? → Bug 1678097 - Deal with SP/PSP (WIP). r=iain.
Attachment #9196344 - Attachment description: Bug 1678097 - Deal with SP/PSP (WIP). r=iain. → Bug 1678097 - Deal with SP/PSP. r=iain.
Depends on: 1697560
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/92b5e5f5b3f7
Enable Ion for wasm on arm64 without SIMD (phase 0).  r=lth.
https://hg.mozilla.org/integration/autoland/rev/04cdd20e2557
ARM64 code generation for Wasm (non-SIMD).  r=nbp.
https://hg.mozilla.org/integration/autoland/rev/0c14d39bd01b
Deal with SP/PSP.  r=iain,lth.
https://hg.mozilla.org/integration/autoland/rev/f3e0ba29ae46
Enable Ion for wasm on ARM64 (phase 1).  r=lth.
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a0ed871de9e1
Enable Ion for wasm on arm64 without SIMD (phase 0).  r=lth.
https://hg.mozilla.org/integration/autoland/rev/b2465e848eaa
ARM64 code generation for Wasm (non-SIMD).  r=nbp.
https://hg.mozilla.org/integration/autoland/rev/a96f90afc318
Deal with SP/PSP.  r=iain,lth.
https://hg.mozilla.org/integration/autoland/rev/99e10a776a25
Enable Ion for wasm on ARM64 (phase 1).  r=lth.

== Change summary for alert #29331 (as of Fri, 19 Mar 2021 10:25:42 GMT) ==

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
125% unity-webgl android-hw-p2-8-0-android-aarch64-shippable nocondprof webrender 654.55 -> 1,472.51
123% unity-webgl android-hw-p2-8-0-android-aarch64-shippable nocondprof webrender 659.03 -> 1,471.43

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29331

(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #29)

== Change summary for alert #29331 (as of Fri, 19 Mar 2021 10:25:42 GMT) ==

FTR, these performance changes were discussed in bug 1700295. They (the
perf changes only, nothing else) have since been backed out in bug 1699379,
so that performance on "unity-webgl" has returned to its original level.

Flags: needinfo?(jseward)

Reopening because this is not done.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

The vector register patch is still blocking SIMD.

Priority: P2 → P1
Depends on: 1705664
Attachment #9193921 - Attachment description: Bug 1678097 - Add vector registers on ARM64 → Bug 1678097 - Add vector registers on ARM64. r=lth,nbp.
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/472c9de40081
Add vector registers on ARM64.  r=lth,nbp.
Blocks: 1709527
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: 88 Branch → 90 Branch

Change the status for beta to have the same as nightly and release.
For more information, please visit auto_nag documentation.

Regressions: 1757476
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: