Add index masking for 32-bit wasm loads and stores

RESOLVED FIXED in Firefox 60

Status

()

enhancement
P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: luke, Assigned: luke)

Tracking

(Blocks 1 bug)

unspecified
mozilla60
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox60 fixed)

Details

Attachments

(1 attachment, 2 obsolete attachments)

On 64-bit, wasm Memory reserves a 6gb guard page and there are no bounds checks so Spectre is not a problem.  On 32-bit ARM, conditional instructions are used instead of branching, and it doesn't appear these are speculated yet.  However, x32 bounds checks require mitigations.
Posted patch spectre-wasm32 (obsolete) — Splinter Review
This patch uses the masm and Ion primitives added by bug 1430602.

I tried to also add index masking to rabldr, but masm.spectreMaskIndex() requires a separate register it can write into and this is rather difficult to do from within prepareMemoryAccess() since, at least for some of the new atomic rmws, if I simply needI32() to get a temp, there are no free GPRs on x32.  Maybe Lars you could help do the baseline patch?
Assignee: nobody → luke
Attachment #8944593 - Flags: review?(lhansen)
Attachment #8944593 - Flags: feedback?(jdemooij)
I'll take care of the baseline patch.  (prepareMemoryAccess should not be allocating registers anyway.)
Comment on attachment 8944593 [details] [diff] [review]
spectre-wasm32

Review of attachment 8944593 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmIonCompile.cpp
@@ +969,5 @@
>      {
>          if (inDeadCode())
>              return nullptr;
>  
> +        checkOffsetAndAlignmentAndBounds(access, SpectreMask::DontNeed, &base);

DontNeed is almost certainly wrong here.  On x86 it is /maybe/ good enough (and it might not be...) /if/ the back-end uses XCHG (and it might not...) but on architectures where exchange is implemented as a LL/SC loop we should worry about speculation through LL.  Effectively, except where you can optimize with XCHG the exchange case is equivalent to the binop case.  If we have to use a loop with cmpxchg the initial load will even be a plain load on most systems and we will speculate through that for sure.
Attachment #8944593 - Flags: review?(lhansen) → review+
(In reply to Lars T Hansen [:lth] from comment #3)
> > +        checkOffsetAndAlignmentAndBounds(access, SpectreMask::DontNeed, &base);
> 
> DontNeed is almost certainly wrong here.

Oops, for some silly reason I forgot Atomics.exchange() returned a value.
Comment on attachment 8944593 [details] [diff] [review]
spectre-wasm32

Review of attachment 8944593 [details] [diff] [review]:
-----------------------------------------------------------------

Just skimmed the patch, but LGTM.
Attachment #8944593 - Flags: feedback?(jdemooij) → feedback+
Less terrible than I had feared.  See commit msg for technical details.

Jan, the interesting bit here for you is the change to the bounds check logic in prepareMemoryAccess, diff chunk marked "-3945,40".
Attachment #8947479 - Flags: review?(luke)
Attachment #8947479 - Flags: feedback?(jdemooij)
Comment on attachment 8947479 [details] [diff] [review]
bug1432345-rabaldr-spectre-boundscheck.patch

Review of attachment 8947479 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +3918,5 @@
> +        bool free;
> +        RegI32 tmp;
> +
> +      public:
> +        SpectreTemp() : free(false) {}

So the callsite reads a bit more explicitly and for a superficial symmetry with the WasmIonCompile.cpp patch, how about instead having a 'static SpectreTemp DontNeed()' and the default ctor is private?

@@ +3938,5 @@
> +                bc->freeI32(tmp);
> +        }
> +
> +        bool isValid() const { return tmp.isValid(); }
> +        RegI32 operator*() const { return tmp; }

Could this assert !spectreIndexMasking() && tmp.isValid()?

@@ +8245,5 @@
> +        RegI32 r = regs.expectLow();
> +        fr.pushPtr(r);
> +        SpectreTemp stemp(this, r);
> +        address = prepareAtomicMemoryAccess(&access, &check, tls, rp, stemp);
> +        fr.popPtr(r);

Ah hah, this is the case that tripped me up when I naively tried this patch.
Attachment #8947479 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #7)
> Comment on attachment 8947479 [details] [diff] [review]
> bug1432345-rabaldr-spectre-boundscheck.patch
> 
> Review of attachment 8947479 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +3938,5 @@
> > +                bc->freeI32(tmp);
> > +        }
> > +
> > +        bool isValid() const { return tmp.isValid(); }
> > +        RegI32 operator*() const { return tmp; }
> 
> Could this assert !spectreIndexMasking() && tmp.isValid()?

Not as such, because spectreIndexMasking() is not definitive, we also need to know what the operation is.  But we can simply assert isValid(), because we'll only deref the spectreTemp if we actually need it for codegen.
Comment on attachment 8947479 [details] [diff] [review]
bug1432345-rabaldr-spectre-boundscheck.patch

Review of attachment 8947479 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +4012,5 @@
> +            masm.wasmBoundsCheck(Assembler::AboveOrEqual, ptr, limit, oldTrap(Trap::OutOfBounds));
> +
> +            if (spectreTemp.isValid()) {
> +                masm.spectreMaskIndex(ptr, limit, *spectreTemp);
> +                masm.and32(*spectreTemp, ptr);

As mentioned on IRC, spectreMaskIndex will do the and32 so you can use *spectreTemp or masm.move32(*spectreTemp, ptr).
Attachment #8947479 - Flags: feedback?(jdemooij) → feedback+
Probably also mentioned in irc but; we should switch this patch to using the cmov masm ops added by bug 1435209.  jan mentioned for wasm we don't need to zero a reg and just cmov length to index because we have the guard page.
Priority: -- → P1
Comment on attachment 8947479 [details] [diff] [review]
bug1432345-rabaldr-spectre-boundscheck.patch

(Bug 1435209 landed.)  Jan pointed out that, for wasm, we could rely on the guard page being present and thus we won't need a temp register; we just cmov length to index if index > length.  That would super-simplify this patch and remove the need for SpectreTemp also probably be a bit faster so could we do that instead?
Attachment #8947479 - Flags: review+
Actually, w/ the temp business out of the way, I'll write the rabaldr patch.
Yes, we should absolutely not introduce the SpectreTemp business if we go the CMOV route.
Wow, the new approach is fantastic: it adds a single cmov and fits in ever-so-neatly with the existing masm.wasmBoundsCheck(), so this tiny patch covers both Ion and Rabaldr with zero changes to Rabaldr.

A few notes:
 - I considered forking off Spectre variants of (M|L)WasmBoundsCheck to reflect the difference that the Spectre variants returned the saturated index, but there seems to be no problem at either the MIR or LIR level with having these single nodes do both.  Jan: in particular, can you see any theoretical problem with LInstructionHelper<1,...> even though there is no output defined?
 - the WasmBCE changes maintain the asserted invariant that MWasmBoundsCheck's for which isRedundant()=true aren't codegen'd.
Attachment #8944593 - Attachment is obsolete: true
Attachment #8947479 - Attachment is obsolete: true
Attachment #8953645 - Flags: review?(jdemooij)
Oof: so I measured the perf impact on Bullet, Fasta, Box2D and Lua-BinaryTrees and all 4 showed a roughly 15% slowdown.  I confirmed:
 - before patch had the same perf as after patch with --spectre-mitigations=off
 - the exact same number of MWasmBoundsChecks are being marked as redundant by WasmBCE

I forgot to mention above: this patch mitigates both loads and stores.
Summary: Add index masking for 32-bit wasm loads → Add index masking for 32-bit wasm loads and stores
(In reply to Luke Wagner [:luke] from comment #14)
> Created attachment 8953645 [details] [diff] [review]
> spectre-wasm32
> 
> Wow, the new approach is fantastic: it adds a single cmov and fits in
> ever-so-neatly with the existing masm.wasmBoundsCheck(), so this tiny patch
> covers both Ion and Rabaldr with zero changes to Rabaldr.

/me rejoices.
Comment on attachment 8953645 [details] [diff] [review]
spectre-wasm32

Review of attachment 8953645 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8953645 - Flags: review?(jdemooij) → review+
(In reply to Luke Wagner [:luke] from comment #15)
> Oof: so I measured the perf impact on Bullet, Fasta, Box2D and
> Lua-BinaryTrees and all 4 showed a roughly 15% slowdown.

Do you know if there's also a slowdown from the bounds check having its own def now?

(In reply to Luke Wagner [:luke] from comment #14)
> Jan: in particular, can you see any
> theoretical problem with LInstructionHelper<1,...> even though there is no
> output defined?

That should be fine. It will default to LDefinition::BogusTemp() and that's ignored by regalloc.
Oh do we need a follow-up for ARM64?
(In reply to Jan de Mooij [:jandem] from comment #19)
> Oh do we need a follow-up for ARM64?

Nevermind, we only care about 32-bit. I thought I saw x86-shared in your patch but it's x86.
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bf401fe9c95c
Baldr: add index masking for 32-bit wasm loads and stores (r=jandem)
Oops, I missed this:

(In reply to Jan de Mooij [:jandem] from comment #18)
> (In reply to Luke Wagner [:luke] from comment #15)
> > Oof: so I measured the perf impact on Bullet, Fasta, Box2D and
> > Lua-BinaryTrees and all 4 showed a roughly 15% slowdown.
> 
> Do you know if there's also a slowdown from the bounds check having its own
> def now?

Great question!  Comparing a pre-patch shell to a post-patch build that has the "cmovCCl()" commented out (so the same regalloc, just no cmov being emitted), they are almost the same (the only slowdown I could find was fasta, with 1%).  Thus, it appears that the overhead is entirely from cmov.
https://hg.mozilla.org/mozilla-central/rev/bf401fe9c95c
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/19c4eedf4d8e
Baldr: remove accidental testing printf (r=me)
awfy wasm 32-bit confirms an overall 15% slowdown.  Looking at v8's 32-bit graphs, they show almost the exact same magnitude drop when they landed their spectre 32-bit cset (much earlier).  To wit:
 - their cset uses a mask loaded from their tls array instead of cmov
 - SM perf *after* this patch matches v8 perf *before* their spectre patch
 - v8 apparently doesn't have 64-bit guard pages enabled and thus they took the same hit on 64-bit
You need to log in before you can comment on or make changes to this bug.