Closed Bug 1377576 Opened 7 years ago Closed 7 years ago

Implement wasm atomics + wait and wake

Categories

(Core :: JavaScript Engine: JIT, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(10 files, 10 obsolete files)

53.75 KB, patch
lth
: review+
Details | Diff | Splinter Review
31.12 KB, patch
lth
: review+
Details | Diff | Splinter Review
22.77 KB, patch
lth
: review+
Details | Diff | Splinter Review
5.16 KB, patch
lth
: review+
Details | Diff | Splinter Review
50.90 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
22.70 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
59.17 KB, patch
lth
: review+
Details | Diff | Splinter Review
83.99 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
66.55 KB, patch
lth
: review+
Details | Diff | Splinter Review
11.81 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
In the best of all worlds we just move our asm.js atomics opcodes over, and implement the missing ones (wake, wait, 64-bit atomics).  In the worst case we'll have subtly different semantics and will need the double set.

The implementation burden isn't necessarily trivial even if we can just move the opcodes.  For example, wasm atomics are lock-free at all sizes, unlike JS atomics.  And 64-bit atomics on 32-bit platforms where we compile with Clang will need special care because Clang does not support 64-bit atomics on those platforms and our compatibility layer will need to be adapted.

wait and wake will presumably just be callouts to the VM.

Probably can't test much until we have shared memory support for wasm.

It's likely these new opcodes should be disabled for wasm except on Nightly for a while, until we have the full feature in place.
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Define the thread operations and add them to the wasm verifier.  This patch stands on its own but has some MOZ_CRASH clauses where later patches will fill in other details.

I have renamed "AtomicPrefix" to "ThreadPrefix", and called the operations "ThreadOp", for a couple of reasons.  First, we should expect there to be some ops here that are not, strictly speaking, atomic, such as create thread.  Second, there is already an "AtomicOp" in the jit namespace and it's unnecessarily confusing to have two different things with the same name.

This patch does not quite follow the current draft because it assumes the timeout value for WaitI32 and WaitI64 will be Int64, once the dust settles.  If I'm wrong it's easily remedied.

The existing asm.js atomics are renamed with the infix "Old", following existing convention eg for old-style indirect calls, because the instruction encoding is different.  Once the patch set in this bug has landed I want to change Odin to generate wasm atomic instructions instead, but I did not want to mix that in here.
Attachment #8883724 - Flags: review?(sunfish)
Attached patch bug1377576-text-to-binary.patch (obsolete) — Splinter Review
Text-to-binary support for the thread operations, again with some MOZ_CRASH where later patches will add more functionality, so that the patch stands on its own.

I don't think there are any surprising bits here.  I did opt for the name "AtomicRMW" for Add, Sub, And, Or, Xor, Xchg, over eg "AtomicBinOp"; neither is completely accurate, and RMW reflects what these instructions are called in the wasm spec.  (Though in truth cmpxchg is also called "rmw".)  I have tried to be consistent in this use throughout.
Attachment #8883727 - Flags: review?(sunfish)
Attached patch bug1377576-binary-to-text.patch (obsolete) — Splinter Review
Binary-to-text conversion, ditto.
Attachment #8883728 - Flags: review?(sunfish)
Pretty basic test cases for verification, text-to-binary, and binary-to-text conversion.
Attachment #8883729 - Flags: review?(sunfish)
Comment on attachment 8883724 [details] [diff] [review]
bug1377576-define-atomic-ops.patch

Review of attachment 8883724 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/wasm/binary.js
@@ +454,5 @@
> +// Illegal AtomicPrefix opcodes
> +for (let i = 3; i < 0x10; i++)
> +    checkIllegalPrefixed(AtomicPrefix, i);
> +
> +for (let i = 0x4f; i < 0x100; i++)

The magic numbers in this loop and the previous are a little opaque. Could you either add an extra comment, or perhaps declare them somewhere so that they have names, so that it's clear what these lines are testing?
Attachment #8883724 - Flags: review?(sunfish) → review+
Attachment #8883727 - Flags: review?(sunfish) → review+
Attachment #8883728 - Flags: review?(sunfish) → review+
Attachment #8883729 - Flags: review?(sunfish) → review+
Try run for first four patches is green, though new opcode dispatch trees are not #ifdef NIGHTLY_BUILD and it's quite possible they ought to be.
Blocks: 1389458
Summary: Implement wasm atomics → Implement wasm atomics + wait and wake
Depends on: 1389471
Depends on: 1146817
A lot of code for implementing 8-byte atomics across many platforms is attached to bug 1131613.
See Also: → 1131613
Priority: -- → P2
Depends on: 1406336
Define atomic ops rebased etc.  See comment 1, comment 5.  Carry r+ from sunfish.
Attachment #8883724 - Attachment is obsolete: true
Attachment #8916577 - Flags: review+
Text-to-binary conversion, rebased etc.  Comment 2.  Carry r+ from sunfish.
Attachment #8883727 - Attachment is obsolete: true
Attachment #8916578 - Flags: review+
Binary-to-text conversion, rebased etc.  Comment 3.  Carry r+ from sunfish.
Attachment #8883728 - Attachment is obsolete: true
Attachment #8916579 - Flags: review+
Test cases for verification and conversion, rebased etc.  Comment 4.  Carry r+ from sunfish.
Attachment #8883729 - Attachment is obsolete: true
Attachment #8916581 - Flags: review+
Changes to assemblers and macroassemblers to support wasm atomic operations, chiefly for 64-bit atomics since we already had 32-bit atomic support.

This is generally completely pedestrian, just add new ops to expose LDREXD/STREXD on ARM, CMPXCHG8B on x86, and CMPXCHGQ/XADDQ/etc on X64.

I have changed the ARM macroassembler to call asMasm().memoryBarrier() instead of calling ma_dmb(), in order to centralize the interpretation of the barrier type in the memoryBarrier() implementation.  It also prefigures some cleanup around memory barriers that will come along in not too long: Right now we have the situation that load and store operations are parameterized by the memory barriers desired, but the other atomic ops are not; this is a mess.
Attachment #8916583 - Flags: review?(sunfish)
General comments about the implementation of atomic operations (next three patches):

OOB handling is as for existing operations: trapping within the guard region, explicit checks outside that.  We get grandfathered into the existing handling for asm.js atomics and SIMD (handled at the beginning of HandleMemoryAccess in WasmSignalHandlers.cpp).  Later we can upgrade this by making the disassemblers understand the atomic accesses properly, but for now I don't think there's a need.

We require alignment checks on atomic accesses.  I have implemented some simple optimizations but no flow-based optimizations, which would be similar to our current BCE.  For the time being I don't think the effort is worth it, since atomic ops are not that common and are fairly expensive.  If we do want flow-based optimizations we can do that as follow-up.
(See comment 15 for general remarks)

Refactoring and preparatory work for atomic operations.  Notably this refactors wait and wake so that they are callable from both wasm and JS.

Of special note here is that the existing alignment fault (used on ARM in a nonstandard way) is being promoted to a standard fault to signal alignment errors for atomics; also, there is a new 'Throw' fault that is being introduced to propagate errors that were already signaled.  This way the wait and wake callouts can signal errors for alignment and bounds checking and the calling wasm code can just propagate the fault without any complexity.
Attachment #8916585 - Flags: review?(bbouvier)
Attached patch bug1377576-rabaldr.patch (obsolete) — Splinter Review
(See comment 15 for general remarks.)

Changes to Rabaldr to support wasm atomics.  The register constraints for 64-bit atomics on 32-bit CPUs makes this a little bit messier than we want, but in the end it's cleaner than I dared hope.
Attachment #8916586 - Flags: review?(bbouvier)
Attached patch bug1377576-baldr.patch (obsolete) — Splinter Review
(See comment 15 for general remarks.)

Baldr changes to support wasm atomics.  Two notable issues:

* I use a fixed register allocation on ARM where it is in principle possible to be more flexible, as Rabaldr is.  We need odd/even register pairs for the doubleword atomics, and I just blindly allocate argreg0/argreg1 and argreg2/argreg3 for the maximum two pairs that I need.  I don't think this is much of an issue for performance; if it is we must see if we can provide a regalloc constraint that will give us the necessary pairs.

* I simplify the code so that when 8/16/32 bit accesses are made but the type is 64 bit, I perform explicit narrowings and zero extends before/after the accesses so that I can just use the existing 32-bit code generation.  Pushing the additional type through that pipeline is possible but did not seem worth the effort at the moment.
Attachment #8916588 - Flags: review?(bbouvier)
Test cases for atomics, for reference - not quite ready for review yet.
As a final note, these patches can't land until the MemoryObject has been cleaned up to handle shared memory, and there are a couple of lines of code in these patches that reference patches on bug 1389464 that you haven't yet seen, but I don't think that should cause much confusion.
Comment on attachment 8916583 [details] [diff] [review]
bug1377576-assemblers.patch

Review of attachment 8916583 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, with one nit assuming there's a reasonable answer :-).

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +5704,4 @@
>  MacroAssembler::wasmLoadI64(const wasm::MemoryAccessDesc& access, Register memoryBase, Register ptr,
>                              Register ptrScratch, Register64 output)
>  {
> +    MOZ_ASSERT_IF(access.isAtomic(), access.byteSize() <= 4);

Why is this asserting that the access size is at most 4 bytes, when this is in `wasmLoadI64`?

::: js/src/jit/x86/MacroAssembler-x86.cpp
@@ +722,5 @@
>  void
>  MacroAssembler::wasmLoadI64(const wasm::MemoryAccessDesc& access, Operand srcAddr, Register64 out)
>  {
> +    // Atomic i64 load must use lock_cmpxchg8b.
> +    MOZ_ASSERT_IF(access.isAtomic(), access.byteSize() <= 4);

Ditto.
Attachment #8916583 - Flags: review?(sunfish) → review+
(In reply to Dan Gohman [:sunfish] from comment #21)
> Comment on attachment 8916583 [details] [diff] [review]
> bug1377576-assemblers.patch
> 
> Review of attachment 8916583 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good to me, with one nit assuming there's a reasonable answer :-).
> 
> ::: js/src/jit/arm/MacroAssembler-arm.cpp
> @@ +5704,4 @@
> >  MacroAssembler::wasmLoadI64(const wasm::MemoryAccessDesc& access, Register memoryBase, Register ptr,
> >                              Register ptrScratch, Register64 output)
> >  {
> > +    MOZ_ASSERT_IF(access.isAtomic(), access.byteSize() <= 4);
> 
> Why is this asserting that the access size is at most 4 bytes, when this is
> in `wasmLoadI64`?

Yes, this is confusing :-)  The 'I64' comes from the result type of the operation and the size of the result register, but the access can be narrower, and the access type determines how much we load and whether we sign extend or zero extend to 64 bits.  wasmLoad() is exactly the same, the result type is always I32 but the access can be narrower.  It "should" have been called wasmLoadI32().

Anyway, on 32-bit systems (hence this confusion for ARM and x86) we have to use radically different code for 64-bit atomic loads, hence the asserts here to ensure that a different path was taken.
Makes sense. Thanks!
Comment on attachment 8916585 [details] [diff] [review]
bug1377576-refactoring-and-prep.patch

Review of attachment 8916585 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the long review time, would like to see another version of this patch because the refactoring doesn't look very appealing at the moment.

::: js/src/builtin/AtomicsObject.cpp
@@ +777,1 @@
>      Rooted<SharedArrayBufferObject*> sab(cx, view->bufferShared());

So the terrible API of atomics_wait_impl is caused by this rooting, if I understand correctly, and it still implies a lot of code duplication between this path and the one in WasmInstance.

Instead, could the rooting happen above the definition of the AutoLockFutexAPI lock? then the lock doesn't need to be passed as outparam and less code is duplicated.

@@ +786,5 @@
> +        r.setString(cx->names().futexOK);
> +        break;
> +      case FutexThread::FutexTimedOut:
> +        r.setString(cx->names().futexTimedOut);
> +        break;

Maybe add a default crashing here?

@@ +854,3 @@
>  
>      Rooted<SharedArrayBufferObject*> sab(cx, view->bufferShared());
>      SharedArrayRawBuffer* sarb = sab->rawBufferObject();

(same question as in wait regarding the place where the locking happens, and the removal of code duplication in WasmInstance)

@@ +854,5 @@
>  
>      Rooted<SharedArrayBufferObject*> sab(cx, view->bufferShared());
>      SharedArrayRawBuffer* sarb = sab->rawBufferObject();
> +
> +    int32_t icount = count > INT32_MAX ? -1 : int32_t(count);

This looks like a change in functionality, since before one could wake INT32_MAX waiters in a chain of INT32_MAX+1 waiters, but now it will wake them all (possible underflows in the _impl method make me a bit nervous too). Is there some other technical limitation that would prevent such an amount of waiters somewhere else? Otherwise, could we use uint64 here?

::: js/src/builtin/AtomicsObject.h
@@ +169,5 @@
> +
> +    js::UniqueLock<js::Mutex>& unique() { return *unique_; }
> +};
> +
> +MOZ_MUST_USE bool atomics_wait_impl(AutoLockFutexAPI& lock, JSContext* cx,

nit (if you want to keep it as is):

- please put JSContext* first
- please use pointers out-params for AutoLockFutexAPI so call sites show it's being modified
- ditto for timeout

@@ +175,5 @@
> +                                    mozilla::Maybe<mozilla::TimeDuration>& timeout,
> +                                    FutexThread::WaitResult* result);
> +
> +// Count < 0 means "wake all".
> +MOZ_MUST_USE int32_t atomics_wake_impl(AutoLockFutexAPI& lock, SharedArrayRawBuffer* sarb,

ditto for lock

::: js/src/wasm/WasmInstance.cpp
@@ +334,5 @@
> +template<typename T>
> +static int32_t
> +PerformWait(Instance* instance, uint32_t offset, T value, int64_t timeout_ns)
> +{
> +    JSContext* cx = TlsContext.get();

Do we need to check for cx->fx.canWait here?

@@ +338,5 @@
> +    JSContext* cx = TlsContext.get();
> +
> +    mozilla::Maybe<mozilla::TimeDuration> timeout;
> +    if (timeout_ns >= 0)
> +        mozilla::Some(mozilla::TimeDuration::FromMicroseconds(timeout_ns / 1000));

nit: did you forget the assignment part :) (hope this will get revealed by a test!)

@@ +356,5 @@
> +
> +    // Validation should ensure that this does not happen.
> +    MOZ_ASSERT(sarb, "wait is only applicable to shared memory");
> +
> +    SharedMem<T*>(addr) = sarb->dataPointerShared().cast<T*>() + offset;

nit: no parenthesis on the LHS of the assignment? (also pre-existing in AtomicsObjects.cpp where this was probably copied from)
Attachment #8916585 - Flags: review?(bbouvier)
Comment on attachment 8916586 [details] [diff] [review]
bug1377576-rabaldr.patch

Review of attachment 8916586 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch!

That's a lot of new code. I trust your register allocation, especially all the particular cases on x86 etc.

It's always a bit unclear to me why some things that relate to register allocations (usage of EBX byte persona) happen in the codegen functions and not at the higher level.

Also, I think a few tricks with Maybe<> could make the code much more linear (i.e. reduces the number of conditions).

Overall looks good, so r=me.

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +579,5 @@
>  
> +    struct AccessCheck {
> +        AccessCheck()
> +          : omitBoundsCheck(false),
> +            omitAlignmentCheck(false),  // Need check neither pointer nor offset

I don't understand this comment.

@@ +580,5 @@
> +    struct AccessCheck {
> +        AccessCheck()
> +          : omitBoundsCheck(false),
> +            omitAlignmentCheck(false),  // Need check neither pointer nor offset
> +            onlyPointerAlignment(false) // When omitAlignmentCheck == false

Do you mean here "Only has meaning when omitAlignmentCheck == false"? Maybe make it an enum then? Otherwise, could it contain a verb? checkOnlyPointerAlignment?

@@ +861,5 @@
> +
> +    static const uint32_t pairLimit = 10;
> +
> +    bool hasGPRPair() {
> +        for ( uint32_t i=0 ; i <= pairLimit ; i += 2 ) {

nits:

- no spaces after/before parens
- spaces between = operands

@@ +862,5 @@
> +    static const uint32_t pairLimit = 10;
> +
> +    bool hasGPRPair() {
> +        for ( uint32_t i=0 ; i <= pairLimit ; i += 2 ) {
> +            if (isAvailable(Register::FromCode(i)) && isAvailable(Register::FromCode(i+1)))

nit: spaces between + operands

@@ +869,5 @@
> +        return false;
> +    }
> +
> +    void allocGPRPair(Register* low, Register* high) {
> +        for ( uint32_t i=0 ; i <= pairLimit ; i += 2 ) {

ditto

@@ +870,5 @@
> +    }
> +
> +    void allocGPRPair(Register* low, Register* high) {
> +        for ( uint32_t i=0 ; i <= pairLimit ; i += 2 ) {
> +            if (isAvailable(Register::FromCode(i)) && isAvailable(Register::FromCode(i+1))) {

ditto

@@ +872,5 @@
> +    void allocGPRPair(Register* low, Register* high) {
> +        for ( uint32_t i=0 ; i <= pairLimit ; i += 2 ) {
> +            if (isAvailable(Register::FromCode(i)) && isAvailable(Register::FromCode(i+1))) {
> +                *low = Register::FromCode(i);
> +                *high = Register::FromCode(i+1);

ditto

@@ +3360,5 @@
>      }
>  
> +
> +    MOZ_MUST_USE bool
> +    supportsRoundInstruction(RoundingMode mode)

Hmm, it would have been nice if code motion were in a separate patch...

@@ +3476,4 @@
>          bceSafe_ &= ~(BCESet(1) << local);
>      }
>  
> +    void memoryAccessPreliminaries(MemoryAccessDesc* access, AccessCheck* check, RegI32 tls,

nit: it'd be nicer if the function's name would contain a verb: how about prepareMemoryAccess?

@@ +3500,5 @@
> +        // memoryBase nor boundsCheckLimit from tls.
> +        MOZ_ASSERT_IF(check->omitBoundsCheck, tls == invalidI32());
> +#endif
> +
> +#ifdef JS_CODEGEN_ARM

nit: Can you invert the order of ifdef WASM_HUGE_MEMORY/ARM, so that the ifndef WASM_HUGE below is just the else of the ifdef WASM_HUGE above?

@@ +3685,5 @@
>  
> +#if defined(JS_CODEGEN_X64) || defined(JS_CODEGEN_ARM)
> +
> +# define ATOMIC_PTR(name, access, tls, ptr)                     \
> +    BaseIndex name(HeapReg, ptr, TimesOne, (access)->offset())

nit: wrap (ptr)

@@ +3690,5 @@
> +
> +#elif defined(JS_CODEGEN_X86)
> +
> +# define ATOMIC_PTR(name, access, tls, ptr)                             \
> +    MOZ_ASSERT(tls != invalidI32());                                    \

nit: and (tls) too, and again below

@@ +3707,2 @@
>      {
> +#if defined(JS_CODEGEN_X86)

Shouldn't there an early crash if used on 64 bits platforms?

Also: why do we put the implementation inline for 64 bits platforms?

@@ +3722,5 @@
> +        popI64ToSpecific(rv);
> +
> +        AccessCheck check;
> +        RegI32 rp = popMemoryAccess(access, &check);
> +        RegI32 tls = maybeLoadTlsForAccess(check);

Random idea: could maybeLoadTlsForAccess return a Maybe<RegI32>, and then we could .map(freeI32) on it instead of `if (tls != invalidI32())`?

@@ +3738,5 @@
> +            freeI32(tls);
> +        freeI32(rp);
> +
> +#if defined(JS_CODEGEN_X86)
> +        freeI32(specific_ecx);

The asymmetry caused by ScratchEBX is a bit unfortunate here. Why is ScratchEBX actually needed?

@@ +3749,5 @@
> +
> +    MOZ_MUST_USE uint32_t
> +    atomicRMWTemps(AtomicOp op, MemoryAccessDesc* access) {
> +#if defined(JS_CODEGEN_X86)
> +        // Handled specially in atomicRMW

Considering:
- there can be at most 1 temporary
- the special casing of x86 is unfortunate
Could atomicRMWTemps return a Maybe<Reg> instead?

@@ +3754,5 @@
> +        if (access->byteSize() == 1)
> +            return 0;
> +#endif
> +#if defined(JS_CODEGEN_X86) || defined(JS_CODEGEN_X64)
> +        return op == AtomicFetchAddOp || op == AtomicFetchSubOp ? 0 : 1;

nit: () around the condition, please?

@@ +6783,5 @@
> +// be omitted can still be simplified by checking only the pointer if the offset
> +// is aligned.
> +//
> +// (In addition, alignment checking of the pointer can be omitted if the pointer
> +// has been checked in dominating code, but we don't do that yet.)

Nice optimization.

@@ +7200,5 @@
> +    MOZ_ASSERT(sig[0] == MIRType::Pointer);
> +
> +    sync();
> +
> +    uint32_t numArgs = sig.length() - 1;

Can you comment that the -1 is for the instance?

@@ +7209,5 @@
> +
> +    ABIArg instanceArg = reservePointerArgument(baselineCall);
> +
> +    startCallArgs(baselineCall, stackArgAreaSize(sig));
> +    for (uint32_t i=1; i < sig.length(); i++) {

nit: spaces around =

@@ +7223,5 @@
> +    endCall(baselineCall, stackSpace);
> +
> +    popValueStackBy(numArgs);
> +
> +    pushReturned(baselineCall, ExprType::I32);

maybe MOZ_ASSERT sig type is I32, to avoid a bad surprise?

@@ +7254,4 @@
>      if (deadCode_)
>          return true;
>  
> +    emitInstanceCall(lineOrBytecode, SigP_, SymbolicAddress::CurrentMemory);

CurrentMemory isn't an InterModule call, but well :)

@@ +7455,5 @@
> +#endif
> +        RegI32 tls = maybeLoadTlsForAccess(check);
> +        size_t temps = atomicRMWTemps(op, &access);
> +        MOZ_ASSERT(temps <= 1);
> +        RegI32 tmp = temps >= 1 ? needI32() : invalidI32();

nit: could simplify the condition to `temps ?` since it's 0 or 1.

(note it could also use the Maybe<> trick described somewhere else)

@@ +7482,4 @@
>      sync();
>  
> +    allocGPR(eax);
> +    ScratchEBX scratch(*this);           // Already allocated

This introduces asymmetry when freeing (see other comment about it below)

@@ +7503,5 @@
> +
> +    memoryAccessPreliminaries(&access, &check, tls, ptr);
> +    masm.movl(Operand(Address(tls, offsetof(TlsData, memoryBase))), memoryBase);
> +
> +    BaseIndex srcAddr(memoryBase, ptr, TimesOne, access.offset());

can you put this just above the call to atomicRMW64?

@@ +7506,5 @@
> +
> +    BaseIndex srcAddr(memoryBase, ptr, TimesOne, access.offset());
> +
> +    masm.Push(ecx);
> +    masm.Push(ebx);

Could we Push(specific_ecx_ebx) here?

@@ +7508,5 @@
> +
> +    masm.Push(ecx);
> +    masm.Push(ebx);
> +
> +    Address value(esp, 0);

can you put this just above the call to atomicRMW64?

@@ +7642,5 @@
> +
> +#ifdef JS_64BIT
> +    RegI64 rv = popI64();
> +    RegI32 rp = popMemoryAccess(&access, &check);
> +    RegI64 rd = rv;

Looks like a x64 constraint, right? (we're under a JS_64BIT ifdef block)

@@ +7677,5 @@
> +
> +    if (type == ValType::I32)
> +        emitInstanceCall(lineOrBytecode, SigPIIL_, SymbolicAddress::WaitI32);
> +    else
> +        emitInstanceCall(lineOrBytecode, SigPILL_, SymbolicAddress::WaitI64);

If it's not done in the iterator's readWait function, can you MOZ_ASSERT(type == ValType::I64)?

@@ +7678,5 @@
> +    if (type == ValType::I32)
> +        emitInstanceCall(lineOrBytecode, SigPIIL_, SymbolicAddress::WaitI32);
> +    else
> +        emitInstanceCall(lineOrBytecode, SigPILL_, SymbolicAddress::WaitI64);
> +    masm.branchTest32(Assembler::Signed, ReturnReg, ReturnReg, trap(Trap::Throw));

Light suggestion for the previous patch, now that I see better what it means: maybe rename Trap::Throw to Trap::AlreadyThrowing or something like this?
Attachment #8916586 - Flags: review?(bbouvier) → review+
(will review the last patch tomorrow)
(In reply to Benjamin Bouvier [:bbouvier] from comment #25)
> Comment on attachment 8916586 [details] [diff] [review]
> bug1377576-rabaldr.patch
> 
> 
> It's always a bit unclear to me why some things that relate to register
> allocations (usage of EBX byte persona) happen in the codegen functions and
> not at the higher level.

Yes, the baseline compiler is a little out of control now, and as you write, some rules / decisions are much more obscure than we want them to be.  I'm working on some refactorings (bug 1336027) and I'll keep this in mind.  There's a tension between creating too much abstract protocol to avoid machine dependencies (what I'm afraid of, because it hurts comprehensibility) and being too lenient in making exceptions from the rules (what you're experiencing the result of and don't want, because it hurts comprehensibility).
Comment on attachment 8916588 [details] [diff] [review]
bug1377576-baldr.patch

Review of attachment 8916588 [details] [diff] [review]:
-----------------------------------------------------------------

Need to have my eyes looking at it again (because I'm pretty foggy right now), but this looks mostly good, thanks.

::: js/src/jit/MIR.h
@@ +14118,5 @@
> +
> +    explicit MWasmAlignmentCheck(MDefinition* index, uint32_t byteSize,
> +                                 wasm::BytecodeOffset bytecodeOffset)
> +      : MUnaryInstruction(classOpcode, index),
> +        byteSize_(byteSize),

maybe we could assert byteSize is a power of two?

@@ +14122,5 @@
> +        byteSize_(byteSize),
> +        bytecodeOffset_(bytecodeOffset)
> +    {
> +        // Alignment check is effectful: it throws for unaligned.
> +        setGuard();

Equivalent alignment checks could be folded, right?

@@ +14143,5 @@
> +        return bytecodeOffset_;
> +    }
> +};
> +
> +

nit: remove one blank line

@@ +14396,4 @@
>          bytecodeOffset_(bytecodeOffset)
>      {
>          setGuard(); // Not removable
> +        setResultType(ScalarTypeToMIRType(access.type()));

rs=me on renaming all these MIR nodes to from MAsmJS to MWasm (separate patch please)

::: js/src/jit/x64/CodeGenerator-x64.cpp
@@ +552,5 @@
>  
> +    if (accessType == Scalar::Int64) {
> +        MOZ_ASSERT(!mir->access().isPlainAsmJS());
> +        masm.compareExchange64(srcAddr, Register64(oldval), Register64(newval),
> +                               Register64(ToRegister(ins->output())));

nit: ToRegister64? (and below too)

::: js/src/jit/x64/Lowering-x64.cpp
@@ +338,5 @@
>      MOZ_ASSERT(base->type() == MIRType::Int32);
>  
> +    // No support for 64-bit operations with constants at the masm level.
> +
> +    bool noConstant = ins->access().type() == Scalar::Int64;

nit: I'd revert the meaning to avoid the double-negative in the ternaries below.

@@ +347,5 @@
>      // LOCK OR, or LOCK XOR.
>  
>      if (!ins->hasUses()) {
> +        LAllocation value =
> +            noConstant ? useRegister(ins->value()) : useRegisterOrConstant(ins->value());

nit:

LAllocation value = noConstant
                    ? useRegister(...)
                    : useRegisterOrConstant(...);

@@ +352,2 @@
>          LAsmJSAtomicBinopHeapForEffect* lir =
> +            new(alloc()) LAsmJSAtomicBinopHeapForEffect(useRegister(base), value);

pre-existing: suggestion: use `auto* lir =` on the LHS?

::: js/src/jit/x86/CodeGenerator-x86.cpp
@@ +605,5 @@
> +    MOZ_ASSERT(ToOutRegister64(ins).high == edx);
> +    MOZ_ASSERT(ToOutRegister64(ins).low == eax);
> +
> +    // We must have the same values in ecx:ebx and edx:eax, but we don't care
> +    // what they are.  It's safe to clobber ecx:ebx for sure.

Is it because useRegister() was used during lowering? /me doubting

::: js/src/jit/x86/LIR-x86.h
@@ +338,5 @@
> +    const wasm::MemoryAccessDesc& access() {
> +        return access_;
> +    }
> +    Maybe<AtomicOp> operation() const {
> +        return op_;

Probably better to have a more explicit API. How about:

isExchange() { return !op_; }
operation() { MOZ_ASSERT(!isExchange()) return *op_ }

ditto for ctor: let's have two and build the Maybe ourselves?

or even better, have an enum augmenting AtomicOp by one value ::Exchange

or even better than better, let's have a LWasmAtomicExchangeI64

::: js/src/wasm/WasmIonCompile.cpp
@@ +778,5 @@
> +    // Only sets *mustFold if it also returns true.
> +    bool needAlignmentCheck(MemoryAccessDesc* access, MDefinition* base, bool* mustFold) {
> +        MOZ_ASSERT(!*mustFold);
> +
> +        if (env_.isAsmJS() || !access->isAtomic())

Can you comment that asm.js validation implies explicit alignment?

@@ +801,4 @@
>          // If the offset is bigger than the guard region, a separate instruction
>          // is necessary to add the offset to the base and check for overflow.
> +        //
> +        // Also fold the offset if we have a Wasm atomic access that needs

Maybe it's my foggy sick brain talking, but isn't "folding the offset" the exact opposite of what's done here? (note the negative in front of `JitOptions.wasmFoldOffsets`)

@@ +807,2 @@
>              auto* ins = MWasmAddOffset::New(alloc(), *base, access->offset(), bytecodeOffset());
>              curBlock_->add(ins);

Can you use the newly introduced effectiveAddress function here?

@@ +835,5 @@
> +    bool checkWaitWakeResult(MDefinition* value) {
> +        auto* zero = MConstant::New(alloc(), Int32Value(0), MIRType::Int32);
> +        if (!zero)
> +            return false;
> +        curBlock_->add(zero);

Slight preference for constant(Int32Value), to make it more symmetric with compare() below (which doesn't need the curBlock_->add bits)?

@@ +836,5 @@
> +        auto* zero = MConstant::New(alloc(), Int32Value(0), MIRType::Int32);
> +        if (!zero)
> +            return false;
> +        curBlock_->add(zero);
> +        auto* cond = compare(value, zero, JSOP_LT, MCompare::Compare_Int32);

Out of curiosity, does it always fold into a sign condition check?

@@ -809,5 @@
>              load = MWasmLoad::New(alloc(), memoryBase, base, *access, ToMIRType(result));
>          }
> -
> -        if (load)
> -            curBlock_->add(load);

what about OOM?

@@ -832,5 @@
>              store = MWasmStore::New(alloc(), memoryBase, base, *access, v);
>          }
> -
> -        if (store)
> -            curBlock_->add(store);

ditto

@@ +903,5 @@
> +            auto* cvtOldv = MWrapInt64ToInt32::New(alloc(), oldv, /* bottomHalf = */ true);
> +            curBlock_->add(cvtOldv);
> +            auto* cvtNewv = MWrapInt64ToInt32::New(alloc(), newv, /* bottomHalf = */ true);
> +            curBlock_->add(cvtNewv);
> +            if (!cvtOldv || !cvtNewv)

It's more verbose, but I prefer the usual pattern: new/check/add, even if it's repeated several times. Happens here and in other functions.

@@ +909,5 @@
> +            oldv = cvtOldv;
> +            newv = cvtNewv;
> +        }
> +
> +        checkOffsetAndAlignmentAndBounds(access, &base);

Could we have the checks done before the wrapping? (here and in other functions)

@@ -848,4 @@
>          MWasmLoadTls* memoryBase = maybeLoadMemoryBase();
> -        auto* cas = MAsmJSCompareExchangeHeap::New(alloc(), bytecodeOffset(), memoryBase, base,
> -                                                   *access, oldv, newv, tlsPointer_);
> -        if (cas)

Wasn't it an OOM check too?

@@ +916,5 @@
> +                                                           base, *access, oldv, newv, tlsPointer_);
> +        curBlock_->add(cas);
> +
> +        if (cas && result == ValType::I64 && access->byteSize() <= 4) {
> +            MOZ_ASSERT(!isSignedIntType(access->type()));

Can you hoist this assert into the previous if() above?

@@ +930,5 @@
>      {
>          if (inDeadCode())
>              return nullptr;
>  
> +        if (result == ValType::I64 && access->byteSize() <= 4) {

Maybe there could be an helper function for this wrapping? (it could even have the assert)

@@ +944,5 @@
> +        MInstruction* xchg = MAsmJSAtomicExchangeHeap::New(alloc(), bytecodeOffset(), memoryBase,
> +                                                           base, *access, value, tlsPointer_);
> +        curBlock_->add(xchg);
> +
> +        if (xchg && result == ValType::I64 && access->byteSize() <= 4) {

And one for the extension?

@@ +2786,5 @@
>      if (!f.iter().readOldAtomicBinOp(&addr, &viewType, &op, &value))
>          return false;
>  
> +    MemoryAccessDesc access(viewType, addr.align, addr.offset, Some(f.bytecodeOffset()),
> +                            0, MembarFull, MembarFull);

Here (and below x2) can you comment what the 0 is?
Attachment #8916588 - Flags: review?(bbouvier) → feedback+
New version of the refactoring-and-prep patch (comment 16, comment 24).

I think this addresses all your concerns; by moving things around they are now better encapsulated, there's less code duplication, and the APIs are cleaner.
Attachment #8916585 - Attachment is obsolete: true
Attachment #8921033 - Flags: review?(bbouvier)
Quick update to the refactoring-and-prep patch (comment 29).

This additionally renames Trap::Throw as Trap::ThrowReported as suggested for the baseline compiler patch (didn't see that comment until just now) and also removes a drive-by fix that belongs with the MemoryObject and will be posted later.
Attachment #8921033 - Attachment is obsolete: true
Attachment #8921033 - Flags: review?(bbouvier)
Attachment #8921047 - Flags: review?(bbouvier)
(In reply to Benjamin Bouvier [:bbouvier] from comment #25)
> Comment on attachment 8916586 [details] [diff] [review]
> bug1377576-rabaldr.patch
> 
> Review of attachment 8916586 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> 
> ::: js/src/wasm/WasmBaselineCompile.cpp
> @@ +3707,2 @@
> >      {
> > +#if defined(JS_CODEGEN_X86)
> 
> Shouldn't there an early crash if used on 64 bits platforms?

(This is about atomicRMW().)  Actually no.  These methods are used even on 64-bit platforms when the access size <= four bytes.

> @@ +3722,5 @@
> > +        popI64ToSpecific(rv);
> > +
> > +        AccessCheck check;
> > +        RegI32 rp = popMemoryAccess(access, &check);
> > +        RegI32 tls = maybeLoadTlsForAccess(check);
> 
> Random idea: could maybeLoadTlsForAccess return a Maybe<RegI32>, and then we
> could .map(freeI32) on it instead of `if (tls != invalidI32())`?

Will investigate this in separate patch, probably around bug 1336027.  It's a good idea, and all the guarding on invalidI32() is really annoying me.

> Why is ScratchEBX actually needed?

I guess the comment is too vague.  It's used for clarity: the use of EBX is necessary (it's wired into some instructions), and by exposing this fact and not "knowing" that the otherwise unnamed scratch register is EBX, we make it clear that we have it.  I'll flesh out the comment.

> @@ +7506,5 @@
> > +
> > +    BaseIndex srcAddr(memoryBase, ptr, TimesOne, access.offset());
> > +
> > +    masm.Push(ecx);
> > +    masm.Push(ebx);
> 
> Could we Push(specific_ecx_ebx) here?

There does not seem to be a Push(Register64) in the masm ABI (not even hidden in the implementations), but apart from that it is a good idea.
(In reply to Benjamin Bouvier [:bbouvier] from comment #28)
> Comment on attachment 8916588 [details] [diff] [review]
> bug1377576-baldr.patch
> 
> Review of attachment 8916588 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +14122,5 @@
> > +        byteSize_(byteSize),
> > +        bytecodeOffset_(bytecodeOffset)
> > +    {
> > +        // Alignment check is effectful: it throws for unaligned.
> > +        setGuard();
> 
> Equivalent alignment checks could be folded, right?

Yes.  I'll add a congruentTo clause; that should be sufficient, I think?

> @@ +14396,4 @@
> >          bytecodeOffset_(bytecodeOffset)
> >      {
> >          setGuard(); // Not removable
> > +        setResultType(ScalarTypeToMIRType(access.type()));
> 
> rs=me on renaming all these MIR nodes to from MAsmJS to MWasm (separate
> patch please)

OK, will create a patch for that to apply on top of this one.

> ::: js/src/jit/x86/CodeGenerator-x86.cpp
> @@ +605,5 @@
> > +    MOZ_ASSERT(ToOutRegister64(ins).high == edx);
> > +    MOZ_ASSERT(ToOutRegister64(ins).low == eax);
> > +
> > +    // We must have the same values in ecx:ebx and edx:eax, but we don't care
> > +    // what they are.  It's safe to clobber ecx:ebx for sure.
> 
> Is it because useRegister() was used during lowering? /me doubting

Not sure what you mean by that.  ecx:ebx are allocated using tempFixed() and should overlap neither the other inputs (which are useRegister(), not useRegisterAtStart()) or the outputs (which are also fixed allocations).  I've clarified in a comment.

> @@ +807,2 @@
> >              auto* ins = MWasmAddOffset::New(alloc(), *base, access->offset(), bytecodeOffset());
> >              curBlock_->add(ins);
> 
> Can you use the newly introduced effectiveAddress function here?

Interestingly I cannot, because that method asserts !asm.js, but the code above may be executed for asm.js atomics as well as for wasm atomics.

Does that suggest there is a problem here?  Or are we OK, since asm.js atomics also trap on OOB, and we just need to be a little more subtle about that assertion?

> @@ +836,5 @@
> > +        auto* zero = MConstant::New(alloc(), Int32Value(0), MIRType::Int32);
> > +        if (!zero)
> > +            return false;
> > +        curBlock_->add(zero);
> > +        auto* cond = compare(value, zero, JSOP_LT, MCompare::Compare_Int32);
> 
> Out of curiosity, does it always fold into a sign condition check?

On x64:
[Codegen] testl      %eax, %eax
[Codegen] jge        .Lfrom98

On ARM:
[Codegen]    f48c8090  e3500000       cmp r0, #0
[Codegen]    f48c8094  aa800000       bge  -> 1004f

Hard to see how it could do better on either platform in terms of instruction count or compactness.

> @@ -809,5 @@
> >              load = MWasmLoad::New(alloc(), memoryBase, base, *access, ToMIRType(result));
> >          }
> > -
> > -        if (load)
> > -            curBlock_->add(load);
> 
> what about OOM?

A good question, since MBasicBlock::add() appears not to handle null pointers, but then what about the unconditional code in eg constant() that also does this?

(Ditto for several subsequent comments.)
(In reply to Lars T Hansen [:lth] from comment #32)
> (In reply to Benjamin Bouvier [:bbouvier] from comment #28)
> > Comment on attachment 8916588 [details] [diff] [review]
> > bug1377576-baldr.patch
> > 
> > Review of attachment 8916588 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > @@ +14122,5 @@
> > > +        byteSize_(byteSize),
> > > +        bytecodeOffset_(bytecodeOffset)
> > > +    {
> > > +        // Alignment check is effectful: it throws for unaligned.
> > > +        setGuard();
> > 
> > Equivalent alignment checks could be folded, right?
> 
> Yes.  I'll add a congruentTo clause; that should be sufficient, I think?

Yes, I think so. (easy to check with iongraph)


> > ::: js/src/jit/x86/CodeGenerator-x86.cpp
> > @@ +605,5 @@
> > > +    MOZ_ASSERT(ToOutRegister64(ins).high == edx);
> > > +    MOZ_ASSERT(ToOutRegister64(ins).low == eax);
> > > +
> > > +    // We must have the same values in ecx:ebx and edx:eax, but we don't care
> > > +    // what they are.  It's safe to clobber ecx:ebx for sure.
> > 
> > Is it because useRegister() was used during lowering? /me doubting
> 
> Not sure what you mean by that.  ecx:ebx are allocated using tempFixed() and
> should overlap neither the other inputs (which are useRegister(), not
> useRegisterAtStart()) or the outputs (which are also fixed allocations). 
> I've clarified in a comment.

I think I was confused by useRegister vs atStart, *again*. Clarifying the comment sounds good, thanks.

> 
> > @@ +807,2 @@
> > >              auto* ins = MWasmAddOffset::New(alloc(), *base, access->offset(), bytecodeOffset());
> > >              curBlock_->add(ins);
> > 
> > Can you use the newly introduced effectiveAddress function here?
> 
> Interestingly I cannot, because that method asserts !asm.js, but the code
> above may be executed for asm.js atomics as well as for wasm atomics.
> 
> Does that suggest there is a problem here?  Or are we OK, since asm.js
> atomics also trap on OOB, and we just need to be a little more subtle about
> that assertion?

Well, since you introduced effectiveAddress, I think you have full control over the assertion. Moreover, I think asm.js atomics aren't "plain", since they get a trapping bytecode offset, so the current assertion is correct. Are you seeing something else?

> Hard to see how it could do better on either platform in terms of
> instruction count or compactness.
Great, I was just asking out of curiosity, thanks for checking!

> 
> > @@ -809,5 @@
> > >              load = MWasmLoad::New(alloc(), memoryBase, base, *access, ToMIRType(result));
> > >          }
> > > -
> > > -        if (load)
> > > -            curBlock_->add(load);
> > 
> > what about OOM?
> 
> A good question, since MBasicBlock::add() appears not to handle null
> pointers, but then what about the unconditional code in eg constant() that
> also does this?
> 
> (Ditto for several subsequent comments.)

Right, it's actually the users of these helper functions that are supposed to check for nulliness; if add() can handle nullptr, it can stay like this then. Somebody has to check for OOM at some point, but I can see the Emit* functions seem to do this already. Sorry for the spurious comments.
Priority: P2 → P1
(In reply to Benjamin Bouvier [:bbouvier] from comment #33)
> (In reply to Lars T Hansen [:lth] from comment #32)
> >
> > > @@ -809,5 @@
> > > >              load = MWasmLoad::New(alloc(), memoryBase, base, *access, ToMIRType(result));
> > > >          }
> > > > -
> > > > -        if (load)
> > > > -            curBlock_->add(load);
> > > 
> > > what about OOM?
> > 
> > A good question, since MBasicBlock::add() appears not to handle null
> > pointers, but then what about the unconditional code in eg constant() that
> > also does this?
> > 
> > (Ditto for several subsequent comments.)
> 
> Right, it's actually the users of these helper functions that are supposed
> to check for nulliness; if add() can handle nullptr, it can stay like this
> then. Somebody has to check for OOM at some point, but I can see the Emit*
> functions seem to do this already. Sorry for the spurious comments.

add() cannot handle a nullptr but New is infallible, so the node pointer is never null; the allocator crashes if it fails.  IonBuilder has no null checks at all, nor does most of WasmIonCompile.

Given how large these graphs can be this is likely mobile-hostile, especially for wasm, but it is what it is.  Maybe there's a trick somewhere I haven't seen that bails us out.
Baseline compiler updated (comment 17, comment 25, comment 27, comment 31).  Carrying bbouvier's r+.
Attachment #8916586 - Attachment is obsolete: true
Attachment #8921788 - Flags: review+
Ion compiler updated (comment 18 comment 28 comment 32 comment 33 comment 34).  I think this addresses all concerns except as noted in previous comments.
Attachment #8916588 - Attachment is obsolete: true
Attachment #8921789 - Flags: review?(bbouvier)
Rename from {L,M}AsmJS{Atomic{Binop,Exchange},CompareExchange}Heap to Wasm ditto.  rs=bbouvier, see earlier comment.
Attachment #8921804 - Flags: review+
Comment on attachment 8921047 [details] [diff] [review]
bug1377576-refactoring-and-prep-v3.patch

Review of attachment 8921047 [details] [diff] [review]:
-----------------------------------------------------------------

Looking so much better, thank you for the patch! The wasm instance methods now look very clean.

::: js/src/builtin/AtomicsObject.cpp
@@ +771,5 @@
> +        w.lower_pri = w.back = &w;
> +        sarb->setWaiters(&w);
> +    }
> +
> +    auto retval = cx->fx.wait(cx, lock.unique(), timeout);

Can you use the explicit type instead of auto, here, please?

@@ +863,5 @@
>  
> +    int64_t woken = 0;
> +
> +    FutexWaiter* waiters = sarb->waiters();
> +    if (waiters && count) {

Not checking anymore for count > 0 here? since it's become an int64, the check would still look valid.

@@ +875,5 @@
> +            // Overflow will be a problem only in two cases:
> +            // (1) 128-bit systems with substantially more than 2^64 bytes of
> +            //     memory per process, and a very lightweight
> +            //     Atomics.waitAsync().  Obviously a future problem.
> +            // (2) Bugs.

I like that this list is exhaustive, implied by the second bullet mostly :)

::: js/src/jit/arm/Simulator-arm.cpp
@@ +2550,5 @@
>      }
>  }
>  
> +static
> +int64_t

nit: `static int64_t` (goes on the same line)

::: js/src/wasm/WasmInstance.cpp
@@ +385,5 @@
> +
> +    int64_t woken = atomics_wake_impl(instance->sharedMemoryBuffer(), offset, int64_t(count));
> +
> +    // Spec is unclear here, and subject to change.  For now, clamp.
> +    int32_t result = int32_t(Min(woken, int64_t(INT32_MAX)));

nit: only used once, maybe just inline it in the return statement?
Attachment #8921047 - Flags: review?(bbouvier) → review+
Comment on attachment 8921789 [details] [diff] [review]
bug1377576-baldr-v2.patch

Review of attachment 8921789 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks!

::: js/src/jit/arm/CodeGenerator-arm.cpp
@@ +3532,5 @@
> +      case AtomicFetchXorOp: masm.atomicFetchXor64(value, addr, tmp, out); break;
> +      default:               MOZ_CRASH();
> +    }
> +}
> +

nit: blank line at end of file

::: js/src/jit/arm/LOpcodes-arm.h
@@ +25,5 @@
>      _(WasmTruncateToInt64)         \
> +    _(WasmAtomicLoadI64)           \
> +    _(WasmAtomicStoreI64)          \
> +    _(WasmCompareExchangeI64)      \
> +    _(WasmAtomicBinopI64)          \

please add WasmAtomicExchangeI64 here too and use it in place of WasmAtomicBinopI64 with no op

::: js/src/jit/x64/CodeGenerator-x64.cpp
@@ +610,3 @@
>  
> +    if (accessType == Scalar::Int64) {
> +        Register64 val = Register64(ToRegister(value));

nit: ToRegister64(value)

::: js/src/jit/x86/CodeGenerator-x86.cpp
@@ +672,5 @@
> +void
> +CodeGeneratorX86::visitWasmAtomicExchangeI64(LWasmAtomicExchangeI64* ins)
> +{
> +    uint32_t offset = ins->access().offset();
> +    MOZ_ASSERT(offset < wasm::OffsetGuardLimit);

nit: can be removed (done in emitWasmStoreOrExchangeAtomicI64)

::: js/src/wasm/WasmIonCompile.cpp
@@ +782,5 @@
> +        // asm.js accesses are always aligned and need no check.
> +        if (env_.isAsmJS() || !access->isAtomic())
> +            return false;
> +
> +        if (base->isConstant()) {

The comment in bug 1410429 lets me think it applies here too: there could be a MWasmAlignmentCheck::foldsTo method that does pretty much the same, with extra knowledge that the compiled amassed during all the optimization passes before GVN. OK to do later, though.

@@ +939,5 @@
>  
> +        checkOffsetAndAlignmentAndBounds(access, &base);
> +
> +        if (isSmallerAccessForI64(result, access)) {
> +            MOZ_ASSERT(!isSignedIntType(access->type()));

nit: can remove, it's asserted in isSmallerAccessForI64

@@ +968,5 @@
>  
> +        checkOffsetAndAlignmentAndBounds(access, &base);
> +
> +        if (isSmallerAccessForI64(result, access)) {
> +            MOZ_ASSERT(!isSignedIntType(access->type()));

ditto
Attachment #8921789 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #38)
> Comment on attachment 8921047 [details] [diff] [review]
> bug1377576-refactoring-and-prep-v3.patch
> 
> Review of attachment 8921047 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> @@ +863,5 @@
> >  
> > +    int64_t woken = 0;
> > +
> > +    FutexWaiter* waiters = sarb->waiters();
> > +    if (waiters && count) {
> 
> Not checking anymore for count > 0 here? since it's become an int64, the
> check would still look valid.

This code is correct but your comment uncovered a bug elsewhere, thank you :)

Notice that we represent the count of waiters to wake as an int64.  "All waiters" is represented as any negative value.  To implement that, we check count!=0 here and at loop exit, and we only decrement the count if it is positive.  In the old code, the count was a double that was always nonnegative, with infinity representing "all waiters", so there count>0 would be the right test.  But the change in semantics (to using int64) is to benefit wasm.  I suppose I could have kept count as a double, so that I could have continued to use infinity, but done's done.

The bug this uncovered is that the entry point from JS just does int64_t(count), where count is a nonnegative double but can be large or even infinite.  If the magnitude of count is too large this is UB (http://en.cppreference.com/w/cpp/language/implicit_conversion, "Floating-integral conversion"), so I need to do better to get predictable behavior.
Blocks: 1413513
(In reply to Benjamin Bouvier [:bbouvier] from comment #39)
> Comment on attachment 8921789 [details] [diff] [review]
> bug1377576-baldr-v2.patch
> 
> ::: js/src/jit/x64/CodeGenerator-x64.cpp
> @@ +610,3 @@
> >  
> > +    if (accessType == Scalar::Int64) {
> > +        Register64 val = Register64(ToRegister(value));
> 
> nit: ToRegister64(value)

Sadly no; ToRegister64 requires an LInt64Allocation, which we don't have (and shouldn't have) here.  The reason for the Register64() cast is the masm interface, which uses the Register64 type for disambiguation.  Otherwise we'd just stick with Register, as most of the 64-bit code does.

> ::: js/src/wasm/WasmIonCompile.cpp
> @@ +782,5 @@
> > +        // asm.js accesses are always aligned and need no check.
> > +        if (env_.isAsmJS() || !access->isAtomic())
> > +            return false;
> > +
> > +        if (base->isConstant()) {
> 
> The comment in bug 1410429 lets me think it applies here too: there could be
> a MWasmAlignmentCheck::foldsTo method that does pretty much the same, with
> extra knowledge that the compiled amassed during all the optimization passes
> before GVN. OK to do later, though.

Filed bug 1413513 to track this opportunity.
Test cases for atomic operations.  Observe:

- verification and parsing were tested by a previous patch on this bug (test-cases-verification-and-conversion)
- inter-agent side effects are tested in connection with bug 1412852 (cloning)
- the memory object functionality is tested in bug 1389464 (shared memory)
- wait and wake operation will be tested by some web-platform-tests (need postMessage and workers)
Attachment #8916589 - Attachment is obsolete: true
Attachment #8924151 - Flags: review?(bbouvier)
Comment on attachment 8924151 [details] [diff] [review]
bug1377576-test-cases-atomics-v2.patch

Review of attachment 8924151 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks and sorry for the long review time! I think the tests are not ideal as is, because they might be hiding side-effects of previous operations. I suggest here another approach with more functions, which shouldn't be too hard to handle with JS generating those tests.

::: js/src/jit-test/tests/wasm/atomic.js
@@ +1,5 @@
>  if (!hasWasmAtomics())
>      quit(0);
>  
> +const oob = /index out of bounds/;
> +const align = /unaligned memory access/;

nit: rename unaligned, maybe?

@@ +195,5 @@
> +	  (drop (${val}.atomic.rmw${view}.or ${addr} (${val}.const 0x67)))
> +	  (drop (${val}.atomic.rmw${view}.xor ${addr} (${val}.const 0xFF)))
> +	  (${val}.atomic.rmw${view}.xchg ${addr} (${val}.const 1))
> +	  (${val}.atomic.load${view} ${addr})
> +	  ${val}.add)

How about having a single function per opcode and test its result? In this case, it's hard to glimpse at the code and be convinced that the result of this test isn't only conditioned by the last few opcodes; that is, whatever we'd put before, would we get the same returned result or not? Also, it makes it harder to keep track of what the expected value should be after each operation. Not saying these computations are hard per se; just saying they're less obvious than what an assertNum/assertEq would provide.

@@ +237,5 @@
> +	      (${val}.atomic.store${view} ${addr} (${val}.const ${init}))
> +	      (${val}.atomic.rmw${view}.${op} ${addr} (${val}.const ${operand}))
> +	      (${val}.atomic.load${view} ${addr})
> +	      ${val}.add)
> +	     (export "f" 0))`;

Ditto here. There are many ways to have an implementation testing precisely what you want: we could store results in globals, and then have getters read these globals; or have wasm call into an FFI that does the assertions, etc.

@@ +339,5 @@
> +
> +	// Both out-of-bounds and unaligned.  The spec may leave it undefined
> +	// whether we see the OOB message or the unaligned message.  In Firefox,
> +	// the unaligned check comes first.  It doesn't have to be this way: On
> +	// an ARM with strict alignment checking we'll get a SEGV either way and

I don't know how atomic accesses behave in the real world, but for non-atomic accesses, a SIGBUS will provided on unaligned accesses, not a SEGV.

@@ +354,5 @@
> +		assertErrorMessage(() => opModule(type, ext, offset, op)(base), RuntimeError, re);
> +		assertErrorMessage(() => opModuleForEffect(type, ext, offset, op)(base), RuntimeError, re);
> +	    }
> +	    assertErrorMessage(() => cmpxchgModule(type, ext, offset)(base), RuntimeError, re);
> +	}

I like these tests.

@@ +388,5 @@
> +assertErrorMessage(() => wasmEvalText(`(module (memory 1 1 shared)
> +					(func (param i32) (result i32)
> +					 (wake (get_local 0) (i32.const 1)))
> +					(export "" 0))`).exports[""](65536),
> +		   RuntimeError, oob);

Test unaligned for wake too?
Attachment #8924151 - Flags: review?(bbouvier) → feedback+
(In reply to Benjamin Bouvier [:bbouvier] from comment #43)
> Comment on attachment 8924151 [details] [diff] [review]
> bug1377576-test-cases-atomics-v2.patch
> 
> Review of attachment 8924151 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +237,5 @@
> > +	      (${val}.atomic.store${view} ${addr} (${val}.const ${init}))
> > +	      (${val}.atomic.rmw${view}.${op} ${addr} (${val}.const ${operand}))
> > +	      (${val}.atomic.load${view} ${addr})
> > +	      ${val}.add)
> > +	     (export "f" 0))`;
> 
> Ditto here. There are many ways to have an implementation testing precisely
> what you want: we could store results in globals, and then have getters read
> these globals; or have wasm call into an FFI that does the assertions, etc.

I understand (and agree with) your objection to the other rmw cases, but here I'm mystified, since these are quite simple and essentially impossible to misunderstand.  But I'll see what I can do.

> @@ +388,5 @@
> > +assertErrorMessage(() => wasmEvalText(`(module (memory 1 1 shared)
> > +					(func (param i32) (result i32)
> > +					 (wake (get_local 0) (i32.const 1)))
> > +					(export "" 0))`).exports[""](65536),
> > +		   RuntimeError, oob);
> 
> Test unaligned for wake too?

Yeah, I was auditing this patch late last week and found that wake has an obscure and technically unnecessary alignment requirement.  (Also fixed in the verification tests, and in the code, though those are in different patches.)
Updates the test cases.  I believe this addresses all your concerns; it takes a while longer to run but it tests many more combinations of arguments, so overall this seems like a significant improvement.
Attachment #8924151 - Attachment is obsolete: true
Attachment #8925571 - Flags: review?(bbouvier)
Comment on attachment 8925571 [details] [diff] [review]
bug1377576-test-cases-atomics-v3.patch

Review of attachment 8925571 [details] [diff] [review]:
-----------------------------------------------------------------

Some Linus T. programmer once said "If you need more than 3 levels of indentation, you're screwed anyway, and should fix your program.", but I guess this person didn't test as extensively as this patch does. Thanks for the patch, looks good to me; feel free to include correctness tests for load/store too if they're not somewhere else already.

::: js/src/jit-test/tests/wasm/atomic.js
@@ +360,5 @@
> +	       (${type}.const 0))
> +	      (export "" 0))`).exports[""];
> +    },
> +
> +    storeModule(type, ext, offset) {

We don't have correctness tests for loadModule/loadModuleIgnored/storeModule anymore, right? Or they're in a previous patch maybe?
Attachment #8925571 - Flags: review?(bbouvier) → review+
Depends on: 1416206
(In reply to Benjamin Bouvier [:bbouvier] from comment #46)
> Comment on attachment 8925571 [details] [diff] [review]
> bug1377576-test-cases-atomics-v3.patch
> 

> ::: js/src/jit-test/tests/wasm/atomic.js
> @@ +360,5 @@
> > +	       (${type}.const 0))
> > +	      (export "" 0))`).exports[""];
> > +    },
> > +
> > +    storeModule(type, ext, offset) {
> 
> We don't have correctness tests for loadModule/loadModuleIgnored/storeModule
> anymore, right? Or they're in a previous patch maybe?

By correctness tests, you mean validation, text <-> binary, and so on?  They are in this file, but they are in a separate patch, "bug1377576-test-cases-verification-and-conversion-v2.patch".

(If you had something else in mind, please let me know.)
(In reply to Lars T Hansen [:lth] from comment #47)
> (In reply to Benjamin Bouvier [:bbouvier] from comment #46)
> > Comment on attachment 8925571 [details] [diff] [review]
> > bug1377576-test-cases-atomics-v3.patch
> > 
> 
> > ::: js/src/jit-test/tests/wasm/atomic.js
> > @@ +360,5 @@
> > > +	       (${type}.const 0))
> > > +	      (export "" 0))`).exports[""];
> > > +    },
> > > +
> > > +    storeModule(type, ext, offset) {
> > 
> > We don't have correctness tests for loadModule/loadModuleIgnored/storeModule
> > anymore, right? Or they're in a previous patch maybe?
> 
> By correctness tests, you mean validation, text <-> binary, and so on?  They
> are in this file, but they are in a separate patch,
> "bug1377576-test-cases-verification-and-conversion-v2.patch".
> 
> (If you had something else in mind, please let me know.)

I meant testing that these functions behave correctly at times where they're expected to behave correctly (i.e. in bounds and aligned). Probably other tests already implicitly rely on this, but having separate tests would be a nice-to-have. As you prefer.
Blocks: 1416723
(In reply to Benjamin Bouvier [:bbouvier] from comment #48)
> (In reply to Lars T Hansen [:lth] from comment #47)
> 
> I meant testing that these functions behave correctly at times where they're
> expected to behave correctly (i.e. in bounds and aligned). Probably other
> tests already implicitly rely on this, but having separate tests would be a
> nice-to-have. As you prefer.

Oh, interesting oversight on my part.  We *mostly* have tests for these since we *mostly* use the existing load and store paths for the atomic operations and generate identical code (apart from memory barrier instructions), but there are entirely new paths for 64-bit atomic load and store on 32-bit systems where we have to use CMPXCHG8B on x86 and a LDREXD/STREXD loop on ARM.

I'll write some tests; if they turn out to be hairy I'll put them up for a separate review.
Also xchg has very poor coverage in the current test suite.
Also, we were only testing one-byte values.  Adding widening of the operand to the TA width found a bug in the x64 masm, where a movl should have been a movq.
And testing xchg better found a (probably old) bug on ARM, where we were too aggressive with useRegisterAtStart - this is a poor fit for the atomic ops, since they tend to involve LL/SC loops.
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a80521fee641
Define atomic ops, add to verifier and test cases, stub out in compilers. r=sunfish
https://hg.mozilla.org/integration/mozilla-inbound/rev/21fb0572b28c
Define text-to-binary machinery. r=sunfish
https://hg.mozilla.org/integration/mozilla-inbound/rev/e13ccb86b417
Define binary-to-text machinery. r=sunfish
https://hg.mozilla.org/integration/mozilla-inbound/rev/37c0194e2b9a
Test cases for thread ops: verification, text-to-binary, binary-to-text. r=sunfish
https://hg.mozilla.org/integration/mozilla-inbound/rev/de907393db02
Assembler/MacroAssembler support for wasm atomics. r=sunfish
https://hg.mozilla.org/integration/mozilla-inbound/rev/6d224bf532b5
Preparatory refactoring and extensions for wasm atomics. r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/443ff6e4b7f1
Baseline support for wasm atomics. r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/cc6c341c68f8
Ion support for wasm atomics. r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/6415237dbdb0
Test cases for wasm atomics. r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/ff11b23cf898
Rename from AsmJSAtomic etc to WasmAtomic etc, rs=bbouvier
No longer depends on: 1516738
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: