Closed Bug 1073096 Opened 6 years ago Closed 5 years ago

Atomics and locks for SharedArrayBuffer: asm.js

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 3 obsolete files)

Implementation item: Odin and asm.js support for the Atomics methods.

This bug was forked from bug 979594 because the discussion there was getting unwieldy.  For comments on early versions of the patches that were posted there, see these:

https://bugzilla.mozilla.org/show_bug.cgi?id=979594#c37
https://bugzilla.mozilla.org/show_bug.cgi?id=979594#c44
Changes to OdinMonkey + assembler bits for x86, x64, and "none".  Somewhat updated version of the patch that was previously hanging off bug 979594.
Attachment #8495418 - Flags: review?(luke)
Attached patch bug979594-AtomicsOdinARM.diff (obsolete) — Splinter Review
ARM assembler bits for the OdinMonkey changes.
Attachment #8495419 - Flags: review?(luke)
(Oops, bug numbers are wrong in patch headings and file names, consider it fixed.)
Comment on attachment 8495419 [details] [diff] [review]
bug979594-AtomicsOdinARM.diff

Review of attachment 8495419 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/CodeGenerator-arm.cpp
@@ +1888,5 @@
>        case Scalar::Float32: isFloat = true;   size = 32; break;
>        default: MOZ_CRASH("unexpected array type");
>      }
>  
> +    memoryBarrier(mir->barrierBefore());

Adding barriers to heap accesses does not look like the right approach - if that is what this is doing. These are very performance critical operations.
(In reply to Douglas Crosher [:dougc] from comment #4)
> Comment on attachment 8495419 [details] [diff] [review]
> bug979594-AtomicsOdinARM.diff
> 
> Review of attachment 8495419 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/arm/CodeGenerator-arm.cpp
> @@ +1888,5 @@
> >        case Scalar::Float32: isFloat = true;   size = 32; break;
> >        default: MOZ_CRASH("unexpected array type");
> >      }
> >  
> > +    memoryBarrier(mir->barrierBefore());
> 
> Adding barriers to heap accesses does not look like the right approach - if
> that is what this is doing. These are very performance critical operations.

It is the right approach, but there is a bug in the implementation of the memoryBarrier() function so that it does not properly quash the barrier if the flags are all zero (which it will be for all normal heap accesses).  See the ARM patch hanging off bug 979594.
Comment on attachment 8495418 [details] [diff] [review]
bug979594-AtomicsOdinSupport.diff

Review of attachment 8495418 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is beautiful; I really appreciate how it matches the overall Odin style.  Mostly just nits, but a few comments aren't so I'd appreciate getting to review the revised patch.

::: js/src/asmjs/AsmJSModule.h
@@ +236,5 @@
>      class Global
>      {
>        public:
> +        enum Which { Variable, FFI, ArrayView, SharedArrayView, MathBuiltinFunction, AtomicsBuiltinFunction,
> +                     Constant, SimdCtor, SimdOperation};

I think this has to be rewrapped to 100 chars.

::: js/src/asmjs/AsmJSValidate.cpp
@@ +1301,5 @@
>          unsigned column;
>      };
>  
>      typedef HashMap<PropertyName*, MathBuiltin> MathNameMap;
> +    typedef HashMap<PropertyName*, AtomicsBuiltin> AtomicsNameMap;

I was just about to say "I appreciate you following the style of MathBuiltin here, but how about just using AsmJSAtomicsBuiltinFunction as the value of this map and removing AtomicsBuiltin", but then I remembered that there might actually be Atomics constants if we ever include the futex API into stdlib (Atomics.TIMEDOUT et al).  Is this your intention?

@@ +2716,5 @@
> +        if (inDeadCode())
> +            return nullptr;
> +        MAsmJSLoadHeap *load = MAsmJSLoadHeap::New(alloc(), vt, ptr,
> +                                                   0,
> +                                                   MembarLoadLoad|MembarLoadStore);

Rather than having these barrierBefore/After flag fields, could both MAsmJSLoad/StoreHeap have enum { UNORDERED, SEQ } (or something) parameters?  I'm having a hard time mapping from the abstract requirement that "Atomics.load is done atomically" and why the exact use of Membar here and in Store is the right one; the abstraction isn't very clear to me.  Having the CodeGenerator take care of this (where I can reason in terms of the underlying arch) would I think be easier to understand.

@@ +2718,5 @@
> +        MAsmJSLoadHeap *load = MAsmJSLoadHeap::New(alloc(), vt, ptr,
> +                                                   0,
> +                                                   MembarLoadLoad|MembarLoadStore);
> +        curBlock_->add(load);
> +        // Signal handler must execute a fence.

This comment is a bit cryptic/scary if I hadn't recalled a conversation we had about this elsewhere.  Perhaps you could either leave it, move it to the signal-handler code that does the fence, or explain in more detail (and have the other 3 comments refer to this one)?

@@ +4224,1 @@
>          return f.fail(viewName, "base of array access must be a typed array view name");

How about adding a ModuleCompiler::Global::isAnyArrayView() (otherwise we'll also need ugly curlies around the then-branch :)

@@ +4614,5 @@
> +{
> +    if (!CheckArrayAccess(f, viewName, indexExpr, viewType, pointerDef, needsBoundsCheck))
> +        return false;
> +
> +    // Atomic accesses only on shared integer arrays.

How about "Atomic accesses may only be made of shared integer arrays."?

@@ +4765,5 @@
> +    *type = Type::Signed;
> +    return true;
> +}
> +
> +// On return, *type will be Signed or Void

If noone depends on this invariant (which will be esp true after CheckCoercedAtomicsBuiltinCall reuses CoerceResult), I'd remove the comment.

@@ +4787,5 @@
> +
> +static bool
> +CheckCoercedAtomicsBuiltinCall(FunctionCompiler &f, ParseNode *callNode,
> +                               AsmJSAtomicsBuiltinFunction func, RetType retType,
> +                               MDefinition **def, Type *type)

I think you wrote this patch in the middle of some call-coercion changes and refactoring.  You should be able to reuse CoerceResult() now in this function to avoid the manual retType type-case (see CheckCoercedMathBuiltinCall and CheckCoercedSimdCall).

::: js/src/jit-test/tests/asm.js/testAtomics.js
@@ +22,5 @@
> +
> +    // Load element 0
> +    function do_load() {
> +	var v = 0;
> +	v = atomic_load(i32a, 0)|0;

To wit, stdlib (SIMD, Math, Atomic) calls no longer require callsite coercions; the return value is inferred from the name of the op (as one might've initially hoped :).

::: js/src/jit/MIR.h
@@ +11301,5 @@
>    public:
>      INSTRUCTION_HEADER(AsmJSLoadHeap);
>  
> +    static MAsmJSLoadHeap *New(TempAllocator &alloc, Scalar::Type vt, MDefinition *ptr,
> +                               int barrierBefore=0, int barrierAfter=0)

SM style is spaces before and after operators (= in this case).  Also for MAsmJSStoreHeap::New.

::: js/src/jit/shared/Lowering-x86-shared.cpp
@@ +528,5 @@
> +    // For the 8-bit variants XADD needs a byte register for the
> +    // output only, we can still set up with movl; just pin the output
> +    // to eax (or ebx / ecx / edx).
> +    //
> +    // For AND/OR/XOR we need to use a CMPXCHG loop:

Perhaps a naive question here: the Intel manual seems to indicate that the LOCK prefix can be used for AND, OR and XOR.  Why can they not be used?
Attachment #8495418 - Flags: review?(luke)
(In reply to Luke Wagner [:luke] from comment #6)
> Comment on attachment 8495418 [details] [diff] [review]
> bug979594-AtomicsOdinSupport.diff
> 
> Review of attachment 8495418 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/asmjs/AsmJSValidate.cpp
> @@ +1301,5 @@
> >          unsigned column;
> >      };
> >  
> >      typedef HashMap<PropertyName*, MathBuiltin> MathNameMap;
> > +    typedef HashMap<PropertyName*, AtomicsBuiltin> AtomicsNameMap;
> 
> I was just about to say "I appreciate you following the style of MathBuiltin
> here, but how about just using AsmJSAtomicsBuiltinFunction as the value of
> this map and removing AtomicsBuiltin", but then I remembered that there
> might actually be Atomics constants if we ever include the futex API into
> stdlib (Atomics.TIMEDOUT et al).  Is this your intention?

Heh.  Originally there was a provision for constants, because I had intended to inline TIMEDOUT et al.  But that need disappeared because the futex functions are most easily reached via the FFI, and Jukka has another technique for exposing the constant values.  So this is really an oversight, it is code that should have been reverted further than it was, exactly as you suggest.

> 
> @@ +2716,5 @@
> > +        if (inDeadCode())
> > +            return nullptr;
> > +        MAsmJSLoadHeap *load = MAsmJSLoadHeap::New(alloc(), vt, ptr,
> > +                                                   0,
> > +                                                   MembarLoadLoad|MembarLoadStore);
> 
> Rather than having these barrierBefore/After flag fields, could both
> MAsmJSLoad/StoreHeap have enum { UNORDERED, SEQ } (or something) parameters?
> I'm having a hard time mapping from the abstract requirement that
> "Atomics.load is done atomically" and why the exact use of Membar here and
> in Store is the right one; the abstraction isn't very clear to me.  Having
> the CodeGenerator take care of this (where I can reason in terms of the
> underlying arch) would I think be easier to understand.

I won't do quite that, because I think it'll add more complexity than it will take away, but I'll introduce useful abstractions of these bit combinations so that it's easier to specify what's needed.  (For example, there will be predefined bit combinations for BeforeLoad and AfterLoad, ditto for store).

> @@ +4787,5 @@
> > +
> > +static bool
> > +CheckCoercedAtomicsBuiltinCall(FunctionCompiler &f, ParseNode *callNode,
> > +                               AsmJSAtomicsBuiltinFunction func, RetType retType,
> > +                               MDefinition **def, Type *type)
> 
> I think you wrote this patch in the middle of some call-coercion changes and
> refactoring.  

You're right!  It was hell, every update resulted in conflicts.

> You should be able to reuse CoerceResult() now in this
> function to avoid the manual retType type-case (see
> CheckCoercedMathBuiltinCall and CheckCoercedSimdCall).

I will look into that.

> ::: js/src/jit/shared/Lowering-x86-shared.cpp
> @@ +528,5 @@
> > +    // For the 8-bit variants XADD needs a byte register for the
> > +    // output only, we can still set up with movl; just pin the output
> > +    // to eax (or ebx / ecx / edx).
> > +    //
> > +    // For AND/OR/XOR we need to use a CMPXCHG loop:
> 
> Perhaps a naive question here: the Intel manual seems to indicate that the
> LOCK prefix can be used for AND, OR and XOR.  Why can they not be used?

They can, but only as a special case optimization (though the special case may be the common case).  Further explained in bug 979594 comment 55.  The intent is to implement that optimization eventually.
(In reply to Lars T Hansen [:lth] from comment #7)
> I won't do quite that, because I think it'll add more complexity than it
> will take away, but I'll introduce useful abstractions of these bit
> combinations so that it's easier to specify what's needed.  (For example,
> there will be predefined bit combinations for BeforeLoad and AfterLoad,
> ditto for store).

What I'm trying to understand is why, if there are only two ultimate configurations of MAsmJSLoadHeap (synchronized and not) why we need to have the generic before/after flags?  (Same goes for MAsmJSStoreHeap.)
(In reply to Luke Wagner [:luke] from comment #8)
> (In reply to Lars T Hansen [:lth] from comment #7)
> > I won't do quite that, because I think it'll add more complexity than it
> > will take away, but I'll introduce useful abstractions of these bit
> > combinations so that it's easier to specify what's needed.  (For example,
> > there will be predefined bit combinations for BeforeLoad and AfterLoad,
> > ditto for store).
> 
> What I'm trying to understand is why, if there are only two ultimate
> configurations of MAsmJSLoadHeap (synchronized and not) why we need to have
> the generic before/after flags?  (Same goes for MAsmJSStoreHeap.)

The point of the generic before/after flags is that those bits will eventually have to be synthesized anyway because they are feeding into the code generator's emitter for barrier instructions (the barriers are separated from the load/store in the Lowering phase).  Generating those bits at a fairly high level means the mapping does not have to be repeated in each architecture's lowering pass, which is what would otherwise happen.

The names of the barriers (LoadLoad etc) are well established and IMO it's just the specific combination of bits going with the specific part of an operation that can be a bit surprising, which is why I suggested that I abstract the bit sets into predefined values that go with their specific use.
(In reply to Lars T Hansen [:lth] from comment #9)
I see, that makes sense.
Comment on attachment 8495419 [details] [diff] [review]
bug979594-AtomicsOdinARM.diff

Probably should give this review to mjrosenb or dougc.
Attachment #8495419 - Flags: review?(luke)
I believe this addresses all the review comments.  The base patches from bug 979594 have already landed.
Attachment #8495418 - Attachment is obsolete: true
Attachment #8495419 - Attachment is obsolete: true
Attachment #8511000 - Flags: review?(luke)
Attachment #8511002 - Flags: review?(dtc-moz)
Comment on attachment 8511002 [details] [diff] [review]
bug1073096-AtomicsOdinARM.diff

Review of attachment 8511002 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

::: js/src/jit/arm/CodeGenerator-arm.cpp
@@ +2043,5 @@
> +        Register out = ToRegister(ins->output());
> +        maybeCmpOffset = bo.getOffset();
> +        masm.ma_b(&goahead, Assembler::LessThan);
> +        memoryBarrier(MembarFull);
> +        masm.as_eor(out,out,O2Reg(out));

nit: spaces after the commas.

@@ +2049,5 @@
> +        masm.bind(&goahead);
> +    }
> +    masm.compareExchangeToTypedIntArray(vt == Scalar::Uint32 ? Scalar::Int32 : vt,
> +                                        srcAddr, oldval, newval, InvalidReg,
> +                                        ToAnyRegister(ins->output()));

I am a little curious why ToAnyRegister is needed here, but can explore it myself later.

@@ +2079,5 @@
> +        Register out = ToRegister(ins->output());
> +        maybeCmpOffset = bo.getOffset();
> +        masm.ma_b(&goahead, Assembler::LessThan);
> +        memoryBarrier(MembarFull);
> +        masm.as_eor(out,out,O2Reg(out));

nit: spaces after the commas.
Attachment #8511002 - Flags: review?(dtc-moz) → review+
Comment on attachment 8511000 [details] [diff] [review]
bug1073096-AtomicsOdinSupport.diff

Review of attachment 8511000 [details] [diff] [review]:
-----------------------------------------------------------------

Mostly nits, but I think I found one or two bugs.

::: js/src/asmjs/AsmJSValidate.cpp
@@ +2757,5 @@
>          if (chk == NO_BOUNDS_CHECK || m().usesSignalHandlersForOOB())
>              store->setSkipBoundsCheck(true);
>      }
>  
> +    // The type is a bitwise 'or' of the barrier types in MMemoryBarrier

Instead of passing around ints (which are relatively untyped and require comments like this), what about passing around MMemoryBarrier and defining operator| and operator& for this type?

@@ +2779,5 @@
> +        // the Atomics spec says that any access that does not throw -
> +        // and out-of-range accesses do not throw - must execute the
> +        // fence, so as to not make synchronization dependent on
> +        // whether the access is in range or not.  Also see code in
> +        // AsmJSSignalHandlers.cpp.

IIUC, the load/store cases are fine even without the signal handler fence: if they trap, then the signal handler will advance pc to right after the load/store, which is the fence instruction.  IIUC, it is only cas and the atomic binops where faulting will skip the atomic op.  Anyhow, it seems like this comment would be better placed in CodeGeneratorX64::visit(CompareExchangeHeap|AtomicBinopHeap).

@@ +3724,1 @@
>          *type = Scalar::Int8;

I'd rather not have a function interface that relies on the caller having set *shared = false.  Could you do that at the beginning of the function instead?

@@ +4707,5 @@
>  }
>  
> +static bool
> +CheckSharedArrayAtomicAccess(FunctionCompiler &f, ParseNode *viewName, ParseNode *indexExpr,
> +                             Scalar::Type *viewType, MDefinition** pointerDef, NeedsBoundsCheck *needsBoundsCheck)

'needsBoundsCheck' needs to go on a third line.

@@ +4716,5 @@
> +    // Atomic accesses may be made on shared integer arrays only.
> +
> +    // The global will be sane, CheckArrayAccess checks it.
> +    const ModuleCompiler::Global *global = f.lookupGlobal(viewName->name());
> +    if (global->which() != ModuleCompiler::Global::SharedArrayView)

I think it would be a little strange if we allow both shared and unshared views in an asm.js module.  For one thing, it couldn't possibly validate.  (On that subject, it seems like we're missing a link-time validation check that the given ArrayBuffer's shared-ness matches the view's shared-ness.)

Given all that, could we instead have another flag on AsmJSModule (complementary to hasArrayView_) that indicates the views are shared?  Then, we'd only allow adding views of the same shared-ness, after the first view was added.  In that case, we could avoid introducing a whole separate path for "shared" array views in AsmJSModule, rather, we'd just have "views" (shared or unshared); when anyone cared about sharedness, they'd check the module 'shared views' flag.

@@ +5777,5 @@
> +                               AsmJSAtomicsBuiltinFunction func, RetType retType,
> +                               MDefinition **def, Type *type)
> +{
> +    MDefinition *operand;
> +    Type actualRetType;

Could you name these resultDef/resultType?

::: js/src/jit/x64/CodeGenerator-x64.cpp
@@ +438,5 @@
> +    if (value->isConstant())
> +        masm.atomicBinopToTypedIntArray(op, vt == Scalar::Uint32 ? Scalar::Int32 : vt, Imm32(ToInt32(value)), srcAddr, temp, InvalidReg, ToAnyRegister(ins->output()));
> +    else
> +        masm.atomicBinopToTypedIntArray(op, vt == Scalar::Uint32 ? Scalar::Int32 : vt, ToRegister(value), srcAddr, temp, InvalidReg, ToAnyRegister(ins->output()));
> +    uint32_t after = masm.size();

How is out-of-bounds supposed to work here (esp in the bitop case)?  I was thinking that on fix might be to, until we try to really optimize atomic binops, always do a bounds check.  (Note: this makes the patching logic in AsmJSModule::initHeap, since we can't just use usesSignalHandlersForOOB() anymore.)  Otherwise, we'd need to capture the offsets of the individual operations.

Also, it'd be good to add some out-of-bounds tests that would catch this.

::: js/src/jit/x86/CodeGenerator-x86.cpp
@@ +383,5 @@
>          // base address is known during dynamic linking (AsmJSModule::initHeap).
>          PatchedAbsoluteAddress srcAddr((void *) ptr->toConstant()->toInt32());
> +        bool r = loadAndNoteViewTypeElement(vt, srcAddr, out);
> +        memoryBarrier(ins->mir()->barrierAfter());
> +        return r;

loadAndNoteViewTypeElement always returns 'true' (it only returns 'bool' as a convenience so code can be written 'return loadAndNoteViewTypeElement', but that's probably a silly reason).  So it seems like you'd either ignore the return value or write:  if (!loadAndNoteViewTypeElement(...)) return false;', since there is no reason that we must always call memoryBarrier().

@@ +392,5 @@
>  
> +    if (mir->skipBoundsCheck()) {
> +        bool r = loadAndNoteViewTypeElement(vt, srcAddr, out);
> +        memoryBarrier(ins->mir()->barrierAfter());
> +        return r;

ditto

@@ +537,5 @@
> +    // Add in the actual heap pointer explicitly, to avoid opening up
> +    // the abstraction that is compareExchangeToTypedIntArray at this time.
> +    masm.addl_wide(Imm32(0), ptrReg);
> +    uint32_t after = masm.size();
> +    masm.append(AsmJSHeapAccess(after-4, after, maybeCmpOffset));

I know that the first parameter doesn't really have a meaning in this case -- it's just used to produce deltas with the other two offsets -- but it's kinda weird to pass in (after-4); that points into the middle of the (5-byte) instruction.  I'd either keep it regular (put a 'uint32_t before = masm.size();' before addl_wide) or make the point that it doesn't matter by passing 'after'.  (Same comment applies to visitAsmJSAtomicBinopHeap, and with CodeGeneratorX64.)

@@ +540,5 @@
> +    uint32_t after = masm.size();
> +    masm.append(AsmJSHeapAccess(after-4, after, maybeCmpOffset));
> +
> +    Address memAddr(ToRegister(ptr), 0);
> +    masm.compareExchangeToTypedIntArray(vt == Scalar::Uint32 ? Scalar::Int32 : vt, memAddr, oldval, newval, InvalidReg, ToAnyRegister(ins->output()));

This line is too long.  You could shorten it by hoisting out some stuff, but I was thinking anyway that it'd be nice to have a compareExchangeAsmJS that:
 - took care of the Uint32->Int32 with a comment even to explain why AsmJS is special
 - passed the InvalidReg, again with comment to explain why, by avoiding the Uint32 case, we don't need a second temp.
(Same comment applies to atomicBinopToTypedIntArray.)
Attachment #8511000 - Flags: review?(luke)
(In reply to Luke Wagner [:luke] from comment #15)
> Comment on attachment 8511000 [details] [diff] [review]
> bug1073096-AtomicsOdinSupport.diff
> 
> Review of attachment 8511000 [details] [diff] [review]:
> -----------------------------------------------------------------

This is pushback / disagreement / discussion; for the rest just assume I will deal with it one way or another.

> ::: js/src/asmjs/AsmJSValidate.cpp
> @@ +2757,5 @@
> >          if (chk == NO_BOUNDS_CHECK || m().usesSignalHandlersForOOB())
> >              store->setSkipBoundsCheck(true);
> >      }
> >  
> > +    // The type is a bitwise 'or' of the barrier types in MMemoryBarrier
> 
> Instead of passing around ints (which are relatively untyped and require
> comments like this), what about passing around MMemoryBarrier and defining
> operator| and operator& for this type?

I don't see how that would fix anything here, since this comment is on the code that creates the MMemoryBarrier and its argument is an argument to the creation of MMemoryBarrier.

Of course it's possible to cook up an ADT to represent sets of possible memory barrier bits, instead of using an integer bit vector.  Given that most uses will just reference the predefined and named bit vectors that already exist I'm not sure there's much value, unless we have a simple bit-vector type floating around already that can be reused.

> @@ +2779,5 @@
> > +        // the Atomics spec says that any access that does not throw -
> > +        // and out-of-range accesses do not throw - must execute the
> > +        // fence, so as to not make synchronization dependent on
> > +        // whether the access is in range or not.  Also see code in
> > +        // AsmJSSignalHandlers.cpp.
> 
> IIUC, the load/store cases are fine even without the signal handler fence:
> if they trap, then the signal handler will advance pc to right after the
> load/store, which is the fence instruction.

Interesting point.  The next instruction is the fence instruction at the moment.  I'm less certain that it will be the fence instruction if we start optimizing the fences.  (Maybe that's too much blue-sky, but it is desirable, esp for asm.js.)

> ::: js/src/jit/x64/CodeGenerator-x64.cpp
> @@ +438,5 @@
> > +    if (value->isConstant())
> > +        masm.atomicBinopToTypedIntArray(op, vt == Scalar::Uint32 ? Scalar::Int32 : vt, Imm32(ToInt32(value)), srcAddr, temp, InvalidReg, ToAnyRegister(ins->output()));
> > +    else
> > +        masm.atomicBinopToTypedIntArray(op, vt == Scalar::Uint32 ? Scalar::Int32 : vt, ToRegister(value), srcAddr, temp, InvalidReg, ToAnyRegister(ins->output()));
> > +    uint32_t after = masm.size();
> 
> How is out-of-bounds supposed to work here (esp in the bitop case)?  I was
> thinking that on fix might be to, until we try to really optimize atomic
> binops, always do a bounds check.  (Note: this makes the patching logic in
> AsmJSModule::initHeap, since we can't just use usesSignalHandlersForOOB()
> anymore.)  Otherwise, we'd need to capture the offsets of the individual
> operations.

The intent was for the bounds checking always to be enabled for these (and compareExchange).  That does not seem to be implemented properly.

> @@ +537,5 @@
> > +    // Add in the actual heap pointer explicitly, to avoid opening up
> > +    // the abstraction that is compareExchangeToTypedIntArray at this time.
> > +    masm.addl_wide(Imm32(0), ptrReg);
> > +    uint32_t after = masm.size();
> > +    masm.append(AsmJSHeapAccess(after-4, after, maybeCmpOffset));
> 
> I know that the first parameter doesn't really have a meaning in this case
> -- it's just used to produce deltas with the other two offsets -- but it's
> kinda weird to pass in (after-4); that points into the middle of the
> (5-byte) instruction.  I'd either keep it regular (put a 'uint32_t before =
> masm.size();' before addl_wide) or make the point that it doesn't matter by
> passing 'after'.  (Same comment applies to visitAsmJSAtomicBinopHeap, and
> with CodeGeneratorX64.)

OK.  I think the code ended up that way after some strange failures, but it could have been when I was still trying to do bounds checking in-line.  Since that may have to come back, I'll have to reexamine here.
(In reply to Lars T Hansen [:lth] from comment #16)
> I don't see how that would fix anything here, since this comment is on the
> code that creates the MMemoryBarrier and its argument is an argument to the
> creation of MMemoryBarrier.

If the type was MemoryBarrierBits, a comment would not be necessary; that's the "fix": types replacing comments.

> Of course it's possible to cook up an ADT to represent sets of possible
> memory barrier bits, instead of using an integer bit vector.

I don't think you need to cook up a full ADT; I was just thinking adding:
  MemoryBarrierBits operator|(MemoryBarrierBits, MemoryBarrierBits);
  MemoryBarrierBits operator&(MemoryBarrierBits, MemoryBarrierBits);
so that you could basically s/int/MemoryBarrierBits/ and everything else would work the same.
(In reply to Lars T Hansen [:lth] from comment #16)
> (In reply to Luke Wagner [:luke] from comment #15)
> > @@ +2779,5 @@
> > > +        // the Atomics spec says that any access that does not throw -
> > > +        // and out-of-range accesses do not throw - must execute the
> > > +        // fence, so as to not make synchronization dependent on
> > > +        // whether the access is in range or not.  Also see code in
> > > +        // AsmJSSignalHandlers.cpp.
> > 
> > IIUC, the load/store cases are fine even without the signal handler fence:
> > if they trap, then the signal handler will advance pc to right after the
> > load/store, which is the fence instruction.
> 
> Interesting point.  The next instruction is the fence instruction at the
> moment.  I'm less certain that it will be the fence instruction if we start
> optimizing the fences.  (Maybe that's too much blue-sky, but it is
> desirable, esp for asm.js.)

On that note, I've been writing some demo code and the core of a barrier sync looks like this:

	const seq = Atomics.load(this.iab, this.seqLoc);
	if (Atomics.sub(this.iab, this.counterLoc, 1) == 1)
            ...;
	Atomics.futexWait(this.iab, this.seqLoc, seq, Number.POSITIVE_INFINITY);

Since we're sequentially consistent there's a fence after the load and another before the sub (on ARM, anyway - a DMB instruction in either case).  One of those could probably be removed fairly easily, it's straight-line code, modulo the range checks.

I mention this mostly because I see the pattern of back-to-back Atomics operations here and there, suggesting that we do want to look into optimization eventually.
Ah, interesting.  Do you think it would make sense to have separate MIR nodes for fences?
(In reply to Luke Wagner [:luke] from comment #19)
> Ah, interesting.  Do you think it would make sense to have separate MIR
> nodes for fences?

It probably does make sense, with a pass that discards redundant nodes.  I might file a followup bug for that.  I might also want some more experience using the Atomics before I dig into it in earnest.

I know LLVM does some of this (it was mentioned in a brownbag a couple of weeks back as being valuable) and the JSR-133 cookbook page has some high-level notes about it.
This is a simple preliminary patch, it just introduces global operators for operating on MemoryBarrierBits and replaces int throughout the code base by MemoryBarrierBits.
Attachment #8522991 - Flags: review?(luke)
I think this, along with the previous patch, addresses all concerns from the previous review, but there are three points to make:

- I introduced a flag for 'shared view' in AsmJSModule as suggested; please
  check that the resulting logic in AsmJSValidate is in line with what you
  expect.

- Since compareExchange and atomicBinop operations now require range checking
  to be enabled for those operations, the code in initHeap unconditionally
  steps through the array of patch addresses looking for range checking 
  patches.  I considered trying to elaborate the guard by communicating a flag
  from AsmJSValidate back to initHeap, but decided against it.

- I decided against introducing special asm.js abstractions in masm for
  compareExchange and atomicBinop, since the concerns are very local and
  it did not seem worth it (at the moment).
Attachment #8511000 - Attachment is obsolete: true
Attachment #8522993 - Flags: review?(luke)
Comment on attachment 8522991 [details] [diff] [review]
Replace int by MemoryBarrierBits

You may want to check out mozilla/EnumSet.h.
Comment on attachment 8522991 [details] [diff] [review]
Replace int by MemoryBarrierBits

Review of attachment 8522991 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thanks!
Attachment #8522991 - Flags: review?(luke) → review+
Comment on attachment 8522993 [details] [diff] [review]
Odin support for Atomics, including x86, x64, and None

Review of attachment 8522993 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!  Thanks for addressing all the comments.  A few minor bugs below (easily fixed) and some last nits.

::: js/src/asmjs/AsmJSLink.cpp
@@ +567,2 @@
>            case AsmJSModule::Global::ArrayViewCtor:
>              if (!ValidateArrayView(cx, global, globalVal))

I think you need to pass module.isSharedView() to ValidateArrayView so that it can validate shared or not.  Can you add an assertAsmLinkFail test for this?

::: js/src/asmjs/AsmJSModule.h
@@ +1014,3 @@
>          MOZ_ASSERT(!isFinishedWithModulePrologue());
> +        if (pod.hasArrayView_ && (pod.isSharedView_ != isSharedView))
> +            return false;

'return false' here will end up cause validation failure without a validation failure message.  Unfortunately, there is no clean way to report validation failure from AsmJSModule, so could you do this checking in AsmJSValidate and just have this function MOZ_ASSERT_IF(pod.hasArrayView, pod.isSharedView_ == isSharedView)?

@@ +1024,4 @@
>          MOZ_ASSERT(!isFinishedWithModulePrologue());
>          MOZ_ASSERT(field);
> +        if (pod.isSharedView_ && !pod.isSharedView_)
> +            return false;

Same comment comment as above.

::: js/src/asmjs/AsmJSSignalHandlers.cpp
@@ +20,5 @@
>  
>  #include "mozilla/DebugOnly.h"
>  
>  #include "asmjs/AsmJSModule.h"
> +#include "builtin/AtomicsObject.h"

I think this is no longer necessary?  Did the full barriers in signal handlers go away from the reasons mentioned earlier?

::: js/src/asmjs/AsmJSValidate.cpp
@@ +2818,5 @@
> +        bool needsBoundsCheck = chk == NEEDS_BOUNDS_CHECK && !m().usesSignalHandlersForOOB();
> +        MAsmJSLoadHeap *load = MAsmJSLoadHeap::New(alloc(), vt, ptr,
> +                                                   needsBoundsCheck,
> +                                                   MembarBeforeLoad,
> +                                                   MembarAfterLoad);

I think you could fit all the arguments on two lines and it'd still be quite readable (ditto for StoreHeap).

@@ +5856,5 @@
> +                               AsmJSAtomicsBuiltinFunction func, RetType retType,
> +                               MDefinition **def, Type *type)
> +{
> +    MDefinition *operand;
> +    Type actualRetType;

Could you name these 'resultDef' and 'resultType'?

::: js/src/jit/shared/Lowering-x86-shared.cpp
@@ +599,5 @@
> +        value = useFixed(ins->value(), ebx);
> +        if (bitOp)
> +            tempDef = tempFixed(ecx);
> +    }
> +    else {

} else {

::: js/src/jit/x64/CodeGenerator-x64.cpp
@@ +391,5 @@
> +    MOZ_ASSERT(mir->needsBoundsCheck());
> +    {
> +        Label goahead;
> +        CodeOffsetLabel cmp = masm.cmplWithPatch(ToRegister(ptr), Imm32(0));
> +        Register out = ToRegister(ins->output());

Can this decl of 'out' be hoisted up next to those of oldval/newval?  (Same comment in visitAsmJSAtomicBinopHeap and in X86 version)

@@ +392,5 @@
> +    {
> +        Label goahead;
> +        CodeOffsetLabel cmp = masm.cmplWithPatch(ToRegister(ptr), Imm32(0));
> +        Register out = ToRegister(ins->output());
> +        maybeCmpOffset = cmp.offset();

You could save a line and write
  maybeCmpOffset = masm.cmplWithPatch(...).offset();
and I think it'd read just as well.  (Same comment in visitAsmJSAtomicBinopHeap and in X86 and in X86 version)

@@ +440,5 @@
> +        masm.jmp(&rejoin);
> +        masm.bind(&goahead);
> +    }
> +    if (value->isConstant())
> +        masm.atomicBinopToTypedIntArray(op, vt == Scalar::Uint32 ? Scalar::Int32 : vt,

Need { } around multi-line then/else branches (and in X86 version).
Attachment #8522993 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #25)
> Comment on attachment 8522993 [details] [diff] [review]
> Odin support for Atomics, including x86, x64, and None
> 
> Review of attachment 8522993 [details] [diff] [review]:
> -----------------------------------------------------------------

> ::: js/src/jit/x64/CodeGenerator-x64.cpp
> @@ +391,5 @@
> > +    MOZ_ASSERT(mir->needsBoundsCheck());
> > +    {
> > +        Label goahead;
> > +        CodeOffsetLabel cmp = masm.cmplWithPatch(ToRegister(ptr), Imm32(0));
> > +        Register out = ToRegister(ins->output());
> 
> Can this decl of 'out' be hoisted up next to those of oldval/newval?  (Same
> comment in visitAsmJSAtomicBinopHeap and in X86 version)

Not quite.  ins->output is known to be a Register for asm.js, and here I use ToRegister to get at its Register nature.  However compareExchangeToTypedIntArray below takes an AnyRegister, so ins->output must be processed again regardless.  I know this is surprising; dougc had the same comment on the ARM port.  But given those facts, hoisting the declaration would not be an improvement.
I see, makes sense.  Given that, could you sink the definition down to right before its single use in xorl?
Re this fix:
  https://hg.mozilla.org/integration/mozilla-inbound/rev/d64f299df337#l3.198
it seems like if !pod.hasArrayView, you can always just return 'true' because pod.isSharedView_ is necessarily false (because it is initialized to false, but it could just as well have been left uninitialized since it is only allowed to be accessed if hasArrayView is true).
Comment on attachment 8522993 [details] [diff] [review]
Odin support for Atomics, including x86, x64, and None

Review of attachment 8522993 [details] [diff] [review]:
-----------------------------------------------------------------

Some queries noted when rebasing after this patch.

::: js/src/asmjs/AsmJSValidate.cpp
@@ +2815,5 @@
> +        if (inDeadCode())
> +            return nullptr;
> +
> +        bool needsBoundsCheck = chk == NEEDS_BOUNDS_CHECK && !m().usesSignalHandlersForOOB();
> +        MAsmJSLoadHeap *load = MAsmJSLoadHeap::New(alloc(), vt, ptr,

The needsBoundsCheck flag just indicates if the access has been prove to be within bounds. It was not intended to differentiate between the use of inline bounds check and a SIGSEGV handler on the x64. Thus I think it can just be passed along?

@@ +2828,5 @@
> +    {
> +        if (inDeadCode())
> +            return;
> +
> +        bool needsBoundsCheck = chk == NEEDS_BOUNDS_CHECK && !m().usesSignalHandlersForOOB();

And here.

@@ +2842,5 @@
> +        if (inDeadCode())
> +            return nullptr;
> +
> +        // The code generator requires explicit bounds checking for compareExchange.
> +        bool needsBoundsCheck = true;

Same issue here. The compiler can usefully skip the bounds check when the index is proven to be within bounds.

@@ +2855,5 @@
> +        if (inDeadCode())
> +            return nullptr;
> +
> +        // The code generator requires explicit bounds checking for the binops.
> +        bool needsBoundsCheck = true;

And here.

::: js/src/jit/x64/CodeGenerator-x64.cpp
@@ +387,5 @@
> +    Register newval = ToRegister(ins->newValue());
> +
> +    Label rejoin;
> +    uint32_t maybeCmpOffset = AsmJSHeapAccess::NoLengthCheck;
> +    MOZ_ASSERT(mir->needsBoundsCheck());

This is not necessary and the inline bounds check can be usefully skipped when the index has been proven to be within bounds. I assume this was to avoid a sigsegv and if so then using the inline bounds check below when needsBoundsCheck() is true will avoid it.

@@ +427,5 @@
> +    BaseIndex srcAddr(HeapReg, ToRegister(ptr), TimesOne);
> +
> +    Label rejoin;
> +    uint32_t maybeCmpOffset = AsmJSHeapAccess::NoLengthCheck;
> +    MOZ_ASSERT(mir->needsBoundsCheck());

And here.
(In reply to Douglas Crosher [:dougc] from comment #32)
> The needsBoundsCheck flag just indicates if the access has been prove to be
> within bounds. It was not intended to differentiate between the use of
> inline bounds check and a SIGSEGV handler on the x64. Thus I think it can
> just be passed along?

You're right that, in the context of CheckArrayAccess, needsBoundsChecks only means "proven in-bounds".  However, the bool 'needsBoundsChecks' passed to the MIR node literally indicates whether or not to emit inline bounds checks.  This allows Odin to merge "proven in bounds" with "disabled by various dynamic sources".
(In reply to Luke Wagner [:luke] from comment #33)
> (In reply to Douglas Crosher [:dougc] from comment #32)
> > The needsBoundsCheck flag just indicates if the access has been prove to be
> > within bounds. It was not intended to differentiate between the use of
> > inline bounds check and a SIGSEGV handler on the x64. Thus I think it can
> > just be passed along?
> 
> You're right that, in the context of CheckArrayAccess, needsBoundsChecks
> only means "proven in-bounds".  However, the bool 'needsBoundsChecks' passed
> to the MIR node literally indicates whether or not to emit inline bounds
> checks.  This allows Odin to merge "proven in bounds" with "disabled by
> various dynamic sources".

Yes, I see it's a bit more complex now. Unfortunately the MIR flag still means 'proven in-bounds' in some paths. Note that range analysis clears this MIR flag when proven within bounds, so it's not possible to depend on setting the flag when emitting the MIR to enforce a bounds check. If nothing else, it seems a rather confusing conflation.

Perhaps it would be clearer to pass along the 'can use signal handler' flag to the x64 backend, and leave the logic for choosing inline versus sigsegv for oob up to the backend - renaming the MIR flag 'needsBoundsChecks' to 'canSkipBoundsCheck' or 'provenInBounds'?

In any case some of the new paths added in these patches do not use the signal handler for OOB even on the x64. For reference bug 1103450 has a proposed patch and more comments.
You need to log in before you can comment on or make changes to this bug.