Closed Bug 1131613 Opened 5 years ago Closed 5 years ago

Support float32 and float64 atomics

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set

Tracking

()

RESOLVED WONTFIX

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(12 files, 4 obsolete files)

25.21 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
11.43 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
33.87 KB, patch
dougc
: review+
Details | Diff | Splinter Review
3.35 KB, patch
h4writer
: review+
Details | Diff | Splinter Review
4.39 KB, patch
rankov
: review+
Details | Diff | Splinter Review
2.45 KB, patch
h4writer
: review+
Details | Diff | Splinter Review
38.16 KB, patch
dougc
: review+
Details | Diff | Splinter Review
24.66 KB, patch
Details | Diff | Splinter Review
20.71 KB, patch
Details | Diff | Splinter Review
43.25 KB, patch
Details | Diff | Splinter Review
37.72 KB, patch
h4writer
: review+
Details | Diff | Splinter Review
40.31 KB, patch
h4writer
: review+
Details | Diff | Splinter Review
It is trivial-ish (using aliasing) to implement float32 atomics by means of int32 atomics, and thus absurd not to provide float32 atomics directly.

Requires modest support in the VM, the JITs, and in the specs.
Obviously this is trivial only for compareExchange, load, and store - for add and subtract there's typically no direct atomic support in the hardware, as the FPU is involved.
So for add and subtract, this is actually tricky.

On some hardware, eg x86, a compare-and-swap loop is typically required, and the exit condition for that loop is floating point equality, with all the complications that entails.

On other hardware, eg ARM, a LDREX/STREX loop is typically required, and the exit condition for that loop is simply that the store was successful.

On either platform, the interpreter and JIT must agree on the exit condition, so we can't really have portable code in the interpreter that simply uses a CAS loop.  (The general version of that problem exists already but has been papered over so far by closing our eyes and hoping the C++ compiler will do as the JIT does, see https://bugzilla.mozilla.org/show_bug.cgi?id=1131624#c0.  Something better is needed.)

An alternative is to define Atomics.add as something that uses a CAS loop with a defined equality test; I suspect that may be the way to go but that it's premature.

FWIW, C++11 only defines fetch_add and fetch_sub for integral types, for floats you have to build your own from the provided compare_exchange primitive.
See Also: → 1131624
(In reply to Lars T Hansen [:lth] from comment #2)

> An alternative is to define Atomics.add as something that uses a CAS loop
> with a defined equality test; I suspect that may be the way to go but that
> it's premature.

We could also use the spinlock for add and subtract on the floating types on all platforms.  However that would force us to use it also for load, store, and compareExchange.  It is probably better to have load, store, and compareExchange run at full speed where possible and pay for a loop only for add and subtract, even if that complicates the loop because the exit condition is tricky.
Just adding fuel to this fire: on platforms that support CAS with two results (cell value and update success), so that we can in principle test the update without comparing values, the natural comparison of the floating-point values for loop exit will presumably be bitwise, not by value, if we want the best performance.  But that's wrong for NaN values and for +0/-0.

In C++ the compare_exchange primitive has two results, allowing the program to use the flag.  Furthermore (and unsurprisingly) it defines equality of CAS as bitwise equality.

I think backing off on add and subtract is the right thing, for now.

Another concern is that C++ code translated to asm.js is going to have to be careful around floating-point CAS since only the old value is returned by Atomics.compareExchange, and floating-point comparison of the old and new value is not the same as what the C++ code had in mind if it tests the result flag of the CAS.
I think that having only atomic load, store and CAS for float32 would be fine. I imagine those are straightforward (as aliased operations to 32-bit int addresses)? Actually, for current pthreads support in Emscripten, that is what I emulate, see

https://github.com/juj/emscripten-fastcomp/commit/150c6c74115e6db353f2a0d2cf1fa5e96816c8e6

and

https://github.com/juj/emscripten/commit/69c311196495a77719848aea5afadf6b99af3c2e
(In reply to Jukka Jylänki from comment #5)
> I think that having only atomic load, store and CAS for float32 would be
> fine. I imagine those are straightforward (as aliased operations to 32-bit
> int addresses)?

CAS is not quite unproblematic because it's really a semantic issue how we want to define "equality".  I think, however, that given what these primitives are for and how they are expected to be used the correct definition of equality is "bitwise", and that makes the implementation unproblematic in practice.
Yeah, that's true. Bitwise equality (nan payload being undefined) sounds ok. However, even if we did nothing in terms of the spec, I think the current form that the Emscripten/LLVM code emulates will be ok, since like you mention, it gets us more or less the same thing.
An additional (though slight) complication for CAS and LOAD is that any NaN value has to be canonicalized after loading.

Semantically however, this means that a NaN returned from CAS or LOAD is not bitwise equal to what's in memory, which might lead to problems for further use of CAS.

For straightforward code this will not be a problem, so it's not a big deal, but it'll be worth noting in the spec.
Summary: Support float32 atomics → Support float32 and float64 atomics
Duplicate of this bug: 1131624
Attached patch WIP: Test cases (obsolete) — Splinter Review
Depends on: 1142593
See Also: → 1155176
The twelve patches in this set together implement float32 and float64 atomics for the interpreter, Ion, and Odin - a relatively large feature.  They are meant to apply to m-i on top of the patches for bug 1154705 and bug bug 1154714, in that order.  The patches in the set are not quite independent, there are some forward references.

Patch 1: Float32 and Float64 atomics in the interpreter, plus some naming cleanup in AtomicsObject.cpp.
Attachment #8579375 - Attachment is obsolete: true
Attachment #8579376 - Attachment is obsolete: true
Attachment #8593816 - Flags: review?(jwalden+bmo)
Attachment #8593817 - Flags: review?(jwalden+bmo)
This is low-level code to support the Ion changes that will appear shortly.

Included here is also Simulator support for UXTH and UXTB; the need for those was discovered only with the improved test cases in bug 1154704.
Attachment #8593818 - Flags: review?(dtc-moz)
Attached patch Intel support for Plain JS (obsolete) — Splinter Review
This is x86 and x64 support code for the Ion changes that will appear shortly.
Attachment #8593819 - Flags: review?(hv1989)
Stubs to keep the None platform compiling.
Attachment #8593820 - Flags: review?(hv1989)
Stubs to keep the MIPS platform compiling (I hope).
Attachment #8593821 - Flags: review?(branislav.rankov)
These are the Ion changes to support inlining of float32/float64 atomics in plain JS.  (There's also a bugfix for a regalloc usage issue that I discovered with my better tests; two uses of useRegisterAtStart have been changed to useRegister.)

One design choice made here is to handle float32 compareExchange by generating MIR that reinterprets the float32 to an int32, then uses the int32 compareExchange, then reinterprets the int32 bits as float64.  This looks messy in MCallOptimize.cpp but reduces the amount of code, since every platform realization would have to do just that anyway.  Contrast this with the situation for float64, where different strategies are needed on different platforms.  Anyway, the new nodes for float32->int32 and int32->float32 reinterpretation also come in handy later.
Attachment #8593825 - Flags: review?(hv1989)
These tests are mostly useful when studying the generated code with eg -D in the shell.
Attachment #8593829 - Flags: review?(hv1989)
ARM support for the asm.js patch that will appear shortly.
Attachment #8593830 - Flags: review?(dtc-moz)
Intel support for the asm.js patch that will appear shortly.
Attachment #8593832 - Flags: review?(luke)
This patch also addresses the general return type issue explained in bug 1155176 comment 1, by falling back on TypedArrayLoadType where a value read from the array is returned, and by returning the rhsType, in the case of Atomics.store.
Attachment #8593835 - Flags: review?(luke)
In addition to adding test cases for float32 and float64 this change cleans up the asm.js atomics tests generally and adds tests for unsigned types and for halfword types.
Attachment #8593836 - Flags: review?(luke)
Comment on attachment 8593819 [details] [diff] [review]
Intel support for Plain JS

Review of attachment 8593819 [details] [diff] [review]:
-----------------------------------------------------------------

I think there is an issue for x86 MCompareExchangeFloat64TypedArrayElement where we won't have enough registers...

::: js/src/jit/x64/MacroAssembler-x64.h
@@ +1256,5 @@
>      }
>  
> +    void moveDoubleToInt64Bits(FloatRegister src, Register dest) {
> +        vmovq(src, dest);
> +    }

s/moveDoubleToInt64Bits/moveDoubleToInt64

@@ +1260,5 @@
> +    }
> +
> +    void moveInt64BitsToDouble(Register src, FloatRegister dest) {
> +        vmovq(src, dest);
> +    }

s/moveInt64BitsToDouble/moveInt64ToDouble

::: js/src/jit/x86-shared/Assembler-x86-shared.h
@@ +1711,5 @@
> +          default:
> +            MOZ_CRASH("unexpected operand kind");
> +        }
> +    }
> +#endif

Move this to "jit/x64/Assembler-x64.h"

::: js/src/jit/x86-shared/BaseAssembler-x86-shared.h
@@ +1393,5 @@
> +    {
> +        spew("cmpxchgq   %s, " MEM_obs, GPReg64Name(src), ADDR_obs(offset, base, index, scale));
> +        m_formatter.twoByteOp64(OP2_CMPXCHG_GvEw, offset, base, index, scale, src);
> +    }
> +#endif

Can you move this to: "jit/x64/Assembler-x64.h"

::: js/src/jit/x86-shared/MacroAssembler-x86-shared.h
@@ +1292,5 @@
>      }
>  
> +    void moveInt32BitsToFloat(Register src, FloatRegister dest) {
> +        vmovd(src, dest);
> +    }

-> moveInt32ToFloat32
And can you add description about this instruction and say it doesn't convert the input, but copies it bit by bit. And that convertFloat32ToInt32 should get used in that case?

@@ +1296,5 @@
> +    }
> +
> +    void moveFloatToInt32Bits(FloatRegister src, Register dest) {
> +        vmovd(src, dest);
> +    }

moveFloat32ToInt32
And can you add description about this instruction and say it doesn't convert the input, but copies it bit by bit. And that convertXXX should get used in that case?

::: js/src/jit/x86/Lowering-x86.cpp
@@ +200,5 @@
> +    const LDefinition newLo = tempFixed(ebx);
> +    const LUse elements = useRegister(ins->elements());
> +    const LAllocation index = useRegisterOrConstant(ins->index());
> +    const LAllocation oldval = useRegister(ins->oldval());
> +    const LAllocation newval = useRegister(ins->newval());

This will fail!

In x86 we normally have 7 registers available, but when profiler is enabled ebp is also taken. So we can only take 6 registers at a time in x86.

In this particular case we request 8 registers. I assume this isn't failing, since "index" is a constant in the testcases. Can you add a testcase where "index" isn't a constant? That should fail.

So some magic will need to get used to get this working properly. Potential solutions is to use useRegisterAtStart for inputs.

::: js/src/jit/x86/MacroAssembler-x86.h
@@ +1007,5 @@
> +        vmovd(src, destLo);
> +        vmovapd(src, ScratchDoubleReg);
> +        vpsrlq(Imm32(32), ScratchDoubleReg, ScratchDoubleReg);
> +        vmovd(ScratchDoubleReg, destHi);
> +    }

moveDoubleToInt32x2
And can you add description about this instruction and say it doesn't convert the input, but copies it bit by bit.

@@ +1013,5 @@
> +    void moveInt32x2BitsToDouble(Register srcHi, Register srcLo, FloatRegister dest) {
> +        vmovd(srcLo, dest);
> +        vmovd(srcHi, ScratchDoubleReg);
> +        vpsllq(Imm32(32), ScratchDoubleReg, ScratchDoubleReg);
> +        vorpd(ScratchDoubleReg, dest, dest);

moveInt32x2ToDouble
And can you add description about this instruction and say it doesn't convert the input, but copies it bit by bit.
Attachment #8593819 - Flags: review?(hv1989)
Comment on attachment 8593820 [details] [diff] [review]
None support for Plain JS

Review of attachment 8593820 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/none/MacroAssembler-none.h
@@ +410,5 @@
>      Register extractBoolean(ValueOperand, Register) { MOZ_CRASH(); }
>      template <typename T> Register extractTag(T, Register) { MOZ_CRASH(); }
>  
> +    void moveFloatToInt32Bits(FloatRegister src, Register dest) { MOZ_CRASH(); }
> +    void moveInt32BitsToFloat(Register src, FloatRegister dest) { MOZ_CRASH(); }

Do we get breakage if this isn't defined?
If yes. Please update the names to moveFloat32ToInt32 and moveInt32ToFloat32.
If no. Don't add them ;).
Attachment #8593820 - Flags: review?(hv1989) → review+
Attachment #8593829 - Flags: review?(hv1989) → review+
(In reply to Hannes Verschore [:h4writer] from comment #24)
> Comment on attachment 8593819 [details] [diff] [review]
> Intel support for Plain JS
> 
> Review of attachment 8593819 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I think there is an issue for x86 MCompareExchangeFloat64TypedArrayElement
> where we won't have enough registers...

I worry about that too but so far I've not run into any problems, despite my best efforts.  The instruction needs at least six integer registers.  I could see address optimization pushing that number higher though.  (For Odin it seemed worse so that back-end calls out to C++ always at the moment.)

(More below.)

> ::: js/src/jit/x64/MacroAssembler-x64.h
> @@ +1256,5 @@
> >      }
> >  
> > +    void moveDoubleToInt64Bits(FloatRegister src, Register dest) {
> > +        vmovq(src, dest);
> > +    }
> 
> s/moveDoubleToInt64Bits/moveDoubleToInt64

I will make this change (here and et seq) if you want but the point here is that this is a bit copy, not a value conversion.  In a later remark you ask for a comment clarifying that, so I guess the "Bits" suffix was not clear enough.

> ::: js/src/jit/x86/Lowering-x86.cpp
> @@ +200,5 @@
> > +    const LDefinition newLo = tempFixed(ebx);
> > +    const LUse elements = useRegister(ins->elements());
> > +    const LAllocation index = useRegisterOrConstant(ins->index());
> > +    const LAllocation oldval = useRegister(ins->oldval());
> > +    const LAllocation newval = useRegister(ins->newval());
> 
> This will fail!
> 
> In x86 we normally have 7 registers available, but when profiler is enabled
> ebp is also taken. So we can only take 6 registers at a time in x86.
> 
> In this particular case we request 8 registers. I assume this isn't failing,
> since "index" is a constant in the testcases. Can you add a testcase where
> "index" isn't a constant? That should fail.

I will write more tests and see what falls out of that, but I currently do test with non-constant indices.

Note that oldval, newval, and output are floating point registers, and "only" 6 integer registers are actually being requested.

> So some magic will need to get used to get this working properly. Potential
> solutions is to use useRegisterAtStart for inputs.

Possibly.  I've had bad luck with that in these primitives, the register allocator asserts a lot, probably I need to be a little more conservative.
Comment on attachment 8593825 [details] [diff] [review]
Ion changes to implement float32/float64 atomics

Review of attachment 8593825 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Though can I have another look, since I requested some (functionality) changes?

::: js/src/jit/CodeGenerator.cpp
@@ +364,5 @@
> +void
> +CodeGenerator::visitInt32BitsToFloat(LInt32BitsToFloat* lir)
> +{
> +    masm.moveInt32BitsToFloat(ToRegister(lir->input()), ToFloatRegister(lir->output()));
> +    if (lir->canonicalizeNaN())

lir->mir()->canonicalizeNaN(), I added a comment about this later in the patch

::: js/src/jit/IonBuilder.h
@@ +906,1 @@
>                                     AtomicCheckResult=DoCheckAtomicResult);

The last argument here doesn't have a name... (Should be 'checkResult')

::: js/src/jit/LIR-Common.h
@@ +3520,5 @@
> +
> +  public:
> +    LIR_HEADER(Int32BitsToFloat)
> +
> +    LInt32BitsToFloat(const LAllocation& input, bool canonicalizeNaN)

Don't give "bool canonicalizeNaN" as a argument. Just define a function:

MInt32ToFloat32* mir() const {
    return mirRaw_->toInt32ToFloat32();
}

And use lir->mir->canonicalizeNaN() where you need it during codegen.

::: js/src/jit/LOpcodes.h
@@ +162,5 @@
>      _(Float32ToDouble)              \
>      _(DoubleToFloat32)              \
>      _(Int32ToFloat32)               \
> +    _(FloatToInt32Bits)             \
> +    _(Int32BitsToFloat)             \

Please rename to:
Float32ToInt32
Int32ToFloat32

::: js/src/jit/Lowering.cpp
@@ +1846,5 @@
> +
> +void
> +LIRGenerator::visitInt32BitsToFloat(MInt32BitsToFloat* mir)
> +{
> +    define(new(alloc()) LInt32BitsToFloat(useRegister(mir->input()), mir->canonicalizeNaN()), mir);

No need to pass mir->canonicalizeNaN(), see comment in LIR-Common.h

::: js/src/jit/MCallOptimize.cpp
@@ +2779,5 @@
> +        }
> +        if (newval->type() != MIRType_Int32) {
> +            newval = MTruncateToInt32::New(alloc(), newval);
> +            current->add(newval->toInstruction());
> +        }

Ok. I got some comments here:

1) We have IntPolicy: which is no-op if input is Int32 else tries to unbox. But bails if not possible
2) We have ConvertToInt32Policy: which is no-op if input is Int32 and else tries to convert to Int32. If conversion would yield with loss of information a bailout occurs.

So (2) is already more relaxed than (1) so could also be used. Now the comment says you want to go even further and do this upon truncation. Can you create a TruncateIntPolicy and use that in "MCompareExchangeTypedArrayElement" as type for 3rd and 4th operand?

This shouldn't be a problem for Scalar::Float32, since the MTruncate will get added betwen MFloat32toInt32 and MCompareExchangeTypedArrayElement and will become a no-op.

And also for MStoreUnboxedScalar?

@@ +2793,5 @@
> +        MInt32BitsToFloat* i2f = MInt32BitsToFloat::New(alloc(), cas);
> +        current->add(i2f);
> +        current->push(i2f);
> +    } else {
> +        cas->setResultType(arrayType == Scalar::Uint32 ? MIRType_Double : MIRType_Int32);

cas->setResultType(getInlineReturnType());
Should still be correct, right?

::: js/src/jit/MOpcodes.h
@@ +115,5 @@
>      _(TruncateToInt32)                                                      \
>      _(ToString)                                                             \
>      _(ToObjectOrNull)                                                       \
> +    _(FloatToInt32Bits)                                                     \
> +    _(Int32BitsToFloat)                                                     \

Float32ToInt32
Int32ToFloat32

@@ +203,5 @@
>      _(StoreTypedArrayElementHole)                                           \
>      _(StoreTypedArrayElementStatic)                                         \
>      _(CompareExchangeTypedArrayElement)                                     \
> +    _(CompareExchangeFloat64TypedArrayElement)                              \
> +    _(AtomicLoadFloat)                                                      \

Please rename to AtomicLoadFloatingPoint
Attachment #8593825 - Flags: review?(hv1989)
If I'm reading correctly, the float ops are basically equivalent to int ops (equality is bitwise and there isn't atomic arith), so what is the reason for having float ops?  If they have int semantics, it seems more honest in the API to express them as ints.  For float64, you could make the argument that it provides a larger atomic datum, but NaN canonicalization will make it unsuitable for anything but real float64 usage.
Comment on attachment 8593819 [details] [diff] [review]
Intel support for Plain JS

Review of attachment 8593819 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86/Lowering-x86.cpp
@@ +200,5 @@
> +    const LDefinition newLo = tempFixed(ebx);
> +    const LUse elements = useRegister(ins->elements());
> +    const LAllocation index = useRegisterOrConstant(ins->index());
> +    const LAllocation oldval = useRegister(ins->oldval());
> +    const LAllocation newval = useRegister(ins->newval());

So like Lars explained two of these 8 registers are floats, so don't count. As a result we only have 6 registers. So that should be ok ;)
Comment on attachment 8593818 [details] [diff] [review]
ARM support for Plain JS

Review of attachment 8593818 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Lowering-arm.cpp
@@ +690,5 @@
> +// For float32:
> +// We'll use LDR/STR, as int loads/stores are single-copy atomic.
> +//
> +// For float64:
> +// We'll use LDREXD/STREXD for the operation, for now.  On chips where

I assume the simulator already supports LDREXD and STREXD?
Does it also check that the address is double word aligned?
Comment on attachment 8593830 [details] [diff] [review]
ARM support for asm.js

Review of attachment 8593830 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/CodeGenerator-arm.cpp
@@ +2053,5 @@
> +    Label goahead;
> +    BufferOffset bo = masm.ma_BoundsCheck(ptr);
> +    maybeCmpOffset = bo.getOffset();
> +    masm.ma_b(&goahead, Assembler::Below);
> +    memoryBarrier(MembarFull);

Is this barrier only needed when a bounds check is needed?

@@ +2163,5 @@
> +
> +    masm.setupAlignedABICall(3);
> +    masm.passABIArg(ptr);
> +    masm.passABIArg(oldval, MoveOp::DOUBLE);
> +    masm.passABIArg(newval, MoveOp::DOUBLE);

Might there be any utility reordering these arguments to pass the ptr last, so that the doubles can be packed in r0,r1 r2,r3?

::: js/src/jit/arm/Simulator-arm.cpp
@@ +2274,5 @@
>            }
> +          case Args_Double_IntDoubleDouble: {
> +            double dval0, dval1;
> +            int32_t ival;
> +            getFpArgs(&dval0, &dval1, &ival);

Could you double check that the use of getFpArgs is correct?
(In reply to Douglas Crosher [:dougc] from comment #31)
> Comment on attachment 8593830 [details] [diff] [review]
> ARM support for asm.js
> 
> Review of attachment 8593830 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/arm/CodeGenerator-arm.cpp
> @@ +2053,5 @@
> > +    Label goahead;
> > +    BufferOffset bo = masm.ma_BoundsCheck(ptr);
> > +    maybeCmpOffset = bo.getOffset();
> > +    masm.ma_b(&goahead, Assembler::Below);
> > +    memoryBarrier(MembarFull);
> 
> Is this barrier only needed when a bounds check is needed?

Yes, because when the bounds check passes the primitive contains the necessary barriers.

> @@ +2163,5 @@
> > +
> > +    masm.setupAlignedABICall(3);
> > +    masm.passABIArg(ptr);
> > +    masm.passABIArg(oldval, MoveOp::DOUBLE);
> > +    masm.passABIArg(newval, MoveOp::DOUBLE);
> 
> Might there be any utility reordering these arguments to pass the ptr last,
> so that the doubles can be packed in r0,r1 r2,r3?

I don't think that kind of micro-optimization will make much difference here, and the values will have to be rearranged on the C++ side in any case.  It's probably better if they're just on the stack.  I'm open to arguments, if you have them :)

> ::: js/src/jit/arm/Simulator-arm.cpp
> @@ +2274,5 @@
> >            }
> > +          case Args_Double_IntDoubleDouble: {
> > +            double dval0, dval1;
> > +            int32_t ival;
> > +            getFpArgs(&dval0, &dval1, &ival);
> 
> Could you double check that the use of getFpArgs is correct?

I did check before, but I will do so again.
(In reply to Douglas Crosher [:dougc] from comment #30)
> Comment on attachment 8593818 [details] [diff] [review]
> ARM support for Plain JS
> 
> Review of attachment 8593818 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/arm/Lowering-arm.cpp
> @@ +690,5 @@
> > +// For float32:
> > +// We'll use LDR/STR, as int loads/stores are single-copy atomic.
> > +//
> > +// For float64:
> > +// We'll use LDREXD/STREXD for the operation, for now.  On chips where
> 
> I assume the simulator already supports LDREXD and STREXD?

It does (search for "exclusive", you want the "1" case in the subsequent switches).

> Does it also check that the address is double word aligned?

It does - it uses readDW, which contains the necessary check.
(In reply to Hannes Verschore [:h4writer] from comment #24)
> Comment on attachment 8593819 [details] [diff] [review]
> Intel support for Plain JS
> 
> Review of attachment 8593819 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/x86-shared/BaseAssembler-x86-shared.h
> @@ +1393,5 @@
> > +    {
> > +        spew("cmpxchgq   %s, " MEM_obs, GPReg64Name(src), ADDR_obs(offset, base, index, scale));
> > +        m_formatter.twoByteOp64(OP2_CMPXCHG_GvEw, offset, base, index, scale, src);
> > +    }
> > +#endif
> 
> Can you move this to: "jit/x64/Assembler-x64.h"

Hannes: I can do that, but the pattern we have right now really is that all calls to m_formatter are in the BaseAssembler, if necessary with ifdefs around them like here.
(In reply to Hannes Verschore [:h4writer] from comment #27)
> Comment on attachment 8593825 [details] [diff] [review]
> Ion changes to implement float32/float64 atomics
> 
> Review of attachment 8593825 [details] [diff] [review]:
> -----------------------------------------------------------------

> ::: js/src/jit/MCallOptimize.cpp
> @@ +2779,5 @@
> > +        }
> > +        if (newval->type() != MIRType_Int32) {
> > +            newval = MTruncateToInt32::New(alloc(), newval);
> > +            current->add(newval->toInstruction());
> > +        }
> 
> Ok. I got some comments here:
> 
> 1) We have IntPolicy: which is no-op if input is Int32 else tries to unbox.
> But bails if not possible
> 2) We have ConvertToInt32Policy: which is no-op if input is Int32 and else
> tries to convert to Int32. If conversion would yield with loss of
> information a bailout occurs.
> 
> So (2) is already more relaxed than (1) so could also be used. Now the
> comment says you want to go even further and do this upon truncation.

Yes, that follows from the spec, which in turn follows ES6 as far as conversion behavior for typed array insertion is concerned.

> Can
> you create a TruncateIntPolicy and use that in
> "MCompareExchangeTypedArrayElement" as type for 3rd and 4th operand?

I will attempt that, it's a nice solution so long as order of evaluation issues do not get in the way.

> This shouldn't be a problem for Scalar::Float32, since the MTruncate will
> get added betwen MFloat32toInt32 and MCompareExchangeTypedArrayElement and
> will become a no-op.

That sounds plausible, and is easy to test.

> And also for MStoreUnboxedScalar?

No, MStoreUnboxedScalar already has its own type policy that actually contains the same code that I've used in compareExchange (using truncate).  I probably stole the code I have from an older incarnation of that code.
Hannes: Here's the code for compareExchange on x86 with a variable array index.  As you can see, six integer registers are live (edx:eax, ecx:ebx, ebp for the elements pointer, and esi for the index).  I had thought -D would enable the profiler and hence reserve ebp, but in any case, should ebp be taken edi should be usable for the elements pointer in this situation.

I'll add this test case to the tests that you've already looked at.

[Pointer:NON_GC_THING]
movl       $0xf76c9000, %ebp

[CompareExchangeFloat64TypedArrayElement]
movd       %xmm0, %eax
movapd     %xmm0, %xmm7
psrlq      $32, %xmm7
movd       %xmm7, %edx
movd       %xmm1, %ebx
movapd     %xmm1, %xmm7
psrlq      $32, %xmm7
movd       %xmm7, %ecx
lock
cmpxchg8b  edx:eax, 0x0(%ebp,%esi,8)
movd       %eax, %xmm2
movd       %edx, %xmm7
psllq      $32, %xmm7
orpd       %xmm7, %xmm2
ucomisd    %xmm2, %xmm2
jnp        .Lfrom842
movsd      0xffffffff, %xmm2
(In reply to Hannes Verschore [:h4writer] from comment #27)
> Comment on attachment 8593825 [details] [diff] [review]
> Ion changes to implement float32/float64 atomics
> 
> Review of attachment 8593825 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +2793,5 @@
> > +        MInt32BitsToFloat* i2f = MInt32BitsToFloat::New(alloc(), cas);
> > +        current->add(i2f);
> > +        current->push(i2f);
> > +    } else {
> > +        cas->setResultType(arrayType == Scalar::Uint32 ? MIRType_Double : MIRType_Int32);
> 
> cas->setResultType(getInlineReturnType());
> Should still be correct, right?

Unfortunately it is not still correct.  The reason is that the check in atomicsMeetsPreconditions is looser now: if the view type is an integer type other than Uint32, there is no guard on the inline return type.  Notably the inline return type can be MIRType_Value.  Hence the result type should be set to the precise type in the code quoted above.

I can make the check in atomicsMeetsPreconditions stricter again, but I'm not sure why it should be (arguments welcome, of course).
Originally reviewed in comment 24.

Followups in comment 26, comment 29, comment 34.

I have renamed and commented as suggested but had to choose slightly different names, using a "Move" prefix several places (LMoveInt32ToFloat32) because there was already an LInt32ToFloat32, namely the conversion.  Ditto for later patches.
Attachment #8593819 - Attachment is obsolete: true
Attachment #8595232 - Flags: review?(hv1989)
Originally reviewed in comment 27.

Followup in comment 35.
Attachment #8593825 - Attachment is obsolete: true
Attachment #8595235 - Flags: review?(hv1989)
(In reply to Luke Wagner [:luke] from comment #28)
> If I'm reading correctly, the float ops are basically equivalent to int ops
> (equality is bitwise and there isn't atomic arith), so what is the reason
> for having float ops?  If they have int semantics, it seems more honest in
> the API to express them as ints.  For float64, you could make the argument
> that it provides a larger atomic datum, but NaN canonicalization will make
> it unsuitable for anything but real float64 usage.

For float32, in asm.js, it does not matter a great deal whether specialized operations are provided, since the int operations can be used, although I don't know that there's a convenient floatbits-to-intbits primitive in asm.js that does not force the translation to go through memory.  So the operations are not quite equivalent and a native implementation will likely beat the emulation.

For float64 it does make a difference, since the only alternative is some sort of spinlocked loop.

NaN and +0/-0 make the semantics complicated; NaN isn't a concern IMO but the fact that CAS(+0.0, v) will fail if the value in the cell is -0.0 is annoying, to say the least.

(In any case C++ provides float32 and float64 atomics with the same semantics as here, and there's at least an argument to be made that we should support the translation well for asm.js.)

For handwritten-JS it seems more useful to me to actually support these operations on floating point arrays directly.

How useful are they?  Probably moderately.  I've implemented these because they aid emscripten and because they round out the feature, so to speak.  Clearly one can imagine the utility of eg a float64 counter that is updated atomically by multiple threads, using CAS, or other ways of communicating floating point data among threads using atomics.
Comment on attachment 8593821 [details] [diff] [review]
MIPS support for Plain JS

Review of attachment 8593821 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/mips/Lowering-mips.cpp
@@ +567,5 @@
>      MOZ_CRASH("NYI");
>  }
>  
>  void
> +LIRGeneratorMIPS::visitCompareExchangeFloat64TypedArrayElement(MCompareExchangeFloat64TypedArrayElement* ins)

nit: This line is too long

::: js/src/jit/mips/Lowering-mips.h
@@ +102,5 @@
>      void visitSimdSelect(MSimdSelect* ins);
>      void visitSimdSplatX4(MSimdSplatX4* ins);
>      void visitSimdValueX4(MSimdValueX4* ins);
>      void visitCompareExchangeTypedArrayElement(MCompareExchangeTypedArrayElement* ins);
> +    void visitCompareExchangeFloat64TypedArrayElement(MCompareExchangeFloat64TypedArrayElement* ins);

nit: line too long
Attachment #8593821 - Flags: review?(branislav.rankov) → review+
(In reply to Lars T Hansen [:lth] from comment #40)
> although I
> don't know that there's a convenient floatbits-to-intbits primitive in
> asm.js that does not force the translation to go through memory.

There isn't but it's separately valuable and I've been thinking for a while that we should add it as a Math builtin b/c this comes up from time to time.

> How useful are they?  Probably moderately.  I've implemented these because
> they aid emscripten and because they round out the feature, so to speak. 
> Clearly one can imagine the utility of eg a float64 counter that is updated
> atomically by multiple threads, using CAS, or other ways of communicating
> floating point data among threads using atomics.

I ask this b/c I've never seen or heard of any uses of atomic operations on floats.  I wonder if it wouldn't be better to wait until we actually have someone complain (given that, from what Jukka says, Emscripten is able to support this already).  The advantage is minimizing the work for everyone to implement v.1 which enables faster shipping and general surface area reduction.
In e.g. Unity there are extensive uses of 32-bit and 64-bit Atomic float loads and stores. In the pthreads branch, Emscripten compiler has been set to emit custom emulated instructions for these. See https://github.com/juj/emscripten-fastcomp/commit/3f91518e5ef21c67f69651022babddc7f697bc84#diff-7212822d603be752dbb5c5d4619a359eR849 and https://github.com/juj/emscripten/commit/cd721d1aebc3ebc17a2e4c9fa3e56b7e3649fb4c . The biggest drawback of these is most likely performance. For 32-bit ints vs floats the emulation is probably good, but native 64-bit atomic ops would be nice to have, so that float64 atomic ops could be sanely implemented.

I think it would be nice to have these ops in the spec for ease of use, although it is not critical.
(In reply to Jukka Jylänki from comment #43)
Ok, I stand corrected.  Just for my own benefit: what are the float atomic ops being used for?
Comment on attachment 8593816 [details] [diff] [review]
Float32 and float64 atomics in the runtime

Review of attachment 8593816 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/AtomicsObject.cpp
@@ +200,5 @@
> +{
> +    DoubleLongOverlay oldval;
> +    DoubleLongOverlay newval;
> +    oldval.d = oldCandidate;
> +    newval.d = newCandidate;

Rather than one-off, ad hoc overlays, I'd prefer you use mozilla::BitwiseCast<>:

  int64_t oldval = mozilla::BitwiseCast<int64_t>(oldCandidate);
  int64_t newval = mozilla::BitwiseCast<int64_t>(newCandidate);

  ... rest of the body, with s/(old|new)val.l/\1val/g...

  return mozilla::BitwiseCast<int64_t>(oldval);

That isolates the bitwise casting to one already-existing location, rather than adding more of them.  |using mozilla::BitwiseCast| is fine if you want it.

@@ +201,5 @@
> +    DoubleLongOverlay oldval;
> +    DoubleLongOverlay newval;
> +    oldval.d = oldCandidate;
> +    newval.d = newCandidate;
> +    int64_t* addr = (int64_t*)viewData + offset;

General preference for constructor-style casts, or failing that static_cast<>, over C-style casts, everywhere in this patch.

@@ +212,5 @@
> +            *addr = newval.l;
> +        buffer->regionLock.release<8>(addr);
> +        oldval.l = x;
> +    }
> +    // Note, canonicalization must take place in the caller if appropriate.

Hmm.  This is going to be...odd...when compare-exchange operations performing an exchange if the existing value is NaN, have no effect when the existing value *is* NaN.  (I agree with not equating -0 and +0.)  And it's largely impossible to write code to ensure bitwise-identical NaNs appear everywhere.

We could make the comparison use SameValue semantics, if we added a well-predicted branch on |IsNaN(oldCandidate)|, guarding a more elaborate loop-based approach.  Even that overhead should be dwarfed by cost to synchronize.

Should we?  Not all users want instruction-level granularity.  asm.js probably does, but anyone working in plain handwritten JS, could easily be surprised.  There *will* be some such people.  And given Granted, you'd have to access and modify a typed array's underlying buffer.  But I'm not sure I can discard the possibility quite so immediately.

Suppose you wanted compare-and-exchange to use SameValue equality.  Could you use a bitwise version and check for the existing, returned value being NaN and retrying the compare-exchange with that?  Not really, because the returned value might be canonicalized.  I think you'd *have* to do it by examining the underlying bytes.  But you'd have to repeatedly loop in case someone was mutating bytes underneath you.  Very tricky.

I almost wonder if providing both operations might be the answer.  I'm not really convinced providing just bitwise, or just SameValue equality, is the right answer here.  :-\

@@ +225,5 @@
> +    oldval.f = oldCandidate;
> +    newval.f = newCandidate;
> +    oldval.i = jit::AtomicOperations::compareExchangeSeqCst((int32_t*)viewData + offset, oldval.i, newval.i);
> +    // Note, canonicalization must take place in the caller if appropriate.
> +    return oldval.f;

Same BitwiseCast changes.  Oh, and you'll want #include "mozilla/Casting.h" for this.

@@ +246,5 @@
>      bool inRange;
>      if (!GetSharedTypedArrayIndex(cx, idxv, view, &offset, &inRange))
>          return false;
> +
> +    if (view->type() == Scalar::Float32 || view->type() == Scalar::Float64) {

We could share a bunch more code here if ToNumber and inRange were used by everyone, then the non-floating-point code did a separate ToInt32 atop that.  Given that we control the spec, I think we should push that change into the spec, then make it here.

@@ +260,5 @@
> +            return atomics_fence_impl(cx, r);
> +
> +        if (view->type() == Scalar::Float32) {
> +            r.setNumber(JS::CanonicalizeNaN(CompareExchangeSeqCst((float)oldCandidate,
> +                                                                  (float)newCandidate,

Either constructor- or C++-style casts.

@@ +264,5 @@
> +                                                                  (float)newCandidate,
> +                                                                  view->viewData(),
> +                                                                  offset)));
> +        } else {
> +            r.setNumber(JS::CanonicalizeNaN(CompareExchangeSeqCst(oldCandidate,

I think you want setDouble for these two halves, probably.  setNumber would return a varying-type value, when really the contents of these arrays are floating point and should always be represented that way, and never as int32_t.

@@ +303,5 @@
> +LoadSeqCst(void* viewData, uint32_t offset, SharedArrayRawBuffer* buffer)
> +{
> +    DoubleLongOverlay v;
> +    int64_t* addr = (int64_t*)viewData + offset;
> +    if (jit::AtomicOperations::isLockfree8()) {

int64_t* addr = static_cast<int64_t*>(viewData) + offset;
if (jit::AtomicOperations::isLockfree8())
  return BitwiseCast<double>(jit::AtomicOperations::loadSeqCst(addr));

buffer->regionLock.acquire<8>(addr);
int64_t v = *addr;
buffer->regionLock.release<8>(addr);
return BitwiseCast<double>(v);

@@ +367,5 @@
>        }
> +      case Scalar::Float32: {
> +          FloatIntOverlay v;
> +          v.i = jit::AtomicOperations::loadSeqCst((int32_t*)view->viewData() + offset);
> +          r.setNumber(JS::CanonicalizeNaN(v.f));

More BitwiseCast.

@@ +390,5 @@
> +    if (jit::AtomicOperations::isLockfree8()) {
> +        jit::AtomicOperations::storeSeqCst(addr, value.l);
> +    } else {
> +        buffer->regionLock.acquire<8>(addr);
> +        *addr = value.l;

And again BitwiseCast.

@@ +808,5 @@
>  
> +double
> +js::atomics_cmpxchgd_asm_callout(int32_t offset, double oldval, double newval)
> +{
> +    MOZ_ASSERT(!(offset & 7));

Mild preference for (offset % 8) == 0 as more readably implying 8-byte aligned, same elsewhere.

@@ +815,5 @@
> +    SharedArrayRawBuffer* rawBuffer;
> +    GetCurrentAsmJSHeap(&heap, &heapLength, &rawBuffer);
> +    if ((size_t)offset >= heapLength)
> +        return JS::GenericNaN();
> +    return CompareExchangeSeqCst(oldval, newval, heap, offset>>3, rawBuffer);

I assume (without checking) that all these functions take int32_t because they come from JITs, Values, etc.  But that's not really how they treat the argument.  Instead of passing signed integers everywhere, and having to deal with all the quirks of signed integers (such as right-shift being implementation-defined if the LHS is negative), could you either immediately assign them to uint32_t, or make the arguments themselves uint32_t?  Followup is fine.

(I ask this because I really would rather read |offset / 8| here and let the compiler do the optimization, than have readers take a mental speedbump for it.)
Attachment #8593816 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8595232 [details] [diff] [review]
Intel support for Plain JS, v2

Review of attachment 8595232 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x64/LOpcodes-x64.h
@@ +18,5 @@
>      _(ModPowTwoI)                   \
>      _(PowHalfD)                     \
> +    _(CompareExchangeFloat64TypedArrayElement) \
> +    _(AtomicLoadFloatingPoint)      \
> +    _(AtomicStoreFloat)             \

s/AtomicStoreFloat/AtomicStoreFloatingPoint

::: js/src/jit/x86-shared/Assembler-x86-shared.h
@@ +1727,5 @@
> +          default:
> +            MOZ_CRASH("unexpected operand kind");
> +        }
> +    }
> +#endif

This should still get moved to jit/x64/Assembler-x64.h.
Or was there also a reason you didn't do that?
(I agreed about not moving cmpxchgq, but there is no 'JS_CODEGEN_X64' in jit/x86-shared/Assembler-x86-shared.h and we should try to keep that. We have these platform specific Assemblers for that.

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +3208,5 @@
> +
> +    Label fail;
> +    if (lir->index()->isConstant()) {
> +        Address source(elements, ToInt32(lir->index()) * width);
> +        masm.loadFromTypedArray(vt, source, out, InvalidReg, &fail, /*canonicalize=*/ true);

Style nit: add some breathing space: /* canonicalize = */

@@ +3211,5 @@
> +        Address source(elements, ToInt32(lir->index()) * width);
> +        masm.loadFromTypedArray(vt, source, out, InvalidReg, &fail, /*canonicalize=*/ true);
> +    } else {
> +        BaseIndex source(elements, ToRegister(lir->index()), ScaleFromElemWidth(width));
> +        masm.loadFromTypedArray(vt, source, out, InvalidReg, &fail, /*canonicalize=*/ true);

Style nit: add some breathing space: /* canonicalize = */

::: js/src/jit/x86/LOpcodes-x86.h
@@ +19,5 @@
>      _(ModPowTwoI)               \
>      _(PowHalfD)                 \
> +    _(CompareExchangeFloat64TypedArrayElement) \
> +    _(AtomicLoadFloatingPoint)  \
> +    _(AtomicStoreFloat)         \

s/AtomicStoreFloat/AtomicStoreFloatingPoint
Attachment #8595232 - Flags: review?(hv1989) → review+
Comment on attachment 8595235 [details] [diff] [review]
Ion changes to implement float32/float64 atomics, v2

Review of attachment 8595235 [details] [diff] [review]:
-----------------------------------------------------------------

Maybe InterpretFloat32AsInt32 instead of MoveFloat32ToInt32 (for lir/mir name)?
Attachment #8595235 - Flags: review?(hv1989) → review+
(In reply to Hannes Verschore [:h4writer] from comment #46)
> Comment on attachment 8595232 [details] [diff] [review]
> Intel support for Plain JS, v2
> 
> Review of attachment 8595232 [details] [diff] [review]:
> -----------------------------------------------------------------

+1 on all of those, they are all oversights on my part.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #45)
> Comment on attachment 8593816 [details] [diff] [review]
> Float32 and float64 atomics in the runtime
> 
> Review of attachment 8593816 [details] [diff] [review]:
> -----------------------------------------------------------------

Selected followups, assume +1 or "will try to make that work out" for the rest.

> @@ +212,5 @@
> > +            *addr = newval.l;
> > +        buffer->regionLock.release<8>(addr);
> > +        oldval.l = x;
> > +    }
> > +    // Note, canonicalization must take place in the caller if appropriate.
> 
> Hmm.  This is going to be...odd...when compare-exchange operations
> performing an exchange if the existing value is NaN, have no effect when the
> existing value *is* NaN.  (I agree with not equating -0 and +0.)  And it's
> largely impossible to write code to ensure bitwise-identical NaNs appear
> everywhere.

Ah, so we don't have a canonical (quiet) NaN representation (canonical on one platform, I mean)?  Tricky.  But if, as you say, "largely impossible", then understandable :)

Note that for non-asm.js, all callers of the CompareExchangeSeqCst function immediately canonicalize.  The reason I've not done that for asm.js is because asm.js generally does not seem to do canonization (except at the boundary with plain JS, presumably).

> We could make the comparison use SameValue semantics, if we added a
> well-predicted branch on |IsNaN(oldCandidate)|, guarding a more elaborate
> loop-based approach.  Even that overhead should be dwarfed by cost to
> synchronize.

Certainly the overhead is dwarfed if we inline the operation; if we choose to bail out the overhead might be noticeable, but that may not be a dealbreaker if NaN is a corner case (as I expect it is).

> Should we?  Not all users want instruction-level granularity.  asm.js
> probably does, but anyone working in plain handwritten JS, could easily be
> surprised.  There *will* be some such people.  And given Granted, you'd have
> to access and modify a typed array's underlying buffer.  But I'm not sure I
> can discard the possibility quite so immediately.

The assumption is that asm.js wants to translate a CAS on float64 into one of ours, and C++ uses storage equality, for better or worse.  So asm.js wants the current semantics.

I completely agree that it would be surprising to a JS programmer, provided it is the case that our arithmetic operations can produce several different NaN representations on any given CPU.  (Anyone creating a custom NaN representation I'm still not all that worried about.)

> Suppose you wanted compare-and-exchange to use SameValue equality.  Could
> you use a bitwise version and check for the existing, returned value being
> NaN and retrying the compare-exchange with that?  Not really, because the
> returned value might be canonicalized.  I think you'd *have* to do it by
> examining the underlying bytes.  But you'd have to repeatedly loop in case
> someone was mutating bytes underneath you.  Very tricky.

Yes, loop while the bits are NaN, exit when you succeed or the bits are no longer NaN.

On that note, no CPU that I know of supports CAS to floating point registers, everything goes through integer registers (see the subsequent patches for examples) and nothing is canonicalized on load.  We have access to the bits, directly, at this level, at no extra cost.

> I almost wonder if providing both operations might be the answer.  I'm not
> really convinced providing just bitwise, or just SameValue equality, is the
> right answer here.  :-\

There's another dimension.  As specified, compareExchange only has the one return value, namely the old value in the cell.  In C++ it has two return values, the old value and a result for the operation (did it succeed or not).  For integers that bit about operation success is pretty much redundant (and for asm.js multiple return values is not great), but for floating point numbers that may be NaN in plain JS it appears not to be.  So suppose we add a primitive that returns both those values:

  var [flag,cellval] = Atomics.compareExchange2(mem, index, oldval, newval)

Then it would be possible, without inspecting the bits, for the JS program to decide what to do:

  if (!flag && isNaN(cellval) && isNaN(oldval)) {
    // special case
  }

Arguably the programmer would have to be fairly aware to use this strategy in the first case, so perhaps that's not a great argument.  And the programmer still can't update that cell with CAS because the non-canonical NaN has no representation in JS.

I probably agree that for plain JS there ought to be a SameValue-based variant that handles NaN.  I'll make a note on the spec.
Comment on attachment 8593817 [details] [diff] [review]
Test cases for plain JS

Review of attachment 8593817 [details] [diff] [review]:
-----------------------------------------------------------------

Unrelated to this, do we have any tests yet that verify proper synchronization behavior, global sequence order, &c. yet?

::: js/src/jit-test/tests/atomics/basic-tests.js
@@ +45,1 @@
>  	// val = 0

Arguably you should update all these val= comments with a +c in them.  If you're going for cleaner-looking code, I could buy that rationale.

@@ +68,5 @@
> +	if (!isFloat) {
> +	    // val = 0
> +	    assertEq(Atomics.add(a, x, 3), 0+c); // Don't add c again
> +	    // val = 3
> +	    assertEq(Atomics.sub(a, x, 2), 3+c); // Ditto

With c===0 it hardly seems worthwhile putting +c in *only* these places.

@@ +88,1 @@
>  	// val = 0

Given here you revert back to no +c, I think probably even if cleanliness was the idea with not adding +c above everywhere, you probably should give it up here.

@@ +121,2 @@
>  	// val = 0
> +	assertEq(gAtomics_compareExchange(a, x, 0, 37+c), 0);

Is this testing a different |this| or something?  Just not entirely sure I see the point of it, but fine.

...oh, looking later on, I guess so.  Guess I wouldn't have bothered, but eh.

I guess all the same comments as before keep applying, here.

@@ +227,5 @@
> +	Atomics.add(a, 0, 1);
> +    }
> +    catch (e) {
> +	thrown = true;
> +	assertEq(e instanceof TypeError, true);

This hits the ReportBadArrayType in atomics_binop_impl, right?  I would (separate patch) add float32/float64 cases to that switch with a custom error message to the effect that "binops not supported on floating-point element types" or somesuch.

@@ +411,5 @@
>  
>      assertEq(sum, k);
> +
> +    // Values that are Uint32 but not Int32 and which trigger
> +    // different paths within the JIT.

Might be worth adding a few non-constant tests as well -- I could see this being converted into int32_t somehow and avoiding the input-is-double JIT code at some time.

@@ +506,5 @@
> +    CLONE(testThrowBinop)(f32);
> +    CLONE(testThrowBinop)(f64);
> +
> +    // Clear this, or the range tests fail
> +    v32[0] = 0;

Maybe add an assert that Atomics.load(a, 0) is 0 at the start of testRangeCAS as documentation?
Attachment #8593817 - Flags: review?(jwalden+bmo) → review+
Dougc: review ping on the patch for plain JS (comment 14, comment 30, comment 33).  Note that the code for UXTH/UXTAH landed separately and need not be reviewed as part of this patch.
Flags: needinfo?(dtc-moz)
Jukka, can you followup to Luke's question in comment 44?  I think he's waiting on a response before proceeding with reviews.
Flags: needinfo?(jujjyl)
(In reply to Lars T Hansen [:lth] from comment #32)
> (In reply to Douglas Crosher [:dougc] from comment #31)
> > Comment on attachment 8593830 [details] [diff] [review]
> > ARM support for asm.js
> > 
> > Review of attachment 8593830 [details] [diff] [review]:
> > -----------------------------------------------------------------

> > ::: js/src/jit/arm/Simulator-arm.cpp
> > @@ +2274,5 @@
> > >            }
> > > +          case Args_Double_IntDoubleDouble: {
> > > +            double dval0, dval1;
> > > +            int32_t ival;
> > > +            getFpArgs(&dval0, &dval1, &ival);
> > 
> > Could you double check that the use of getFpArgs is correct?
> 
> I did check before, but I will do so again.

Ah, it's arguably correct for hardfp but not for softfp (and the argument rearrangement you suggested would have made it correct in both cases).  Will fix this.
Comment on attachment 8593818 [details] [diff] [review]
ARM support for Plain JS

Review of attachment 8593818 [details] [diff] [review]:
-----------------------------------------------------------------

Not scrutinized, but it looks plausible and contained.
Attachment #8593818 - Flags: review?(dtc-moz) → review+
Comment on attachment 8593830 [details] [diff] [review]
ARM support for asm.js

Review of attachment 8593830 [details] [diff] [review]:
-----------------------------------------------------------------

Not scrutinized, but it looks plausible with the argument passing issue addressed, and its contained. Good to see this support added to the ARM backend, thank you.
Attachment #8593830 - Flags: review?(dtc-moz) → review+
Oh sorry, made a note in the back of my mind to analyze closer where the places are, but then forgot. Took a closer look now and sent email to you guys.
Flags: needinfo?(jujjyl)
(In reply to Lars T Hansen [:lth] from comment #20)
> Created attachment 8593830 [details] [diff] [review]
> ARM support for asm.js
> 
> ARM support for the asm.js patch that will appear shortly.

There are actually other bugs here.  I've started testing in earnest with ARMv6 (only ARMHWCAPS=vfp) and I'm running into issues with the way I use the callout protocol with Float64 returns, defineFixed does not like to be used with a floating-point register.
(In reply to Lars T Hansen [:lth] from comment #57)
> (In reply to Lars T Hansen [:lth] from comment #20)
> > Created attachment 8593830 [details] [diff] [review]
> > ARM support for asm.js
> > 
> > ARM support for the asm.js patch that will appear shortly.
> 
> There are actually other bugs here.  I've started testing in earnest with
> ARMv6 (only ARMHWCAPS=vfp) and I'm running into issues with the way I use
> the callout protocol with Float64 returns, defineFixed does not like to be
> used with a floating-point register.

Turned out this was only a problem with trying to capture the return value of a void instruction.  With that change, and with the proper fix for softfp, the ARM code passes all tests configured both as ARMv6 and ARMv7.
Flags: needinfo?(dtc-moz)
Comment on attachment 8593832 [details] [diff] [review]
Intel support for asm.js

Canceling for now to see if we ultimately end up needing this.
Attachment #8593832 - Flags: review?(luke)
Attachment #8593835 - Flags: review?(luke)
Attachment #8593836 - Flags: review?(luke)
Test case cleanup for asm.js was moved to bug 1175494.
We won't support these.  The generic code in these patches has been recycled in other contexts, and some of the rest will come back if/when we do int64 atomics.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
See Also: → 1377576
You need to log in before you can comment on or make changes to this bug.