Closed Bug 1136226 Opened 5 years ago Closed 3 years ago

Add SIMD.int8x16 and SIMD.int16x8 support to IonMonkey

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: ProgramFOX, Assigned: jolesen)

References

Details

Attachments

(41 files)

28.87 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
9.17 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
12.87 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
26.19 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
19.91 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
60.57 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
5.40 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
6.29 KB, patch
nbp
: review+
Details | Diff | Splinter Review
20.75 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
17.12 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
5.31 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
6.50 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
12.25 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
7.07 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
17.87 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
3.53 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
10.96 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
38.21 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
2.62 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
11.60 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
20.02 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
18.44 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
33.92 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
26.36 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
18.95 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
13.28 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
9.62 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
45.44 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
24.66 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
2.45 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
8.13 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
28.89 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
27.38 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
39.02 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
48.47 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
9.32 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
4.61 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
14.90 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
15.05 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
9.49 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
15.42 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
Bug 1124291 will add support for SIMD.int8x16 and SIMD.int16x8 to the interpreter, but not yet to IonMonkey.

I think this cannot be immediately done after bug 1124291 is finished, based on nbp's comment on bug 1124205 (which is about adding SIMD.float64x2 to IonMonkey):
> This should be added later, once the infrastructure for int32x4 and float32x4 is complete.
Depends on: 1226027
Blocks: 1244117
Emscripten has a small workaround currently in place to ignore its test suite failures since these types don't yet exist for asm.js: https://github.com/juj/emscripten/commit/c1b006d044ec2f47d70a31bafe635fa3f7e7c4a3
Working on this now. I'll be doing the signed and unsigned types simultaneously since most of the code is shared. I am adding asm.js / wasm support during development since asm.js is a good way of writing code generator test cases.
Assignee: nobody → jolesen
We're about to add 16-lane and 8-lane SIMD types to IonMonkey, so we don't want
an enum to identify lanes. Just use an unsigned instead.

Also note that SIMD.js no longer has the .x .y .z .w properties.

Change a few lane < 4 checks to use the number of lanes in the Simd type.
Attachment #8749246 - Flags: review?(sunfish)
Extend the SimdConstant class to handle other 128-bit SIMD vector types.

Remove some unused methods.
Attachment #8749247 - Flags: review?(sunfish)
This adds 8x16 and 16x8 types to MIRType, but it does not enable inlining of
the corresponding SIMD operations yet.

Explicitly disable the inlining of non-4-lane constructor calls until we have
support for them.
Attachment #8749248 - Flags: review?(sunfish)
Rename LIR instructions:

  LInt32x4   -> LSimd128Int
  LFloat32x4 -> LSimd128Float.

These two LIR instructions can be used to materialize 128-bit SIMD vectors of
other geometries too. Also rename the masm.loadConstant{Int,Float}32x4()
functions to indicate that they can be used for other geometries.
Attachment #8749249 - Flags: review?(sunfish)
In both classes, rename enums:

  INT32X4   -> SIMD128INT
  FLOAT32X4 -> SIMD128FLOAT

Catch the new MIRTypes to produce SIMD128INT.
Attachment #8749250 - Flags: review?(sunfish)
Load, store, and move methods are generic for all 128-bit SIMD types. Keep the
difference between float and int vectors since x86 sometimes cares.
Attachment #8749251 - Flags: review?(sunfish)
Replace the existing int32x4Constant and float32x4Constant masm methods with a
generic simd128Constant.
Attachment #8749252 - Flags: review?(sunfish)
We have a lot of SIMD types in Ion now, and there is little benefit in
maintaining a separate bailout kind for each type. Instead, use a single
Bailout_NonSimd bailout kind to indicate that SimdUnbox didn't get the SIMD
type it was expecting.
Attachment #8749253 - Flags: review?(nicolas.b.pierron)
Constructors and factories take lane lists as arrays instead of four separate
lane arguments.
Attachment #8749254 - Flags: review?(bbouvier)
This MIR instruction will be used for splatting all vector shapes.
Attachment #8749255 - Flags: review?(bbouvier)
Support tests using all SIMD types.
Attachment #8749256 - Flags: review?(bbouvier)
Comment on attachment 8749246 [details] [diff] [review]
Remove SimdLane enumeration.

Review of attachment 8749246 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. The x/y/z/w conventions and the enum were added before int8x16/int16x8 were part of SIMD.js.
Attachment #8749246 - Flags: review?(sunfish) → review+
Comment on attachment 8749247 [details] [diff] [review]
Support 8x16 and 16x8 types in SimdConstant.

Review of attachment 8749247 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/SIMD.cpp
@@ +172,5 @@
>      if (!IsVectorObject<V>(v))
>          return ErrorWrongTypeArg(cx, 1, typeDescr);
>  
>      Elem* mem = reinterpret_cast<Elem*>(v.toObject().as<TypedObject>().typedMem());
> +    *out = jit::SimdConstant::Create(mem);

Calling "Create" with a pointer and no length looks unusual. Looking at the code, I see it infers the length from the element type and assuming that the SIMD type is 128-bit. I think I'd suggest calling this function CreateSimd128, so that it's clear here that it makes that assumption.
Attachment #8749247 - Flags: review?(sunfish) → review+
Comment on attachment 8749248 [details] [diff] [review]
Add 16x8 and 8x16 MIRTypes.

Review of attachment 8749248 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/IonTypes.h
@@ +440,5 @@
>      Bool32x4  = Boolean | (2 << VECTOR_SCALE_SHIFT),
>      Doublex2  = Double  | (1 << VECTOR_SCALE_SHIFT)
>  };
>  
> +static inline MIRType SimdTypeToLaneType(MIRType type);

It looks like it'd be a little tidier to just move SimdTypeToLaneType's definition to this point, rather than just declaring it here.

@@ +618,5 @@
>  
>  static inline bool
>  IsSimdType(MIRType type)
>  {
> +    return ((unsigned(type) >> VECTOR_SCALE_SHIFT) & VECTOR_SCALE_MASK) != 0;

Cool!
Attachment #8749248 - Flags: review?(sunfish) → review+
Comment on attachment 8749249 [details] [diff] [review]
Materialize 8x16 and 16x8 SIMD constants.

Review of attachment 8749249 [details] [diff] [review]:
-----------------------------------------------------------------

On the Float side, we can worry about whether movaps vs movapd matters for float64x2 when we add full support for float64x2, so this looks good.
Attachment #8749249 - Flags: review?(sunfish) → review+
Comment on attachment 8749250 [details] [diff] [review]
Update LDefinition and MoveOp for 8x16 and 16x8.

Review of attachment 8749250 [details] [diff] [review]:
-----------------------------------------------------------------

Ditto for float64x2 and movapd :-).
Attachment #8749250 - Flags: review?(sunfish) → review+
Attachment #8749251 - Flags: review?(sunfish) → review+
Comment on attachment 8749252 [details] [diff] [review]
Add masm.simd128Constant().

Review of attachment 8749252 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86-shared/BaseAssembler-x86-shared.h
@@ +3399,5 @@
>      {
> +        const uint8_t* b = reinterpret_cast<const uint8_t*>(data);
> +        spew(".simd128 %02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
> +             b[0], b[1], b[2], b[3], b[4], b[5], b[6], b[7],
> +             b[8], b[9], b[10], b[11], b[12], b[13], b[14], b[15]);

The spew strings are actually valid GAS syntax (bugs aside), and it is occasionally useful this to feed spew output into GAS in order to check encodings (though it does take some work to do this). .simd128 unfortunately isn't a GAS directive.

I kind of like printing it as 4 32-bit int values, and if we use hex then it'd still be pretty usable for int8x16 and int16x8 cases too:

".int 0x%08x,0x%08x,0x%08x,0x%08x" (passing it the data as 4 int32 values)

Other options could be:

".byte 0x%02x,0x%02x,0x%02x,..."

".ascii \x%02x\x%02x\x%02x..."
Attachment #8749252 - Flags: review?(sunfish) → review+
Comment on attachment 8749255 [details] [diff] [review]
Rename MSimdSplatX4 to MSimdSplat.

Review of attachment 8749255 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks!

::: js/src/jit/shared/LIR-shared.h
@@ +200,3 @@
>  {
>    public:
>      LIR_HEADER(SimdSplatX4)

I don't know if you've planned this for a subsequent patch, but feel free to rename LSimdSplatX4 into LSimdSplat too, if this makes sense.
Attachment #8749255 - Flags: review?(bbouvier) → review+
Comment on attachment 8749256 [details] [diff] [review]
Fix jit-test shell for SIMD.

Review of attachment 8749256 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.

::: js/src/jit-test/lib/simd.js
@@ +95,5 @@
>  
> +    assertFunc(Type.extractLane(vec, 0), arr[0]);
> +    assertFunc(Type.extractLane(vec, 1), arr[1]);
> +    assertFunc(Type.extractLane(vec, 2), arr[2]);
> +    assertFunc(Type.extractLane(vec, 3), arr[3]);

Much better!
Attachment #8749256 - Flags: review?(bbouvier) → review+
Comment on attachment 8749254 [details] [diff] [review]
Make MSimdSwizzle and MSimdShuffle length-agnostic.

Review of attachment 8749254 [details] [diff] [review]:
-----------------------------------------------------------------

Much simpler, thank you.

::: js/src/jit/FixedList.h
@@ +84,5 @@
>          MOZ_ASSERT(index < length_);
>          return list_[index];
>      }
>  
> +    T* data() {

See comment in MIR.cpp; i think we can get rid of this one.

::: js/src/jit/MIR.cpp
@@ +1209,3 @@
>  
>      MOZ_ASSERT(numVectors() == 2);
> +    return MSimdShuffle::New(alloc, vector(0), vector(1), lanes.data());

Here and above, could we simply use lanes.begin()? Then I think we can remove the data() helper on the FixedList. It is ok, at least to me, to use "begin()" when meaning "data()" (I think STL does that).

::: js/src/jit/MIR.h
@@ +1972,4 @@
>      {
> +        arity_ = SimdTypeToLength(type);
> +        for (unsigned i = 0; i < arity_; i++) {
> +            MOZ_ASSERT(lanes[i] < 2 * arity_);

This is too lax for swizzle: it should be lanes[i] < arity_. Maybe the value to check against could be passed as a DebugOnly<uint32_t>? Or the assertion could be in the children's constructors.

@@ +1995,5 @@
>  };
>  
>  // Applies a shuffle operation to the input, putting the input lanes as
>  // indicated in the output register's lanes. This implements the SIMD.js
>  // "shuffle" function, that takes one vector and one mask.

Pre-existing: s/shuffle/swizzle/ in this comment

@@ +2141,5 @@
>          // In the balanced case, swap operands if needs be, in order to be able
>          // to do only one vshufps on x86.
> +        unsigned lanesFromLHS = 0;
> +        for (unsigned i = 0; i < arity; i++)
> +            if (lanes[i] < arity)

nit: Not sure what our style guide says about this kind of situation, but I would put braces for the `for` body here, just to ensure we don't fall into a weird openssl-like situation later.

@@ +2145,5 @@
> +            if (lanes[i] < arity)
> +                lanesFromLHS++;
> +
> +        if (lanesFromLHS < arity / 2 || (arity == 4 && lanesFromLHS == 2 &&
> +                                         lanes[0] >= 4 && lanes[1] >= 4)) {

uber-nit: can you put the second part of the || in its own line, please? (or put it first, then put the `lanesFromLHS < arity / 2` on the second line)

@@ +2154,4 @@
>          }
>  
>          // If all lanes come from the same vector, just use swizzle instead.
> +        if (lanesFromLHS == arity)

Nice, it's much simpler to think in terms of arity.
Attachment #8749254 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #22)
> Comment on attachment 8749254 [details] [diff] [review]
> Make MSimdSwizzle and MSimdShuffle length-agnostic.

Thanks, Ben!

> ::: js/src/jit/FixedList.h
> @@ +84,5 @@
> >          MOZ_ASSERT(index < length_);
> >          return list_[index];
> >      }
> >  
> > +    T* data() {
> 
> See comment in MIR.cpp; i think we can get rid of this one.
> 
> ::: js/src/jit/MIR.cpp
> @@ +1209,3 @@
> >  
> >      MOZ_ASSERT(numVectors() == 2);
> > +    return MSimdShuffle::New(alloc, vector(0), vector(1), lanes.data());
> 
> Here and above, could we simply use lanes.begin()? Then I think we can
> remove the data() helper on the FixedList. It is ok, at least to me, to use
> "begin()" when meaning "data()" (I think STL does that).

The begin() and end() methods return iterators, while the data() method returns a pointer. It is common in C++ standard libraries to implement even std::vector iterators as classes since iterators are not guaranteed to support all the same things a pointer can do. Some libraries use extended debug-mode iterators that can perform bounds checking too.

Of course FixedList is our own class, and we can do whatever we want, but I would prefer to keep the separation between iterators and pointers, even though this class uses pointers as its iterators.

> ::: js/src/jit/MIR.h
> @@ +1972,4 @@
> >      {
> > +        arity_ = SimdTypeToLength(type);
> > +        for (unsigned i = 0; i < arity_; i++) {
> > +            MOZ_ASSERT(lanes[i] < 2 * arity_);
> 
> This is too lax for swizzle: it should be lanes[i] < arity_. Maybe the value
> to check against could be passed as a DebugOnly<uint32_t>? Or the assertion
> could be in the children's constructors.

Thanks, good catch. I'll move the checks to the children.
https://hg.mozilla.org/integration/mozilla-inbound/rev/48e12dc46e554777b2048cbf515cd4f808364fb7
Bug 1136226 - Remove SimdLane enumeration. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/55710af96f41086584b3c9065e552ec129f23a96
Bug 1136226 - Support 8x16 and 16x8 types in SimdConstant. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/5de8e799eae3901a46aba497de1601be02c61a1a
Bug 1136226 - Add 16x8 and 8x16 MIRTypes. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/2d6da6b2c98ab18cbb2f071c7e623180d48bd218
Bug 1136226 - Materialize 8x16 and 16x8 SIMD constants. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/c515aeae0c850113c1127a0db4f3a72e8822f71e
Bug 1136226 - Update LDefinition and MoveOp for 8x16 and 16x8. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/f63aa4a372a4af819f067d0aad23cf60a14619f6
Bug 1136226 - Rename 32x4 SIMD masm methods to "Simd128". r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/bb54ae23b22a7434ffef46e9678731338971d1bd
Bug 1136226 - Add masm.simd128Constant(). r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/117800523a0368e2a98d90befbc9c15854c7ce6c
Bug 1136226 - Make MSimdSwizzle and MSimdShuffle length-agnostic. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/a8a0b4464969d0b76cb23556e6bc940d74e481db
Bug 1136226 - Rename MSimdSplatX4 to MSimdSplat. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/f4b3c6c334e9bacbd278dfb4cd5aeee59098ef9d
Bug 1136226 - Fix jit-test shell for SIMD. r=bbouvier
Keywords: leave-open
Comment on attachment 8749253 [details] [diff] [review]
Unify Bailout_NonSimd*Input.

Review of attachment 8749253 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay, I was off for a few days.

::: js/src/jit/BaselineBailouts.cpp
@@ +1887,5 @@
>        case Bailout_NonBooleanInput:
>        case Bailout_NonObjectInput:
>        case Bailout_NonStringInput:
>        case Bailout_NonSymbolInput:
> +      case Bailout_NonSimdInput:

Note, we use this bailout to tag a bailout which attempt to unbox a Simd input of one particular type.  Thus we can technically bail on a Bool8x16 and succeed on a Float32x4.

Thus I think this should be renamed as well to something like:
  Bailout_UnexpectedSimdInput
Attachment #8749253 - Flags: review?(nicolas.b.pierron) → review+
These are used in the asm.js / wasm code generator.

Also include addSaturate and subSaturate in the SimdOperation enum. They had
been left out by accident.
Attachment #8750870 - Flags: review?(sunfish)
SIMD constructors with dynamic arguments are translated to a sequence of
MSimdInsertElement instructions, LLVM style. This may change if we need the
optimization.
Attachment #8750874 - Flags: review?(sunfish)
Almost entirely boilerplate.

Note that IsSimdValidOperationType() does not yet accept 8x16 and 16x8 SIMD
operations which means that none of the new code is reachable yet. We'll turn
this on once IonMonkey has full support for the new SIMD types.
Attachment #8750876 - Flags: review?(bbouvier)
Comment on attachment 8750876 [details] [diff] [review]
Asm.js: Add small integer SIMD types.

Review of attachment 8750876 [details] [diff] [review]:
-----------------------------------------------------------------

Very boilerplatey, indeed. Thank you!

::: js/src/asmjs/AsmJS.cpp
@@ +975,5 @@
> +    {
> +        return which_ == Int8x16 || which_ == Uint8x16 || which_ == Int16x8 ||
> +            which_ == Uint16x8 || which_ == Int32x4 || which_ == Uint32x4 ||
> +            which_ == Float32x4 || which_ == Bool8x16 || which_ == Bool16x8 ||
> +            which_ == Bool32x4;

This, or you can add two enum values to Which: FirstSimdType = Int8x16, LastSimdType = Bool32x4, and then just compare FirstSmdType <= which_ <= LastSimdType.

@@ +2518,5 @@
> +        int8_t val[16];
> +        for (size_t i = 0; i < 16; i++, arg = NextNode(arg)) {
> +            uint32_t u32;
> +            JS_ALWAYS_TRUE(IsLiteralInt(m, arg, &u32));
> +            val[i] = int8_t(u32);

There is an implicit coercion here. Should we make this explicit and require that e.g. int8x16 lanes literals must be uint8 values in the first place? Otherwise, you could have pretty weird situations like SIMD.Int8x16(0x7fffffff, etc.). Not too sure about this though.

@@ +2520,5 @@
> +            uint32_t u32;
> +            JS_ALWAYS_TRUE(IsLiteralInt(m, arg, &u32));
> +            val[i] = int8_t(u32);
> +        }
> +        MOZ_ASSERT(arg == nullptr);

I'd prefer MOZ_ASSERT(!arg); here and below, but that's up to your taste.

@@ +2521,5 @@
> +            JS_ALWAYS_TRUE(IsLiteralInt(m, arg, &u32));
> +            val[i] = int8_t(u32);
> +        }
> +        MOZ_ASSERT(arg == nullptr);
> +        NumLit::Which w = type == SimdType::Uint8x16 ? NumLit::Uint8x16 : NumLit::Int8x16;

Pre-existing, but as there's a coercion from SimdType to Type, and Type enum values are the same as NumLit, could we do NumLit::Which w = NumLit::Which(Type(type)); and maybe inline its use?

@@ +2718,5 @@
> +          case SimdOperation::Fn_fromInt8x16Bits:    return Expr::Limit;
> +          default: break;
> +        }
> +        MOZ_FALLTHROUGH;
> +      case SimdType::Int8x16: {

nit, here and below, pre-existing: no need for the {} here

@@ +7532,4 @@
>  #define SET_NATIVE_UINT32X4(op) case SimdOperation::Fn_##op: native = simd_uint32x4_##op; break;
>  #define SET_NATIVE_FLOAT32X4(op) case SimdOperation::Fn_##op: native = simd_float32x4_##op; break;
> +#define SET_NATIVE_BOOL8x16(op) case SimdOperation::Fn_##op: native = simd_bool8x16_##op; break;
> +#define SET_NATIVE_BOOL16x8(op) case SimdOperation::Fn_##op: native = simd_bool16x8_##op; break;

uber nit, but can you please make the capitalization of the "x" consistent here? I actually prefer it to be lowercase, so that would mean changing the preexisting macro names.
Attachment #8750876 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #36)
> Comment on attachment 8750876 [details] [diff] [review]
> Asm.js: Add small integer SIMD types.

Thanks, Ben!

> @@ +2518,5 @@
> > +        int8_t val[16];
> > +        for (size_t i = 0; i < 16; i++, arg = NextNode(arg)) {
> > +            uint32_t u32;
> > +            JS_ALWAYS_TRUE(IsLiteralInt(m, arg, &u32));
> > +            val[i] = int8_t(u32);
> 
> There is an implicit coercion here. Should we make this explicit and require
> that e.g. int8x16 lanes literals must be uint8 values in the first place?
> Otherwise, you could have pretty weird situations like
> SIMD.Int8x16(0x7fffffff, etc.). Not too sure about this though.

I don't really have a strong opinion. Nagy at Microsoft is writing a more formal specification of SIMD in asm.js, maybe we should wait for him?

FWIW, the corresponding JS constructor call has the same coercion via the ToInt8() and ToInt16() functions which extract the low bits. Also note that IsLiteralInt() already coerces negative literals into a u32.

Given the JS precedent, I don't think there is a lot of value in restricting asm.js further than requiring 32-bit literals in the 1.5x range INT_MIN - UINT_MAX, which is what IsLiteralInt() does.

> @@ +2521,5 @@
> > +            JS_ALWAYS_TRUE(IsLiteralInt(m, arg, &u32));
> > +            val[i] = int8_t(u32);
> > +        }
> > +        MOZ_ASSERT(arg == nullptr);
> > +        NumLit::Which w = type == SimdType::Uint8x16 ? NumLit::Uint8x16 : NumLit::Int8x16;
> 
> Pre-existing, but as there's a coercion from SimdType to Type, and Type enum
> values are the same as NumLit, could we do NumLit::Which w =
> NumLit::Which(Type(type)); and maybe inline its use?

It would be safe in this code, but it seems dangerous in general to cast between different enums with partially overlapping ranges. I see it's done in Type::lit(), but that is inside the Type class, and it is from the smaller to the larger enum.

I am mostly worried about what happens if somebody changes NumLit and Type in the future.

> @@ +7532,4 @@
> >  #define SET_NATIVE_UINT32X4(op) case SimdOperation::Fn_##op: native = simd_uint32x4_##op; break;
> >  #define SET_NATIVE_FLOAT32X4(op) case SimdOperation::Fn_##op: native = simd_float32x4_##op; break;
> > +#define SET_NATIVE_BOOL8x16(op) case SimdOperation::Fn_##op: native = simd_bool8x16_##op; break;
> > +#define SET_NATIVE_BOOL16x8(op) case SimdOperation::Fn_##op: native = simd_bool16x8_##op; break;
> 
> uber nit, but can you please make the capitalization of the "x" consistent
> here? I actually prefer it to be lowercase, so that would mean changing the
> preexisting macro names.

He he, clearly most of this patch was made with s/32x4/8x16/g.
Depends on: 1271959
Attachment #8750870 - Flags: review?(sunfish) → review+
Attachment #8750871 - Flags: review?(sunfish) → review+
Attachment #8750872 - Flags: review?(sunfish) → review+
Attachment #8750873 - Flags: review?(sunfish) → review+
Comment on attachment 8750874 [details] [diff] [review]
Wasm: Generate code for small int SIMD constructors.

Review of attachment 8750874 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/asmjs/WasmIonCompile.cpp
@@ +2578,5 @@
> +        MDefinition* scalar = 0;
> +        if (!f.iter().readSimdCtorArg(ValType::I32, length, i, &scalar))
> +            return false;
> +        val = f.insertElementSimd(val, scalar, i, type);
> +    }

This is missing readSimdCtorArgsEnd and readSimdCtorReturn calls, such as are used in the ValType::I32x4 case below.

@@ +2594,5 @@
> +        MDefinition* scalar = 0;
> +        if (!f.iter().readSimdCtorArg(ValType::I32, length, i, &scalar))
> +            return false;
> +        val = f.insertElementSimd(val, EmitSimdBooleanLaneExpr(f, scalar), i, type);
> +    }

Same here.

@@ +2609,5 @@
>      switch (type) {
> +      case ValType::I8x16:
> +        return EmitSimdChainedCtor(f, MIRType::Int8x16, SimdConstant::SplatX16(0));
> +      case ValType::I16x8:
> +        return EmitSimdChainedCtor(f, MIRType::Int16x8, SimdConstant::SplatX8(0));

Would it make sense to migrate the I32x4 and F32x4 cases to use this same approach, for consistency?
Attachment #8750874 - Flags: review?(sunfish) → review+
Attachment #8750875 - Flags: review?(sunfish) → review+
(In reply to Dan Gohman [:sunfish] from comment #39)
> Comment on attachment 8750874 [details] [diff] [review]
> Wasm: Generate code for small int SIMD constructors.

Thanks, Dan.

> @@ +2609,5 @@
> >      switch (type) {
> > +      case ValType::I8x16:
> > +        return EmitSimdChainedCtor(f, MIRType::Int8x16, SimdConstant::SplatX16(0));
> > +      case ValType::I16x8:
> > +        return EmitSimdChainedCtor(f, MIRType::Int16x8, SimdConstant::SplatX8(0));
> 
> Would it make sense to migrate the I32x4 and F32x4 cases to use this same
> approach, for consistency?

I thought about doing that, but I didn't want to cause any regressions.

For an F32x4 constructor, we currently use three vunpcklps instructions to shuffle things into place. Expanded like above, we would generate four vinsertps instead which is OK, except they require SSE 4.1. Without SSE 4.1, visitSimdInsertElementF falls back to stack bounces.

Similarly, for I32x4 we already generate four vpinsrd with SSE 4.1, but without we do a stack bounce. However, it's a single stack bounce (four 32-bit stores, one 128-bit load) and expanding the constructor into replaceLane calls would create 4 x (128-bit store, 32-bit store, 128-bit load).

For I16x8, it is always safe to expand the constructor into replaceLanes since vpinsrw is available since SSE 2.

For I8x16, we generate pretty bad code without SSE 4.1 for vpinsrb, but the register allocator would choke on a 16-input constructor instruction anyway.

It will be interesting to see if these SIMD constructors are used in real-world code, and how. If they are often used with many constant arguments, there is an opportunity to convert that into replaceLane operations on a constant initial vector.
More tests will be added as operations are implemented.
Attachment #8756049 - Flags: review?(bbouvier)
Use vpinsrw to insert 16x8 lanes. This instruction is available since SSE2, so
it can be used unconditionally.
Attachment #8756051 - Flags: review?(bbouvier)
- Implement 'not' and 'neg' for 8x16 and 16x8 types.
- Rename some 'bitwiseFooX4' masm functions to 'bitwiseFooSimd128'.
- Rename the zeroInt32x4 and zeroFloat32x4 to zeroSimd128{Int,Float}.
- Add support for the paddb/paddw and psubb/psubw SSE2 instructions in the
  assembler.
Attachment #8756052 - Flags: review?(bbouvier)
- Rename LSimdBinaryBitwiseX4 to LSimdBinaryBitwise and use it for all types.
- Add pmullw to the assembler for 16x8 multiplies.

Don't implement 8x16 multiplies just yet. There are no SSE instructions for
that operation, so they need to be synthesied from 16x8 multiplies and shuffles.
Attachment #8756053 - Flags: review?(bbouvier)
These all have corresponding SSE instructions.

The 8x16 shifts don't have SSE instructions, so they will be added by the next
commit.
Attachment #8756054 - Flags: review?(bbouvier)
These operations don't have SSE instructions, so express them in terms of 16x8
shifts in MSimdShift::AddLegalized.
Attachment #8756055 - Flags: review?(bbouvier)
There are no 8x16 SSE multiply instructions, so expand these multiplies in
terms of 16x8 multiplies.
Attachment #8756056 - Flags: review?(bbouvier)
The scalar argument to this operation is expanded into MIR as either -1 or 0 in
an Int32, so the 4-lane splat produces the correct result for 8-lane and
16-lane splats too. Either an all-zeroes vector or an all-ones vector.
Attachment #8756058 - Flags: review?(bbouvier)
Remove a normalizing shift of the boolean selector. Boolean SIMD types are
guaranteed to be represented as all-ones.
Attachment #8756060 - Flags: review?(sunfish)
When we have SSSE3 available, the pshufb instruction can perform any byte-wise
swizzle.

Without SSSE3, fall back to using byte-wise loads and stores to simulate the
swizzle. This applies to CPUs from before 2006.
Attachment #8756061 - Flags: review?(sunfish)
When SSSE3 is available, two pshufb instructions can be combined to form any
shuffle.

Old machines without SSSE3 bounce the two vectors through the stack.
Attachment #8756062 - Flags: review?(sunfish)
Since SSE doesn't have unsigned comparisons, add a bias vector and use the
signed comparisons instead, just like we do for the 32x4 unsigned vectors.

Use 'defineReuseInput' even when SIMD input and output types differ. This is
fine now since the register allocator uses a single Simd128 class for all SIMD
registers.
Attachment #8756063 - Flags: review?(sunfish)
Add a new asm.js test which exercises all the possible bitcast operations as
well as all possible load and store operations, except the partial ones.

Add new Scalar::I8x16, I16x8 enumerators.

Remove the fromUintXXX opcodes from the *_ASMJS macros. This avoids gerating
unwanted wasm opcodes. Include the relevant unsigned bit-casts explicitly where
needed.
Attachment #8756065 - Flags: review?(sunfish)
Add support for inlining addSaturate and subSaturate SIMD operations when
compiling SIMD.js.
Attachment #8756066 - Flags: review?(sunfish)
Use the same strategy as we do for asm.js: Start from an initial vector and
insert one lane at a time.
Attachment #8756067 - Flags: review?(sunfish)
Some MIR instructions have an AddLegalized() static function which much be used
when creating new instructions. This ensures that the proper expansions are
generated.

Make the New() factory method in those instructions private so it isn't used
inadvertently.

De-templatize the inlineSimdBinary<> function since it was only used for two
MIR instructions which are now created differently.
Attachment #8756068 - Flags: review?(sunfish)
Comment on attachment 8756048 [details] [diff] [review]
Asm.js: Enable small integer types.

Review of attachment 8756048 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! Just a technicality: having this patch being pushed before the others create a window in which fuzzers can trip and for which they'll hit MOZ_CRASH(NYI). Can you move this patch in the patch queue to be after all the opcodes have been implemented, please?
Attachment #8756048 - Flags: review?(bbouvier) → review+
Comment on attachment 8756049 [details] [diff] [review]
Asm.js: Test cases for small integer SIMD types.

Review of attachment 8756049 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! Comments apply to both files.

::: js/src/jit-test/tests/asm.js/testSIMD-8x16.js
@@ +5,5 @@
> +
> +// Set to true to see more JS debugging spew.
> +const DEBUG = false;
> +
> +if (!isSimdAvailable() || typeof SIMD === 'undefined') {

The second part is tested in libdir/simd.js and can be removed from here.

@@ +33,5 @@
> +
> +// Linking
> +assertEq(asmLink(asmCompile('glob', USE_ASM + I8x16 + "function f() {} return f"), {SIMD:{Int8x16: SIMD.Int8x16}})(), undefined);
> +assertEq(asmLink(asmCompile('glob', USE_ASM + U8x16 + "function f() {} return f"), {SIMD:{Uint8x16: SIMD.Uint8x16}})(), undefined);
> +assertEq(asmLink(asmCompile('glob', USE_ASM + B8x16 + "function f() {} return f"), {SIMD:{Bool8x16: SIMD.Bool8x16}})(), undefined);

Can you test linking fails against the wrong functions?

@@ +68,5 @@
> +assertAsmTypeFail('glob', USE_ASM + B8x16 + "function f() {var x=b8x16(1,0,0,0);} return f");
> +assertAsmTypeFail('glob', USE_ASM + B8x16 + "function f() {var x=b8x16(1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1.0);} return f");
> +assertAsmTypeFail('glob', USE_ASM + B8x16 + "function f() {var x=b8x16(1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1|0);} return f");
> +assertAsmTypeFail('glob', USE_ASM + B8x16 + "function f() {var x=b8x16(1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1);} return f");
> +assertEq(asmLink(asmCompile('glob', USE_ASM + B8x16 + "function f() {var x=b8x16(1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1);} return f"), this)(), undefined);

Can you test values other than 0, 1 here too, please? (In particular -1)

@@ +74,5 @@
> +// Only signed Int8x16 allowed as return value.
> +assertEqVecArr(asmLink(asmCompile('glob', USE_ASM + I8x16 + "function f() {return i8x16(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16);} return f"), this)(),
> +           [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]);
> +assertEqVecArr(asmLink(asmCompile('glob', USE_ASM + I8x16 + I8x16CHK + "function f() {return i8x16chk(i8x16(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16));} return f"), this)(),
> +           [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]);

Can you add a test with a vector that has (INT8X16_MAX + 1) in one component, and check that it gets wrapped?
Attachment #8756049 - Flags: review?(bbouvier) → review+
Comment on attachment 8756050 [details] [diff] [review]
Implement MSimdExtractElement for small integer types.

Review of attachment 8756050 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you! Nice tests.

::: js/src/jit/Lowering.cpp
@@ +4400,5 @@
>              MOZ_ASSERT(ins->signedness() == SimdSign::Unsigned);
>              define(new (alloc()) LSimdExtractElementU2D(use, temp()), ins);
>          } else {
> +            auto* lir = new (alloc()) LSimdExtractElementI(use);
> +#if defined(JS_CODEGEN_X86)

Lowering should not have platform dependent code. I think you will need an implementation in Lowering-x86-shared. Or ignore ARM at the moment and remove this #ifdef, until we implement ARM.

@@ +4407,5 @@
> +            // version of these instructions require a source register that is
> +            // %al, %bl, %cl, or %dl.
> +            // Fix it to %eax since we can't express that constraint better.
> +            if (ins->input()->type() == MIRType::Int8x16) {
> +                defineFixed(lir, ins, LAllocation(AnyRegister(eax)));

As there are other instructions which need eax as fixed but don't have other options, would it make sense to use ebx instead, to maybe lower register pressure in some places?

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +2690,5 @@
> +            signedness = SimdSign::NotApplicable;
> +    } else {
> +        // Extract the relevant 16 bits containing our lane, then shift the
> +        // right 8 bits into place.
> +        emitSimdExtractLane16x8(input, output, lane / 2, SimdSign::Unsigned);

As the last parameter is unused, can you use NotApplicable? I wondered why Unsigned was used here, just to find out it's a trick to not trigger signedness = SimdSign::Signed above.

@@ +2699,5 @@
> +                signedness = SimdSign::NotApplicable;
> +        }
> +    }
> +
> +    // We have the right low 8 bits in \output|, but we may need to fix the high

nit: |output|

@@ +2752,4 @@
>  
> +    switch (length) {
> +      case 4:
> +        emitSimdExtractLane32x4(input, output, mir->lane());

Maybe MOZ_ASSERT(mir->signedness() == Signed)?
Attachment #8756050 - Flags: review?(bbouvier) → review+
Comment on attachment 8756050 [details] [diff] [review]
Implement MSimdExtractElement for small integer types.

Review of attachment 8756050 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry I spotted something else.

::: js/src/jit/x86-shared/Assembler-x86-shared.h
@@ +2115,5 @@
> +    void vpextrb(unsigned lane, FloatRegister src, Register dest) {
> +        MOZ_ASSERT(HasSSE41());
> +        masm.vpextrb_irr(lane, src.encoding(), dest.encoding());
> +    }
> +    void vpextrb(unsigned lane, FloatRegister src, const Operand& dest) {

This variant is actually unused, can we remove it? (unless you need it in a subsequent patch or have plans to use it somehow). In this case, can we remove vpextrb_irm too?
Comment on attachment 8756051 [details] [diff] [review]
Implement MSimdInsertElement for small integer types.

Review of attachment 8756051 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you!

::: js/src/jit-test/tests/asm.js/testSIMD-8x16.js
@@ +128,5 @@
> +for (var i = 0; i < 16; i++) {
> +    var f = replaceI(a, i);
> +    var b = a.slice(0);
> +    b[i] = 0;
> +    assertEqVecArr(f(0), b);

nit: can you use a value more interesting than zero, please? :} Like, a signed one? (applies for the other test too)

::: js/src/jit/x86-shared/Assembler-x86-shared.h
@@ +2098,5 @@
> +    void vpinsrb(unsigned lane, Register src1, FloatRegister src0, FloatRegister dest) {
> +        MOZ_ASSERT(HasSSE41());
> +        masm.vpinsrb_irr(lane, src1.encoding(), src0.encoding(), dest.encoding());
> +    }
> +    void vpinsrb(unsigned lane, const Operand& src1, FloatRegister src0, FloatRegister dest) {

Can we remove this variant? It seems unused. (and vpinsrb_imr will be too)
Attachment #8756051 - Flags: review?(bbouvier) → review+
Comment on attachment 8756052 [details] [diff] [review]
Unary functions for small integer SIMD types.

Review of attachment 8756052 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Muuuuuch boilerplate.

::: js/src/jit-test/tests/asm.js/testSIMD-16x8.js
@@ +180,5 @@
> +
> +function unaryB(opname, lanefunc) {
> +    simdfunc = asmLink(asmCompile('glob', USE_ASM + B16x8 + B16x8CHK +
> +                                  `var fut = b16x8.${opname};` +
> +                                  'function f(v) { v = b16x8chk(v); return fut(v); } return f'), this);

Same question as for the other test file.

::: js/src/jit-test/tests/asm.js/testSIMD-8x16.js
@@ +181,5 @@
> +
> +function unaryB(opname, lanefunc) {
> +    simdfunc = asmLink(asmCompile('glob', USE_ASM + B8x16 + B8x16CHK +
> +                                  `var fut = b8x16.${opname};` +
> +                                  'function f(v) { v = b8x16chk(v); return fut(v); } return f'), this);

I thought only signed integer (non bool) vectors could be returned from asm.js to the outside world? Or is it that the "not" operation return an i8x16? Either way, isn't there something to fix here?

::: js/src/jit/Lowering.cpp
@@ +4556,5 @@
>  
>      // Cannot be at start, as the ouput is used as a temporary to store values.
>      LUse in = use(ins->input());
>  
> +    if (ins->type() == MIRType::Int8x16 || ins->type() == MIRType::Bool8x16) {

Maybe we could switch on ins->type(). Or maybe we don't need to.

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +3571,5 @@
> +        masm.packedSubInt16(in, out);
> +        return;
> +      case MSimdUnaryArith::not_:
> +        masm.loadConstantSimd128Int(allOnes, out);
> +        masm.bitwiseXorSimd128(in, out);

Random question: as not_ is independent from the input's length, could/should we common it out somewhere?
Attachment #8756052 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #66)
> Comment on attachment 8756048 [details] [diff] [review]
> Asm.js: Enable small integer types.
> 
> Review of attachment 8756048 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks! Just a technicality: having this patch being pushed before the
> others create a window in which fuzzers can trip and for which they'll hit
> MOZ_CRASH(NYI). Can you move this patch in the patch queue to be after all
> the opcodes have been implemented, please?

Good point. Currently, the patches implementing operations are also adding asm.test cases to the two test files. I'll roll up all the test cases into one commit and add in this order:

1. Implement new operations.
2. Enable inlining of new types for JS JIT.
3. Enable new types and operations for asm.js
4. Single commit with asm.js tests.

This should keep the tree passing at all intermediate stages without any NYI crashers.
(In reply to Benjamin Bouvier [:bbouvier] from comment #68)
> Comment on attachment 8756050 [details] [diff] [review]
> Implement MSimdExtractElement for small integer types.
> 
> Review of attachment 8756050 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thank you! Nice tests.
> 
> ::: js/src/jit/Lowering.cpp
> @@ +4400,5 @@
> >              MOZ_ASSERT(ins->signedness() == SimdSign::Unsigned);
> >              define(new (alloc()) LSimdExtractElementU2D(use, temp()), ins);
> >          } else {
> > +            auto* lir = new (alloc()) LSimdExtractElementI(use);
> > +#if defined(JS_CODEGEN_X86)
> 
> Lowering should not have platform dependent code. I think you will need an
> implementation in Lowering-x86-shared. Or ignore ARM at the moment and
> remove this #ifdef, until we implement ARM.

Ugh, I though you might say that ;-)

I agree, this is a bit gross. I'll move it to x86-shared.

> @@ +4407,5 @@
> > +            // version of these instructions require a source register that is
> > +            // %al, %bl, %cl, or %dl.
> > +            // Fix it to %eax since we can't express that constraint better.
> > +            if (ins->input()->type() == MIRType::Int8x16) {
> > +                defineFixed(lir, ins, LAllocation(AnyRegister(eax)));
> 
> As there are other instructions which need eax as fixed but don't have other
> options, would it make sense to use ebx instead, to maybe lower register
> pressure in some places?

Yes, that would make sense. Some shift instructions use ecx hardcoded, so ebx is a good choice.

> ::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
> @@ +2690,5 @@
> > +            signedness = SimdSign::NotApplicable;
> > +    } else {
> > +        // Extract the relevant 16 bits containing our lane, then shift the
> > +        // right 8 bits into place.
> > +        emitSimdExtractLane16x8(input, output, lane / 2, SimdSign::Unsigned);
> 
> As the last parameter is unused, can you use NotApplicable? I wondered why
> Unsigned was used here, just to find out it's a trick to not trigger
> signedness = SimdSign::Signed above.

In the case of an odd lane with unsigned extraction, we do use the zero-extension. We end up emitting this code:

    vpextrw lane/2, input, output ; output = 0x0000xxyy
    shrl 8, output                ; output = 0x000000xx

That's a 32-bit right shift, and we can cancel the insertion of a movzbl below.

> @@ +2699,5 @@
> > +                signedness = SimdSign::NotApplicable;
> > +        }
> > +    }
> > +
> > +    // We have the right low 8 bits in \output|, but we may need to fix the high
> 
> nit: |output|
> 
> @@ +2752,4 @@
> >  
> > +    switch (length) {
> > +      case 4:
> > +        emitSimdExtractLane32x4(input, output, mir->lane());
> 
> Maybe MOZ_ASSERT(mir->signedness() == Signed)?

I'm not sure if we can depend on asm.js to always generate that, but at least it shouldn't be NotApplicable.
(In reply to Benjamin Bouvier [:bbouvier] from comment #69)
> Comment on attachment 8756050 [details] [diff] [review]
> Implement MSimdExtractElement for small integer types.
> 
> Review of attachment 8756050 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Sorry I spotted something else.
> 
> ::: js/src/jit/x86-shared/Assembler-x86-shared.h
> @@ +2115,5 @@
> > +    void vpextrb(unsigned lane, FloatRegister src, Register dest) {
> > +        MOZ_ASSERT(HasSSE41());
> > +        masm.vpextrb_irr(lane, src.encoding(), dest.encoding());
> > +    }
> > +    void vpextrb(unsigned lane, FloatRegister src, const Operand& dest) {
> 
> This variant is actually unused, can we remove it? (unless you need it in a
> subsequent patch or have plans to use it somehow). In this case, can we
> remove vpextrb_irm too?

I'm not so sure about this. It makes more sense to me to add all the variants of an opcode once I need one. Then the next guy won't have to worry about which variants of an opcode happen to be implemented already.

Since these are inline functions, I expect that the compiler won't even generate code for them if they're not used.

How do we usually do this? It looked to me like the macro assembler mostly implements the full set of variants.
Comment on attachment 8756055 [details] [diff] [review]
Implement 8x16 SIMD shift operators.

Review of attachment 8756055 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.cpp
@@ +1389,5 @@
> +        // wide = yyxx yyxx yyxx yyxx yyxx yyxx yyxx yyxx
> +
> +        MInstruction* shiftMask = MConstant::New(alloc, Int32Value(7));
> +        addTo->add(shiftMask);
> +        MInstruction* shiftBy = MBitAnd::New(alloc, right, shiftMask);

It looks like I missed this:

   shiftBy->setInt32Specialization();

It appears that `specialization` is left uninitialized if I don't do something like that.
(In reply to Benjamin Bouvier [:bbouvier] from comment #71)
> Comment on attachment 8756052 [details] [diff] [review]
> Unary functions for small integer SIMD types.
> 
> Review of attachment 8756052 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good. Muuuuuch boilerplate.

Such is the way of the macro assembler ;-)

> ::: js/src/jit-test/tests/asm.js/testSIMD-8x16.js
> @@ +181,5 @@
> > +
> > +function unaryB(opname, lanefunc) {
> > +    simdfunc = asmLink(asmCompile('glob', USE_ASM + B8x16 + B8x16CHK +
> > +                                  `var fut = b8x16.${opname};` +
> > +                                  'function f(v) { v = b8x16chk(v); return fut(v); } return f'), this);
> 
> I thought only signed integer (non bool) vectors could be returned from
> asm.js to the outside world? Or is it that the "not" operation return an
> i8x16? Either way, isn't there something to fix here?

We don't allow unsigned vectors because WebAssembly wouldn't be able to tell themapart form signed vectors in its type system. I expect that WebAssembly will have boolean vector types, though, so they are safe to pass around.

For FFI, all SIMD types are banned.

> ::: js/src/jit/Lowering.cpp
> @@ +4556,5 @@
> >  
> >      // Cannot be at start, as the ouput is used as a temporary to store values.
> >      LUse in = use(ins->input());
> >  
> > +    if (ins->type() == MIRType::Int8x16 || ins->type() == MIRType::Bool8x16) {
> 
> Maybe we could switch on ins->type(). Or maybe we don't need to.

That would be good form, I think.

> ::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
> @@ +3571,5 @@
> > +        masm.packedSubInt16(in, out);
> > +        return;
> > +      case MSimdUnaryArith::not_:
> > +        masm.loadConstantSimd128Int(allOnes, out);
> > +        masm.bitwiseXorSimd128(in, out);
> 
> Random question: as not_ is independent from the input's length,
> could/should we common it out somewhere?

We do have a SimdBinaryBitwise opcode, but since there is only the one bitwise unary operation, it doesn't seem worthwhile to add a whole SimdUnaryBitwise instruction.
Attachment #8756060 - Flags: review?(sunfish) → review+
Comment on attachment 8756061 [details] [diff] [review]
Implement swizzle for 8x16 and 16x8 SIMD types.

Review of attachment 8756061 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +3021,5 @@
> +    if (AssemblerX86Shared::HasSSSE3()) {
> +        ScratchSimd128Scope scratch(masm);
> +        masm.loadConstantSimd128Int(SimdConstant::CreateX16(bLane), scratch);
> +        if (input != output)
> +            masm.vmovdqa(input, output);

VEX prefixes are implemented, but not actually enabled yet, so this doesn't matter at the moment, but it would be nice to use masm.reusedInputFloat32x4 here instead of an unconditional copy, so that we don't need a copy when VEX prefixes are enabled.

@@ +3026,5 @@
> +        masm.vpshufb(scratch, output, output);
> +        return;
> +    }
> +
> +    // Worst-case fallback for pre-SSE3 machines. Bounce through memory.

typo; pre-SSSE3 machines.
Attachment #8756061 - Flags: review?(sunfish) → review+
Comment on attachment 8756061 [details] [diff] [review]
Implement swizzle for 8x16 and 16x8 SIMD types.

Review of attachment 8756061 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +3021,5 @@
> +    if (AssemblerX86Shared::HasSSSE3()) {
> +        ScratchSimd128Scope scratch(masm);
> +        masm.loadConstantSimd128Int(SimdConstant::CreateX16(bLane), scratch);
> +        if (input != output)
> +            masm.vmovdqa(input, output);

Or rather, reusedInputInt32x4.
Comment on attachment 8756062 [details] [diff] [review]
Implement shuffle for 8x16 and 16x8 SIMD types.

Review of attachment 8756062 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +3121,5 @@
> +        // Use pshufb instructions to gather the lanes from each source vector.
> +        // A negative index creates a zero lane, so the two vectors can be combined.
> +
> +        // Set scratch2 = lanes from lhs.
> +        masm.vmovdqa(lhs, scratch2);

This can use reusedInputInt32x4

@@ +3129,5 @@
> +        masm.loadConstantSimd128Int(SimdConstant::CreateX16(idx), scratch1);
> +        masm.vpshufb(scratch1, scratch2, scratch2);
> +
> +        // Set output = lanes from rhs.
> +        masm.vmovdqa(rhs, output);

Ditto :)
Attachment #8756062 - Flags: review?(sunfish) → review+
Comment on attachment 8756063 [details] [diff] [review]
Implement compares for 8x16 and 16x8 SIMD types.

Review of attachment 8756063 [details] [diff] [review]:
-----------------------------------------------------------------

Cool. Yay for using defineReuseInput :).
Attachment #8756063 - Flags: review?(sunfish) → review+
Attachment #8756065 - Flags: review?(sunfish) → review+
Comment on attachment 8756066 [details] [diff] [review]
Inline saturating arithmetic operations.

Review of attachment 8756066 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/lib/simd.js
@@ +13,4 @@
>      var arr = [];
>      var [varr, warr] = [simdToArray(v), simdToArray(w)];
>      [varr, warr] = [varr.map(Math.fround), warr.map(Math.fround)];
>      for (var i = 0; i < 4; i++)

It looks like you want to generalize this function to non-4 sizes, but this loop still counts to 4.
Attachment #8756066 - Flags: review?(sunfish) → review+
Attachment #8756067 - Flags: review?(sunfish) → review+
Comment on attachment 8756068 [details] [diff] [review]
Make MIR New factories private where AddLegalized should be used.

Review of attachment 8756068 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!
Attachment #8756068 - Flags: review?(sunfish) → review+
Attachment #8756070 - Flags: review?(sunfish) → review+
Thank you for all your answers and explanations. I agree to all of them, except for one...

(In reply to Jakob Stoklund Olesen [:jolesen] from comment #74)
> (In reply to Benjamin Bouvier [:bbouvier] from comment #69)
> > Comment on attachment 8756050 [details] [diff] [review]
> > Implement MSimdExtractElement for small integer types.
> > 
> > Review of attachment 8756050 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Sorry I spotted something else.
> > 
> > ::: js/src/jit/x86-shared/Assembler-x86-shared.h
> > @@ +2115,5 @@
> > > +    void vpextrb(unsigned lane, FloatRegister src, Register dest) {
> > > +        MOZ_ASSERT(HasSSE41());
> > > +        masm.vpextrb_irr(lane, src.encoding(), dest.encoding());
> > > +    }
> > > +    void vpextrb(unsigned lane, FloatRegister src, const Operand& dest) {
> > 
> > This variant is actually unused, can we remove it? (unless you need it in a
> > subsequent patch or have plans to use it somehow). In this case, can we
> > remove vpextrb_irm too?
> 
> I'm not so sure about this. It makes more sense to me to add all the
> variants of an opcode once I need one. Then the next guy won't have to worry
> about which variants of an opcode happen to be implemented already.
> 
> Since these are inline functions, I expect that the compiler won't even
> generate code for them if they're not used.
> 
> How do we usually do this? It looked to me like the macro assembler mostly
> implements the full set of variants.

Well, it is just a basic golden rule I'm trying to apply: don't add dead code, because it can remain dead for a long time. There was an instance of this I've found in one of the ARM assemblers: the rotation functions have been implemented since the very beginning, but there were no users until we implement rotations in wasm! And it has been like this for some time :-)

I think it is being more pragmatic to not add dead code, of course unless you use it in the patch series, or very soon afterwards. It takes the developer's memory, byte per byte. Plus, if it's not used, it's not tested, so it can have bugs, which might be hard to find for the potential future user.

(Unrelated to this argument: I don't think we can require regalloc that the output of an instruction be in memory. So this function could be used otherwise in codegen, if we manually put something onto the stack or whatnot, but again, if this doesn't happen, please let's just not add dead code)

Or as the Delphi Greek temple front says, "nothing in excess" :).
Comment on attachment 8756053 [details] [diff] [review]
Binary functions for small integer SIMD types.

Review of attachment 8756053 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks!

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +3370,5 @@
> +    FloatRegister lhs = ToFloatRegister(ins->lhs());
> +    Operand rhs = ToOperand(ins->rhs());
> +    FloatRegister output = ToFloatRegister(ins->output());
> +
> +    ScratchSimd128Scope scratch(masm);

That might be pre-existing in other code below, but it seems this is unused.

@@ +3384,5 @@
> +      case MSimdBinaryArith::Op_mul:
> +          // 8x16 mul is a valid operation, but not supported in SSE or AVX.
> +          // The operation is synthesized from 16x8 multiplies by
> +          // MSimdBinaryArith::AddLegalized().
> +          break;

nit: inconsistent indentation here.

@@ +3402,5 @@
> +    FloatRegister lhs = ToFloatRegister(ins->lhs());
> +    Operand rhs = ToOperand(ins->rhs());
> +    FloatRegister output = ToFloatRegister(ins->output());
> +
> +    ScratchSimd128Scope scratch(masm);

ditto

::: js/src/jit/x86-shared/Lowering-x86-shared.cpp
@@ +651,5 @@
> +      case MIRType::Int8x16: {
> +          LSimdBinaryArithIx16* lir = new (alloc()) LSimdBinaryArithIx16();
> +          bool needsTemp =
> +              ins->operation() == MSimdBinaryArith::Op_mul && !MacroAssembler::HasSSE41();
> +          lir->setTemp(0, needsTemp ? temp(LDefinition::SIMD128INT) : LDefinition::BogusTemp());

If I've understood correctly, ins->operation() can't be Op_mul here? So the temp can consistently be a BogusTemp()?
Attachment #8756053 - Flags: review?(bbouvier) → review+
Comment on attachment 8756054 [details] [diff] [review]
Implement 16x8 SIMD shift operators.

Review of attachment 8756054 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! I really like the fact that each type gets their own test file.

::: js/src/jit-test/tests/asm.js/testSIMD-16x8.js
@@ +252,5 @@
> +// Test shift operators.
> +function zip1map(a, s, f) {
> +    let r = [];
> +    for (let i = 0; i < a.length; i++)
> +        r.push(f(a[i], s));

(ubernit: return a.map(x => f(x, s)); )

::: js/src/jit/x86-shared/BaseAssembler-x86-shared.h
@@ +2771,5 @@
> +
> +    void vpsllw_ir(uint32_t count, XMMRegisterID src, XMMRegisterID dst)
> +    {
> +        MOZ_ASSERT(count < 16);
> +        shiftOpImmSimd("vpsllw", OP2_PSLLW_UdqIb, ShiftID::vpslld, count, src, dst);

There are different enum values for vpsrld/vpsrlq; I am fine with reusing them. If you feel like so, can you unify ShiftID::vpsrld with ShiftID::vpsrlq (and the same for vpslld with vpsllq, and remove the size suffixes in those enum values). Or it could be done in a follow up bug.
Attachment #8756054 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #83)

> Well, it is just a basic golden rule I'm trying to apply: don't add dead
> code, because it can remain dead for a long time. There was an instance of
> this I've found in one of the ARM assemblers: the rotation functions have
> been implemented since the very beginning, but there were no users until we
> implement rotations in wasm! And it has been like this for some time :-)
> 
> I think it is being more pragmatic to not add dead code, of course unless
> you use it in the patch series, or very soon afterwards. It takes the
> developer's memory, byte per byte. Plus, if it's not used, it's not tested,
> so it can have bugs, which might be hard to find for the potential future
> user.
> 
> Or as the Delphi Greek temple front says, "nothing in excess" :).

I actually agree with all of this. It must be because the code is so boilerplatey that I don't like pruning it. It really should be generated by a script from a few opcode tables.

I'll prune the dead code.
Since most targets don't implement SIMD and don't enable the generation of SIMD
instructions, it makes sense to provide a default implementation of the SIMD
visitor functions that simply crashes with an NYI error.

Remove the many identical copies of these visitors from the non-SIMD targets.
Attachment #8757023 - Flags: review?(bbouvier)
This instruction is used when the shuffle indexes are not compile time
constants.

Give the register allocator permission to spill the index arguments to
GeneralShuffle instructions. There can be up to 16 of them, and they can't all
be registers.

Move visitSimdGeneralShuffle into the x86-specific lowering code since it has
special register allocation requirements for 8-bit lanes.
Attachment #8757024 - Flags: review?(bbouvier)
Ben, I've rearranged the commits as we discussed. You can see the final order in the try push above or here: https://hg.mozilla.org/try/pushloghtml?changeset=32332939081fa174955be2ea12e464d0a8850a56

- All the tests are squashed into one commit that follows enabling the new types.
- I put the "NYI" commit up front so the following commits don't have to touch the arm and mips targets at all.
(In reply to Jakob Stoklund Olesen [:jolesen] from comment #90)
> Ben, I've rearranged the commits as we discussed. You can see the final
> order in the try push above or here:
> https://hg.mozilla.org/try/
> pushloghtml?changeset=32332939081fa174955be2ea12e464d0a8850a56
> 
> - All the tests are squashed into one commit that follows enabling the new
> types.
> - I put the "NYI" commit up front so the following commits don't have to
> touch the arm and mips targets at all.

Fantastic, thank you!
Comment on attachment 8756055 [details] [diff] [review]
Implement 8x16 SIMD shift operators.

Review of attachment 8756055 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you!

::: js/src/jit/MIR.cpp
@@ +1389,5 @@
> +        // wide = yyxx yyxx yyxx yyxx yyxx yyxx yyxx yyxx
> +
> +        MInstruction* shiftMask = MConstant::New(alloc, Int32Value(7));
> +        addTo->add(shiftMask);
> +        MInstruction* shiftBy = MBitAnd::New(alloc, right, shiftMask);

Roger.

@@ +1394,5 @@
> +        addTo->add(shiftBy);
> +
> +        MInstruction* mask =
> +          MSimdConstant::New(alloc, SimdConstant::SplatX8(int16_t(0xff00)), MIRType::Int16x8);
> +        addTo->add(mask);

It seems this could be closer to its use, moved down to...

@@ +1408,5 @@
> +        // even = xx00 xx00 xx00 xx00 xx00 xx00 xx00 xx00
> +        // odd  = yyxx yyxx yyxx yyxx yyxx yyxx yyxx yyxx
> +
> +        // Left-shift: Clear the low bits in `odd` before shifting.
> +        if (op == lsh) {

... right above here.

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +2639,2 @@
>          masm.vmovdqa(input, output);
> +    } else {

nit: no {} for one line if/else bodies.
Attachment #8756055 - Flags: review?(bbouvier) → review+
Comment on attachment 8756056 [details] [diff] [review]
Implement 8x16 SIMD multiplies.

Review of attachment 8756056 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks!

::: js/src/jit/MIR.cpp
@@ +1399,5 @@
> +        // oddR  = 00bb 00bb 00bb 00bb 00bb 00bb 00bb 00bb
> +
> +        // Now do two 16x8 multiplications. We can use the low bits of each.
> +        MInstruction* even = MSimdBinaryArith::AddLegalized(alloc, addTo, evenL, evenR, op);
> +        MInstruction* odd = MSimdBinaryArith::AddLegalized(alloc, addTo, oddL, oddR, op);

We know op == Op_Mul, should we just replace these two instances here with Op_Mul directly?
Attachment #8756056 - Flags: review?(bbouvier) → review+
Comment on attachment 8756057 [details] [diff] [review]
Implement SIMD saturating arithmetic.

Review of attachment 8756057 [details] [diff] [review]:
-----------------------------------------------------------------

Straightforward, thank you.

::: js/src/jit-test/tests/asm.js/testSIMD-16x8.js
@@ +203,5 @@
>      return r
>  }
>  
>  function binaryI(opname, lanefunc) {
>      simdfunc = asmLink(asmCompile('glob', USE_ASM + I16x8 + I16x8CHK +

(see comment in the other test file)

::: js/src/jit-test/tests/asm.js/testSIMD-8x16.js
@@ +217,4 @@
>  }
>  
>  function binaryU(opname, lanefunc) {
>      simdfunc = asmLink(asmCompile('glob', USE_ASM + U8x16 + I8x16 + I8x16CHK + U8x16I8x16 + I8x16U8x16 +

pre-existing, can you add a `let` before simdfunc as well, here and above?

::: js/src/jit/MIR.h
@@ +2413,5 @@
> +  public:
> +    enum Operation
> +    {
> +        addSaturate,
> +        subSaturate,

As the name of the MIR instruction already contains "saturating", what do you think about calling them simply add/sub?

@@ +2441,5 @@
> +        MIRType type = left->type();
> +        MOZ_ASSERT(type == MIRType::Int8x16 || type == MIRType::Int16x8);
> +        setResultType(type);
> +        setMovable();
> +        setCommutative();

subSaturate might not be commutative.

::: js/src/jit/x86-shared/Lowering-x86-shared.cpp
@@ +696,5 @@
> +LIRGeneratorX86Shared::visitSimdBinarySaturating(MSimdBinarySaturating* ins)
> +{
> +    MOZ_ASSERT(IsSimdType(ins->lhs()->type()));
> +    MOZ_ASSERT(IsSimdType(ins->rhs()->type()));
> +    MOZ_ASSERT(IsSimdType(ins->type()));

Can probably ReorderCommutative if the instruction is commutative (which would exhibit the sub marked as commutative bug).
Attachment #8756057 - Flags: review?(bbouvier) → review+
Comment on attachment 8756058 [details] [diff] [review]
Implement Bool8x16.splat and Bool16x8.splat.

Review of attachment 8756058 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thank you. x86 is such a mess.

::: js/src/jit/x86-shared/Assembler-x86-shared.h
@@ +3043,5 @@
> +    void vpshuflw(uint32_t mask, FloatRegister src, FloatRegister dest) {
> +        MOZ_ASSERT(HasSSE2());
> +        masm.vpshuflw_irr(mask, src.encoding(), dest.encoding());
> +    }
> +    void vpshuflw(uint32_t mask, const Operand& src1, FloatRegister dest) {

Only the vpshuflw mask/register/register variant is used. Can you delete this one please? (and take away the BaseAssembler implementations too?)

@@ +3064,5 @@
> +    void vpshufhw(uint32_t mask, FloatRegister src, FloatRegister dest) {
> +        MOZ_ASSERT(HasSSE2());
> +        masm.vpshufhw_irr(mask, src.encoding(), dest.encoding());
> +    }
> +    void vpshufhw(uint32_t mask, const Operand& src1, FloatRegister dest) {

These two are unused too at the moment.

@@ +3085,5 @@
> +    void vpshufb(FloatRegister mask, FloatRegister src, FloatRegister dest) {
> +        MOZ_ASSERT(HasSSSE3());
> +        masm.vpshufb_rr(mask.encoding(), src.encoding(), dest.encoding());
> +    }
> +    void vpshufb(const Operand &mask, FloatRegister src, FloatRegister dest) {

This one is unused at the moment.

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +2599,5 @@
>  }
>  
>  void
> +CodeGeneratorX86Shared::visitSimdSplatX16(LSimdSplatX16* ins)
> +{

Here and in the x8 and x4 methods, can we MOZ_ASSERT that we have the SimdTypeToLength of the input's type is the expected one (here 16).

@@ +2624,5 @@
> +    Register input = ToRegister(ins->getOperand(0));
> +    FloatRegister output = ToFloatRegister(ins->output());
> +    masm.vmovd(input, output);
> +    masm.vpshuflw(0, output, output);
> +    masm.vpshufd(0, output, output);

so we have vpshufw but it only works on 64 bits values, not 128 bits values.

yay x86 ¯\_(ツ)_/¯
Attachment #8756058 - Flags: review?(bbouvier) → review+
Attachment #8756059 - Flags: review?(bbouvier) → review+
Comment on attachment 8757023 [details] [diff] [review]
Provide shared NYI implementations of SIMD visitors.

Review of attachment 8757023 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, thank you.
Attachment #8757023 - Flags: review?(bbouvier) → review+
Comment on attachment 8757024 [details] [diff] [review]
Add general shuffle support for 8x16 and 16x8 shuffles.

Review of attachment 8757024 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for the patch. Great work on all of this.

::: js/src/jit/x86-shared/Lowering-x86-shared.cpp
@@ +947,5 @@
> +#if defined(JS_CODEGEN_X86)
> +        // The temp register must be usable with 8-bit load and store
> +        // instructions, so one of %eax-%edx.
> +        if (ins->type() == MIRType::Int8x16)
> +            t = tempFixed(ebx);

I think clobbering `t` will increase register pressure: there was a virtual register for the previous value of `t` which gets unused. Instead, we could just do

#if defined(JS_CODEGEN_X86)
  t = ins->type() == ... ? tempFixed(ebx) : temp();
#else
  t = temp();
#endif

if you don't mind.

It is a bit sad to have a platform-dependent #ifdef in the shared lowering, but it sounds more compelling than having a platform-specific implementation, since there are so few differences between x86/x64. We could have a platform-specific getTempForGeneralShuffle, but sounds like over-engineering, so let's keep it this way.

@@ +951,5 @@
> +            t = tempFixed(ebx);
> +#endif
> +        lir = new (alloc()) LSimdGeneralShuffleI(t);
> +    } else if (ins->type() == MIRType::Float32x4)
> +        lir = new (alloc()) LSimdGeneralShuffleF(temp());

nit: {} here and below as the `if` has {}

@@ +953,5 @@
> +        lir = new (alloc()) LSimdGeneralShuffleI(t);
> +    } else if (ins->type() == MIRType::Float32x4)
> +        lir = new (alloc()) LSimdGeneralShuffleF(temp());
> +    else
> +        MOZ_CRASH("Unknown SIMD kind when doing a shuffle");

ditto
Attachment #8757024 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #95)
> @@ +2624,5 @@
> > +    Register input = ToRegister(ins->getOperand(0));
> > +    FloatRegister output = ToFloatRegister(ins->output());
> > +    masm.vmovd(input, output);
> > +    masm.vpshuflw(0, output, output);
> > +    masm.vpshufd(0, output, output);
> 
> so we have vpshufw but it only works on 64 bits values, not 128 bits values.
> 
> yay x86 ¯\_(ツ)_/¯

It's probably because of encoding constraints. A 4-way swizzle can be encoded in 4 x 2 bits = 1 byte which seems to be the size of immediate field the SSE instructions support. An 8-way swizzle would require 8 x 3 bits = 3 bytes immediate field which would probably require a new instruction format.

Instead, we get pshufb which can do it all by not using immediate fields.
Keywords: leave-open
https://hg.mozilla.org/integration/mozilla-inbound/rev/1df0548311c90e35e86104dc387bd3ab81dd797f
Bug 1136226 - Provide shared NYI implementations of SIMD visitors. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/f0779cf0f83dc854e97c87466f7109d9b264b48c
Bug 1136226 - Implement MSimdExtractElement for small integer types. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/b45c0a42f19cc86dbeb58f0f0282b788fe132d46
Bug 1136226 - Implement MSimdInsertElement for small integer types. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/62c28a8f7ebf20927d97f809313d3e0c567a064a
Bug 1136226 - Unary functions for small integer SIMD types. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/69ce6e7501086e2492274e352752eee5eeea447d
Bug 1136226 - Binary functions for small integer SIMD types. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/25dc50270a77116f3e679f451e143b0031382cdd
Bug 1136226 - Implement 16x8 SIMD shift operators. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/6ea5fb073f4be87096890646bf58061c90b22fcd
Bug 1136226 - Implement 8x16 SIMD shift operators. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/43882729d51c668fc0e57a7941cfbdc25664422d
Bug 1136226 - Implement 8x16 SIMD multiplies. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/4117a5326ded2a0f6543da9070e7c68c0ba0a172
Bug 1136226 - Implement SIMD saturating arithmetic. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/a17bc6fab38f7beaffa3608ca8ec4a7d660a2bd4
Bug 1136226 - Implement Bool8x16.splat and Bool16x8.splat. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/fa42a25f4124566158f812c0b796360dd239f814
Bug 1136226 - Handle SIMD global variables for x86 / x64. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/5472bbbb12079f4ca2da7fb8048fd0787ef6f200
Bug 1136226 - Implement select for 8x16 and 16x8 SIMD types. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/018f7422c57ec03d65f58802e4cbb6ee2fc25418
Bug 1136226 - Implement swizzle for 8x16 and 16x8 SIMD types. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/7be2feba720f43f6c5df652f9908f7a8c8a39be1
Bug 1136226 - Implement shuffle for 8x16 and 16x8 SIMD types. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/e5479106a7ab7033580c4114d96cd5d0d3c062d2
Bug 1136226 - Add general shuffle support for 8x16 and 16x8 shuffles. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/8e4f48388c600edd5ee5292b014cedb0b8f7672f
Bug 1136226 - Implement compares for 8x16 and 16x8 SIMD types. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/a019522f497b3d852ece421f4e09548e654d1e7e
Bug 1136226 - Inline saturating arithmetic operations. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/500e86461a3cce584c5b8069a2721a95366f8f72
Bug 1136226 - Test loads, stores, and bitcasts. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/b069413fe6b93ada4c38dad2bd6b8f22c2abf8ce
Bug 1136226 - Asm.js: Enable small integer types. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/5721a25757df53c3650889587768f4c6787c7ed1
Bug 1136226 - Asm.js: Test cases for small integer SIMD types. r=bbouvier

https://hg.mozilla.org/integration/mozilla-inbound/rev/b986652cc2606730be2a7c0f43697d8b2a9a2641
Bug 1136226 - Make MIR New factories private where AddLegalized should be used. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/4314ed7441b656c689d98d54996dcd524f80b72a
Bug 1136226 - Inline small int SIMD.js constructor calls. r=sunfish

https://hg.mozilla.org/integration/mozilla-inbound/rev/c19c99878a6076e928690f45b37403b110cd5482
Bug 1136226 - Enable inlining of 8x16 and 16x8 types. r=sunfish
Depends on: 1277796
Depends on: 1278251
Depends on: 1366096
You need to log in before you can comment on or make changes to this bug.