Closed Bug 904913 Opened 8 years ago Closed 7 years ago

Add experimental support for Float32x4, SIMD object and SIMD functions

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: luke, Unassigned)

References

(Depends on 2 open bugs)

Details

Attachments

(2 files, 2 obsolete files)

No description provided.
Oops, hit <Enter> before I finished filling out the bug:

We have a proposal (discussed in bug 894105 comment 5) for adding SIMD that involves a few new objects and functions: Float32x4 (the datum operated on by SIMD operations), a SIMD object, and a bunch of SIMD functions that are properties of the SIMD object.  (Hopefully, SIMD will eventually a module, not an object.)

This bug is to add just the basic VM support behind a pref that is default off on all channels.  The main work is in Ion in bug 894105.
Assignee: general → nsm.nikhil
Comment on attachment 790541 [details] [diff] [review]
Example.

Review of attachment 790541 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Float32x4.cpp
@@ +40,5 @@
> +    NULL
> +};
> +
> +const JSPropertySpec Float32x4::properties[] = {
> +    JS_PSG("w", getW, 0),

Stupid question, why "w" and not "t"?
Comment on attachment 790541 [details] [diff] [review]
Example.

Review of attachment 790541 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Nikhil!  Everything looks pretty nice; I'd like to take another pass with these comments addressed:

::: js/src/builtin/Float32x4.cpp
@@ +21,5 @@
> +
> +using namespace js;
> +
> +Class Float32x4::class_ = {
> +    "Float32x4",

I just learned that the new syntactic convention is that immutable value objects start with lowercase, so this should be 'float32x4'.

@@ +75,5 @@
> +
> +bool
> +Float32x4::construct(JSContext *cx, unsigned int argc, jsval *vp)
> +{
> +    if (!JS_IsConstructing(cx, vp)) {

I think the current consensus wants the opposite: only the function call form is allowed and passing 'new' throws.  (The idea is that 'new' means a new identity with mutable reference semantics, and with float32x4 (and int64 etc) we are creating a value with no unique identity.)

@@ +88,5 @@
> +        !args[0].isNumber() ||
> +        !args[1].isNumber() ||
> +        !args[2].isNumber() ||
> +        !args[3].isNumber()) {
> +        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_FLOAT32X4_BAD_ARGS);

Instead of requiring that they are already numbers, the usual JS thing to do is to apply ToNumber(cx, args[i]) to each argument.

@@ +97,5 @@
> +    if (!obj)
> +        return false;
> +
> +    // The float cast is since the values have to be stored as
> +    // single-precision.

It might be worth explaining a bit more:
"The semantics of Float32x4 is to store 4 float32 values.  However, when any value is read, it is immediately coerced back to a JS number (i.e., float64).  Thus, float32x4(a,b,c,d).x === double(float(ToNumber(a))).  Since the contents of float32x4 are not aliased, we don't actually need to store a float32, we can store the coerced float64, which is good because JS::Value only provides float64."

@@ +111,5 @@
> +Float32x4::getLane(JSContext *cx, unsigned int argc, jsval *vp, unsigned int slot)
> +{
> +    JS_ASSERT(slot < SLOT_COUNT);
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +    JS_ASSERT(Float32x4::is(args.thisv()));

I don't think we can assert that.  Evil JS code can pull off the accessor function with Object.getPropertyDescriptor(obj).get and call it passing any random value as 'this'.  An example of this is if you look at the native getter/setters in vm/Debugger.cpp (grep JS_PSG).

::: js/src/builtin/Float32x4.h
@@ +21,5 @@
> +    static bool is(const Value &v);
> +
> +    static bool construct(JSContext *cx, unsigned int argc, jsval *vp);
> +
> +    static bool getLane(JSContext *cx, unsigned int argc, jsval *vp, unsigned int slot);

You can take out the \n between all these.

::: js/src/jsprototypes.h
@@ +68,5 @@
>      macro(float64,               49,     js_InitBinaryDataClasses) \
>      macro(ArrayType,             50,     js_InitBinaryDataClasses) \
>      macro(StructType,            51,     js_InitBinaryDataClasses) \
>      macro(ArrayTypeObject,       52,     js_InitBinaryDataClasses) \
> +    macro(Float32x4,             53,     js_InitFloat32x4) \

Similarly, float32x4 here too.  Alternatively, do you think float32x4 could be in BinaryData.cpp and be initialized by js_InitBinaryDataClass?
Attachment #790541 - Flags: review?(luke)
Comment on attachment 790942 [details] [diff] [review]
Example.

Review of attachment 790942 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Float32x4.cpp
@@ +101,5 @@
> +                             JSMSG_NOT_CONSTRUCTOR, "Float32x4");
> +        return false;
> +    }
> +
> +    CallArgs args = CallArgsFromVp(argc, vp);

If you're going to check constructing-ness at all, use args.isConstructing().

@@ +103,5 @@
> +    }
> +
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +
> +    if (argc < 4) {

args.length()

@@ +105,5 @@
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +
> +    if (argc < 4) {
> +        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL,
> +                             JSMSG_MORE_ARGS_NEEDED, "float32x4", argc, "s");

args.length()
My one meta-comment is that I think float32x4 should be defined as a binary data constructor. To really take advantage of this in the JIT -- as is the ultimate goal -- it will however be necessary to build on the patches in bug 898347, which are still under review though I expect it will land soon. I would be happy to do that implementation work and expect it can be done relatively quickly, though I don't have a time estimate yet. I will put some effort into it today and try to produce one. There may still be value in landing the current set of patches, and then porting the relevant work over to the new context, I am not sure.
The order or the .x, .y, .z, .w is wrong:

js> var x = float32x4(1,2,3,4)
js> x
({})
js> x.w
1
js> x.x
4
js> x.y
3
js> x.z
2

I would’ve expected .x == 1, .y == 2, .z == 3, .w == 4.

This would be in accordance with the spec:

https://github.com/johnmccutchan/ecmascript_simd
When calling the float32x4() function without arguments, or with less than 4 arguments, the js shell segfaults:

js> var x = float32x4()

will crash the shell.
Comment on attachment 790942 [details] [diff] [review]
Example.

This error message still uses the uppercasing of Float32x4.  It should use "float32x4"

+Float32x4::construct(JSContext *cx, unsigned int argc, jsval *vp)
+{
+    if (JS_IsConstructing(cx, vp)) {
+        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL,
+                             JSMSG_NOT_CONSTRUCTOR, "Float32x4");
+        return false;
+    }
+

Otherwise, you'll get an error message that looks like this:

js> var x = new float32x4(1,2,3,4)
typein:1:4 TypeError: Float32x4 is not a constructor
Assignee: nsm.nikhil → general
Has there been further progress on this issue beyond Nikhil's patch? The work on typed objects is reaching a level of maturity that I could soon investigate adding float32x4 as a kind of type constructor.
This patch has the 3 new builtin objects: float32x4, uint32x4, and SIMD.

The SIMD object has 4 functions for now:
float32x4 SIMD.addf(float32x4 op1, float32x4 op2)
float32x4 SIMD.mulf(float32x4 op1, float32x4 op2)
uint32x4  SIMD.addu(uint32x4  op1, uint32x4  op2)
uint32x4  SIMD.mulu(uint32x4  op1, uint32x4  op2)

With this basic VM support, we are starting the implementation work to get IonMonkey to generate SIMD instructions.
Hi,

These method names are incorrect and do not match the polyfill.

John
Revised based on John's comment:

This patch has the 3 new builtin objects: float32x4, uint32x4, and SIMD.

The SIMD object has 4 functions for now:
float32x4 SIMD.add(float32x4 op1, float32x4 op2)
float32x4 SIMD.mul(float32x4 op1, float32x4 op2)
uint32x4  SIMD.addu32(uint32x4  op1, uint32x4  op2)
uint32x4  SIMD.mulu32(uint32x4  op1, uint32x4  op2)

With this basic VM support, we are starting the implementation work to get IonMonkey to generate SIMD instructions.
Attachment #804727 - Attachment is obsolete: true
Comment on attachment 790541 [details] [diff] [review]
Example.

Review of attachment 790541 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Float32x4.cpp
@@ +97,5 @@
> +    if (!obj)
> +        return false;
> +
> +    // The float cast is since the values have to be stored as
> +    // single-precision.

It is better that we store 4 floats instead of 4 doubles. If we store 4 doubles, the lane access is easier but it is less efficient to unbox/box them into a XMM register. For 4 float32s, a single movups/movaps instruction could do the job.
Hi,

Is there a more up to date patch? The last patch is from September 13th, it has many issues, but before I get into it maybe there is a more to date patch?

Thanks,
John
Hi,

nmatsakis mentioned to use typed object to implement float32x4 and int32x4. After reading http://wiki.ecmascript.org/doku.php?id=harmony:typed_objects, it seems that we only need to initialize:
  const float32x4 = new StructType({ x: float32, y: float32, z:float32, w:float32});
  const int32x4 = new StructType({ x: int32, y: int32, z:int32, w:int32});
when the engine starts a new context.

Thanks
-Haitao
(In reply to haitao from comment #19)
> nmatsakis mentioned to use typed object to implement float32x4 and int32x4.
> After reading http://wiki.ecmascript.org/doku.php?id=harmony:typed_objects,
> it seems that we only need to initialize:
>   const float32x4 = new StructType({ x: float32, y: float32, z:float32,
> w:float32});
>   const int32x4 = new StructType({ x: int32, y: int32, z:int32, w:int32});
> when the engine starts a new context.

This is not quite right. For one thing, the alignment requirements of such type would be the same as float32, but we want a stricter alignment. For another, float32x4 is supposed to be a value type, which struct types are not. The difference can be seen when you embed a float32x4 in another type, e.g.:

    var MyArray = new ArrayType(float32x4, 64);
    var myArray = new MyArray();
    var tmp = myArray[12];
    myArray[12] = float32x4(1, 2, 3, 4);

At the end of this fragment, I believe that `tmp` should have the value `float32x4(0,0,0,0)`. But if we implemented `float32x4` as a `StructType`, then `tmp` would alias `myArray[12]` and hence have the value `float32x4(1,2,3,4)`.
Just a quick note:  The Dart VM and the V8 implementations use unaligned memory access. We do not expect or require 16-byte alignment of the payload data in a float32x4.
(In reply to John McCutchan from comment #21)
Doesn't that produce problems when performing xmm loads?  I was under the impression that those faulted if the address wasn't 16-byte aligned.
(In reply to Luke Wagner [:luke] from comment #22)
> (In reply to John McCutchan from comment #21)
> Doesn't that produce problems when performing xmm loads?  I was under the
> impression that those faulted if the address wasn't 16-byte aligned.

Only when using MOVAPS instructions which require 16-bytes alignment; when using MOVUPS this is not a problem.
movaps will fault when the address isn't 16-byte aligned but movups allows any alignment. The performance of movups is quite high (especially on newer CPUs).
Ah, thanks for explaining.  ARM also has no 16-byte alignment requirements?
Dart VM has all SIMD types fully accelerated on NEON without any alignment restrictions.
Dart VM has all SIMD types fully accelerated on NEON without any alignment restrictions.
(In reply to Luke Wagner [:luke] from comment #25)
> Ah, thanks for explaining.  ARM also has no 16-byte alignment requirements?

Alignment requirements should be optional for NEON instructions, see here for more information:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344e/Cihejdic.html
(In reply to John McCutchan from comment #27)
> Dart VM has all SIMD types fully accelerated on NEON without any alignment
> restrictions.

OK. So are intending to fix the alignment requirements of float32x4 as the same as float32 then?
The specification will not require any alignment of the 128-bits of payload data in a float32x4 or int32x4 value. That is an implementation detail of each VM.
It would be observable via Typed Objects (StructType({x:uint8, y:float32x4})).
If Typed Objects requires a specified alignment, float32 alignment should be fine.
That being said, the SIMD specification is independent of Typed Object specification. It's already been fully accelerated in V8 without the need for typed object based storage. I am not sure why ?Monkey can't support a new native type with 128-bits of storage.
nmatsakis, thanks for the explanation on the difference of value types and struct types.

> This is not quite right. For one thing, the alignment requirements of such
> type would be the same as float32, but we want a stricter alignment. For
> another, float32x4 is supposed to be a value type, which struct types are
> not. The difference can be seen when you embed a float32x4 in another type,
> e.g.:
> 
>     var MyArray = new ArrayType(float32x4, 64);
>     var myArray = new MyArray();
>     var tmp = myArray[12];
>     myArray[12] = float32x4(1, 2, 3, 4);
> 
> At the end of this fragment, I believe that `tmp` should have the value
> `float32x4(0,0,0,0)`. But if we implemented `float32x4` as a `StructType`,
> then `tmp` would alias `myArray[12]` and hence have the value
> `float32x4(1,2,3,4)`.
(In reply to John McCutchan from comment #32)
> If Typed Objects requires a specified alignment, float32 alignment should be
> fine.

One question is what is most forwards compatible. Typed arrays have traditionally
taken a pretty strict view on maintaining data alignment, I believe for maximum
compatibility across hardware. I think we should probably do the same, particularly
since it is mostly inobservable (as you point out) except when `float32x4` values
are mixed with other values.
(In reply to John McCutchan from comment #33)
> That being said, the SIMD specification is independent of Typed Object
> specification. It's already been fully accelerated in V8 without the need
> for typed object based storage. I am not sure why ?Monkey can't support a
> new native type with 128-bits of storage.

The two specs are not fully independent -- going forward, as I understand it,
float32x4 is intended to be a value type constructor, and values types fit into
the framework of typed objects. That said, I agree that the SIMD spec "stands
alone" to some extent and can be implemented separately. But given that we
have relatively mature support for typed objects, there is no reason to do so.
Anyway adding `float32x4` as a new kind of typed object is not a lot of work.
I will try to tackle it today, and certainly I believe it can be done this week.
Depends on: 938728
Yes, good point, float32x4 and int32x4 entries within a typed array should be aligned to 16-byte boundaries.
Depends on: 942311
Depends on: 943766
Depends on: 943767
Depends on: 943769
Based on our last meeting today and from Niko's writeup, I've created 3 bugzilla entries for the first 3 steps from the following list of action items:

1. Add MIRType_float32x4 and MIRType_int32x4
2. Add the following MIR instructions:
   • MPackVector(v1, v2, v3, v4) -> {MIRType_float32x4, MIRType_int32x4}
   • MLoadVector(elements, offset) -> {MIRType_float32x4, MIRType_int32x4}
   • MStoreVector(elements, offset, value) where value : {MIRType_float32x4, MIRType_int32x4}
   • MAddVector(value1, value2) where value1, value2 : {MIRType_float32x4, MIRType_int32x4}
   • MVectorTypedObject(value) -> MIRType_object (similar to MDerivedTypeObject)
3. Calls to SIMD.float32x4.add are inlined specially if:
   • Both operands are typed objects of type T where T = either float32x4 or int32x4
   • In that case, insert MLoadVector for each argument (using existing code that skips indirection as a model – detecting MDerivedTypeObject and MVectorTypedObject and shortcircuit)
   • Insert MAddVector with unboxed operands
   • Insert MVectorTypedObject to box result
4. Loads of typed object values of type float32x4 are modified to do a MLoadVector and MVectorTypedObject
5. Stores into typed object lvalues of type float32x4 where value we are storing from is a MVectorTypedObject would be optimized to avoid intermediate
6. Add appropriate LIR instructions and modify register allocator etc.
Depends on: 946042
Depends on: 947711
Depends on: 956245
Depends on: 956250
I didn't know this until now, so maybe this is new information for some of you too.  Per Agner Fog's instruction tables ( http://www.agner.org/optimize/instruction_tables.pdf ), on Sandy Bridge, Ivy Bridge, and Haswell, MOVAPS and MOVUPS have identical performance.  Nehalem is the last CPU where, MOVUPS had lower reciprocal throughput.

Just an FYI.
Besides MOVAPS and MOVUPS, there is a third kind, which is to use a memory operand folded into an arithmetic instruction. Even with the newest CPUs, the regular SSE encoding (what Intel already appears to be calling the "Legacy" encoding) still requires full alignment.

With the new VEX encoding, memory operands do support unaligned accesses. VEX encodings are supported on Sandy Bridge and later.
Assignee: general → nobody
Should we close this bug? Some SIMD support has already landed in the interpreter. I can't see any reference to uint32x4 in the polyfill nor in our implementation, so I think we're fine here? (we have int32x4 instead)
Flags: needinfo?(luke)
Yeah, it seems like all the work has moved elsewhere.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(luke)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.