904913 - Add experimental support for Float32x4, SIMD object and SIMD functions

Luke Wagner [:luke]

Reporter

Description

•

11 years ago

No description provided.

Luke Wagner [:luke]

Reporter

Comment 1

•

11 years ago

Oops, hit <Enter> before I finished filling out the bug: We have a proposal (discussed in bug 894105 comment 5) for adding SIMD that involves a few new objects and functions: Float32x4 (the datum operated on by SIMD operations), a SIMD object, and a bunch of SIMD functions that are properties of the SIMD object. (Hopefully, SIMD will eventually a module, not an object.) This bug is to add just the basic VM support behind a pref that is default off on all channels. The main work is in Ion in bug 894105.

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Comment 2

•

11 years ago

Attached patch Example. (obsolete) — Details — Splinter Review

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Updated

•

11 years ago

Assignee: general → nsm.nikhil

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Comment 3

•

11 years ago

Comment on attachment 790541 [details] [diff] [review] Example. Luke, as we discussed.

Attachment #790541 - Flags: review?(luke)

Nicolas B. Pierron [:nbp]

Comment 4

•

11 years ago

Comment on attachment 790541 [details] [diff] [review] Example. Review of attachment 790541 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/builtin/Float32x4.cpp @@ +40,5 @@ > + NULL > +}; > + > +const JSPropertySpec Float32x4::properties[] = { > + JS_PSG("w", getW, 0), Stupid question, why "w" and not "t"?

Ivan Jibaja

Comment 5

•

11 years ago

We are trying to follow https://github.com/johnmccutchan/ecmascript_simd

Luke Wagner [:luke]

Reporter

Comment 6

•

11 years ago

Comment on attachment 790541 [details] [diff] [review] Example. Review of attachment 790541 [details] [diff] [review]: ----------------------------------------------------------------- Thanks Nikhil! Everything looks pretty nice; I'd like to take another pass with these comments addressed: ::: js/src/builtin/Float32x4.cpp @@ +21,5 @@ > + > +using namespace js; > + > +Class Float32x4::class_ = { > + "Float32x4", I just learned that the new syntactic convention is that immutable value objects start with lowercase, so this should be 'float32x4'. @@ +75,5 @@ > + > +bool > +Float32x4::construct(JSContext *cx, unsigned int argc, jsval *vp) > +{ > + if (!JS_IsConstructing(cx, vp)) { I think the current consensus wants the opposite: only the function call form is allowed and passing 'new' throws. (The idea is that 'new' means a new identity with mutable reference semantics, and with float32x4 (and int64 etc) we are creating a value with no unique identity.) @@ +88,5 @@ > + !args[0].isNumber() || > + !args[1].isNumber() || > + !args[2].isNumber() || > + !args[3].isNumber()) { > + JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_FLOAT32X4_BAD_ARGS); Instead of requiring that they are already numbers, the usual JS thing to do is to apply ToNumber(cx, args[i]) to each argument. @@ +97,5 @@ > + if (!obj) > + return false; > + > + // The float cast is since the values have to be stored as > + // single-precision. It might be worth explaining a bit more: "The semantics of Float32x4 is to store 4 float32 values. However, when any value is read, it is immediately coerced back to a JS number (i.e., float64). Thus, float32x4(a,b,c,d).x === double(float(ToNumber(a))). Since the contents of float32x4 are not aliased, we don't actually need to store a float32, we can store the coerced float64, which is good because JS::Value only provides float64." @@ +111,5 @@ > +Float32x4::getLane(JSContext *cx, unsigned int argc, jsval *vp, unsigned int slot) > +{ > + JS_ASSERT(slot < SLOT_COUNT); > + CallArgs args = CallArgsFromVp(argc, vp); > + JS_ASSERT(Float32x4::is(args.thisv())); I don't think we can assert that. Evil JS code can pull off the accessor function with Object.getPropertyDescriptor(obj).get and call it passing any random value as 'this'. An example of this is if you look at the native getter/setters in vm/Debugger.cpp (grep JS_PSG). ::: js/src/builtin/Float32x4.h @@ +21,5 @@ > + static bool is(const Value &v); > + > + static bool construct(JSContext *cx, unsigned int argc, jsval *vp); > + > + static bool getLane(JSContext *cx, unsigned int argc, jsval *vp, unsigned int slot); You can take out the \n between all these. ::: js/src/jsprototypes.h @@ +68,5 @@ > macro(float64, 49, js_InitBinaryDataClasses) \ > macro(ArrayType, 50, js_InitBinaryDataClasses) \ > macro(StructType, 51, js_InitBinaryDataClasses) \ > macro(ArrayTypeObject, 52, js_InitBinaryDataClasses) \ > + macro(Float32x4, 53, js_InitFloat32x4) \ Similarly, float32x4 here too. Alternatively, do you think float32x4 could be in BinaryData.cpp and be initialized by js_InitBinaryDataClass?

Attachment #790541 - Flags: review?(luke)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Comment 7

•

11 years ago

Attached patch Example. — Details — Splinter Review

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Updated

•

11 years ago

Attachment #790541 - Attachment is obsolete: true

Jeff Walden [:Waldo]

Comment 8

•

11 years ago

Comment on attachment 790942 [details] [diff] [review] Example. Review of attachment 790942 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/builtin/Float32x4.cpp @@ +101,5 @@ > + JSMSG_NOT_CONSTRUCTOR, "Float32x4"); > + return false; > + } > + > + CallArgs args = CallArgsFromVp(argc, vp); If you're going to check constructing-ness at all, use args.isConstructing(). @@ +103,5 @@ > + } > + > + CallArgs args = CallArgsFromVp(argc, vp); > + > + if (argc < 4) { args.length() @@ +105,5 @@ > + CallArgs args = CallArgsFromVp(argc, vp); > + > + if (argc < 4) { > + JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, > + JSMSG_MORE_ARGS_NEEDED, "float32x4", argc, "s"); args.length()

Niko Matsakis [:nmatsakis]

Comment 9

•

11 years ago

My one meta-comment is that I think float32x4 should be defined as a binary data constructor. To really take advantage of this in the JIT -- as is the ultimate goal -- it will however be necessary to build on the patches in bug 898347, which are still under review though I expect it will land soon. I would be happy to do that implementation work and expect it can be done relatively quickly, though I don't have a time estimate yet. I will put some effort into it today and try to produce one. There may still be value in landing the current set of patches, and then porting the relevant work over to the new context, I am not sure.

Peter Jensen

Comment 10

•

11 years ago

The order or the .x, .y, .z, .w is wrong: js> var x = float32x4(1,2,3,4) js> x ({}) js> x.w 1 js> x.x 4 js> x.y 3 js> x.z 2 I would’ve expected .x == 1, .y == 2, .z == 3, .w == 4. This would be in accordance with the spec: https://github.com/johnmccutchan/ecmascript_simd

Peter Jensen

Comment 11

•

11 years ago

When calling the float32x4() function without arguments, or with less than 4 arguments, the js shell segfaults: js> var x = float32x4() will crash the shell.

Peter Jensen

Comment 12

•

11 years ago

Comment on attachment 790942 [details] [diff] [review] Example. This error message still uses the uppercasing of Float32x4. It should use "float32x4" +Float32x4::construct(JSContext *cx, unsigned int argc, jsval *vp) +{ + if (JS_IsConstructing(cx, vp)) { + JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, + JSMSG_NOT_CONSTRUCTOR, "Float32x4"); + return false; + } + Otherwise, you'll get an error message that looks like this: js> var x = new float32x4(1,2,3,4) typein:1:4 TypeError: Float32x4 is not a constructor

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Updated

•

11 years ago

Assignee: nsm.nikhil → general

Niko Matsakis [:nmatsakis]

Comment 13

•

11 years ago

Has there been further progress on this issue beyond Nikhil's patch? The work on typed objects is reaching a level of maturity that I could soon investigate adding float32x4 as a kind of type constructor.

Ivan Jibaja

Comment 14

•

11 years ago

Attached patch SIMD, Float32x4, Uint32x4 builtin objects (obsolete) — Details — Splinter Review

This patch has the 3 new builtin objects: float32x4, uint32x4, and SIMD. The SIMD object has 4 functions for now: float32x4 SIMD.addf(float32x4 op1, float32x4 op2) float32x4 SIMD.mulf(float32x4 op1, float32x4 op2) uint32x4 SIMD.addu(uint32x4 op1, uint32x4 op2) uint32x4 SIMD.mulu(uint32x4 op1, uint32x4 op2) With this basic VM support, we are starting the implementation work to get IonMonkey to generate SIMD instructions.

John McCutchan

Comment 15

•

11 years ago

Hi, These method names are incorrect and do not match the polyfill. John

Ivan Jibaja

Comment 16

•

11 years ago

Attached patch SIMD, Float32x4, Uint32x4 builtin objects — Details — Splinter Review

Revised based on John's comment: This patch has the 3 new builtin objects: float32x4, uint32x4, and SIMD. The SIMD object has 4 functions for now: float32x4 SIMD.add(float32x4 op1, float32x4 op2) float32x4 SIMD.mul(float32x4 op1, float32x4 op2) uint32x4 SIMD.addu32(uint32x4 op1, uint32x4 op2) uint32x4 SIMD.mulu32(uint32x4 op1, uint32x4 op2) With this basic VM support, we are starting the implementation work to get IonMonkey to generate SIMD instructions.

Attachment #804727 - Attachment is obsolete: true

haitao

Comment 17

•

11 years ago

Comment on attachment 790541 [details] [diff] [review] Example. Review of attachment 790541 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/builtin/Float32x4.cpp @@ +97,5 @@ > + if (!obj) > + return false; > + > + // The float cast is since the values have to be stored as > + // single-precision. It is better that we store 4 floats instead of 4 doubles. If we store 4 doubles, the lane access is easier but it is less efficient to unbox/box them into a XMM register. For 4 float32s, a single movups/movaps instruction could do the job.

John McCutchan

Comment 18

•

11 years ago

Hi, Is there a more up to date patch? The last patch is from September 13th, it has many issues, but before I get into it maybe there is a more to date patch? Thanks, John

haitao

Comment 19

•

11 years ago

Hi, nmatsakis mentioned to use typed object to implement float32x4 and int32x4. After reading http://wiki.ecmascript.org/doku.php?id=harmony:typed_objects, it seems that we only need to initialize: const float32x4 = new StructType({ x: float32, y: float32, z:float32, w:float32}); const int32x4 = new StructType({ x: int32, y: int32, z:int32, w:int32}); when the engine starts a new context. Thanks -Haitao

Niko Matsakis [:nmatsakis]

Comment 20

•

11 years ago

(In reply to haitao from comment #19) > nmatsakis mentioned to use typed object to implement float32x4 and int32x4. > After reading http://wiki.ecmascript.org/doku.php?id=harmony:typed_objects, > it seems that we only need to initialize: > const float32x4 = new StructType({ x: float32, y: float32, z:float32, > w:float32}); > const int32x4 = new StructType({ x: int32, y: int32, z:int32, w:int32}); > when the engine starts a new context. This is not quite right. For one thing, the alignment requirements of such type would be the same as float32, but we want a stricter alignment. For another, float32x4 is supposed to be a value type, which struct types are not. The difference can be seen when you embed a float32x4 in another type, e.g.: var MyArray = new ArrayType(float32x4, 64); var myArray = new MyArray(); var tmp = myArray[12]; myArray[12] = float32x4(1, 2, 3, 4); At the end of this fragment, I believe that `tmp` should have the value `float32x4(0,0,0,0)`. But if we implemented `float32x4` as a `StructType`, then `tmp` would alias `myArray[12]` and hence have the value `float32x4(1,2,3,4)`.

John McCutchan

Comment 21

•

11 years ago

Just a quick note: The Dart VM and the V8 implementations use unaligned memory access. We do not expect or require 16-byte alignment of the payload data in a float32x4.

Luke Wagner [:luke]

Reporter

Comment 22

•

11 years ago

(In reply to John McCutchan from comment #21) Doesn't that produce problems when performing xmm loads? I was under the impression that those faulted if the address wasn't 16-byte aligned.

Gabriele Svelto [:gsvelto]

Comment 23

•

11 years ago

(In reply to Luke Wagner [:luke] from comment #22) > (In reply to John McCutchan from comment #21) > Doesn't that produce problems when performing xmm loads? I was under the > impression that those faulted if the address wasn't 16-byte aligned. Only when using MOVAPS instructions which require 16-bytes alignment; when using MOVUPS this is not a problem.

John McCutchan

Comment 24

•

11 years ago

movaps will fault when the address isn't 16-byte aligned but movups allows any alignment. The performance of movups is quite high (especially on newer CPUs).

Luke Wagner [:luke]

Reporter

Comment 25

•

11 years ago

Ah, thanks for explaining. ARM also has no 16-byte alignment requirements?

John McCutchan

Comment 26

•

11 years ago

Dart VM has all SIMD types fully accelerated on NEON without any alignment restrictions.

John McCutchan

Comment 27

•

11 years ago

Dart VM has all SIMD types fully accelerated on NEON without any alignment restrictions.

Gabriele Svelto [:gsvelto]

Comment 28

•

11 years ago

(In reply to Luke Wagner [:luke] from comment #25) > Ah, thanks for explaining. ARM also has no 16-byte alignment requirements? Alignment requirements should be optional for NEON instructions, see here for more information: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344e/Cihejdic.html

Niko Matsakis [:nmatsakis]

Comment 29

•

11 years ago

(In reply to John McCutchan from comment #27) > Dart VM has all SIMD types fully accelerated on NEON without any alignment > restrictions. OK. So are intending to fix the alignment requirements of float32x4 as the same as float32 then?

John McCutchan

Comment 30

•

11 years ago

The specification will not require any alignment of the 128-bits of payload data in a float32x4 or int32x4 value. That is an implementation detail of each VM.

Luke Wagner [:luke]

Reporter

Comment 31

•

11 years ago

It would be observable via Typed Objects (StructType({x:uint8, y:float32x4})).

John McCutchan

Comment 32

•

11 years ago

If Typed Objects requires a specified alignment, float32 alignment should be fine.

John McCutchan

Comment 33

•

11 years ago

That being said, the SIMD specification is independent of Typed Object specification. It's already been fully accelerated in V8 without the need for typed object based storage. I am not sure why ?Monkey can't support a new native type with 128-bits of storage.

haitao

Comment 34

•

11 years ago

nmatsakis, thanks for the explanation on the difference of value types and struct types. > This is not quite right. For one thing, the alignment requirements of such > type would be the same as float32, but we want a stricter alignment. For > another, float32x4 is supposed to be a value type, which struct types are > not. The difference can be seen when you embed a float32x4 in another type, > e.g.: > > var MyArray = new ArrayType(float32x4, 64); > var myArray = new MyArray(); > var tmp = myArray[12]; > myArray[12] = float32x4(1, 2, 3, 4); > > At the end of this fragment, I believe that `tmp` should have the value > `float32x4(0,0,0,0)`. But if we implemented `float32x4` as a `StructType`, > then `tmp` would alias `myArray[12]` and hence have the value > `float32x4(1,2,3,4)`.

Niko Matsakis [:nmatsakis]

Comment 35

•

11 years ago

(In reply to John McCutchan from comment #32) > If Typed Objects requires a specified alignment, float32 alignment should be > fine. One question is what is most forwards compatible. Typed arrays have traditionally taken a pretty strict view on maintaining data alignment, I believe for maximum compatibility across hardware. I think we should probably do the same, particularly since it is mostly inobservable (as you point out) except when `float32x4` values are mixed with other values.

Niko Matsakis [:nmatsakis]

Comment 36

•

11 years ago

(In reply to John McCutchan from comment #33) > That being said, the SIMD specification is independent of Typed Object > specification. It's already been fully accelerated in V8 without the need > for typed object based storage. I am not sure why ?Monkey can't support a > new native type with 128-bits of storage. The two specs are not fully independent -- going forward, as I understand it, float32x4 is intended to be a value type constructor, and values types fit into the framework of typed objects. That said, I agree that the SIMD spec "stands alone" to some extent and can be implemented separately. But given that we have relatively mature support for typed objects, there is no reason to do so. Anyway adding `float32x4` as a new kind of typed object is not a lot of work. I will try to tackle it today, and certainly I believe it can be done this week.

Niko Matsakis [:nmatsakis]

Updated

•

11 years ago

Depends on: 938728

John McCutchan

Comment 37

•

11 years ago

Yes, good point, float32x4 and int32x4 entries within a typed array should be aligned to 16-byte boundaries.

Niko Matsakis [:nmatsakis]

Updated

•

11 years ago

Depends on: 942311

Ivan Jibaja

Updated

•

11 years ago

Depends on: 943766

Ivan Jibaja

Updated

•

11 years ago

Depends on: 943767

Ivan Jibaja

Updated

•

11 years ago

Depends on: 943769

Ivan Jibaja

Comment 38

•

11 years ago

Based on our last meeting today and from Niko's writeup, I've created 3 bugzilla entries for the first 3 steps from the following list of action items: 1. Add MIRType_float32x4 and MIRType_int32x4 2. Add the following MIR instructions: • MPackVector(v1, v2, v3, v4) -> {MIRType_float32x4, MIRType_int32x4} • MLoadVector(elements, offset) -> {MIRType_float32x4, MIRType_int32x4} • MStoreVector(elements, offset, value) where value : {MIRType_float32x4, MIRType_int32x4} • MAddVector(value1, value2) where value1, value2 : {MIRType_float32x4, MIRType_int32x4} • MVectorTypedObject(value) -> MIRType_object (similar to MDerivedTypeObject) 3. Calls to SIMD.float32x4.add are inlined specially if: • Both operands are typed objects of type T where T = either float32x4 or int32x4 • In that case, insert MLoadVector for each argument (using existing code that skips indirection as a model – detecting MDerivedTypeObject and MVectorTypedObject and shortcircuit) • Insert MAddVector with unboxed operands • Insert MVectorTypedObject to box result 4. Loads of typed object values of type float32x4 are modified to do a MLoadVector and MVectorTypedObject 5. Stores into typed object lvalues of type float32x4 where value we are storing from is a MVectorTypedObject would be optimized to avoid intermediate 6. Add appropriate LIR instructions and modify register allocator etc.

Ivan Jibaja

Updated

•

11 years ago

Depends on: 946042

Dan Gohman [:sunfish]

Updated

•

11 years ago

Depends on: 947711

haitao

Updated

•

11 years ago

Depends on: 956245

haitao

Updated

•

11 years ago

Depends on: 956250

Chad Austin

Comment 39

•

11 years ago

I didn't know this until now, so maybe this is new information for some of you too. Per Agner Fog's instruction tables ( http://www.agner.org/optimize/instruction_tables.pdf ), on Sandy Bridge, Ivy Bridge, and Haswell, MOVAPS and MOVUPS have identical performance. Nehalem is the last CPU where, MOVUPS had lower reciprocal throughput. Just an FYI.

Dan Gohman [:sunfish]

Comment 40

•

11 years ago

Besides MOVAPS and MOVUPS, there is a third kind, which is to use a memory operand folded into an arithmetic instruction. Even with the newest CPUs, the regular SSE encoding (what Intel already appears to be calling the "Legacy" encoding) still requires full alignment. With the new VEX encoding, memory operands do support unaligned accesses. VEX encodings are supported on Sandy Bridge and later.

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Assignee: general → nobody

Benjamin Bouvier [:bbouvier] (inactive)

Comment 41

•

10 years ago

Should we close this bug? Some SIMD support has already landed in the interpreter. I can't see any reference to uint32x4 in the polyfill nor in our implementation, so I think we're fine here? (we have int32x4 instead)

Flags: needinfo?(luke)

Luke Wagner [:luke]

Reporter

Comment 42

•

10 years ago

Yeah, it seems like all the work has moved elsewhere.

Status: NEW → RESOLVED

Closed: 10 years ago

Flags: needinfo?(luke)

Resolution: --- → FIXED

Example. 11 years ago Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?) 10.58 KB, patch		Details \| Diff \| Splinter Review
Example. 11 years ago Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?) 11.33 KB, patch		Details \| Diff \| Splinter Review
SIMD, Float32x4, Uint32x4 builtin objects 11 years ago Ivan Jibaja 22.06 KB, patch		Details \| Diff \| Splinter Review
SIMD, Float32x4, Uint32x4 builtin objects 11 years ago Ivan Jibaja 22.06 KB, patch		Details \| Diff \| Splinter Review