Closed
Bug 904913
Opened 11 years ago
Closed 10 years ago
Add experimental support for Float32x4, SIMD object and SIMD functions
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: luke, Unassigned)
References
(Depends on 2 open bugs)
Details
Attachments
(2 files, 2 obsolete files)
11.33 KB,
patch
|
Details | Diff | Splinter Review | |
22.06 KB,
patch
|
Details | Diff | Splinter Review |
No description provided.
Reporter | ||
Comment 1•11 years ago
|
||
Oops, hit <Enter> before I finished filling out the bug:
We have a proposal (discussed in bug 894105 comment 5) for adding SIMD that involves a few new objects and functions: Float32x4 (the datum operated on by SIMD operations), a SIMD object, and a bunch of SIMD functions that are properties of the SIMD object. (Hopefully, SIMD will eventually a module, not an object.)
This bug is to add just the basic VM support behind a pref that is default off on all channels. The main work is in Ion in bug 894105.
Assignee: general → nsm.nikhil
Comment on attachment 790541 [details] [diff] [review]
Example.
Luke, as we discussed.
Attachment #790541 -
Flags: review?(luke)
Comment 4•11 years ago
|
||
Comment on attachment 790541 [details] [diff] [review]
Example.
Review of attachment 790541 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/builtin/Float32x4.cpp
@@ +40,5 @@
> + NULL
> +};
> +
> +const JSPropertySpec Float32x4::properties[] = {
> + JS_PSG("w", getW, 0),
Stupid question, why "w" and not "t"?
Comment 5•11 years ago
|
||
We are trying to follow https://github.com/johnmccutchan/ecmascript_simd
Reporter | ||
Comment 6•11 years ago
|
||
Comment on attachment 790541 [details] [diff] [review]
Example.
Review of attachment 790541 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks Nikhil! Everything looks pretty nice; I'd like to take another pass with these comments addressed:
::: js/src/builtin/Float32x4.cpp
@@ +21,5 @@
> +
> +using namespace js;
> +
> +Class Float32x4::class_ = {
> + "Float32x4",
I just learned that the new syntactic convention is that immutable value objects start with lowercase, so this should be 'float32x4'.
@@ +75,5 @@
> +
> +bool
> +Float32x4::construct(JSContext *cx, unsigned int argc, jsval *vp)
> +{
> + if (!JS_IsConstructing(cx, vp)) {
I think the current consensus wants the opposite: only the function call form is allowed and passing 'new' throws. (The idea is that 'new' means a new identity with mutable reference semantics, and with float32x4 (and int64 etc) we are creating a value with no unique identity.)
@@ +88,5 @@
> + !args[0].isNumber() ||
> + !args[1].isNumber() ||
> + !args[2].isNumber() ||
> + !args[3].isNumber()) {
> + JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_FLOAT32X4_BAD_ARGS);
Instead of requiring that they are already numbers, the usual JS thing to do is to apply ToNumber(cx, args[i]) to each argument.
@@ +97,5 @@
> + if (!obj)
> + return false;
> +
> + // The float cast is since the values have to be stored as
> + // single-precision.
It might be worth explaining a bit more:
"The semantics of Float32x4 is to store 4 float32 values. However, when any value is read, it is immediately coerced back to a JS number (i.e., float64). Thus, float32x4(a,b,c,d).x === double(float(ToNumber(a))). Since the contents of float32x4 are not aliased, we don't actually need to store a float32, we can store the coerced float64, which is good because JS::Value only provides float64."
@@ +111,5 @@
> +Float32x4::getLane(JSContext *cx, unsigned int argc, jsval *vp, unsigned int slot)
> +{
> + JS_ASSERT(slot < SLOT_COUNT);
> + CallArgs args = CallArgsFromVp(argc, vp);
> + JS_ASSERT(Float32x4::is(args.thisv()));
I don't think we can assert that. Evil JS code can pull off the accessor function with Object.getPropertyDescriptor(obj).get and call it passing any random value as 'this'. An example of this is if you look at the native getter/setters in vm/Debugger.cpp (grep JS_PSG).
::: js/src/builtin/Float32x4.h
@@ +21,5 @@
> + static bool is(const Value &v);
> +
> + static bool construct(JSContext *cx, unsigned int argc, jsval *vp);
> +
> + static bool getLane(JSContext *cx, unsigned int argc, jsval *vp, unsigned int slot);
You can take out the \n between all these.
::: js/src/jsprototypes.h
@@ +68,5 @@
> macro(float64, 49, js_InitBinaryDataClasses) \
> macro(ArrayType, 50, js_InitBinaryDataClasses) \
> macro(StructType, 51, js_InitBinaryDataClasses) \
> macro(ArrayTypeObject, 52, js_InitBinaryDataClasses) \
> + macro(Float32x4, 53, js_InitFloat32x4) \
Similarly, float32x4 here too. Alternatively, do you think float32x4 could be in BinaryData.cpp and be initialized by js_InitBinaryDataClass?
Attachment #790541 -
Flags: review?(luke)
Attachment #790541 -
Attachment is obsolete: true
Comment 8•11 years ago
|
||
Comment on attachment 790942 [details] [diff] [review]
Example.
Review of attachment 790942 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/builtin/Float32x4.cpp
@@ +101,5 @@
> + JSMSG_NOT_CONSTRUCTOR, "Float32x4");
> + return false;
> + }
> +
> + CallArgs args = CallArgsFromVp(argc, vp);
If you're going to check constructing-ness at all, use args.isConstructing().
@@ +103,5 @@
> + }
> +
> + CallArgs args = CallArgsFromVp(argc, vp);
> +
> + if (argc < 4) {
args.length()
@@ +105,5 @@
> + CallArgs args = CallArgsFromVp(argc, vp);
> +
> + if (argc < 4) {
> + JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL,
> + JSMSG_MORE_ARGS_NEEDED, "float32x4", argc, "s");
args.length()
Comment 9•11 years ago
|
||
My one meta-comment is that I think float32x4 should be defined as a binary data constructor. To really take advantage of this in the JIT -- as is the ultimate goal -- it will however be necessary to build on the patches in bug 898347, which are still under review though I expect it will land soon. I would be happy to do that implementation work and expect it can be done relatively quickly, though I don't have a time estimate yet. I will put some effort into it today and try to produce one. There may still be value in landing the current set of patches, and then porting the relevant work over to the new context, I am not sure.
Comment 10•11 years ago
|
||
The order or the .x, .y, .z, .w is wrong:
js> var x = float32x4(1,2,3,4)
js> x
({})
js> x.w
1
js> x.x
4
js> x.y
3
js> x.z
2
I would’ve expected .x == 1, .y == 2, .z == 3, .w == 4.
This would be in accordance with the spec:
https://github.com/johnmccutchan/ecmascript_simd
Comment 11•11 years ago
|
||
When calling the float32x4() function without arguments, or with less than 4 arguments, the js shell segfaults:
js> var x = float32x4()
will crash the shell.
Comment 12•11 years ago
|
||
Comment on attachment 790942 [details] [diff] [review]
Example.
This error message still uses the uppercasing of Float32x4. It should use "float32x4"
+Float32x4::construct(JSContext *cx, unsigned int argc, jsval *vp)
+{
+ if (JS_IsConstructing(cx, vp)) {
+ JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL,
+ JSMSG_NOT_CONSTRUCTOR, "Float32x4");
+ return false;
+ }
+
Otherwise, you'll get an error message that looks like this:
js> var x = new float32x4(1,2,3,4)
typein:1:4 TypeError: Float32x4 is not a constructor
Assignee: nsm.nikhil → general
Comment 13•11 years ago
|
||
Has there been further progress on this issue beyond Nikhil's patch? The work on typed objects is reaching a level of maturity that I could soon investigate adding float32x4 as a kind of type constructor.
Comment 14•11 years ago
|
||
This patch has the 3 new builtin objects: float32x4, uint32x4, and SIMD.
The SIMD object has 4 functions for now:
float32x4 SIMD.addf(float32x4 op1, float32x4 op2)
float32x4 SIMD.mulf(float32x4 op1, float32x4 op2)
uint32x4 SIMD.addu(uint32x4 op1, uint32x4 op2)
uint32x4 SIMD.mulu(uint32x4 op1, uint32x4 op2)
With this basic VM support, we are starting the implementation work to get IonMonkey to generate SIMD instructions.
Comment 15•11 years ago
|
||
Hi,
These method names are incorrect and do not match the polyfill.
John
Comment 16•11 years ago
|
||
Revised based on John's comment:
This patch has the 3 new builtin objects: float32x4, uint32x4, and SIMD.
The SIMD object has 4 functions for now:
float32x4 SIMD.add(float32x4 op1, float32x4 op2)
float32x4 SIMD.mul(float32x4 op1, float32x4 op2)
uint32x4 SIMD.addu32(uint32x4 op1, uint32x4 op2)
uint32x4 SIMD.mulu32(uint32x4 op1, uint32x4 op2)
With this basic VM support, we are starting the implementation work to get IonMonkey to generate SIMD instructions.
Attachment #804727 -
Attachment is obsolete: true
Comment 17•11 years ago
|
||
Comment on attachment 790541 [details] [diff] [review]
Example.
Review of attachment 790541 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/builtin/Float32x4.cpp
@@ +97,5 @@
> + if (!obj)
> + return false;
> +
> + // The float cast is since the values have to be stored as
> + // single-precision.
It is better that we store 4 floats instead of 4 doubles. If we store 4 doubles, the lane access is easier but it is less efficient to unbox/box them into a XMM register. For 4 float32s, a single movups/movaps instruction could do the job.
Comment 18•11 years ago
|
||
Hi,
Is there a more up to date patch? The last patch is from September 13th, it has many issues, but before I get into it maybe there is a more to date patch?
Thanks,
John
Comment 19•11 years ago
|
||
Hi,
nmatsakis mentioned to use typed object to implement float32x4 and int32x4. After reading http://wiki.ecmascript.org/doku.php?id=harmony:typed_objects, it seems that we only need to initialize:
const float32x4 = new StructType({ x: float32, y: float32, z:float32, w:float32});
const int32x4 = new StructType({ x: int32, y: int32, z:int32, w:int32});
when the engine starts a new context.
Thanks
-Haitao
Comment 20•11 years ago
|
||
(In reply to haitao from comment #19)
> nmatsakis mentioned to use typed object to implement float32x4 and int32x4.
> After reading http://wiki.ecmascript.org/doku.php?id=harmony:typed_objects,
> it seems that we only need to initialize:
> const float32x4 = new StructType({ x: float32, y: float32, z:float32,
> w:float32});
> const int32x4 = new StructType({ x: int32, y: int32, z:int32, w:int32});
> when the engine starts a new context.
This is not quite right. For one thing, the alignment requirements of such type would be the same as float32, but we want a stricter alignment. For another, float32x4 is supposed to be a value type, which struct types are not. The difference can be seen when you embed a float32x4 in another type, e.g.:
var MyArray = new ArrayType(float32x4, 64);
var myArray = new MyArray();
var tmp = myArray[12];
myArray[12] = float32x4(1, 2, 3, 4);
At the end of this fragment, I believe that `tmp` should have the value `float32x4(0,0,0,0)`. But if we implemented `float32x4` as a `StructType`, then `tmp` would alias `myArray[12]` and hence have the value `float32x4(1,2,3,4)`.
Comment 21•11 years ago
|
||
Just a quick note: The Dart VM and the V8 implementations use unaligned memory access. We do not expect or require 16-byte alignment of the payload data in a float32x4.
Reporter | ||
Comment 22•11 years ago
|
||
(In reply to John McCutchan from comment #21)
Doesn't that produce problems when performing xmm loads? I was under the impression that those faulted if the address wasn't 16-byte aligned.
Comment 23•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #22)
> (In reply to John McCutchan from comment #21)
> Doesn't that produce problems when performing xmm loads? I was under the
> impression that those faulted if the address wasn't 16-byte aligned.
Only when using MOVAPS instructions which require 16-bytes alignment; when using MOVUPS this is not a problem.
Comment 24•11 years ago
|
||
movaps will fault when the address isn't 16-byte aligned but movups allows any alignment. The performance of movups is quite high (especially on newer CPUs).
Reporter | ||
Comment 25•11 years ago
|
||
Ah, thanks for explaining. ARM also has no 16-byte alignment requirements?
Comment 26•11 years ago
|
||
Dart VM has all SIMD types fully accelerated on NEON without any alignment restrictions.
Comment 27•11 years ago
|
||
Dart VM has all SIMD types fully accelerated on NEON without any alignment restrictions.
Comment 28•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #25)
> Ah, thanks for explaining. ARM also has no 16-byte alignment requirements?
Alignment requirements should be optional for NEON instructions, see here for more information:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344e/Cihejdic.html
Comment 29•11 years ago
|
||
(In reply to John McCutchan from comment #27)
> Dart VM has all SIMD types fully accelerated on NEON without any alignment
> restrictions.
OK. So are intending to fix the alignment requirements of float32x4 as the same as float32 then?
Comment 30•11 years ago
|
||
The specification will not require any alignment of the 128-bits of payload data in a float32x4 or int32x4 value. That is an implementation detail of each VM.
Reporter | ||
Comment 31•11 years ago
|
||
It would be observable via Typed Objects (StructType({x:uint8, y:float32x4})).
Comment 32•11 years ago
|
||
If Typed Objects requires a specified alignment, float32 alignment should be fine.
Comment 33•11 years ago
|
||
That being said, the SIMD specification is independent of Typed Object specification. It's already been fully accelerated in V8 without the need for typed object based storage. I am not sure why ?Monkey can't support a new native type with 128-bits of storage.
Comment 34•11 years ago
|
||
nmatsakis, thanks for the explanation on the difference of value types and struct types.
> This is not quite right. For one thing, the alignment requirements of such
> type would be the same as float32, but we want a stricter alignment. For
> another, float32x4 is supposed to be a value type, which struct types are
> not. The difference can be seen when you embed a float32x4 in another type,
> e.g.:
>
> var MyArray = new ArrayType(float32x4, 64);
> var myArray = new MyArray();
> var tmp = myArray[12];
> myArray[12] = float32x4(1, 2, 3, 4);
>
> At the end of this fragment, I believe that `tmp` should have the value
> `float32x4(0,0,0,0)`. But if we implemented `float32x4` as a `StructType`,
> then `tmp` would alias `myArray[12]` and hence have the value
> `float32x4(1,2,3,4)`.
Comment 35•11 years ago
|
||
(In reply to John McCutchan from comment #32)
> If Typed Objects requires a specified alignment, float32 alignment should be
> fine.
One question is what is most forwards compatible. Typed arrays have traditionally
taken a pretty strict view on maintaining data alignment, I believe for maximum
compatibility across hardware. I think we should probably do the same, particularly
since it is mostly inobservable (as you point out) except when `float32x4` values
are mixed with other values.
Comment 36•11 years ago
|
||
(In reply to John McCutchan from comment #33)
> That being said, the SIMD specification is independent of Typed Object
> specification. It's already been fully accelerated in V8 without the need
> for typed object based storage. I am not sure why ?Monkey can't support a
> new native type with 128-bits of storage.
The two specs are not fully independent -- going forward, as I understand it,
float32x4 is intended to be a value type constructor, and values types fit into
the framework of typed objects. That said, I agree that the SIMD spec "stands
alone" to some extent and can be implemented separately. But given that we
have relatively mature support for typed objects, there is no reason to do so.
Anyway adding `float32x4` as a new kind of typed object is not a lot of work.
I will try to tackle it today, and certainly I believe it can be done this week.
Comment 37•11 years ago
|
||
Yes, good point, float32x4 and int32x4 entries within a typed array should be aligned to 16-byte boundaries.
Comment 38•11 years ago
|
||
Based on our last meeting today and from Niko's writeup, I've created 3 bugzilla entries for the first 3 steps from the following list of action items:
1. Add MIRType_float32x4 and MIRType_int32x4
2. Add the following MIR instructions:
• MPackVector(v1, v2, v3, v4) -> {MIRType_float32x4, MIRType_int32x4}
• MLoadVector(elements, offset) -> {MIRType_float32x4, MIRType_int32x4}
• MStoreVector(elements, offset, value) where value : {MIRType_float32x4, MIRType_int32x4}
• MAddVector(value1, value2) where value1, value2 : {MIRType_float32x4, MIRType_int32x4}
• MVectorTypedObject(value) -> MIRType_object (similar to MDerivedTypeObject)
3. Calls to SIMD.float32x4.add are inlined specially if:
• Both operands are typed objects of type T where T = either float32x4 or int32x4
• In that case, insert MLoadVector for each argument (using existing code that skips indirection as a model – detecting MDerivedTypeObject and MVectorTypedObject and shortcircuit)
• Insert MAddVector with unboxed operands
• Insert MVectorTypedObject to box result
4. Loads of typed object values of type float32x4 are modified to do a MLoadVector and MVectorTypedObject
5. Stores into typed object lvalues of type float32x4 where value we are storing from is a MVectorTypedObject would be optimized to avoid intermediate
6. Add appropriate LIR instructions and modify register allocator etc.
Comment 39•11 years ago
|
||
I didn't know this until now, so maybe this is new information for some of you too. Per Agner Fog's instruction tables ( http://www.agner.org/optimize/instruction_tables.pdf ), on Sandy Bridge, Ivy Bridge, and Haswell, MOVAPS and MOVUPS have identical performance. Nehalem is the last CPU where, MOVUPS had lower reciprocal throughput.
Just an FYI.
Comment 40•11 years ago
|
||
Besides MOVAPS and MOVUPS, there is a third kind, which is to use a memory operand folded into an arithmetic instruction. Even with the newest CPUs, the regular SSE encoding (what Intel already appears to be calling the "Legacy" encoding) still requires full alignment.
With the new VEX encoding, memory operands do support unaligned accesses. VEX encodings are supported on Sandy Bridge and later.
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Comment 41•10 years ago
|
||
Should we close this bug? Some SIMD support has already landed in the interpreter. I can't see any reference to uint32x4 in the polyfill nor in our implementation, so I think we're fine here? (we have int32x4 instead)
Flags: needinfo?(luke)
Reporter | ||
Comment 42•10 years ago
|
||
Yeah, it seems like all the work has moved elsewhere.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(luke)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•