On the testcase in the url field, we spend 11% of the time doing SetElem on typed arrays (canvas imagedata) and another 2% of the time doing GetElem.
Created attachment 487055 [details] [diff] [review] part 1: ICs for reading from typed arrays
The second test case in bug 601988 goes from 3.1s to 1.7s. v8 gets 2.6s. bug 604905's test case: pre -m -j: (10x10) = 1ms (50x50) = 4ms (100x100) = 13ms (500x500) = 364ms (1000x1000) = 1500ms post -m -j: (10x10) = 1ms (50x50) = 3ms (100x100) = 10ms (500x500) = 283ms (1000x1000) = 1160ms v8: (10x10) = 1ms (50x50) = 2ms (100x100) = 2ms (500x500) = 52ms (1000x1000) = 191ms
Comment on attachment 487055 [details] [diff] [review] part 1: ICs for reading from typed arrays Do we already have tests for 32-bit typed arrays, especially the unsigned ones? We need to have those for coverage on the conversion optimizations. >+static inline int >+TypedArraySlotWidth(js::TypedArray *tarray) Maybe this should be a method of TypedArray? >+LookupStatus >+GetElementIC::attachTypedArray(JSContext *cx, JSObject *obj, const Value &v, Value *vp) I really liked the comments within this function--they made it much easier to review. >+ // Return uncacheable so GetElement/CallElement will fetch the value. >+ return Lookup_Uncacheable; Ick. This would probably cause confusion someday. How about a new return value that indicates a cache was generated but we still need to do the lookup? Or pull the lookup into a helper function that can be called from here and return Lookup_Cacheable?
Created attachment 487066 [details] [diff] [review] part 1.1: ICs for reading from typed arrays forgot to hg add the int32 tests New return value for LookupStatus would require auditing all its callers so this patch just fetches the value.
dvander, what do things look like on the bug 604905 testcase if you do what the beginning of bug 604905 comment 17 says?
+1 Should definitely be a priority.
Created attachment 489397 [details] [diff] [review] fix x64 SingleByteRegs The "set" part of this patch - forthcoming - changes slightly how the assembler handles single-byte registers, since int8/uint8 arrays stress that path more than usual. It looks like x64 restricts its single-byte register range, but the spec says that with a REX prefix (which the macro assembler emits), all sixteen registers have single-byte aliases.
Created attachment 489402 [details] [diff] [review] WIP, setting integer arrays The set part of this patch turned out to be significantly harder, but everything is in place to finish this up very soon. The problem is that when assigning to a typed array some kind of conversion might need to happen, and rather than give up and roll over - we do the conversion inline, like the tracer. But this requires potentially making a call and/or mutating the RHS, which we haven't done in an IC so far. So a good part of this patch is expanding the call abstractions introduced in bug 609222 to accommodate more advanced use. Remaining: float32/float64 arrays.
Created attachment 489434 [details] [diff] [review] SetElementIC changes
Comment on attachment 489434 [details] [diff] [review] SetElementIC changes Vlad, mind double-checking the typed-array coercion logic?
Comment on attachment 489434 [details] [diff] [review] SetElementIC changes Looks quite reasonable to me -- I don't know the jaeger jit that well, but nothing jumped out at me.
Created attachment 489590 [details] [diff] [review] rolled up patch applies against tip + bug 609222
Created attachment 489879 [details] [diff] [review] rolled up patch v2 testSetTypedIntArray started failing after bug 592976 landed. It looks like the increased register pressure found an edge case where no single byte registers were available. This patch (rolled up, sorry, qfolded everything yesterday) changes StoreToTypedArray to account for that.
Comment on attachment 489879 [details] [diff] [review] rolled up patch v2 This patch kind of made my head explode, but I do have an understanding of all the cases that are laid out and their correctness WRT the tests. Only a few minor comments: - The call ABI stuff is cool -- hand't looked at that before. - We can probably use a store64 for the ImmDoubles in the masm interface. - Would be nice to have a commend on addressOfExtra that it uses a constant displacement based on stackOffset and so all space must be already-allocated before calling it. - Would be cool to assert TempRegs fits in that 16b volatile bitmask - I think it's |temp regs - arg regs| instead of union (symmetric difference). Nice patch -- leading the way for even *more* insane ICs!
Created attachment 501500 [details] [diff] [review] rebased
Created attachment 501501 [details] [diff] [review] rebased, right patch
Not that I can really follow what's going on, but was this intended: 5.48 + OP2_CVTSS2SD_VsdEd = 0x5A, 5.49 + OP2_CVTSD2SS_VsdEd = 0x5A, (ie both enumerations the same value)?
It's okay. CVTSS2SD is F3 0F 5A and CVTSD2SS is F2 0F 5A. In cvtss2sd_rr the prefix F3 is used and in cvtsd2ss_rr the prefix F2.
I'm getting a 5% perf boost in my code now. I'd declare this a success. :D