JM: Add specialized codepaths for typed arrays

RESOLVED FIXED

Status

()

Core
JavaScript Engine
RESOLVED FIXED
7 years ago
6 years ago

People

(Reporter: bz, Assigned: dvander)

Tracking

(Blocks: 3 bugs)

Trunk
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-in-tracemonkey, URL)

Attachments

(2 attachments, 7 obsolete attachments)

On the testcase in the url field, we spend 11% of the time doing SetElem on typed arrays (canvas imagedata) and another 2% of the time doing GetElem.
Blocks: 599009

Updated

7 years ago
Blocks: 590379

Updated

7 years ago
Blocks: 601988
Blocks: 604905

Updated

7 years ago
Blocks: 497458
(Assignee)

Updated

7 years ago
Depends on: 602641
(Assignee)

Updated

7 years ago
Depends on: 607293
(Assignee)

Updated

7 years ago
Assignee: general → dvander
Status: NEW → ASSIGNED
(Assignee)

Comment 1

7 years ago
Created attachment 487055 [details] [diff] [review]
part 1: ICs for reading from typed arrays
Attachment #487055 - Flags: review?(dmandelin)
(Assignee)

Comment 2

7 years ago
The second test case in bug 601988 goes from 3.1s to 1.7s. v8 gets 2.6s.

bug 604905's test case:

pre -m -j:
(10x10) = 1ms
(50x50) = 4ms
(100x100) = 13ms
(500x500) = 364ms
(1000x1000) = 1500ms
 
post -m -j:
(10x10) = 1ms
(50x50) = 3ms
(100x100) = 10ms
(500x500) = 283ms
(1000x1000) = 1160ms

v8:
(10x10) = 1ms
(50x50) = 2ms
(100x100) = 2ms
(500x500) = 52ms
(1000x1000) = 191ms
Comment on attachment 487055 [details] [diff] [review]
part 1: ICs for reading from typed arrays

Do we already have tests for 32-bit typed arrays, especially the unsigned ones? We need to have those for coverage on the conversion optimizations.

>+static inline int
>+TypedArraySlotWidth(js::TypedArray *tarray)

Maybe this should be a method of TypedArray?

>+LookupStatus
>+GetElementIC::attachTypedArray(JSContext *cx, JSObject *obj, const Value &v, Value *vp)

I really liked the comments within this function--they made it much easier to review.

>+    // Return uncacheable so GetElement/CallElement will fetch the value.
>+    return Lookup_Uncacheable;

Ick. This would probably cause confusion someday. How about a new return value that indicates a cache was generated but we still need to do the lookup? Or pull the lookup into a helper function that can be called from here and return Lookup_Cacheable?
Attachment #487055 - Flags: review?(dmandelin)
(Assignee)

Comment 4

7 years ago
Created attachment 487066 [details] [diff] [review]
part 1.1: ICs for reading from typed arrays

forgot to hg add the int32 tests

New return value for LookupStatus would require auditing all its callers so this patch just fetches the value.
Attachment #487066 - Flags: review?(dmandelin)
Attachment #487066 - Flags: review?(dmandelin) → review+
dvander, what do things look like on the bug 604905 testcase if you do what the beginning of bug 604905 comment 17 says?

Comment 6

7 years ago
+1
Should definitely be a priority.
(Assignee)

Updated

7 years ago
Depends on: 609222
(Assignee)

Comment 7

7 years ago
Created attachment 489397 [details] [diff] [review]
fix x64 SingleByteRegs

The "set" part of this patch - forthcoming - changes slightly how the assembler handles single-byte registers, since int8/uint8 arrays stress that path more than usual.

It looks like x64 restricts its single-byte register range, but the spec says that with a REX prefix (which the macro assembler emits), all sixteen registers have single-byte aliases.
Attachment #489397 - Flags: review?(sstangl)
(Assignee)

Comment 8

7 years ago
Created attachment 489402 [details] [diff] [review]
WIP, setting integer arrays

The set part of this patch turned out to be significantly harder, but everything is in place to finish this up very soon. The problem is that when assigning to a typed array some kind of conversion might need to happen, and rather than give up and roll over - we do the conversion inline, like the tracer.

But this requires potentially making a call and/or mutating the RHS, which we haven't done in an IC so far. So a good part of this patch is expanding the call abstractions introduced in bug 609222 to accommodate more advanced use.

Remaining: float32/float64 arrays.

Updated

7 years ago
Attachment #489397 - Flags: review?(sstangl) → review+
(Assignee)

Comment 9

7 years ago
Created attachment 489434 [details] [diff] [review]
SetElementIC changes
Attachment #489434 - Flags: review?(cdleary)
(Assignee)

Comment 10

7 years ago
Comment on attachment 489434 [details] [diff] [review]
SetElementIC changes

Vlad, mind double-checking the typed-array coercion logic?
Attachment #489434 - Flags: review?(vladimir)
(Assignee)

Updated

7 years ago
Depends on: 610901
Comment on attachment 489434 [details] [diff] [review]
SetElementIC changes

Looks quite reasonable to me -- I don't know the jaeger jit that well, but nothing jumped out at me.
Attachment #489434 - Flags: review?(vladimir) → review+
(Assignee)

Comment 12

7 years ago
Created attachment 489590 [details] [diff] [review]
rolled up patch

applies against tip + bug 609222
Attachment #487055 - Attachment is obsolete: true
Attachment #489402 - Attachment is obsolete: true
(Assignee)

Comment 13

7 years ago
Created attachment 489879 [details] [diff] [review]
rolled up patch v2

testSetTypedIntArray started failing after bug 592976 landed. It looks like the increased register pressure found an edge case where no single byte registers were available. This patch (rolled up, sorry, qfolded everything yesterday) changes StoreToTypedArray to account for that.
Attachment #489434 - Attachment is obsolete: true
Attachment #489590 - Attachment is obsolete: true
Attachment #489879 - Flags: review?(cdleary)
Attachment #489434 - Flags: review?(cdleary)

Updated

7 years ago
Blocks: 612019
Comment on attachment 489879 [details] [diff] [review]
rolled up patch v2

This patch kind of made my head explode, but I do have an understanding of all the cases that are laid out and their correctness WRT the tests. Only a few minor comments:

- The call ABI stuff is cool -- hand't looked at that before.
- We can probably use a store64 for the ImmDoubles in the masm interface.
- Would be nice to have a commend on addressOfExtra that it uses a constant displacement based on stackOffset and so all space must be already-allocated before calling it.
- Would be cool to assert TempRegs fits in that 16b volatile bitmask
- I think it's |temp regs - arg regs| instead of union (symmetric difference).

Nice patch -- leading the way for even *more* insane ICs!
Attachment #489879 - Flags: review?(cdleary) → review+
Blocks: 619761
(Assignee)

Comment 15

7 years ago
Created attachment 501500 [details] [diff] [review]
rebased
Attachment #487066 - Attachment is obsolete: true
Attachment #489397 - Attachment is obsolete: true
(Assignee)

Comment 16

7 years ago
Created attachment 501501 [details] [diff] [review]
rebased, right patch
Attachment #501500 - Attachment is obsolete: true
(Assignee)

Comment 17

7 years ago
http://hg.mozilla.org/tracemonkey/rev/5bb0f4c62370
Whiteboard: fixed-in-tracemonkey

Comment 18

7 years ago
Not that I can really follow what's going on, but was this intended:

    5.48 +        OP2_CVTSS2SD_VsdEd  = 0x5A,
    5.49 +        OP2_CVTSD2SS_VsdEd  = 0x5A,

(ie both enumerations the same value)?
It's okay. CVTSS2SD is F3 0F 5A  and CVTSD2SS is F2 0F 5A. In cvtss2sd_rr the prefix F3 is used and in cvtsd2ss_rr the prefix F2.
http://hg.mozilla.org/mozilla-central/rev/5bb0f4c62370
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED

Comment 21

7 years ago
I'm getting a 5% perf boost in my code now. I'd declare this a success. :D

Updated

7 years ago
Depends on: 624483
Depends on: 625141

Updated

7 years ago
Blocks: 628170
No longer blocks: 628170
Depends on: 628170

Updated

7 years ago
Blocks: 652346
You need to log in before you can comment on or make changes to this bug.