Step 2 of Typed Array creation speedups

RESOLVED FIXED in mozilla8

Status

()

Core
JavaScript Engine
--
enhancement
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: nsm, Assigned: nsm)

Tracking

(Blocks: 1 bug)

unspecified
mozilla8
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 13 obsolete attachments)

83.37 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
49.80 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Build Identifier: 

This set of patches applies on top of bug 656519
and introduces the following improvements.

The TypedArray C++ object held by the JSObject has been dropped in favour of putting all properties in inline slots in the JSObject representing the JS TypedArray. Some changes in the TypedArray public API to hide implementation details. This includes updates to TM and JM to generate proper instructions.
Creation time has reduced even more. The actual Array Buffer is still in a JSObject for the Array Buffer and not inlined for really small arrays.

Another patch fixes JM to generate code for 'length' property access on TypedArray which was missing before. This is primarily responsible for faster access times when length is referenced in a loop.

The uses of the API in Firefox have been fixed, and passes try server - http://tbpl.mozilla.org/?tree=Try&rev=d449d210cc8d

Benchmarks:
[access] http://pastebin.mozilla.org/1230046
[creation] http://pastebin.mozilla.org/1230047
[large-array-creation] http://pastebin.mozilla.org/1230049
[bocoup] http://pastebin.mozilla.org/1230050
[skinning] http://pastebin.mozilla.org/1230052

System:
Darwin macironik.local 10.7.4 Darwin Kernel Version 10.7.4: Mon Apr 18 21:24:17 PDT 2011; root:xnu-1504.14.12~3/RELEASE_X86_64 x86_64
8G RAM, 2.3 Ghz quad core i7

Without patches
--------------
Access (3 run mean)
--------------
Plain and simple
42.0363333333
--------------
TM
2.86466666667
--------------
JM
14.9576666667
--------------
TM, JM (with loop profiling)
14.9523333333
--------------
Creation (3 run mean)
--------------
Plain and simple
2.29966666667
--------------
TM
1.806
--------------
JM
1.95866666667
--------------
TM, JM (with loop profiling)
1.816
--------------
Large arrays (3 run mean)
--------------
Plain and simple
5.34633333333
--------------
TM
4.51233333333
--------------
JM
5.29433333333
--------------
TM, JM (with loop profiling)
4.65266666667
--------------
Bocoup (1 run) (time in ms, smaller is better)
--------------
Plain and simple
Write: 9736
Read: 11742
Loop Copy: 13262
Slice Copy: 192
--------------
TM
Write: 618
Read: 539
Loop Copy: 816
Slice Copy: 199
--------------
JM
Write: 1279
Read: 1450
Loop Copy: 1878
Slice Copy: 200
--------------
TM, JM (with loop profiling)
Write: 691
Read: 1510
Loop Copy: 840
Slice Copy: 203
--------------
Skinning (Skinned vertices per second) (1 run) (larger is better)
--------------
Plain and simple
294895.676099
--------------
TM
8700845.82085
--------------
JM
1374831.25382
--------------
TM, JM (with loop profiling)
8030652.68065


With patches
--------------
Access (3 run mean)
--------------
Plain and simple
42.4646666667
--------------
TM
2.856
--------------
JM
7.53233333333
--------------
TM, JM (with loop profiling)
7.608
--------------
Creation (3 run mean)
--------------
Plain and simple
1.37533333333
--------------
TM
0.947
--------------
JM
1.075
--------------
TM, JM (with loop profiling)
0.954333333333
--------------
Large arrays (3 run mean)
--------------
Plain and simple
4.64133333333
--------------
TM
4.56033333333
--------------
JM
4.61
--------------
TM, JM (with loop profiling)
4.64733333333
--------------
Bocoup (1 run) (time in ms, smaller is better)
--------------
Plain and simple
Write: 8997
Read: 11066
Loop Copy: 12567
Slice Copy: 191
--------------
TM
Write: 610
Read: 541
Loop Copy: 794
Slice Copy: 192
--------------
JM
Write: 1237
Read: 1429
Loop Copy: 1931
Slice Copy: 197
--------------
TM, JM (with loop profiling)
Write: 673
Read: 1492
Loop Copy: 818
Slice Copy: 190
--------------
Skinning (Skinned vertices per second) (1 run) (larger is better)
--------------
Plain and simple
285527.916846
--------------
TM
8877042.95704
--------------
JM
1448983.59264
--------------
TM, JM (with loop profiling)
8140000.0


Reproducible: Always
Created attachment 539271 [details] [diff] [review]
Store typed array properties in inline slots
Attachment #539271 - Flags: review?(mrbkap)
Created attachment 539272 [details] [diff] [review]
Fast length access in JM
Created attachment 539273 [details] [diff] [review]
Fix TypedArray API changes

Order of patch application is:
1. Apply patches from 656519
2. Apply "Store typed array properties in inline slots"
3. Apply "Fast length access in JM"
4. Apply "Fix TypedArray API changes"
Jandem, how does this affect bug 663485?
(In reply to comment #4)
> Jandem, how does this affect bug 663485?

Bug 663485 inlines JSOP_LENGTH on typed arrays with TI enabled, the length patch here does this in the IC, it's good to have both. The "inline slots" patch does affect bug 663485 but, looking at the JM/TM changes, rebasing will be straight-forward.

Nikhil, in bug 663485 we want to optimize typed array access based on type inference. In this case we don't have an actual typed array but we know the typed array type (int8, uint16 etc). The only (minor) change outside JM is changing slotWidth() so that we can calculate the slot width based on the type, without needing an actual typed array object.

Are you planning other typed array changes?

Updated

6 years ago
Assignee: general → nsm.nikhil
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Btw I don't think this will affect the performance numbers in bug 663485 much since we'll have to generate the same number of instructions. I think this is expected since bug 656519 and this bug are mostly about speeding up array *creation*. It's hard to say though without measuring.
Jandem,

There are no more major planned changes in the TypedArray APIs or implementation. One thing remaining is to try and store the ArrayBuffer data in the inline slots of the TypedArray when the size is really small. This should not affect JIT or TI since the constructors will handle making the data field point to the right place.
Blocks: 650657
Depends on: 656519
Created attachment 539857 [details] [diff] [review]
Store typed array properties in inline slots

I may have changed some things due to bug 664577, just being safe by updating
Attachment #539271 - Attachment is obsolete: true
Attachment #539271 - Flags: review?(mrbkap)
Created attachment 539859 [details] [diff] [review]
Fast length access in JM
Attachment #539272 - Attachment is obsolete: true
Created attachment 539860 [details] [diff] [review]
Fix TypedArray API changes

Update to use TypedArray non-inline public API
Attachment #539273 - Attachment is obsolete: true
Attachment #539857 - Flags: review?(mrbkap)
Attachment #539859 - Flags: review?(mrbkap)
Attachment #539860 - Flags: review?(mrbkap)
Created attachment 540894 [details] [diff] [review]
Store typed array properties in inline slots
Attachment #539857 - Attachment is obsolete: true
Attachment #539857 - Flags: review?(mrbkap)
Attachment #540894 - Flags: review?(mrbkap)
Created attachment 540896 [details] [diff] [review]
Fast length access in JM

updated to apply against latest tracemonkey
Attachment #539859 - Attachment is obsolete: true
Attachment #539859 - Flags: review?(mrbkap)
Attachment #540896 - Flags: review?(mrbkap)
Created attachment 540901 [details] [diff] [review]
Fix TypedArray API changes

updated to apply against latest tracemonkey. nsContentUtils.cpp has dropped structure clone and the use of typed arrays.
Attachment #539860 - Attachment is obsolete: true
Attachment #539860 - Flags: review?(mrbkap)
Attachment #540901 - Flags: review?(mrbkap)
bump
Comment on attachment 540894 [details] [diff] [review]
Store typed array properties in inline slots

Review of attachment 540894 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty good. Why is getField necessary, though? It seems like you could use getSlot with regular JS values.

::: js/src/jstypedarray.cpp
@@ +909,5 @@
>              return NULL;
>  
> +        *(uint32*)getField(obj, FIELD_TYPE) = ArrayTypeID();
> +
> +        do {

I don't think you need this do {} while(0); loop here. You can just use braces for scoping.

@@ +1702,5 @@
>      NULL,           /* call        */                                          \
>      NULL,           /* construct   */                                          \
>      NULL,           /* xdrObject   */                                          \
>      NULL,           /* hasInstance */                                          \
> +    _typedArray::obj_trace,           /* trace       */                                          \

I think this backslash is no longer properly lined up (though this patch view doesn't show it).

::: js/src/jstypedarray.h
@@ +173,5 @@
>      static JSBool obj_getAttributes(JSContext *cx, JSObject *obj, jsid id, uintN *attrsp);
>  
>      static JSBool obj_setAttributes(JSContext *cx, JSObject *obj, jsid id, uintN *attrsp);
>  
> +    static void * getField(const JSObject *obj, int field);

Nit (here and below): the * should be next to the function name.

::: js/src/jstypedarrayinlines.h
@@ +51,5 @@
>      return *((JSUint32*) obj->slots);
>  }
>  
>  inline uint8 *
>  ArrayBuffer::getDataOffset(JSObject *obj) {

Pre-existing, but the { should be on its own line.

@@ +65,5 @@
> +    return (void *) base;
> +}
> +
> +inline JSUint32
> +TypedArray::getLength(const JSObject *obj) { return *((JSUint32 *) getField(obj, FIELD_LENGTH)); }

Even for these small inline functions, we should probably use the

functionname(args)
{
    return stuff;
}

style.
Attachment #540894 - Flags: review?(mrbkap) → review+
Comment on attachment 540896 [details] [diff] [review]
Fast length access in JM

Passing this review off to dmandelin since I don't know enough JaegerMonkey stuff to do it.
Attachment #540896 - Flags: review?(mrbkap) → review?(dmandelin)

Updated

6 years ago
Attachment #540901 - Flags: review?(mrbkap) → review+
Comment on attachment 540896 [details] [diff] [review]
Fast length access in JM

Review of attachment 540896 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with nit fixed.

::: js/src/methodjit/PolyIC.cpp
@@ +853,5 @@
> +        masm.loadObjClass(pic.objReg, pic.shapeReg);
> +        Jump notArray = masm.testClass(Assembler::NotEqual, pic.shapeReg, obj->getClass());
> +
> +        masm.loadPtr(Address(pic.objReg, offsetof(JSObject, slots)), pic.objReg);
> +        masm.load32(Address(pic.objReg, sizeof(js::Value)*js::TypedArray::FIELD_LENGTH), pic.objReg);

Nit: the js:: aren't needed here because of the 'using namespace js' in this file.
Attachment #540896 - Flags: review?(dmandelin) → review+
Created attachment 545264 [details] [diff] [review]
Store typed array properties in inline slots

Fixes nits, removes getField.

Properties are stored in slots except pointers to buffer and data which are stored raw due to two-byte alignment requirement. (patch has details)

JM improvement patch has been folded into this patch after nit fix, approved by dmandelin.
Attachment #540894 - Attachment is obsolete: true
Attachment #540896 - Attachment is obsolete: true
Attachment #545264 - Flags: review?(mrbkap)
Created attachment 545267 [details] [diff] [review]
Fix TypedArray API changes

Drop fields here as well.
Attachment #540901 - Attachment is obsolete: true
Attachment #545267 - Flags: review?(mrbkap)
Created attachment 545269 [details] [diff] [review]
Fix TypedArray API changes
Attachment #545267 - Attachment is obsolete: true
Attachment #545267 - Flags: review?(mrbkap)
Attachment #545269 - Flags: review?(mrbkap)

Updated

6 years ago
Attachment #545264 - Flags: review?(mrbkap) → review+

Updated

6 years ago
Attachment #545269 - Flags: review?(mrbkap) → review+
Created attachment 549527 [details] [diff] [review]
Store typed array properties in inline slots

properties are stored in slots, but as slots (using jsval methods) instead of treating it as raw memory.
byteOffset is not directly added but done every time.
TM and JM have been modified appropriately
Attachment #545264 - Attachment is obsolete: true
Attachment #549527 - Flags: review?(mrbkap)
Created attachment 549529 [details] [diff] [review]
Store typed array properties in inline slots

Attached the wrong patch before.
Attachment #549527 - Attachment is obsolete: true
Attachment #549527 - Flags: review?(mrbkap)
Attachment #549529 - Flags: review?(mrbkap)
Created attachment 549533 [details] [diff] [review]
Fix TypedArray API changes
Attachment #545269 - Attachment is obsolete: true
Attachment #549533 - Flags: review?(mrbkap)
Blake, I took a look at the interdiff, and while it seems plausible as adjustments to Bill's patchwork, it's pretty clear to me that without having looked closely at the other 80K of changes I shouldn't do an interdiff review here.  :-\  (Much as I might want to say good enough and do it anyway.)  So I'll wait for you to get back to this on Monday, I think.

Updated

6 years ago
Attachment #549529 - Flags: review?(mrbkap) → review+

Updated

6 years ago
Attachment #549533 - Flags: review?(mrbkap) → review+

Comment 25

6 years ago
Backed out while investigating a webgl permaorange on mozilla-inbound.
And since things went green after the backouts, this bug does seem to be the regressor.

Since backing out the two changes attributed to this bug resulted in bustage, the eventual changeset to fix this should include both this bug and bug 667047, in a single revision, looks like.
I can reproduce the two failures on my Windows box.  The failures are in:

http://hg.mozilla.org/integration/mozilla-inbound/file/8e4e181a9ce9/content/canvas/test/webgl/conformance/draw-elements-out-of-bounds.html

on lines 83 and 87.  If I turn on webgl.verbose and futz with the test, the first problem gives this verbose error:

DrawElements: bound vertex attribute buffers do not have sufficient size for given indices from the bound element array

That corresponds to:

http://hg.mozilla.org/mozilla-central/annotate/982a5835fba1/content/canvas/src/WebGLContextGL.cpp#l1558

The second problem hits that as well, but I don't know WebGL nearly well enough to say whether the munging I did to that test file to get a usable line number from it would mean that second error message is bogus.  (I copied the evalstr for line 83 into a new line inserted between 82/83, and I did the same for 87 between 86/87.)
For the first problem, that case is hit when checked_maxIndexInSubArrayPlusOne is 4 and maxAllowedCount is 3.  That's because checked_maxIndexPlusOne is 4.  That's because maxIndex is 3.  And I'm guessing based on the name that that really should be 2.  Still looking at this, fumbling my way through slightly.  :-)
Okay, I think the reason for my guess was wrong, but I think I see the problem.  It looks like mBoundElementArrayBuffer->FindMaxUbyteElement(), if I trace a little, is returning 3 because mBoundElementArrayBuffer->mData is pointed at the start of new Uint8Array([3, 0, 1, 2]).subarray(1), rather than one into it.
Which points to WebGLContext::BufferData_array(WebGLenum target, JSObject *wa, WebGLenum usage), which points to this:

    GLenum error = CheckedBufferData(target,
                                     JS_GetTypedArrayByteLength(wa),
                                     JS_GetTypedArrayData(wa),
                                     usage);

Which says that JS_GetTypedArrayData is either not returning the right thing, or it's not being used correctly or something.  Blake isn't sure exactly what the semantics of this stuff was supposed to be, offhand, and says he's going to figure it out soon.  :-)
I think JS_GetTypedArrayData isn't returning the right thing, looking at the patches that landed here a little more closely.  Blake said he'd check for certain after a dentist appointment this afternoon.
http://hg.mozilla.org/mozilla-central/rev/794fe3408d3a
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
http://hg.mozilla.org/mozilla-central/rev/025d0712bfce
Part of the typed array IC changes here look broken to me.

    1.32 +    Address data_base(objReg, sizeof(Value) * js::TypedArray::FIELD_DATA);
    1.33 +    masm.loadPrivate(data_base, objReg);
    1.34 +
    1.35 +    JSObject *tarray = js::TypedArray::getTypedArray(obj);
    1.36 +    int shift = js::TypedArray::slotWidth(tarray);
    1.37 +
    1.38 +    int byteOffset = js::TypedArray::getByteOffset(tarray);
    1.39 +    masm.addPtr(Imm32(byteOffset), objReg);

There has only been a clasp guard previously, and the byte offset of a typed array can vary across instances of the same kind of array.

Why doesn't the typed array have a pointer directly to the base of its data, so that the offset doesn't need to be added every single time the array is accessed?  Also, is there documentation on how typed arrays are laid out internally?  I'm trying to merge this stuff into the TI branch and grubbing through the code to figure out what's going on is annoying.
Depends on: 677743

Comment 35

6 years ago
This patch breaks pdf.js. We don't have a reduced test case, but something is clearly borked here. I will back out and post instructions how to reproduce the failure (its pretty easy).

Comment 36

6 years ago
Go to this page. Works with an 8/3 nightly, broken with this patch (paper isn't rendered).

andreasgal.github.com/pdf.js/web/viewer.html

Backing out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 37

6 years ago
Actually, I am knee deep in conflict hell with the backout so I will file a dependent bug instead.

Updated

6 years ago
Blocks: 677854

Updated

6 years ago
No longer blocks: 677854

Updated

6 years ago
Depends on: 677854

Updated

6 years ago
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED

Updated

5 years ago
Depends on: 736609
You need to log in before you can comment on or make changes to this bug.