Closed
Bug 580752
Opened 15 years ago
Closed 15 years ago
TM: optimize setelem
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: gal, Assigned: n.nethercote)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(2 files, 3 obsolete files)
18.79 KB,
patch
|
Details | Diff | Splinter Review | |
18.79 KB,
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
We currently do 2809193 array stores in SS. Each of these is a builtin call. 2493752 of these (88%) overwrite an existing value and don't require a proto walk to deal with holes. Only 29876 of these calls grows the array (1%). We should be able to emit a lot more specialized code here.
![]() |
||
Comment 1•15 years ago
|
||
Does JM need a version of this bug?
Not yet, I think - we have one on a specific array-related slowdown though, bug 580355
![]() |
Assignee | |
Comment 3•15 years ago
|
||
(In reply to comment #0)
> 2493752 of these (88%) overwrite an existing value and don't require a proto
> walk to deal with holes. Only 29876 of these calls grows the array (1%). We
> should be able to emit a lot more specialized code here.
That would be nice but do you have any suggestions how?
Reporter | ||
Comment 4•15 years ago
|
||
4.5% speedup on SS
Comment 5•15 years ago
|
||
Comment on attachment 459305 [details] [diff] [review]
patch
>diff --git a/js/src/jsarray.cpp b/js/src/jsarray.cpp
>--- a/js/src/jsarray.cpp
>+++ b/js/src/jsarray.cpp
>@@ -482,16 +482,30 @@ SetArrayElement(JSContext *cx, JSObject
> if (!IndexToId(cx, obj, index, NULL, idr.addr(), JS_TRUE))
> return JS_FALSE;
> JS_ASSERT(!JSID_IS_VOID(idr.id()));
>
> Value tmp = v;
> return obj->setProperty(cx, idr.id(), &tmp);
> }
>
>+#ifdef JS_TRACER
>+JSBool JS_FASTCALL
>+js_EnsureDenseArrayCapacity(JSContext *cx, JSObject *obj, jsint i)
>+{
>+ jsuint u = jsuint(i);
>+ jsuint capacity = obj->getDenseArrayCapacity();
>+ if ((u >= capacity) && (INDEX_TOO_SPARSE(obj, u) || !obj->ensureDenseArrayElements(cx, u + 1)))
>+ return false;
>+ return true;
Those gratuitous parens won't save you :-P, and they just help hide the fact that you have a silent failure (false return with no error reported or exception thrown).
It would be much more clear that the false return means an error if you invert and put predicted/prefetched best foot forward:
if (u < capacity)
return true;
if (INDEX_TOO_SPARSE(obj, u))
return true;
return obj->ensureDenseArrayElements(cx, u + 1);
Righteous speedup!
/be
![]() |
Assignee | |
Comment 6•15 years ago
|
||
I'll look at this in the morning. In the meantime, a short description of the patch would be nice!
Reporter | ||
Comment 7•15 years ago
|
||
The patch inlines the array set code. I branch around a call to the grow call, which is only needed in 1% of the invocations. This is about 5% slower than guarding but manages with a single trace (guarding would need a 2nd trace). This patch depends on the removal of MinLenCap and Count.
![]() |
Assignee | |
Comment 8•15 years ago
|
||
I wanted to try this patch. I applied it on top of the patches from bug 580846 and bug 580877 but it didn't apply cleanly. I think you have a patch removing js_UnboxDouble in your patch stack below this one?
If you could update this patch to account for changes to the two preceding bugs that'd be great! I really want to see what the LIR looks like.
Reporter | ||
Comment 9•15 years ago
|
||
Attachment #459305 -
Attachment is obsolete: true
Reporter | ||
Updated•15 years ago
|
Attachment #460393 -
Flags: review?(nnethercote)
Reporter | ||
Comment 10•15 years ago
|
||
With this patch I am seeing sub 400ms times on my laptop now. Exciting.
============================================
RESULTS (means and 95% confidence intervals)
--------------------------------------------
Total: 398.8ms +/- 0.4%
--------------------------------------------
![]() |
Assignee | |
Comment 11•15 years ago
|
||
Patch still doesn't apply cleanly to TM tip. Do you still have UnboxDouble removal code in there?
Reporter | ||
Comment 12•15 years ago
|
||
Yes, dependent bug is linked.
![]() |
Assignee | |
Comment 13•15 years ago
|
||
The patch for bug 580877 has landed on TM. The patch for bug 582081 has not, but AFAICT isn't what's causing this patch to fail. I suspect the patch for bug 580534 is the problem, ie. that you have it in your workspace, but you haven't marked that as blocking this bug.
In general, if I write a patch that applies on top of a patch in another bug, I'll mention that and explain the sequencing. I won't expect a reviewer to work out which of my current in-flight patches need to be applied first, and I don't think others should have to work that out. Am I being unreasonable?
Reporter | ||
Comment 14•15 years ago
|
||
Sorry. Other patch should be in your queue too. I meant to mark it as dependent. Not at my desktop right now.
Reporter | ||
Updated•15 years ago
|
Summary: TM: optimize array stores → TM: optimize setelem
Reporter | ||
Comment 15•15 years ago
|
||
Attachment #460393 -
Attachment is obsolete: true
Attachment #460393 -
Flags: review?(nnethercote)
Reporter | ||
Updated•15 years ago
|
Attachment #460984 -
Flags: review?(nnethercote)
Reporter | ||
Comment 16•15 years ago
|
||
580534 must be applied first
![]() |
Assignee | |
Comment 17•15 years ago
|
||
Comment on attachment 460984 [details] [diff] [review]
patch
I get 20 trace-tests failures on x86 and 28 on x86-64. I applied the patch on top of the patch from 580534. I'll review again when these failures are fixed.
Attachment #460984 -
Flags: review?(nnethercote) → review-
![]() |
Assignee | |
Comment 18•15 years ago
|
||
This bug now depends on bug 585525.
![]() |
Assignee | |
Comment 19•15 years ago
|
||
There's a bug in the patch -- Array.length isn't updated when required. I'm working on fixing it now.
![]() |
Assignee | |
Comment 20•15 years ago
|
||
I'm attaching two patches. Both pass trace-tests. They only differ in how
they handle setting a hole element -- the first one deals with it entirely
on-trace, the second one falls back to a helper function. The performance
of the two is very similar, about a 1.045x speedup on 32-bit (with TM+JM),
and less on 64-bit.
I'm leaning towards the second one because I find the LIR generated by the
first one long-winded and hard to read, and also because there's potential
to improve the code generated when we have a C call on a slow path.
Here's example code before either patch:
# isDenseArray?
ldi2 = ldi.o $global0[4]
clasp = immi 0x8338a00
guard(class is Array) = eqi ldi2, clasp/*0x8338a00*/
xf2: xf guard(class is Array) -> pc=0x94e1c82 imacpc=(nil) sp+24 rp+0 (GuardID=002)
# set elem, via a C call
allocp1 = allocp 8
tag = immi 0xffff0005
sti.o allocp1[4] = tag/*0xffff0005*/
sti.o allocp1[0] = ATOM_TO_STRING(atom)/*0x833cee8*/
js_Array_dense_setelem1 = calli.all #js_Array_dense_setelem ( cx $global0 ldi1 allocp1 )
eqi1 = eqi js_Array_dense_setelem1, immi3/*0*/
xt1: xt eqi1 -> pc=0x94e1c82 imacpc=(nil) sp+24 rp+0 (GuardID=003)
Here's an example with the first patch:
# isDenseArray?
ldi6 = ldi.o $global0[4]
clasp = immi 0x83389e0
guard(class is Array) = eqi ldi6, clasp/*0x83389e0*/
xf5: xf guard(class is Array) -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=002)
# check capacity, grow if necessary
capacity = ldi.o $global0[40]
ltui2 = ltui ldi1, capacity
jt ltui2 -> label3
js_EnsureDenseArrayCapacity1 = calli.all #js_EnsureDenseArrayCapacity ( cx $global0 ldi1 )
eqi2 = eqi js_EnsureDenseArrayCapacity1, immi8/*0*/
xt3: xt eqi2 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=003)
# get address of the element
label3:
dslots = ldi.o $global0[24]
immi7 = immi 3
lshi1 = lshi ldi1, immi7/*3*/
addi1 = addi dslots, lshi1
# if element is a hole...
JSVAL_TAG_MAGIC = immi 0xffff0004
ldi5 = ldi.o addi1[4]
eqi1 = eqi ldi5, JSVAL_TAG_MAGIC/*0xffff0004*/
jf eqi1 -> label2
# ...check the proto chain has no indexed properties...
ldi4 = ldi.o $global0[16]
immi4 = immi 0
guard(proto-not-null) = eqi ldi4, immi4/*0*/
xt2: xt guard(proto-not-null) -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=004)
objShape = ldi.o ldi4[12]
immi6 = immi 212
guard(shape) = eqi objShape, immi6/*212*/
xf4: xf guard(shape) -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=005)
ldi3 = ldi.o ldi4[16]
guard(proto-not-null)2 = eqi ldi3, immi4/*0*/
xt1: xt guard(proto-not-null)2 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=006)
objShape2 = ldi.o ldi3[12]
immi5 = immi 173
guard(shape)2 = eqi objShape2, immi5/*173*/
xf3: xf guard(shape)2 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=007)
ldi2 = ldi.o ldi3[16]
guard(proto-not-null)3 = eqi ldi2, immi4/*0*/
xf2: xf guard(proto-not-null)3 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=008)
# ...and update .length if necessary
length = ldi.o $global0[32]
ltui1 = ltui ldi1, length
jt ltui1 -> label2
immi3 = immi 1
addi2 = addi ldi1, immi3/*1*/
sti.o $global0[32] = addi2
# actually do the setelem
label2:
tag = immi 0xffff0005
sti.o addi1[4] = tag/*0xffff0005*/
sti.o addi1[0] = ATOM_TO_STRING(atom)/*0x833cec8*/
It would be nice if that proto-chain walk could be optimized somehow.
Here's an example of the hole-related part with the second patch:
# if element is a hole...
# ...check the proto chain has no indexed properties...
# ...and update .length if necessary
JSVAL_TAG_MAGIC = immi 0xffff0004
ldi2 = ldi.o addi1[4]
eqi2 = eqi ldi2, JSVAL_TAG_MAGIC/*0xffff0004*/
jf eqi2 -> label2
hasNoIndexedProperties = calli.all #js_Array_dense_setelem_hole ( cx $global0 mulxovi1 )
immi3 = immi 0
eqi1 = eqi hasNoIndexedProperties, immi3/*0*/
xt1: xt eqi1 -> pc=0x9369c89 imacpc=(nil) sp+24 rp+0 (GuardID=006)
This patch also removes the dependency on bug 580534, which wasn't
fundamental but just a reflection of Gal's queue at a particular point in
time.
Here are Cachegrind results. The timings reflected these fairly closely, but
with a bit more noise. I don't include crypto-aes in my Cachegrind results
because its non-deterministic but it gets about a 1.04x speedup on timings.
---------------------------------------------------------------
| millions of instructions executed |
| total | on-trace (may overestimate) |
---------------------------------------------------------------
| 106.795 109.805 (0.973x) | 44.765 45.176 (0.991x) | 3d-cube
| 56.590 46.433 (1.219x) | 22.315 24.778 (0.901x) | 3d-morph
| 116.183 116.985 (0.993x) | 43.885 43.909 (0.999x) | 3d-raytrace
| 73.028 73.026 (------) | 15.220 15.220 (------) | access-binary-
| 130.540 108.369 (1.205x) | 79.768 89.534 (0.891x) | access-fannkuc
| 37.455 37.454 (------) | 16.430 16.430 (------) | access-nbody
| 59.788 49.928 (1.197x) | 27.009 29.741 (0.908x) | access-nsieve
| 15.956 15.954 (------) | 2.863 2.863 (------) | bitops-3bit-bi
| 43.480 43.479 (------) | 30.281 30.281 (------) | bitops-bits-in
| 22.978 22.977 (------) | 10.219 10.219 (------) | bitops-bitwise
| 61.859 54.633 (1.132x) | 36.551 40.406 (0.905x) | bitops-nsieve-
| 49.009 49.008 (------) | 19.438 19.438 (------) | controlflow-re
| 47.551 47.398 (1.003x) | 5.916 6.110 (0.968x) | crypto-md5
| 30.243 30.060 (1.006x) | 6.741 6.952 (0.970x) | crypto-sha1
| 98.330 98.329 (------) | 23.053 23.053 (------) | date-format-to
| 83.673 83.671 (------) | 9.747 9.747 (------) | date-format-xp
| 53.289 53.287 (------) | 30.317 30.317 (------) | math-cordic
| 29.439 29.438 (------) | 6.137 6.137 (------) | math-partial-s
| 30.573 30.599 (0.999x) | 12.941 12.973 (0.998x) | math-spectral-
| 58.519 58.517 (------) | 34.592 34.592 (------) | regexp-dna
| 39.539 39.538 (------) | 9.220 9.220 (------) | string-base64
| 116.996 115.498 (1.013x) | 24.787 25.125 (0.987x) | string-fasta
| 138.698 138.697 (------) | 17.588 17.588 (------) | string-tagclou
| 180.813 181.014 (0.999x) | 21.896 21.896 (------) | string-unpack-
| 51.365 51.363 (------) | 8.346 8.346 (------) | string-validat
-------
| 1732.702 1685.473 (1.028x) | 560.038 580.064 (0.965x) | all
For the ones like 3d-cube where we do worse it's due to extra time in
Nanojit compiling the additional code.
![]() |
Assignee | |
Comment 21•15 years ago
|
||
Assignee: gal → nnethercote
Attachment #460984 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #471425 -
Flags: feedback?(gal)
![]() |
Assignee | |
Comment 22•15 years ago
|
||
Attachment #471426 -
Flags: feedback?(gal)
![]() |
Assignee | |
Comment 23•15 years ago
|
||
Something else worth mentioning: this patch might make the patch in bug 542905 worthwhile, as it exposes more internal array access operations to CSE, but there are lots of branches involved.
![]() |
Assignee | |
Comment 24•15 years ago
|
||
I'm nominating this for blocking2.0 because a 4% speedup on SunSpider for a single patch is pretty impressive.
blocking2.0: --- → ?
Reporter | ||
Comment 25•15 years ago
|
||
Nick thanks so much for stealing this from me. You rock.
![]() |
Assignee | |
Updated•15 years ago
|
Attachment #471426 -
Flags: feedback?(gal) → review?(gal)
![]() |
Assignee | |
Comment 26•15 years ago
|
||
(In reply to comment #25)
> Nick thanks so much for stealing this from me. You rock.
Thanks! Do you want to review? It's mostly your code so it shouldn't take long :)
Comment 27•15 years ago
|
||
Andreas is on varcaton or something like that (I didn't understand what he was talking about), so someone else might want to steal the review.
Reporter | ||
Comment 28•15 years ago
|
||
Comment on attachment 471426 [details] [diff] [review]
njn's 2nd patch (against TM 52722:b360dd204dc6)
>-static JS_ALWAYS_INLINE JSBool FASTCALL
>-dense_grow(JSContext* cx, JSObject* obj, jsint i, const Value &v)
>+JSBool FASTCALL
>+js_Array_dense_setelem_hole(JSContext* cx, JSObject* obj, jsint i)
> {
>- JS_ASSERT(obj->isDenseArray());
>+ if (js_PrototypeHasIndexedProperties(cx, obj))
>+ return false;
>
>- /*
>- * Let the interpreter worry about negative array indexes.
>- */
>- JS_ASSERT((MAX_DSLOTS_LENGTH > MAX_DSLOTS_LENGTH32) == (sizeof(intptr_t) != sizeof(uint32)));
>- if (MAX_DSLOTS_LENGTH > MAX_DSLOTS_LENGTH32) {
>- /*
>- * Have to check for negative values bleeding through on 64-bit machines only,
>- * since we can't allocate large enough arrays for this on 32-bit machines.
>- */
>- if (i < 0)
>- return JS_FALSE;
>- }
>-
>- /*
>- * If needed, grow the array as long it remains dense, otherwise fall off trace.
>- */
> jsuint u = jsuint(i);
>- jsuint capacity = obj->getDenseArrayCapacity();
>- if ((u >= capacity) && (INDEX_TOO_SPARSE(obj, u) || !obj->ensureDenseArrayElements(cx, u + 1)))
>- return JS_FALSE;
>-
>- if (obj->getDenseArrayElement(u).isMagic()) {
>- if (js_PrototypeHasIndexedProperties(cx, obj))
>- return JS_FALSE;
>-
>- if (u >= obj->getArrayLength())
>- obj->setArrayLength(u + 1);
>- }
>-
>- obj->setDenseArrayElement(u, v);
>- return JS_TRUE;
>+ if (u >= obj->getArrayLength())
>+ obj->setArrayLength(u + 1);
>+ return true;
We could inline this and then only call the above check. Separate bug if you prefer or no bug if you think its stupid.
Attachment #471426 -
Flags: review?(gal) → review+
![]() |
Assignee | |
Comment 29•15 years ago
|
||
I figure if we have to call a C function, there's no point doing part of the work in LIR, that's just more LIR instructions to process.
Reporter | ||
Comment 30•15 years ago
|
||
We could easily inline that though. For arrays its always exactly 2 loops. There is a loop for this in the array read path.
![]() |
Assignee | |
Updated•15 years ago
|
Attachment #471425 -
Flags: feedback?(gal)
![]() |
Assignee | |
Comment 31•15 years ago
|
||
(In reply to comment #30)
> We could easily inline that though. For arrays its always exactly 2 loops.
> There is a loop for this in the array read path.
See comment 20 -- I liked the C call better. It's slower when we hit a hole, but we end up with a lot less LIR to compile. For SunSpider inlining the code makes some tests slightly faster and some slightly slower, and it ends up a wash.
http://hg.mozilla.org/tracemonkey/rev/03fbde5c25b6
Whiteboard: fixed-in-tracemonkey
Updated•15 years ago
|
blocking2.0: ? → ---
![]() |
Assignee | |
Comment 32•15 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•