Closed
Bug 580752
Opened 14 years ago
Closed 14 years ago
TM: optimize setelem
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: gal, Assigned: n.nethercote)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(2 files, 3 obsolete files)
18.79 KB,
patch
|
Details | Diff | Splinter Review | |
18.79 KB,
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
We currently do 2809193 array stores in SS. Each of these is a builtin call. 2493752 of these (88%) overwrite an existing value and don't require a proto walk to deal with holes. Only 29876 of these calls grows the array (1%). We should be able to emit a lot more specialized code here.
Comment 1•14 years ago
|
||
Does JM need a version of this bug?
Not yet, I think - we have one on a specific array-related slowdown though, bug 580355
Assignee | ||
Comment 3•14 years ago
|
||
(In reply to comment #0) > 2493752 of these (88%) overwrite an existing value and don't require a proto > walk to deal with holes. Only 29876 of these calls grows the array (1%). We > should be able to emit a lot more specialized code here. That would be nice but do you have any suggestions how?
Reporter | ||
Comment 4•14 years ago
|
||
4.5% speedup on SS
Comment 5•14 years ago
|
||
Comment on attachment 459305 [details] [diff] [review] patch >diff --git a/js/src/jsarray.cpp b/js/src/jsarray.cpp >--- a/js/src/jsarray.cpp >+++ b/js/src/jsarray.cpp >@@ -482,16 +482,30 @@ SetArrayElement(JSContext *cx, JSObject > if (!IndexToId(cx, obj, index, NULL, idr.addr(), JS_TRUE)) > return JS_FALSE; > JS_ASSERT(!JSID_IS_VOID(idr.id())); > > Value tmp = v; > return obj->setProperty(cx, idr.id(), &tmp); > } > >+#ifdef JS_TRACER >+JSBool JS_FASTCALL >+js_EnsureDenseArrayCapacity(JSContext *cx, JSObject *obj, jsint i) >+{ >+ jsuint u = jsuint(i); >+ jsuint capacity = obj->getDenseArrayCapacity(); >+ if ((u >= capacity) && (INDEX_TOO_SPARSE(obj, u) || !obj->ensureDenseArrayElements(cx, u + 1))) >+ return false; >+ return true; Those gratuitous parens won't save you :-P, and they just help hide the fact that you have a silent failure (false return with no error reported or exception thrown). It would be much more clear that the false return means an error if you invert and put predicted/prefetched best foot forward: if (u < capacity) return true; if (INDEX_TOO_SPARSE(obj, u)) return true; return obj->ensureDenseArrayElements(cx, u + 1); Righteous speedup! /be
Assignee | ||
Comment 6•14 years ago
|
||
I'll look at this in the morning. In the meantime, a short description of the patch would be nice!
Reporter | ||
Comment 7•14 years ago
|
||
The patch inlines the array set code. I branch around a call to the grow call, which is only needed in 1% of the invocations. This is about 5% slower than guarding but manages with a single trace (guarding would need a 2nd trace). This patch depends on the removal of MinLenCap and Count.
Assignee | ||
Comment 8•14 years ago
|
||
I wanted to try this patch. I applied it on top of the patches from bug 580846 and bug 580877 but it didn't apply cleanly. I think you have a patch removing js_UnboxDouble in your patch stack below this one? If you could update this patch to account for changes to the two preceding bugs that'd be great! I really want to see what the LIR looks like.
Reporter | ||
Comment 9•14 years ago
|
||
Attachment #459305 -
Attachment is obsolete: true
Reporter | ||
Updated•14 years ago
|
Attachment #460393 -
Flags: review?(nnethercote)
Reporter | ||
Comment 10•14 years ago
|
||
With this patch I am seeing sub 400ms times on my laptop now. Exciting. ============================================ RESULTS (means and 95% confidence intervals) -------------------------------------------- Total: 398.8ms +/- 0.4% --------------------------------------------
Assignee | ||
Comment 11•14 years ago
|
||
Patch still doesn't apply cleanly to TM tip. Do you still have UnboxDouble removal code in there?
Reporter | ||
Comment 12•14 years ago
|
||
Yes, dependent bug is linked.
Assignee | ||
Comment 13•14 years ago
|
||
The patch for bug 580877 has landed on TM. The patch for bug 582081 has not, but AFAICT isn't what's causing this patch to fail. I suspect the patch for bug 580534 is the problem, ie. that you have it in your workspace, but you haven't marked that as blocking this bug. In general, if I write a patch that applies on top of a patch in another bug, I'll mention that and explain the sequencing. I won't expect a reviewer to work out which of my current in-flight patches need to be applied first, and I don't think others should have to work that out. Am I being unreasonable?
Reporter | ||
Comment 14•14 years ago
|
||
Sorry. Other patch should be in your queue too. I meant to mark it as dependent. Not at my desktop right now.
Reporter | ||
Updated•14 years ago
|
Summary: TM: optimize array stores → TM: optimize setelem
Reporter | ||
Comment 15•14 years ago
|
||
Attachment #460393 -
Attachment is obsolete: true
Attachment #460393 -
Flags: review?(nnethercote)
Reporter | ||
Updated•14 years ago
|
Attachment #460984 -
Flags: review?(nnethercote)
Reporter | ||
Comment 16•14 years ago
|
||
580534 must be applied first
Assignee | ||
Comment 17•14 years ago
|
||
Comment on attachment 460984 [details] [diff] [review] patch I get 20 trace-tests failures on x86 and 28 on x86-64. I applied the patch on top of the patch from 580534. I'll review again when these failures are fixed.
Attachment #460984 -
Flags: review?(nnethercote) → review-
Assignee | ||
Comment 18•14 years ago
|
||
This bug now depends on bug 585525.
Assignee | ||
Comment 19•14 years ago
|
||
There's a bug in the patch -- Array.length isn't updated when required. I'm working on fixing it now.
Assignee | ||
Comment 20•14 years ago
|
||
I'm attaching two patches. Both pass trace-tests. They only differ in how they handle setting a hole element -- the first one deals with it entirely on-trace, the second one falls back to a helper function. The performance of the two is very similar, about a 1.045x speedup on 32-bit (with TM+JM), and less on 64-bit. I'm leaning towards the second one because I find the LIR generated by the first one long-winded and hard to read, and also because there's potential to improve the code generated when we have a C call on a slow path. Here's example code before either patch: # isDenseArray? ldi2 = ldi.o $global0[4] clasp = immi 0x8338a00 guard(class is Array) = eqi ldi2, clasp/*0x8338a00*/ xf2: xf guard(class is Array) -> pc=0x94e1c82 imacpc=(nil) sp+24 rp+0 (GuardID=002) # set elem, via a C call allocp1 = allocp 8 tag = immi 0xffff0005 sti.o allocp1[4] = tag/*0xffff0005*/ sti.o allocp1[0] = ATOM_TO_STRING(atom)/*0x833cee8*/ js_Array_dense_setelem1 = calli.all #js_Array_dense_setelem ( cx $global0 ldi1 allocp1 ) eqi1 = eqi js_Array_dense_setelem1, immi3/*0*/ xt1: xt eqi1 -> pc=0x94e1c82 imacpc=(nil) sp+24 rp+0 (GuardID=003) Here's an example with the first patch: # isDenseArray? ldi6 = ldi.o $global0[4] clasp = immi 0x83389e0 guard(class is Array) = eqi ldi6, clasp/*0x83389e0*/ xf5: xf guard(class is Array) -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=002) # check capacity, grow if necessary capacity = ldi.o $global0[40] ltui2 = ltui ldi1, capacity jt ltui2 -> label3 js_EnsureDenseArrayCapacity1 = calli.all #js_EnsureDenseArrayCapacity ( cx $global0 ldi1 ) eqi2 = eqi js_EnsureDenseArrayCapacity1, immi8/*0*/ xt3: xt eqi2 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=003) # get address of the element label3: dslots = ldi.o $global0[24] immi7 = immi 3 lshi1 = lshi ldi1, immi7/*3*/ addi1 = addi dslots, lshi1 # if element is a hole... JSVAL_TAG_MAGIC = immi 0xffff0004 ldi5 = ldi.o addi1[4] eqi1 = eqi ldi5, JSVAL_TAG_MAGIC/*0xffff0004*/ jf eqi1 -> label2 # ...check the proto chain has no indexed properties... ldi4 = ldi.o $global0[16] immi4 = immi 0 guard(proto-not-null) = eqi ldi4, immi4/*0*/ xt2: xt guard(proto-not-null) -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=004) objShape = ldi.o ldi4[12] immi6 = immi 212 guard(shape) = eqi objShape, immi6/*212*/ xf4: xf guard(shape) -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=005) ldi3 = ldi.o ldi4[16] guard(proto-not-null)2 = eqi ldi3, immi4/*0*/ xt1: xt guard(proto-not-null)2 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=006) objShape2 = ldi.o ldi3[12] immi5 = immi 173 guard(shape)2 = eqi objShape2, immi5/*173*/ xf3: xf guard(shape)2 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=007) ldi2 = ldi.o ldi3[16] guard(proto-not-null)3 = eqi ldi2, immi4/*0*/ xf2: xf guard(proto-not-null)3 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=008) # ...and update .length if necessary length = ldi.o $global0[32] ltui1 = ltui ldi1, length jt ltui1 -> label2 immi3 = immi 1 addi2 = addi ldi1, immi3/*1*/ sti.o $global0[32] = addi2 # actually do the setelem label2: tag = immi 0xffff0005 sti.o addi1[4] = tag/*0xffff0005*/ sti.o addi1[0] = ATOM_TO_STRING(atom)/*0x833cec8*/ It would be nice if that proto-chain walk could be optimized somehow. Here's an example of the hole-related part with the second patch: # if element is a hole... # ...check the proto chain has no indexed properties... # ...and update .length if necessary JSVAL_TAG_MAGIC = immi 0xffff0004 ldi2 = ldi.o addi1[4] eqi2 = eqi ldi2, JSVAL_TAG_MAGIC/*0xffff0004*/ jf eqi2 -> label2 hasNoIndexedProperties = calli.all #js_Array_dense_setelem_hole ( cx $global0 mulxovi1 ) immi3 = immi 0 eqi1 = eqi hasNoIndexedProperties, immi3/*0*/ xt1: xt eqi1 -> pc=0x9369c89 imacpc=(nil) sp+24 rp+0 (GuardID=006) This patch also removes the dependency on bug 580534, which wasn't fundamental but just a reflection of Gal's queue at a particular point in time. Here are Cachegrind results. The timings reflected these fairly closely, but with a bit more noise. I don't include crypto-aes in my Cachegrind results because its non-deterministic but it gets about a 1.04x speedup on timings. --------------------------------------------------------------- | millions of instructions executed | | total | on-trace (may overestimate) | --------------------------------------------------------------- | 106.795 109.805 (0.973x) | 44.765 45.176 (0.991x) | 3d-cube | 56.590 46.433 (1.219x) | 22.315 24.778 (0.901x) | 3d-morph | 116.183 116.985 (0.993x) | 43.885 43.909 (0.999x) | 3d-raytrace | 73.028 73.026 (------) | 15.220 15.220 (------) | access-binary- | 130.540 108.369 (1.205x) | 79.768 89.534 (0.891x) | access-fannkuc | 37.455 37.454 (------) | 16.430 16.430 (------) | access-nbody | 59.788 49.928 (1.197x) | 27.009 29.741 (0.908x) | access-nsieve | 15.956 15.954 (------) | 2.863 2.863 (------) | bitops-3bit-bi | 43.480 43.479 (------) | 30.281 30.281 (------) | bitops-bits-in | 22.978 22.977 (------) | 10.219 10.219 (------) | bitops-bitwise | 61.859 54.633 (1.132x) | 36.551 40.406 (0.905x) | bitops-nsieve- | 49.009 49.008 (------) | 19.438 19.438 (------) | controlflow-re | 47.551 47.398 (1.003x) | 5.916 6.110 (0.968x) | crypto-md5 | 30.243 30.060 (1.006x) | 6.741 6.952 (0.970x) | crypto-sha1 | 98.330 98.329 (------) | 23.053 23.053 (------) | date-format-to | 83.673 83.671 (------) | 9.747 9.747 (------) | date-format-xp | 53.289 53.287 (------) | 30.317 30.317 (------) | math-cordic | 29.439 29.438 (------) | 6.137 6.137 (------) | math-partial-s | 30.573 30.599 (0.999x) | 12.941 12.973 (0.998x) | math-spectral- | 58.519 58.517 (------) | 34.592 34.592 (------) | regexp-dna | 39.539 39.538 (------) | 9.220 9.220 (------) | string-base64 | 116.996 115.498 (1.013x) | 24.787 25.125 (0.987x) | string-fasta | 138.698 138.697 (------) | 17.588 17.588 (------) | string-tagclou | 180.813 181.014 (0.999x) | 21.896 21.896 (------) | string-unpack- | 51.365 51.363 (------) | 8.346 8.346 (------) | string-validat ------- | 1732.702 1685.473 (1.028x) | 560.038 580.064 (0.965x) | all For the ones like 3d-cube where we do worse it's due to extra time in Nanojit compiling the additional code.
Assignee | ||
Comment 21•14 years ago
|
||
Assignee: gal → nnethercote
Attachment #460984 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #471425 -
Flags: feedback?(gal)
Assignee | ||
Comment 22•14 years ago
|
||
Attachment #471426 -
Flags: feedback?(gal)
Assignee | ||
Comment 23•14 years ago
|
||
Something else worth mentioning: this patch might make the patch in bug 542905 worthwhile, as it exposes more internal array access operations to CSE, but there are lots of branches involved.
Assignee | ||
Comment 24•14 years ago
|
||
I'm nominating this for blocking2.0 because a 4% speedup on SunSpider for a single patch is pretty impressive.
blocking2.0: --- → ?
Reporter | ||
Comment 25•14 years ago
|
||
Nick thanks so much for stealing this from me. You rock.
Assignee | ||
Updated•14 years ago
|
Attachment #471426 -
Flags: feedback?(gal) → review?(gal)
Assignee | ||
Comment 26•14 years ago
|
||
(In reply to comment #25) > Nick thanks so much for stealing this from me. You rock. Thanks! Do you want to review? It's mostly your code so it shouldn't take long :)
Comment 27•14 years ago
|
||
Andreas is on varcaton or something like that (I didn't understand what he was talking about), so someone else might want to steal the review.
Reporter | ||
Comment 28•14 years ago
|
||
Comment on attachment 471426 [details] [diff] [review] njn's 2nd patch (against TM 52722:b360dd204dc6) >-static JS_ALWAYS_INLINE JSBool FASTCALL >-dense_grow(JSContext* cx, JSObject* obj, jsint i, const Value &v) >+JSBool FASTCALL >+js_Array_dense_setelem_hole(JSContext* cx, JSObject* obj, jsint i) > { >- JS_ASSERT(obj->isDenseArray()); >+ if (js_PrototypeHasIndexedProperties(cx, obj)) >+ return false; > >- /* >- * Let the interpreter worry about negative array indexes. >- */ >- JS_ASSERT((MAX_DSLOTS_LENGTH > MAX_DSLOTS_LENGTH32) == (sizeof(intptr_t) != sizeof(uint32))); >- if (MAX_DSLOTS_LENGTH > MAX_DSLOTS_LENGTH32) { >- /* >- * Have to check for negative values bleeding through on 64-bit machines only, >- * since we can't allocate large enough arrays for this on 32-bit machines. >- */ >- if (i < 0) >- return JS_FALSE; >- } >- >- /* >- * If needed, grow the array as long it remains dense, otherwise fall off trace. >- */ > jsuint u = jsuint(i); >- jsuint capacity = obj->getDenseArrayCapacity(); >- if ((u >= capacity) && (INDEX_TOO_SPARSE(obj, u) || !obj->ensureDenseArrayElements(cx, u + 1))) >- return JS_FALSE; >- >- if (obj->getDenseArrayElement(u).isMagic()) { >- if (js_PrototypeHasIndexedProperties(cx, obj)) >- return JS_FALSE; >- >- if (u >= obj->getArrayLength()) >- obj->setArrayLength(u + 1); >- } >- >- obj->setDenseArrayElement(u, v); >- return JS_TRUE; >+ if (u >= obj->getArrayLength()) >+ obj->setArrayLength(u + 1); >+ return true; We could inline this and then only call the above check. Separate bug if you prefer or no bug if you think its stupid.
Attachment #471426 -
Flags: review?(gal) → review+
Assignee | ||
Comment 29•14 years ago
|
||
I figure if we have to call a C function, there's no point doing part of the work in LIR, that's just more LIR instructions to process.
Reporter | ||
Comment 30•14 years ago
|
||
We could easily inline that though. For arrays its always exactly 2 loops. There is a loop for this in the array read path.
Assignee | ||
Updated•14 years ago
|
Attachment #471425 -
Flags: feedback?(gal)
Assignee | ||
Comment 31•14 years ago
|
||
(In reply to comment #30) > We could easily inline that though. For arrays its always exactly 2 loops. > There is a loop for this in the array read path. See comment 20 -- I liked the C call better. It's slower when we hit a hole, but we end up with a lot less LIR to compile. For SunSpider inlining the code makes some tests slightly faster and some slightly slower, and it ends up a wash. http://hg.mozilla.org/tracemonkey/rev/03fbde5c25b6
Whiteboard: fixed-in-tracemonkey
Updated•14 years ago
|
blocking2.0: ? → ---
Assignee | ||
Comment 32•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/03fbde5c25b6
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•