580752 - TM: optimize setelem

Reporter

Description

•

14 years ago

We currently do 2809193 array stores in SS. Each of these is a builtin call. 2493752 of these (88%) overwrite an existing value and don't require a proto walk to deal with holes. Only 29876 of these calls grows the array (1%). We should be able to emit a lot more specialized code here.

Boris Zbarsky [:bzbarsky]

Comment 1

•

14 years ago

Does JM need a version of this bug?

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 2

•

14 years ago

Not yet, I think - we have one on a specific array-related slowdown though, bug 580355

Nicholas Nethercote [inactive]

Assignee

Comment 3

•

14 years ago

(In reply to comment #0)
> 2493752 of these (88%) overwrite an existing value and don't require a proto
> walk to deal with holes. Only 29876 of these calls grows the array (1%). We
> should be able to emit a lot more specialized code here.

That would be nice but do you have any suggestions how?

Andreas Gal :gal

Reporter

Comment 4

•

14 years ago

Attached patch patch (obsolete) — Details — Splinter Review

4.5% speedup on SS

Andreas Gal :gal

Reporter

Updated

•

14 years ago

Depends on: 580877

Brendan Eich [:brendan]

Comment 5

•

14 years ago

Comment on attachment 459305 [details] [diff] [review]
patch

>diff --git a/js/src/jsarray.cpp b/js/src/jsarray.cpp
>--- a/js/src/jsarray.cpp
>+++ b/js/src/jsarray.cpp
>@@ -482,16 +482,30 @@ SetArrayElement(JSContext *cx, JSObject 
>     if (!IndexToId(cx, obj, index, NULL, idr.addr(), JS_TRUE))
>         return JS_FALSE;
>     JS_ASSERT(!JSID_IS_VOID(idr.id()));
> 
>     Value tmp = v;
>     return obj->setProperty(cx, idr.id(), &tmp);
> }
> 
>+#ifdef JS_TRACER
>+JSBool JS_FASTCALL
>+js_EnsureDenseArrayCapacity(JSContext *cx, JSObject *obj, jsint i)
>+{
>+    jsuint u = jsuint(i);
>+    jsuint capacity = obj->getDenseArrayCapacity();
>+    if ((u >= capacity) && (INDEX_TOO_SPARSE(obj, u) || !obj->ensureDenseArrayElements(cx, u + 1)))
>+        return false;
>+    return true;

Those gratuitous parens won't save you :-P, and they just help hide the fact that you have a silent failure (false return with no error reported or exception thrown).

It would be much more clear that the false return means an error if you invert and put predicted/prefetched best foot forward:

    if (u < capacity)
        return true;
    if (INDEX_TOO_SPARSE(obj, u))
        return true;
    return obj->ensureDenseArrayElements(cx, u + 1);    

Righteous speedup!

/be

Nicholas Nethercote [inactive]

Assignee

Comment 6

•

14 years ago

I'll look at this in the morning.  In the meantime, a short description of the patch would be nice!

Andreas Gal :gal

Reporter

Comment 7

•

14 years ago

The patch inlines the array set code. I branch around a call to the grow call, which is only needed in 1% of the invocations. This is about 5% slower than guarding but manages with a single trace (guarding would need a 2nd trace). This patch depends on the removal of MinLenCap and Count.

Nicholas Nethercote [inactive]

Assignee

Comment 8

•

14 years ago

I wanted to try this patch.  I applied it on top of the patches from bug 580846 and bug 580877 but it didn't apply cleanly.  I think you have a patch removing js_UnboxDouble in your patch stack below this one?

If you could update this patch to account for changes to the two preceding bugs that'd be great!  I really want to see what the LIR looks like.

Andreas Gal :gal

Reporter

Comment 9

•

14 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #459305 - Attachment is obsolete: true

Andreas Gal :gal

Reporter

Updated

•

14 years ago

Depends on: 582081

Andreas Gal :gal

Reporter

Updated

•

14 years ago

Attachment #460393 - Flags: review?(nnethercote)

Andreas Gal :gal

Reporter

Comment 10

•

14 years ago

With this patch I am seeing sub 400ms times on my laptop now. Exciting.

============================================
RESULTS (means and 95% confidence intervals)
--------------------------------------------
Total:                  398.8ms +/- 0.4%
--------------------------------------------

Nicholas Nethercote [inactive]

Assignee

Comment 11

•

14 years ago

Patch still doesn't apply cleanly to TM tip.  Do you still have UnboxDouble removal code in there?

Andreas Gal :gal

Reporter

Comment 12

•

14 years ago

Yes, dependent bug is linked.

Nicholas Nethercote [inactive]

Assignee

Comment 13

•

14 years ago

The patch for bug 580877 has landed on TM.  The patch for bug 582081 has not, but AFAICT isn't what's causing this patch to fail.  I suspect the patch for bug 580534 is the problem, ie. that you have it in your workspace, but you haven't marked that as blocking this bug.

In general, if I write a patch that applies on top of a patch in another bug, I'll mention that and explain the sequencing.  I won't expect a reviewer to work out which of my current in-flight patches need to be applied first, and I don't think others should have to work that out.  Am I being unreasonable?

Andreas Gal :gal

Reporter

Comment 14

•

14 years ago

Sorry. Other patch should be in your queue too. I meant to mark it as dependent. Not at my desktop right now.

Andreas Gal :gal

Reporter

Updated

•

14 years ago

Summary: TM: optimize array stores → TM: optimize setelem

Andreas Gal :gal

Reporter

Updated

•

14 years ago

Depends on: 580534

Andreas Gal :gal

Reporter

Comment 15

•

14 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #460393 - Attachment is obsolete: true

Attachment #460393 - Flags: review?(nnethercote)

Andreas Gal :gal

Reporter

Updated

•

14 years ago

Attachment #460984 - Flags: review?(nnethercote)

Andreas Gal :gal

Reporter

Comment 16

•

14 years ago

580534 must be applied first

Nicholas Nethercote [inactive]

Assignee

Comment 17

•

14 years ago

Comment on attachment 460984 [details] [diff] [review]
patch

I get 20 trace-tests failures on x86 and 28 on x86-64.  I applied the patch on top of the patch from 580534.  I'll review again when these failures are fixed.

Attachment #460984 - Flags: review?(nnethercote) → review-

Nicholas Nethercote [inactive]

Assignee

Comment 18

•

14 years ago

This bug now depends on bug 585525.

Nicholas Nethercote [inactive]

Assignee

Comment 19

•

14 years ago

There's a bug in the patch -- Array.length isn't updated when required.  I'm working on fixing it now.

Nicholas Nethercote [inactive]

Assignee

Comment 20

•

14 years ago

I'm attaching two patches.  Both pass trace-tests.  They only differ in how
they handle setting a hole element -- the first one deals with it entirely
on-trace, the second one falls back to a helper function.  The performance
of the two is very similar, about a 1.045x speedup on 32-bit (with TM+JM),
and less on 64-bit.

I'm leaning towards the second one because I find the LIR generated by the
first one long-winded and hard to read, and also because there's potential
to improve the code generated when we have a C call on a slow path.


Here's example code before either patch:

      # isDenseArray?
      ldi2 = ldi.o $global0[4]
      clasp = immi 0x8338a00
      guard(class is Array) = eqi ldi2, clasp/*0x8338a00*/
      xf2: xf guard(class is Array) -> pc=0x94e1c82 imacpc=(nil) sp+24 rp+0 (GuardID=002)

      # set elem, via a C call
      allocp1 = allocp 8
      tag = immi 0xffff0005
      sti.o allocp1[4] = tag/*0xffff0005*/
      sti.o allocp1[0] = ATOM_TO_STRING(atom)/*0x833cee8*/
      js_Array_dense_setelem1 = calli.all #js_Array_dense_setelem ( cx $global0 ldi1 allocp1 )
      eqi1 = eqi js_Array_dense_setelem1, immi3/*0*/
      xt1: xt eqi1 -> pc=0x94e1c82 imacpc=(nil) sp+24 rp+0 (GuardID=003)


Here's an example with the first patch:

      # isDenseArray?
      ldi6 = ldi.o $global0[4]
      clasp = immi 0x83389e0
      guard(class is Array) = eqi ldi6, clasp/*0x83389e0*/
      xf5: xf guard(class is Array) -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=002)

      # check capacity, grow if necessary
      capacity = ldi.o $global0[40]
      ltui2 = ltui ldi1, capacity
      jt ltui2 -> label3
      js_EnsureDenseArrayCapacity1 = calli.all #js_EnsureDenseArrayCapacity ( cx $global0 ldi1 )
      eqi2 = eqi js_EnsureDenseArrayCapacity1, immi8/*0*/
      xt3: xt eqi2 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=003)

      # get address of the element
      label3:
      dslots = ldi.o $global0[24]
      immi7 = immi 3
      lshi1 = lshi ldi1, immi7/*3*/
      addi1 = addi dslots, lshi1

      # if element is a hole...
      JSVAL_TAG_MAGIC = immi 0xffff0004
      ldi5 = ldi.o addi1[4]
      eqi1 = eqi ldi5, JSVAL_TAG_MAGIC/*0xffff0004*/
      jf eqi1 -> label2

      # ...check the proto chain has no indexed properties...
      ldi4 = ldi.o $global0[16]
      immi4 = immi 0
      guard(proto-not-null) = eqi ldi4, immi4/*0*/
      xt2: xt guard(proto-not-null) -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=004)
      objShape = ldi.o ldi4[12]
      immi6 = immi 212
      guard(shape) = eqi objShape, immi6/*212*/
      xf4: xf guard(shape) -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=005)
      ldi3 = ldi.o ldi4[16]
      guard(proto-not-null)2 = eqi ldi3, immi4/*0*/
      xt1: xt guard(proto-not-null)2 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=006)
      objShape2 = ldi.o ldi3[12]
      immi5 = immi 173
      guard(shape)2 = eqi objShape2, immi5/*173*/
      xf3: xf guard(shape)2 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=007)
      ldi2 = ldi.o ldi3[16]
      guard(proto-not-null)3 = eqi ldi2, immi4/*0*/
      xf2: xf guard(proto-not-null)3 -> pc=0x9858c82 imacpc=(nil) sp+24 rp+0 (GuardID=008)

      # ...and update .length if necessary
      length = ldi.o $global0[32]
      ltui1 = ltui ldi1, length
      jt ltui1 -> label2
      immi3 = immi 1
      addi2 = addi ldi1, immi3/*1*/
      sti.o $global0[32] = addi2

      # actually do the setelem
      label2:
      tag = immi 0xffff0005
      sti.o addi1[4] = tag/*0xffff0005*/
      sti.o addi1[0] = ATOM_TO_STRING(atom)/*0x833cec8*/

It would be nice if that proto-chain walk could be optimized somehow.


Here's an example of the hole-related part with the second patch:

      # if element is a hole...
      # ...check the proto chain has no indexed properties...
      # ...and update .length if necessary
      JSVAL_TAG_MAGIC = immi 0xffff0004
      ldi2 = ldi.o addi1[4]
      eqi2 = eqi ldi2, JSVAL_TAG_MAGIC/*0xffff0004*/
      jf eqi2 -> label2
      hasNoIndexedProperties = calli.all #js_Array_dense_setelem_hole ( cx $global0 mulxovi1 )
      immi3 = immi 0
      eqi1 = eqi hasNoIndexedProperties, immi3/*0*/
      xt1: xt eqi1 -> pc=0x9369c89 imacpc=(nil) sp+24 rp+0 (GuardID=006)


This patch also removes the dependency on bug 580534, which wasn't
fundamental but just a reflection of Gal's queue at a particular point in
time.

Here are Cachegrind results.  The timings reflected these fairly closely, but
with a bit more noise.  I don't include crypto-aes in my Cachegrind results
because its non-deterministic but it gets about a 1.04x speedup on timings.

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | on-trace (may overestimate)  |
---------------------------------------------------------------
|   106.795   109.805 (0.973x) |    44.765    45.176 (0.991x) | 3d-cube
|    56.590    46.433 (1.219x) |    22.315    24.778 (0.901x) | 3d-morph
|   116.183   116.985 (0.993x) |    43.885    43.909 (0.999x) | 3d-raytrace
|    73.028    73.026 (------) |    15.220    15.220 (------) | access-binary-
|   130.540   108.369 (1.205x) |    79.768    89.534 (0.891x) | access-fannkuc
|    37.455    37.454 (------) |    16.430    16.430 (------) | access-nbody
|    59.788    49.928 (1.197x) |    27.009    29.741 (0.908x) | access-nsieve
|    15.956    15.954 (------) |     2.863     2.863 (------) | bitops-3bit-bi
|    43.480    43.479 (------) |    30.281    30.281 (------) | bitops-bits-in
|    22.978    22.977 (------) |    10.219    10.219 (------) | bitops-bitwise
|    61.859    54.633 (1.132x) |    36.551    40.406 (0.905x) | bitops-nsieve-
|    49.009    49.008 (------) |    19.438    19.438 (------) | controlflow-re
|    47.551    47.398 (1.003x) |     5.916     6.110 (0.968x) | crypto-md5
|    30.243    30.060 (1.006x) |     6.741     6.952 (0.970x) | crypto-sha1
|    98.330    98.329 (------) |    23.053    23.053 (------) | date-format-to
|    83.673    83.671 (------) |     9.747     9.747 (------) | date-format-xp
|    53.289    53.287 (------) |    30.317    30.317 (------) | math-cordic
|    29.439    29.438 (------) |     6.137     6.137 (------) | math-partial-s
|    30.573    30.599 (0.999x) |    12.941    12.973 (0.998x) | math-spectral-
|    58.519    58.517 (------) |    34.592    34.592 (------) | regexp-dna
|    39.539    39.538 (------) |     9.220     9.220 (------) | string-base64
|   116.996   115.498 (1.013x) |    24.787    25.125 (0.987x) | string-fasta
|   138.698   138.697 (------) |    17.588    17.588 (------) | string-tagclou
|   180.813   181.014 (0.999x) |    21.896    21.896 (------) | string-unpack-
|    51.365    51.363 (------) |     8.346     8.346 (------) | string-validat
-------
|  1732.702  1685.473 (1.028x) |   560.038   580.064 (0.965x) | all


For the ones like 3d-cube where we do worse it's due to extra time in
Nanojit compiling the additional code.

Nicholas Nethercote [inactive]

Assignee

Comment 21

•

14 years ago

Attached patch njn's 1st patch — Details — Splinter Review

Assignee: gal → nnethercote

Attachment #460984 - Attachment is obsolete: true

Status: NEW → ASSIGNED

Attachment #471425 - Flags: feedback?(gal)

Nicholas Nethercote [inactive]

Assignee

Comment 22

•

14 years ago

Attached patch njn's 2nd patch (against TM 52722:b360dd204dc6) — Details — Splinter Review

Attachment #471426 - Flags: feedback?(gal)

Nicholas Nethercote [inactive]

Assignee

Comment 23

•

14 years ago

Something else worth mentioning:  this patch might make the patch in bug 542905 worthwhile, as it exposes more internal array access operations to CSE, but there are lots of branches involved.

Nicholas Nethercote [inactive]

Assignee

Comment 24

•

14 years ago

I'm nominating this for blocking2.0 because a 4% speedup on SunSpider for a single patch is pretty impressive.

blocking2.0: --- → ?

Andreas Gal :gal

Reporter

Comment 25

•

14 years ago

Nick thanks so much for stealing this from me. You rock.

Nicholas Nethercote [inactive]

Assignee

Updated

•

14 years ago

Attachment #471426 - Flags: feedback?(gal) → review?(gal)

Nicholas Nethercote [inactive]

Assignee

Comment 26

•

14 years ago

(In reply to comment #25)
> Nick thanks so much for stealing this from me. You rock.

Thanks!  Do you want to review?  It's mostly your code so it shouldn't take long :)

Mike Shaver (:shaver emeritus)

Comment 27

•

14 years ago

Andreas is on varcaton or something like that (I didn't understand what he was talking about), so someone else might want to steal the review.

Andreas Gal :gal

Reporter

Comment 28

•

14 years ago

Comment on attachment 471426 [details] [diff] [review]
njn's 2nd patch (against TM 52722:b360dd204dc6)

>-static JS_ALWAYS_INLINE JSBool FASTCALL
>-dense_grow(JSContext* cx, JSObject* obj, jsint i, const Value &v)
>+JSBool FASTCALL
>+js_Array_dense_setelem_hole(JSContext* cx, JSObject* obj, jsint i)
> {
>-    JS_ASSERT(obj->isDenseArray());
>+    if (js_PrototypeHasIndexedProperties(cx, obj))
>+        return false;
> 
>-    /*
>-     * Let the interpreter worry about negative array indexes.
>-     */
>-    JS_ASSERT((MAX_DSLOTS_LENGTH > MAX_DSLOTS_LENGTH32) == (sizeof(intptr_t) != sizeof(uint32)));
>-    if (MAX_DSLOTS_LENGTH > MAX_DSLOTS_LENGTH32) {
>-        /*
>-         * Have to check for negative values bleeding through on 64-bit machines only,
>-         * since we can't allocate large enough arrays for this on 32-bit machines.
>-         */
>-        if (i < 0)
>-            return JS_FALSE;
>-    }
>-
>-    /*
>-     * If needed, grow the array as long it remains dense, otherwise fall off trace.
>-     */
>     jsuint u = jsuint(i);
>-    jsuint capacity = obj->getDenseArrayCapacity();
>-    if ((u >= capacity) && (INDEX_TOO_SPARSE(obj, u) || !obj->ensureDenseArrayElements(cx, u + 1)))
>-        return JS_FALSE;
>-
>-    if (obj->getDenseArrayElement(u).isMagic()) {
>-        if (js_PrototypeHasIndexedProperties(cx, obj))
>-            return JS_FALSE;
>-
>-        if (u >= obj->getArrayLength())
>-            obj->setArrayLength(u + 1);
>-    }
>-
>-    obj->setDenseArrayElement(u, v);
>-    return JS_TRUE;
>+    if (u >= obj->getArrayLength())
>+        obj->setArrayLength(u + 1);
>+    return true;

We could inline this and then only call the above check. Separate bug if you prefer or no bug if you think its stupid.

Attachment #471426 - Flags: review?(gal) → review+

Nicholas Nethercote [inactive]

Assignee

Comment 29

•

14 years ago

I figure if we have to call a C function, there's no point doing part of the work in LIR, that's just more LIR instructions to process.

Andreas Gal :gal

Reporter

Comment 30

•

14 years ago

We could easily inline that though. For arrays its always exactly 2 loops. There is a loop for this in the array read path.

Nicholas Nethercote [inactive]

Assignee

Updated

•

14 years ago

Attachment #471425 - Flags: feedback?(gal)

Nicholas Nethercote [inactive]

Assignee

Comment 31

•

14 years ago

(In reply to comment #30)
> We could easily inline that though. For arrays its always exactly 2 loops.
> There is a loop for this in the array read path.

See comment 20 -- I liked the C call better.  It's slower when we hit a hole, but we end up with a lot less LIR to compile.  For SunSpider inlining the code makes some tests slightly faster and some slightly slower, and it ends up a wash.

http://hg.mozilla.org/tracemonkey/rev/03fbde5c25b6

Whiteboard: fixed-in-tracemonkey

Robert Sayre

Updated

•

14 years ago

blocking2.0: ? → ---

Nicholas Nethercote [inactive]

Assignee

Comment 32

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/03fbde5c25b6

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

patch 14 years ago Andreas Gal :gal 14.92 KB, patch		Details \| Diff \| Splinter Review
patch 14 years ago Andreas Gal :gal 14.73 KB, patch		Details \| Diff \| Splinter Review
patch 14 years ago Andreas Gal :gal 16.33 KB, patch	n.nethercote : review-	Details \| Diff \| Splinter Review
njn's 1st patch 14 years ago Nicholas Nethercote [inactive] 18.79 KB, patch		Details \| Diff \| Splinter Review
njn's 2nd patch (against TM 52722:b360dd204dc6) 14 years ago Nicholas Nethercote [inactive] 18.79 KB, patch	gal : review+	Details \| Diff \| Splinter Review