Closed Bug 596026 Opened 14 years ago Closed 14 years ago

js_GetPropertyHelper() called too many times for string-fasta.js

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 2 obsolete files)

Under JM+TM, js_GetPropertyHelper() is called 175,155 times.  cdleary says that there's a PIC that should be catching these but it seems it is not.  The properties being accessed are just normal ones:  IUB.a, HomoSap.t, etc.
Okay, so without the getelem PIC, we hit the stub like so:

getelem stub (in cycles):
       Mean:              1109.57
        Min:                  811
        Max:               222309
     StdDev:              2342.99
      Count:               175152
      Total:          1.94343e+08
  ms @ 3GHz:              64.7811

With the PIC enabled, we hit it thusly:

getelem stub (in cycles):
       Mean:              1047.41
        Min:                  849
        Max:               214416
     StdDev:              1808.62
      Count:                27769
      Total:          2.90856e+07
  ms @ 3GHz:               9.6952

So we're getting 84% of all of them, but that's another 16% (25k misses) we might be able to pick up for 10ms! Looking into why we're hitting the stub now.
Ugh, nevermind those numbers, accidentally was measuring a debug build.
I've measured this before in bug 578761 comment 48 -- either something changed 

getelem stub (in cycles):
       Mean:              175.835
        Min:                  130
        Max:               223732
     StdDev:              2129.62
      Count:                27769
      Total:          4.88275e+06
  ms @ 3GHz:              1.62758

Increasing PIC entries to 32:

getelem stub (in cycles):
       Mean:               2638.5
        Min:                 1323
        Max:                 3954
     StdDev:               1860.4
      Count:                    2
      Total:                 5277
  ms @ 3GHz:             0.001759

We can do this for a 1-3ms win if we think it's worth it -- if so, I'll run the rest of the suite to make sure it's not negatively affected.

Still, this only accounts for 30k of the 175k that Nick is seeing -- will set some breakpoints.
(In reply to comment #3)
> I've measured this before in bug 578761 comment 48 -- either something changed 

... or I was somehow using the slow call count profiler improperly.
A PIC is never big enough for the biggest object you might for-in over. We can tune for SS but really, PIC is not exactly the right tool here. We want something based on the enumeration protocol and a bit of compile-time analysis.

What about bug 580138 (someone should dup bug 560989)?

See also bug 593931 (which is not really about fasta if we do the right kind of prediction).

/be
The issue with TM+JM seems to be that TM is taking over and we never get to really exercise the JM GETELEM PIC. Nick and I are working on adding a PIC-like table-lookup mechanism to jstracer.cpp:GetPropertyByName to avoid this overhead and maybe make things a bit easier on trace learning.
Attached patch Tracer PIC for GetPropByName. (obsolete) — Splinter Review
Nick and I whipped this up. SunSpider comparison said ~17% speedup on fasta with 32 entries, ~10% on fasta with 16 entries.
Attachment #475231 - Flags: review?(jorendorff)
Attached patch Tracer PIC for GetPropByName. (obsolete) — Splinter Review
Remove debugging crud.
Attachment #475231 - Attachment is obsolete: true
Attachment #475231 - Flags: review?(jorendorff)
Last quick fixup before review, I promise. :-) Got rid of _JS_CTYPE_VOID_STAR in favor of _JS_CTYPE_PICTABLE.
Attachment #475235 - Attachment is obsolete: true
Attachment #475243 - Flags: review?(jorendorff)
Cachegrind results on my Mac, which has a lot of extra system overhead than my Linux box -- if I ran these on my linux box the improvements/degradations would be greater.

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | on-trace (may overestimate)  |
---------------------------------------------------------------
|   188.022   188.060 (------) |    48.598    48.598 (------) | 3d-cube
|   136.549   136.548 (------) |    25.517    25.517 (------) | 3d-morph
|   196.263   196.280 (------) |    44.925    44.924 (------) | 3d-raytrace
|   122.554   122.979 (0.997x) |    14.796    14.796 (------) | access-binary-
|   190.659   190.633 (------) |    93.423    93.423 (------) | access-fannkuc
|   116.424   116.413 (------) |    17.692    17.692 (------) | access-nbody
|   126.109   126.109 (------) |    28.379    28.379 (------) | access-nsieve
|    95.092    95.096 (------) |     3.618     3.618 (------) | bitops-3bit-bi
|   124.478   124.479 (------) |    32.909    32.909 (------) | bitops-bits-in
|   103.514   103.519 (------) |    12.380    12.380 (------) | bitops-bitwise
|   130.887   130.887 (------) |    38.377    38.377 (------) | bitops-nsieve-
|   109.375   109.381 (------) |    17.894    17.894 (------) | controlflow-re
|   125.267   125.278 (------) |     6.666     6.666 (------) | crypto-md5
|   108.621   108.618 (------) |     7.532     7.532 (------) | crypto-sha1
|   182.161   182.160 (------) |    22.689    22.689 (------) | date-format-to
|   165.617   162.990 (1.016x) |     9.728     9.753 (0.998x) | date-format-xp
|   130.774   130.775 (------) |    31.463    31.463 (------) | math-cordic
|   110.582   110.588 (------) |     6.460     6.460 (------) | math-partial-s
|   109.770   109.753 (------) |    14.025    14.025 (------) | math-spectral-
|   138.422   138.433 (------) |    36.117    36.117 (------) | regexp-dna
|   118.406   118.412 (------) |    10.927    10.927 (------) | string-base64
|   197.259   182.823 (1.079x) |    23.935    24.111 (0.993x) | string-fasta
|   227.677   227.660 (------) |    19.436    19.436 (------) | string-tagclou
|   264.846   265.821 (0.996x) |    29.674    29.674 (------) | string-unpack-
|   133.538   133.537 (------) |     8.839     8.839 (------) | string-validat
-------
|  3652.880  3637.243 (1.004x) |   606.012   606.209 (------) | all
>+    *shapeOut = NULL;
>     /* Convert string indices to integers if appropriate. */

Style nit: Blank line before a comment.

>     protoIndex = js_LookupPropertyWithFlags(cx, aobj, id, cx->resolveFlags,
>                                             &obj2, &prop);
>+    *holderOut = obj2;
>     if (protoIndex < 0)
>         return JS_FALSE;

Nit: Move the olderOut line after the error check, please.

>     shape = (Shape *) prop;
>+    *shapeOut = shape;
> 
>     if (getHow & JSGET_CACHE_RESULT) {
>         JS_ASSERT_NOT_ON_TRACE(cx);
>         JS_PROPERTY_CACHE(cx).fill(cx, aobj, 0, protoIndex, obj2, shape);
>     }

Hmm. It'd be nice, since we've got out params now, to move that property
cache fill to the caller and get rid of JSGET_CACHE_RESULT. ...but maybe
that would be bad.

>+}
>+
>+
>+JSBool

Style nit: delete the extra blank line.

>+struct PICTableEntry
>+{
>+    uint32  shape;
>+    jsid    id;
>+    uint32  slot;
>+};

Does this waste a few bytes on 64-bit platforms?

>-    if (!RootedStringToId(cx, namep, &id) || !obj->getProperty(cx, id, vp)) {
>+    if (!RootedStringToId(cx, namep, &id))
>+        return false;
>+    
>+    /* Duplicated from JSObject::getProperty. */
>+    PropertyIdOp op = obj->getOps()->getProperty;
>+    if (op)
>+        return op(cx, obj, id, vp);

For both these ifs, you need to SetBuiltinError iff an error occurred.

Add a test for that using a proxy or a getter to make the op call fail.

>+    /* Try to hit in the cache. */
>+    {
>+        uint32 slot;
>+        if (picTable->scan(obj->shape(), id, &slot)) {
>+            *vp = obj->getSlot(slot);
>+            return cx->tracerState->builtinStatus == 0;
>+        }
>+    }

Style nit: Lose the extra braces?

>+    /* Only update the table when the object is the holder of the property. */
>+    if (obj == holder) {

Also avoid updating the PIC when shape->slot == SHAPE_INVALID_SLOT, and
add a test for that case (if there isn't one already).

r=me with these changes. This is a very cute patch.
Attachment #475243 - Flags: review?(jorendorff) → review+
The slight degradations in instruction counts are because js_GetPropertyHelperWithShape() is returning (by reference) additional values.
You could avoid this (see the JM PICs) by using FindProperty to fill.
(In reply to comment #13)
> You could avoid this (see the JM PICs) by using FindProperty to fill.

That's a good idea, but it won't help with the degradations -- those ones don't hit GetPropertyByName, it's just that js_GetPropertyName() has become slightly more expensive.
Bumping the PIC size to 32 changes the string-fasta and total results as follows:

 ...
|   197.259   173.529 (1.137x) |    23.935    24.110 (0.993x) | string-fasta
 ...
|  3652.863  3627.924 (1.007x) |   606.012   606.209 (------) | all


All the others were unchanged.
Comment on attachment 475243 [details] [diff] [review]
Tracer PIC for GetPropByName.

>+bool
>+js_GetPropertyHelperWithShape(JSContext *cx, JSObject *obj, jsid id,
>+                              uintN getHow, Value *vp,
>+                              const Shape **shapeOut, JSObject **holderOut)


Too many args and a mostly (all but one bit ;-) wasted return type -- suggest returning Shape * directly for some of those instruction count losses one back.

>+    /* Only update the table when the object is the holder of the property. */
>+    if (obj == holder) {
>+        /*
>+         * Note: we insert the non-normalized id into the table so you don't need to
>+         * normalize it before hitting in the table (faster lookup).
>+         */
>+        picTable->update(obj->shape(), id, shape->slot);

But update just fills the PIC, there's no self-organization on a scan hit, and once full you're stuck with possibly irrelevant cache "lines".

Better to wrap around and use MRU. You'd fill at the cursor and start scanning just after it, wrapping around and skipping (initially) free entries.

Don't tune for SS too much and declare victory. We have a chance with this kind of approach to do better than non-self-organizing branch exits (trace trees), we should exploit it.

/be
Argh, s/one back/won back/ in last comment!

(Are textareas updating badly when you type? Is this bug 130078 fallout? Argh!)

/be
(In reply to comment #16)
> Too many args and a mostly (all but one bit ;-) wasted return type -- suggest
> returning Shape * directly for some of those instruction count losses one back.

We should be able to force inline that function to avoid the overhead (DCE the unused arg stores).

> But update just fills the PIC, there's no self-organization on a scan hit, and
> once full you're stuck with possibly irrelevant cache "lines".

This is the way JM PICs work. :-) In the JM PICs dmandelin measured that balancing didn't yield a speedup, but it may be different for our tables, since we don't invalidate the I$ as a result of balancing.

> Don't tune for SS too much and declare victory.

Will file a follow up bug to try balancing (so long as this patch doesn't regress V8) in an attempt to iterate early and often.
(In reply to comment #18)
> (In reply to comment #16)
> > Too many args and a mostly (all but one bit ;-) wasted return type -- suggest
> > returning Shape * directly for some of those instruction count losses one back.
> 
> We should be able to force inline that function to avoid the overhead (DCE the
> unused arg stores).

I'd rather a simpler signature with one fewer out params (Out params :-/) to fiddle with.

> > But update just fills the PIC, there's no self-organization on a scan hit, and
> > once full you're stuck with possibly irrelevant cache "lines".
> 
> This is the way JM PICs work. :-)

I know. This is a different animal, not a PIC based on literal property name (Self/Smalltalk message selector), or a likely repeated array index? See first para in comment 5.

> Will file a follow up bug to try balancing (so long as this patch doesn't
> regress V8) in an attempt to iterate early and often.

Thanks,

/be
(Curse this bug 130078 fallout, my typing results in even more typos than usual!)
http://hg.mozilla.org/mozilla-central/rev/62545b0da5bc
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: fixed-in-tracemonkey
Fileup fodder filed as bug 597099.
Status: RESOLVED → UNCONFIRMED
Ever confirmed: false
Resolution: FIXED → ---
Not sure why the status changed, but changing back to resolved/fixed.
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Depends on: 601434
Depends on: 643444
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: