Last Comment Bug 707515 - (ObjShrink) sunspider-string-fasta regression
: (ObjShrink) sunspider-string-fasta regression
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal with 1 vote (vote)
: mozilla11
Assigned To: Brian Hackett (:bhackett)
:
Mentors:
Depends on:
Blocks: ObjectShrink
  Show dependency treegraph
 
Reported: 2011-12-04 09:15 PST by Brian Hackett (:bhackett)
Modified: 2012-02-01 13:58 PST (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (1.12 KB, patch)
2011-12-07 09:06 PST, Brian Hackett (:bhackett)
luke: review+
Details | Diff | Review

Description Brian Hackett (:bhackett) 2011-12-04 09:15:19 PST
It looks like a recent objshrink change made string-fasta really slow.  SS on a recent trunk build vs. JM tip:

Trunk:

============================================
RESULTS (means and 95% confidence intervals)
--------------------------------------------
Total:                 248.7ms +/- 2.9%
--------------------------------------------

  3d:                   41.4ms +/- 1.5%
    cube:               16.0ms +/- 2.1%
    morph:              10.6ms +/- 3.5%
    raytrace:           14.8ms +/- 2.0%

  access:               23.2ms +/- 4.1%
    binary-trees:        3.7ms +/- 13.0%
    fannkuch:            8.4ms +/- 4.4%
    nbody:               5.4ms +/- 6.8%
    nsieve:              5.7ms +/- 8.5%

  bitops:               13.5ms +/- 5.1%
    3bit-bits-in-byte:   0.9ms +/- 25.1%
    bits-in-byte:        5.1ms +/- 4.4%
    bitwise-and:         3.7ms +/- 9.3%
    nsieve-bits:         3.8ms +/- 7.9%

  controlflow:           2.7ms +/- 12.8%
    recursive:           2.7ms +/- 12.8%

  crypto:               21.5ms +/- 3.2%
    aes:                11.4ms +/- 3.2%
    md5:                 6.2ms +/- 4.9%
    sha1:                3.9ms +/- 13.5%

  date:                 34.9ms +/- 3.9%
    format-tofte:       19.3ms +/- 4.3%
    format-xparb:       15.6ms +/- 4.9%

  math:                 20.3ms +/- 4.4%
    cordic:              3.4ms +/- 10.9%
    partial-sums:       13.3ms +/- 3.6%
    spectral-norm:       3.6ms +/- 10.3%

  regexp:               15.5ms +/- 4.5%
    dna:                15.5ms +/- 4.5%

  string:               75.7ms +/- 4.5%
    base64:              6.1ms +/- 3.7%
    fasta:               8.1ms +/- 7.7%
    tagcloud:           23.9ms +/- 3.0%
    unpack-code:        27.3ms +/- 7.7%
    validate-input:     10.3ms +/- 3.4%

JM:


============================================
RESULTS (means and 95% confidence intervals)
--------------------------------------------
Total:                  304.3ms +/- 0.9%
--------------------------------------------

  3d:                    41.4ms +/- 1.2%
    cube:                16.2ms +/- 1.9%
    morph:               10.3ms +/- 3.4%
    raytrace:            14.9ms +/- 1.5%

  access:                21.0ms +/- 2.3%
    binary-trees:         3.1ms +/- 7.3%
    fannkuch:             8.0ms +/- 0.0%
    nbody:                5.2ms +/- 5.8%
    nsieve:               4.7ms +/- 7.3%

  bitops:                13.1ms +/- 6.0%
    3bit-bits-in-byte:    0.9ms +/- 25.1%
    bits-in-byte:         5.0ms +/- 0.0%
    bitwise-and:          3.5ms +/- 10.8%
    nsieve-bits:          3.7ms +/- 9.3%

  controlflow:            2.8ms +/- 10.8%
    recursive:            2.8ms +/- 10.8%

  crypto:                21.4ms +/- 2.8%
    aes:                 11.7ms +/- 3.0%
    md5:                  5.8ms +/- 5.2%
    sha1:                 3.9ms +/- 5.8%

  date:                  35.0ms +/- 1.9%
    format-tofte:        19.6ms +/- 1.9%
    format-xparb:        15.4ms +/- 2.4%

  math:                  19.8ms +/- 3.7%
    cordic:               3.6ms +/- 10.3%
    partial-sums:        12.8ms +/- 3.5%
    spectral-norm:        3.4ms +/- 10.9%

  regexp:                15.0ms +/- 3.9%
    dna:                 15.0ms +/- 3.9%

  string:               134.8ms +/- 1.1%
    base64:               6.0ms +/- 0.0%
    fasta:               66.2ms +/- 1.4%
    tagcloud:            25.9ms +/- 0.9%
    unpack-code:         26.4ms +/- 1.4%
    validate-input:      10.3ms +/- 3.4%

The 55.6ms total difference is all in the 58.1ms difference on string-fasta.  This is recent, I didn't see this when comparing SS scores between the branches a week or two ago.
Comment 1 blechri 2011-12-07 03:05:10 PST
I'm seeing the same regression happening between the 2011-12-01 and the 12-04 Nightly build.
Comment 2 Brian Hackett (:bhackett) 2011-12-07 09:06:23 PST
Created attachment 579716 [details] [diff] [review]
patch

Because of the teleporting optimization, the entire prototype chain was getting marked with an uncacheable proto, which will end up inhibiting pretty much all iterator caching.  Using generateOwnShape() all the time ends up creating too many dictionary objects, so some balance needs to be struck.  Using generateOwnShape on objects with singleton types takes care of the regression and seems to avoid a lot of dictionary creation during normal browsing, but this is still pretty unsatisfying and is really just a temporary fix until either bug 707717 gets fixed or the prototype is moved to BaseShape.
Comment 3 Luke Wagner [:luke] 2011-12-07 17:11:11 PST
How about just removing the teleporting optimization already?  This is like the 5th gross-but-it-solves-this-perf-regression hack.
Comment 4 Brian Hackett (:bhackett) 2011-12-07 18:46:06 PST
(In reply to Luke Wagner [:luke] from comment #3)
> How about just removing the teleporting optimization already?  This is like
> the 5th gross-but-it-solves-this-perf-regression hack.

That is definitely a third alternative to the two I listed above.  I really want to remove the teleporting optimization but need time to collect data and determine whether doing so is a good idea.  In the meantime, this is a fire which needs to be put out.
Comment 5 David Mandelin [:dmandelin] 2011-12-07 19:17:13 PST
(In reply to Brian Hackett (:bhackett) from comment #4)
> (In reply to Luke Wagner [:luke] from comment #3)
> > How about just removing the teleporting optimization already?  This is like
> > the 5th gross-but-it-solves-this-perf-regression hack.
> 
> That is definitely a third alternative to the two I listed above.  I really
> want to remove the teleporting optimization but need time to collect data
> and determine whether doing so is a good idea.  In the meantime, this is a
> fire which needs to be put out.

Yes, we need to be careful. Access time is constant with proto chain length in all browsers. So if that's true even without teleporting, then we of course could just remove it, but that's not likely.
Comment 6 Brendan Eich [:brendan] 2011-12-07 21:00:40 PST
(In reply to Luke Wagner [:luke] from comment #3)
> How about just removing the teleporting optimization already?  This is like
> the 5th gross-but-it-solves-this-perf-regression hack.

Maybe this is the last of five, and someone shoulda read jorendorff's doc [1] first :-P.

Really, if an optimization matters, then the code has to preserve its invariants. If that is hard, then welcome to JS! It's not enough to bemoan complexity if there's no simpler way to achieve the same performance. Is there?

For the long chains Vitek et al. [2] found by instrumenting JS execution of popular web apps, getting rid of teleportation could regress performance. Or perhaps the long chains are rare outliers, and the quartiles shown in Figure 6 are short enough that checking every shape is tolerable. Hard to know without measuring.

Generally, lookups dominate hard cases such as __proto__ assignment, and hard cases make bad law. So in principle, we should not sacrifice performance in the lookup path for simplicity in the abnormal cases. But it's hard to know what is "abnormal".

I believe Dart is touted because its Smalltalk-style classes are not subject to shadowing and foreshadowing that can invalidate PICs in the absence of teleportation or checking every shape along the chain. So I bet one can show JS performance in V8 suffering with long chains where Dart does not suffer. It would be interesting to find that microbenchmark and run it.

/be

[1] https://developer.mozilla.org/En/SpiderMonkey/Internals/Property_cache
[2] http://sss.cs.purdue.edu/projects/dynjs/pldi275-richards.pdf
Comment 7 Brian Hackett (:bhackett) 2011-12-08 06:30:42 PST
The hope is that TI has reduced the pressure on teleporting to handle accesses with long prototype chains.  With TI we can statically resolve such accesses to a single object or even a specific value, with no shape checks.  We just don't know how often this actually holds; across all getprop accesses (on a breadth of websites, measured a couple months back) we statically resolve about 40% of accesses, and in principle this should be higher for accesses going through a prototype.
Comment 8 Luke Wagner [:luke] 2011-12-08 11:45:48 PST
(In reply to Brendan Eich [:brendan] from comment #6)
> Maybe this is the last of five, and someone shoulda read jorendorff's doc
> [1] first :-P.

These haven't been correctness bugs,

> Generally, lookups dominate hard cases such as __proto__ assignment, and
> hard cases make bad law. So in principle, we should not sacrifice
> performance in the lookup path for simplicity in the abnormal cases.

they have been performance bugs because the hard cases aren't as rare.

> Really, if an optimization matters, then the code has to preserve its
> invariants.

Whether it matters is the issue under question.  I didn't say "after measuring whether it actually buys us anything" in my comment b/c Brian and I have already talked about this issue and I know that he plans to do such a measurement.  My comment was just "how about sooner rather than later" because there is a very real possibility that "later" may become "never" and we will just be left a sprinkling of magic code that only makes sense in the context of the browser+benchmark suite.

I'll r+ since in 12 days ObjShrink moves to aurora and it would be nice to use that 12 days for baking, not new-patch-writing.
Comment 9 Luke Wagner [:luke] 2011-12-08 11:47:03 PST
Comment on attachment 579716 [details] [diff] [review]
patch

Please provide a comment explaining the tradeoff being made and the browser use case.
Comment 10 Brian Hackett (:bhackett) 2011-12-08 19:24:13 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/592d5034c8a9
Comment 11 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-12-09 02:34:54 PST
There's a good chance that this improved Dromaeo (CSS) by 2-3%, based on tree-management mails.
Comment 12 Ed Morley [:emorley] 2011-12-10 20:46:11 PST
https://hg.mozilla.org/mozilla-central/rev/592d5034c8a9

Note You need to log in before you can comment on or make changes to this bug.