Closed Bug 707515 Opened 13 years ago Closed 13 years ago

(ObjShrink) sunspider-string-fasta regression

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla11

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Attachments

(1 file)

It looks like a recent objshrink change made string-fasta really slow. SS on a recent trunk build vs. JM tip: Trunk: ============================================ RESULTS (means and 95% confidence intervals) -------------------------------------------- Total: 248.7ms +/- 2.9% -------------------------------------------- 3d: 41.4ms +/- 1.5% cube: 16.0ms +/- 2.1% morph: 10.6ms +/- 3.5% raytrace: 14.8ms +/- 2.0% access: 23.2ms +/- 4.1% binary-trees: 3.7ms +/- 13.0% fannkuch: 8.4ms +/- 4.4% nbody: 5.4ms +/- 6.8% nsieve: 5.7ms +/- 8.5% bitops: 13.5ms +/- 5.1% 3bit-bits-in-byte: 0.9ms +/- 25.1% bits-in-byte: 5.1ms +/- 4.4% bitwise-and: 3.7ms +/- 9.3% nsieve-bits: 3.8ms +/- 7.9% controlflow: 2.7ms +/- 12.8% recursive: 2.7ms +/- 12.8% crypto: 21.5ms +/- 3.2% aes: 11.4ms +/- 3.2% md5: 6.2ms +/- 4.9% sha1: 3.9ms +/- 13.5% date: 34.9ms +/- 3.9% format-tofte: 19.3ms +/- 4.3% format-xparb: 15.6ms +/- 4.9% math: 20.3ms +/- 4.4% cordic: 3.4ms +/- 10.9% partial-sums: 13.3ms +/- 3.6% spectral-norm: 3.6ms +/- 10.3% regexp: 15.5ms +/- 4.5% dna: 15.5ms +/- 4.5% string: 75.7ms +/- 4.5% base64: 6.1ms +/- 3.7% fasta: 8.1ms +/- 7.7% tagcloud: 23.9ms +/- 3.0% unpack-code: 27.3ms +/- 7.7% validate-input: 10.3ms +/- 3.4% JM: ============================================ RESULTS (means and 95% confidence intervals) -------------------------------------------- Total: 304.3ms +/- 0.9% -------------------------------------------- 3d: 41.4ms +/- 1.2% cube: 16.2ms +/- 1.9% morph: 10.3ms +/- 3.4% raytrace: 14.9ms +/- 1.5% access: 21.0ms +/- 2.3% binary-trees: 3.1ms +/- 7.3% fannkuch: 8.0ms +/- 0.0% nbody: 5.2ms +/- 5.8% nsieve: 4.7ms +/- 7.3% bitops: 13.1ms +/- 6.0% 3bit-bits-in-byte: 0.9ms +/- 25.1% bits-in-byte: 5.0ms +/- 0.0% bitwise-and: 3.5ms +/- 10.8% nsieve-bits: 3.7ms +/- 9.3% controlflow: 2.8ms +/- 10.8% recursive: 2.8ms +/- 10.8% crypto: 21.4ms +/- 2.8% aes: 11.7ms +/- 3.0% md5: 5.8ms +/- 5.2% sha1: 3.9ms +/- 5.8% date: 35.0ms +/- 1.9% format-tofte: 19.6ms +/- 1.9% format-xparb: 15.4ms +/- 2.4% math: 19.8ms +/- 3.7% cordic: 3.6ms +/- 10.3% partial-sums: 12.8ms +/- 3.5% spectral-norm: 3.4ms +/- 10.9% regexp: 15.0ms +/- 3.9% dna: 15.0ms +/- 3.9% string: 134.8ms +/- 1.1% base64: 6.0ms +/- 0.0% fasta: 66.2ms +/- 1.4% tagcloud: 25.9ms +/- 0.9% unpack-code: 26.4ms +/- 1.4% validate-input: 10.3ms +/- 3.4% The 55.6ms total difference is all in the 58.1ms difference on string-fasta. This is recent, I didn't see this when comparing SS scores between the branches a week or two ago.
I'm seeing the same regression happening between the 2011-12-01 and the 12-04 Nightly build.
Attached patch patchSplinter Review
Because of the teleporting optimization, the entire prototype chain was getting marked with an uncacheable proto, which will end up inhibiting pretty much all iterator caching. Using generateOwnShape() all the time ends up creating too many dictionary objects, so some balance needs to be struck. Using generateOwnShape on objects with singleton types takes care of the regression and seems to avoid a lot of dictionary creation during normal browsing, but this is still pretty unsatisfying and is really just a temporary fix until either bug 707717 gets fixed or the prototype is moved to BaseShape.
Assignee: general → bhackett1024
Attachment #579716 - Flags: review?(luke)
How about just removing the teleporting optimization already? This is like the 5th gross-but-it-solves-this-perf-regression hack.
(In reply to Luke Wagner [:luke] from comment #3) > How about just removing the teleporting optimization already? This is like > the 5th gross-but-it-solves-this-perf-regression hack. That is definitely a third alternative to the two I listed above. I really want to remove the teleporting optimization but need time to collect data and determine whether doing so is a good idea. In the meantime, this is a fire which needs to be put out.
(In reply to Brian Hackett (:bhackett) from comment #4) > (In reply to Luke Wagner [:luke] from comment #3) > > How about just removing the teleporting optimization already? This is like > > the 5th gross-but-it-solves-this-perf-regression hack. > > That is definitely a third alternative to the two I listed above. I really > want to remove the teleporting optimization but need time to collect data > and determine whether doing so is a good idea. In the meantime, this is a > fire which needs to be put out. Yes, we need to be careful. Access time is constant with proto chain length in all browsers. So if that's true even without teleporting, then we of course could just remove it, but that's not likely.
(In reply to Luke Wagner [:luke] from comment #3) > How about just removing the teleporting optimization already? This is like > the 5th gross-but-it-solves-this-perf-regression hack. Maybe this is the last of five, and someone shoulda read jorendorff's doc [1] first :-P. Really, if an optimization matters, then the code has to preserve its invariants. If that is hard, then welcome to JS! It's not enough to bemoan complexity if there's no simpler way to achieve the same performance. Is there? For the long chains Vitek et al. [2] found by instrumenting JS execution of popular web apps, getting rid of teleportation could regress performance. Or perhaps the long chains are rare outliers, and the quartiles shown in Figure 6 are short enough that checking every shape is tolerable. Hard to know without measuring. Generally, lookups dominate hard cases such as __proto__ assignment, and hard cases make bad law. So in principle, we should not sacrifice performance in the lookup path for simplicity in the abnormal cases. But it's hard to know what is "abnormal". I believe Dart is touted because its Smalltalk-style classes are not subject to shadowing and foreshadowing that can invalidate PICs in the absence of teleportation or checking every shape along the chain. So I bet one can show JS performance in V8 suffering with long chains where Dart does not suffer. It would be interesting to find that microbenchmark and run it. /be [1] https://developer.mozilla.org/En/SpiderMonkey/Internals/Property_cache [2] http://sss.cs.purdue.edu/projects/dynjs/pldi275-richards.pdf
The hope is that TI has reduced the pressure on teleporting to handle accesses with long prototype chains. With TI we can statically resolve such accesses to a single object or even a specific value, with no shape checks. We just don't know how often this actually holds; across all getprop accesses (on a breadth of websites, measured a couple months back) we statically resolve about 40% of accesses, and in principle this should be higher for accesses going through a prototype.
(In reply to Brendan Eich [:brendan] from comment #6) > Maybe this is the last of five, and someone shoulda read jorendorff's doc > [1] first :-P. These haven't been correctness bugs, > Generally, lookups dominate hard cases such as __proto__ assignment, and > hard cases make bad law. So in principle, we should not sacrifice > performance in the lookup path for simplicity in the abnormal cases. they have been performance bugs because the hard cases aren't as rare. > Really, if an optimization matters, then the code has to preserve its > invariants. Whether it matters is the issue under question. I didn't say "after measuring whether it actually buys us anything" in my comment b/c Brian and I have already talked about this issue and I know that he plans to do such a measurement. My comment was just "how about sooner rather than later" because there is a very real possibility that "later" may become "never" and we will just be left a sprinkling of magic code that only makes sense in the context of the browser+benchmark suite. I'll r+ since in 12 days ObjShrink moves to aurora and it would be nice to use that 12 days for baking, not new-patch-writing.
Comment on attachment 579716 [details] [diff] [review] patch Please provide a comment explaining the tradeoff being made and the browser use case.
Attachment #579716 - Flags: review?(luke) → review+
There's a good chance that this improved Dromaeo (CSS) by 2-3%, based on tree-management mails.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla11
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: