[meta]Improve performance on Ben Galbraith's linked bubblemark

NEW
Unassigned

Status

()

Core
JavaScript Engine
9 years ago
4 years ago

People

(Reporter: bz, Unassigned)

Tracking

(Depends on: 1 bug, {meta, perf})

Trunk
x86
Mac OS X
meta, perf
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(6 attachments, 1 obsolete attachment)

On the url in the url field, when set to 128 balls, we get about 23fps on my machine.  Safari 4 gets closer to 80.

I profiled this on m-c, and the time split on a high level (ignoring the time spent on the URI-classifier thread, which seems to be 7% of the profiler hits) is:

22% in js_Interpret (we mostly fail to trace this testcase, at least in part because of non-stub getters/setters, but there are other issues too)

15% under js_MonitorLoopEdge, breaking down as:
  10% js_ExecuteTree (largely LeaveTree, JS_ArenaAllocate,
      BuildNativeStackFrame, etc).
   2% js_CheckEntryTypes
   1% Running jitted code
   2% various other small stuff (attempting to extend trees, etc)

10% under js_GetPropertyHelper, breaking down as:
   2% self
   4% js_FillPropertyCache
   4% js_LookupPropertyWithFlags (in that function, and under it
      in js_SearchScope)

8% under js_SetPropertyHelper, breaking down as:
   3.5% DOM SetTop
   3.5% DOM SetLeft
   1% js_LookupPropertyWithFlags and js_FillPropertyCache

6% under js_ValueToNumber, mostly calling js_strtod.

4% under js_FullTestPropertyCache

2% under js_SetProperty

2% under js_ValueToString (mostly calling js_NumberToString)

2% allocating jsdoubles

2% js_FindPropertyHelper

~5% in other smaler js things (js_NativeGet, js_fun_call, math_abs, js_GetProperty, etc).

Non-JS bits:

15% painting
4% style recomputation
2% reflow
Created attachment 382902 [details]
First js file for benchmark
Created attachment 382903 [details]
Second js file for benchmark
Created attachment 382905 [details]
HTML for benchmark: run me!
Oh, the attached JS differs from the original in one important way.  In the first js file, this part:

  // process collisions
  for (i=0; i<_this._N; i++) {
    for (var j=i+1; j<_this._N; j++) {
      _this._ballsO[i].doCollide(_this._ballsO[j]);
    }
  }

the original testcase has just |j=i+1| (so js is a global variable).
Filed bug 497789 on the other thing that makes us fall off trace here, and do so on the O(N^2) part of the benchmark, which seems to dominate for N==128 per above profile data.
Keywords: perf
Created attachment 383136 [details]
CSS for benchmark
Attachment #382905 - Attachment is obsolete: true
Created attachment 383137 [details]
Benchmark, with correct CSS.  Run this.
So as an experiment, I tried just commenting out the guard that leads to the aborts in bug 497789.  That helps a bit: fps goes from 23 to 50.... for the first 10-15 seconds.  Then it collapses back to 23.  And there are more "inner tree is trying to grow" aborts.  Maybe the commenting out is just too hacky to really work...
Created attachment 384557 [details]
JS shell version for ease of debugging the JS parts of this

This is a totally nonminimal JS shell version of the testcase.  It still outputs fps, and gcs every so often just like the browser.

This testcase gets 60fps or so until the first gc on m-c right now, then drops off to 45fps.  With the patch in bug 497789 it goes up to 400fps or so... until the first gc.  Then it drops to about 45fps.  That's what comment 9 is about.  I'll be filing a bug on that.
OK, with the fix for bug 497789 (and all other pending patches for bugs blocking this one applied) the new profile looks like:

js-related stuff:
7% running jitted code, boxing and unboxing doubles, etc.
5% js_VaueToString
4% getting .style off nodes (JS-wrapping, unwrapping, etc).
   Slimwrapper might help.
2.5% interpreter time from js_fun_call.  Might get better once we trace
     getters/setters.
1% other.

Total js-related: 20% or so.  A big step up from the 68% in comment 0.

Non-js-related:
36% painting (see bug 498579, though it's not happening as much in this
    profile; compositor might help here)
15% setting style.top/left (at least 3/4 under DeclarationChanged)
10% style recomputation (see bug 479655)
4% reflow
3% js_LookupPropertyWithFlags

The remaining 10% or so looks like mostly profiler artifacts (dtrace_get_cpu_int_stack_top, I'm looking at you).
Duplicate of this bug: 512250
Quick update: We now hit 40fps on trunk here (compared to 28fps back in June).  If I hack the JS to avoid bug 497789, we hit 68fps.

Safari 4 on the same hardware is at 90fps on the original testcase; 95fps on the hacked on.  Chrome is at 130fps, but cheating on the timeouts.

With that hack, 15% of the time is spent in jit-generated code or libmozjs.  Also some xpconnect-ish time around.  So pretty similar to comment 11...
QA Contact: general → brendan
In case it wasn't clear, the 68fps cap from comment 13 was due to bug 528208.  With that fixed, and still with the hackaround bug 497789 in place, m-c is at 100fps on my machine.  If I drop the interval clamping in core to below 10ms the fps actually goes down; I think the timer thread is screwing that over somehow.  We could try more balls to see whether we can get useful head-to-head with chrome.  I'll probably do that once bug 497789 lands and reprofile.
OK, I redid a profile with the current patch in bug 497789 and the DOM's rate-limiting on timeouts removed (so we could go over the 100fps we were hitting).

General breakdown:

js-related:
1% quickstubs glue setting style.top/left
9% js_NumberToStringWithBase
4.5% js_UnboxDouble
3.5% getting .style (unwrapping this, tearoffs, wrapping the decl)
1% js_ConcatStrings
0.5% js_BoxDouble
12% jit-generated code
1% js_ValueToString
0.4% in js_Interpret (yay!)

Total JS-related: 33%

non-js-related:
23.5% setting style.left/top (
14% processing restyles
3% reflow
21% painting

Total non-JS-related: 62%

The remaining 5% looks to mostly be cocoa widgetry stuff or something.

We'll likely win a few % here if painting moves to refresh driver; about 1/7 of the restyling time was happening off WillPaint.

In general, the time spent under the setTimeout call is about 58% of total, with JS accounting for a bit over half of that.

In the style attr setting there's some COM stuff to be killed off; zwol is working on that.
Oh, and when not running under the profiler, for the comment 15 setup we hit 133fps on my machine.  Chrome hits about 148fps.
QA Contact: brendan → general
For what it's worth, current numbers for the comment 15 setup on my current machine are:

m-c: 175fps
Opera: 116fps
Chrome: 180fps
(Assignee)

Updated

4 years ago
Assignee: general → nobody
You need to log in before you can comment on or make changes to this bug.