Slow canvas performance on this testcase

NEW
Unassigned

Status

()

Core
Canvas: 2D
6 years ago
6 years ago

People

(Reporter: Stephen Bannasch, Unassigned)

Tracking

7 Branch
x86
Mac OS X
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

(Reporter)

Description

6 years ago
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.51.22 (KHTML, like Gecko) Version/5.1.1 Safari/534.51.22

Steps to reproduce:

Created a computational simulation for exploring Self-Organized Criticality.

http://stepheneb.github.com/avalanche2d-js/avalanche2d.html

source: https://github.com/stepheneb/avalanche2d-js

There are three main parts:

1) A computational engine
2) A canvas visualization of an array of desks with 0..n folders
3) A graph that displays the average number of folders per desk for the array.

This specific simulation is an adaptation of a NetLogo model.

The point of this research is to understand the performance limitations and capacities for creating and running computational models and visualizations in modern browsers. This work has uncovered many useful patterns for writing computationally intensive JavaScript/HTML5 applications.. It seems clear that modern browsers are a very powerful platform for writing the kinds of applications we used to write in Java.




Actual results:

FireFox runs the current simulation at approximately 25% the speed of Chrome 14.

See benchmark results in the readme here: https://github.com/stepheneb/avalanche2d-js

The benchmarks report the average number of model-steps/s over a run of 5000 model-steps. The model automatically stops at 5000 model-steps.

The benchmarks include data for running the model both with and without the Canvas visualization and the real-time grapher.

FireFox 7.0.1 and Nightly are both about 25% ads fast as Chrome.


Expected results:

Ideally performance should be similar to Chrome 14.
I assume the actual steps to reproduce are to load http://stepheneb.github.com/avalanche2d-js/avalanche2d.html and select the "Go" radio button?

If so, and using the default settings (100x100 desk array size, visualization+graph rendering, Uint8Array), I get numbers like so on Mac OS X 10.6:

  Firefox 7: 2265.5 (steps/s)
  Current trunk nightly: 5417.1 (steps/s)
  Chrome dev build (16.x): 7680.5 (steps/s)
  
If I turn off the rendering part, these become:

  Firefox 7: 3043.2 (steps/s)
  Current trunk nightly: 9124.1 (steps/s)
  Chrome dev build (16.x): rate: 8741.3 (steps/s)

Though all the numbers are very noisy (about 10% or more noise).

If I enable either the visualization or the graph we drop down to about the first set of numbers....

Stephen, how likely is this page to stick around in its current form?

In any case, it seems like the main bottleneck at this point for nightly is in fact the canvas bits.

Brian, do you want to look into whatever JS issues are left here and file bugs as needed?
Status: UNCONFIRMED → NEW
Component: General → General
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → general
Summary: Slow Computational Performance → Slow canvas and simulation performance on this testcase
Component: General → Canvas: 2D
QA Contact: general → canvas.2d
A profile of the testcase with the rendering bits (both visualization and graph) enabled shows:

  7% kernel-level GL stuff
 15% painting
 31% JIT-generated JS code
  7% array_shift
  7% Non-quickstubbed gets of window.self
  5% under mjit::ic::Call (some of this is canvas stuff, note).
  3% under mjit:stibs:SlowCall (likewise)
  8% stroking on canvas (about 2/3 the path ops and 1/3 invalidation)
  
and various array ops, stubs::Pos, js_Date, stubs::sub, BeginPath, etc.

If I collapse all the libmozjs stuff, then 53% is JS engine code, 8% is canvas stroking, 7% is window.self, 1% is setAttribute on elements, and some minor stuff for getImageData, beginPath, putImageData, etc.

Looks like the window.self gets all happen from this sort of pattern in the grapher:

  y.call(self, pts[0].y, 0)

Is there a reason that's using "self" and not just "window" or better yet "null" as the first argument to call() there?
The other interesting thing is the "Animation loop timing measurements for this browser" number, which is rather different for us and Chrome.  What is that measuring?
(Reporter)

Comment 4

6 years ago
I normally run two tests: model, visualization, and grapher AND just the model.

It makes sense to describe these results as a pair of numbers so I might describe your Nightly results as 5417/9124

Model, Canvas Visualization, Graph:

1) Close other CPU intensive processes
2) open page: http://stepheneb.github.com/avalanche2d-js/avalanche2d.html
3) press "Go" and wait for 5000 model-steps to complete
4) Press "Reset"
5) press "Go" and wait for 5000 model-steps to complete
6) record value displayed in "rate: nnnn (steps/s)"

Just Model -- no Canvas Visualization or Graph

This should mainly just test the computational speed.

1) Close other CPU intensive processes
2) open page: http://stepheneb.github.com/avalanche2d-js/avalanche2d.html
3) Uncheck "Visualization" and "Graph"
4) press "Go" and wait for 5000 model-steps to complete
5) Press "Reset"
6) press "Go" and wait for 5000 model-steps to complete
7) record value displayed in "rate: nnnn (steps/s)"

I run the test twice because there sometimes seems to run slower the first time.

One reason the result is noisy is that the computational algorithm has chaotic aspects.

A model step consists of randomly "dropping" a folder on one of the 'Bureaucrats' desks. If that Bureaucrat has more than 3 folders they take four folders from their desk and distribute them to their neighbors. In this process any Bureaucrat to whom a folder has been distributed must be checked to see if they have more than 3 folders ... and so on. Only when NO Bureaucrat has more than 3 folders is the model-step over. Bureaucrats on the edges throw their folders over the edge and these are lost.

The graph shows the average folders per desk over time. The sharp drops are in effect "Avalanches".

Another reason is that  I am dynamically adjusting how much work I do before each requestAnimFrame to try and use about 80% of the available time. Normally this would be consider an "unfriendly" behavior by a web page but this page is specifically designed to find out how fast a computational simulation can be run. In a real version of this app the user would select what speed, size, and resolution to run the model -- and if they chose a large complex model to run fast presumably they are OK with the larger power usage that would entail.

I just updated my Nightly and I measured 5549/11136 on my Mac OS X 10.6.8, 2.66 GHz Core i7 system. Chrome 14 yesterday was: 9578/12594. I hadn't measure Nightly for a few days and it seems much closer to Chrome than before ...

To me this new data indicate that the computational performance is close to Chrome and you are correct that the slowdown appears no to be more with the Canvas performance.

While the graphing framework uses D3.js (which uses SVG) the "real-time graphing" is to a Canvas object that is layered over the plotting section of the SVG graph. This is used while the model is running. When the model stops the Canvas object is placed behind the SVH graph and the SVG graph is re-drawn. 

The graphics performance can be further distinguished by running the model with "Visualization" and "Graph" on and then with just the "Visualization" on and then finally with just the Model running

Here's the data I just collected using this pattern:

  Nightly:  6975/8802/10638
  Chrome Canary: 10183/11682/13089

I think I am done for a while making performance changes to the application.

I was quite surprised at how much performance I was able to get.

This graph shows the performance change over the last two weeks. The Y axis is a log scale!

https://img.skitch.com/20111023-q8rkqfkhynifqhtj6yefm86aaw.jpg

If it would be helpful I could also put up a static copy of the current site that wouldn't change.
(Reporter)

Comment 5

6 years ago
(In reply to Boris Zbarsky (:bz) from comment #2)
> Looks like the window.self gets all happen from this sort of pattern in the
> grapher:
> 
>   y.call(self, pts[0].y, 0)
> 
> Is there a reason that's using "self" and not just "window" or better yet
> "null" as the first argument to call() there?

No reason other than that was the same form used in the D3.js example I based the initial work on.
> I am dynamically adjusting how much work I do before each requestAnimFrame to try and
> use about 80% of the available time.

Hmm.  How are you determining the "available time"?  Just new Date() and seeing until some threshold is passed?  If so, which threshold?

> No reason other than that was the same form used in the D3.js example

Ah.  Passing either "window" or "null" would be somewhat faster, for what it's worth.

Did you see comment 3?
(Reporter)

Comment 7

6 years ago
(In reply to Boris Zbarsky (:bz) from comment #3)
> The other interesting thing is the "Animation loop timing measurements for
> this browser" number, which is rather different for us and Chrome.  What is
> that measuring?

I'm still learning about how requestAnimFrame works and how best to do this -- the intent is to use a large percentage of the available cycles but not all of them -- and to dynamically adjust this. I have the impression (w/o direct evidence yet) that running a browser on a phone or tablet might not call back every 16ms.

One hypothesis is that this algorithm ends up using a smaller percentage of the cpu time available than when the same code is running on Chrome.

My usecase might be a good example where setImmediate:https://github.com/NobleJS/setImmediate would be useful.

Here's what I wrote a couple of days ago about this question.

My latest implementation is designed to render during every screen re-fresh -- and only render what is necessary for every screen refresh.

It's definitely makes sense to only do the work to render what can be seen at the maximum screen refresh rate.

To get the line graph to appear I still render it for every model step.

However for the canvas visualization of the folder-count on the array of desks I am only rendering that once per screen re-paint.

There a bunch of fiddling around to do the maximum amount of work in one screen repaint cycle.

I dynamically adjust the maximum time I spend in the runModelLoop to try and always end at about  70-80% of the interval between a screen refresh.

Nominally the screen refresh rate is 60 Hz, about 17ms.

The browser can choose to make this interval longer. I think this is likely to happen on phones and tablets. It will also happen when the web page is in a tabbed window that is not being displayed (in this situation my algorithm for dynamically adjusting will be more greedy than a good web citizen should be).

Here's some data from a run in Chrome:


  avalanches: 5000
  rate: 9041.6 (steps/s)
  last sample time: 0 (ms)
  loop compute maximum: 20
  anim loop timing: 23
  folders: 2.122 (ave)

The variables that pertain to this discussion are:

  loop compute maximum: 20

This is what the last model loop time took. A model loop time always includes one and usually more than one model step.

  anim loop timing: 23

This is the most recent time between between callbacks from the browser to my code that executes the runModel loop.

So at the end of that run I was:

  * Computing and rendering in my runModelLoop until 20 ms were up.
  * Returning from my runModel loop to the browser.
  * Then getting called 3 ms later to start the runModelLoop all over again.

And I was rendering at 44 times/s (why this is less than 60 I'm not certain right now).

This is tricky because there are model steps which take more than one screen refresh interval -- this happens in big avalanches.
(Reporter)

Comment 8

6 years ago
(In reply to Boris Zbarsky (:bz) from comment #6)
> > I am dynamically adjusting how much work I do before each requestAnimFrame to try and
> > use about 80% of the available time.
> 
> Hmm.  How are you determining the "available time"?  Just new Date() and
> seeing until some threshold is passed?  If so, which threshold?

Yes.

See: measureAnimmationLoop()
  https://github.com/stepheneb/avalanche2d-js/blob/master/avalanche2d.html#L508

which calls: runModelLoop();
  https://github.com/stepheneb/avalanche2d-js/blob/master/avalanche2d.html#L469

Which runs the modelStep once and if the elapsed time is less than model_loop_time runs the model step again (w/o the rendering the Canvas visualization again.

> > No reason other than that was the same form used in the D3.js example
> 
> Ah.  Passing either "window" or "null" would be somewhat faster, for what
> it's worth.
> 
> Did you see comment 3?

yes ...  typing as fast as I can ;-)

Thanks for your help!
(Reporter)

Comment 9

6 years ago
(In reply to Stephen Bannasch from comment #8)
> (In reply to Boris Zbarsky (:bz) from comment #6)
> > > I am dynamically adjusting how much work I do before each requestAnimFrame to try and
> > > use about 80% of the available time.
> > 
> > Hmm.  How are you determining the "available time"?  Just new Date() and
> > seeing until some threshold is passed?  If so, which threshold?
> 
> Yes.
> 
> See: measureAnimmationLoop()
>  
> https://github.com/stepheneb/avalanche2d-js/blob/master/avalanche2d.html#L508



That a red herring!  only used at startup for informational purposes

modelGo();
https://github.com/stepheneb/avalanche2d-js/blob/master/avalanche2d.html#L381

Is where the process starts and it sets up a callback to runModelLoop();
(Reporter)

Comment 10

6 years ago
this is where the the model_loop_goal is dynamically adjusted:

https://github.com/stepheneb/avalanche2d-js/blob/master/avalanche2d.html#L489
(Reporter)

Comment 11

6 years ago
Computational performance plays a larger factor than than it might appear,

There's a non-linearity in the system because I am only rendering the Canvas visualization once for every time the runModelLoop() is called by the requestAnimFrame() callback.

So if 5000 model-steps with just the Canvas visualization on takes one second then avalanche2d.displayFolderCanvas(); will execute 60 times.

If avalanche2d.displayFolderCanvas(); takes 5ms then 60 take 300ms. So that means (discounting the time spent not in the callback) that the model itself on average takes 700ms/5000 => 140 us.

What if the model instead averages 170us?

170us * 5000 = 850ms 

Add to that the initial 60 avalanche2d.displayFolderCanvas(); executions for 300ms and you get 1250ms ... but because we're spending another 250ms (at least) we need to add in another 15 avalanche2d.displayFolderCanvas(); executions for another 75ms. So that leaves an intermediate total of 1375.

So with a 5ms estimate for the Canvas visualization a 21% increase in the time it takes to run the model results in a 38% increase in the time it takes to run 5000 model steps.

I don't actually know how long displayFolderCanvas(); takes -- that would be interesting info.
(Reporter)

Comment 12

6 years ago
(In reply to Stephen Bannasch from comment #11)
> executions for another 75ms. So that leaves an intermediate total of 1375.

basic math error ...

actually : 1325 => 33%
> There's a non-linearity in the system

Thank you for doing that analysis!  That's what I was sort of after with trying to understand the way the throttling works.

So yeah, seems like having slightly slower JS performance could just lead us to have lots more canvas ops (which may also be slower, of course)....
(Reporter)

Comment 14

6 years ago
I've updated the deployed code so if you are running the model by itself no callbacks are used. This removes a variable from the measurements for just running the model alone.

Here are some updated measurements:

Avalanche2D-JS Benchmarks

Model-steps/s over 5000 model-steps, visualization and graphing ON                              average   comparison
--------------------------------------------------------------------------------------------------------------------
Chrome 17.0.920.1 canary              9345.8      9523.8      9276.4      10040.2     10204.1     9678        100%
Chrome 14.0.835.186                   8210.2      9009        9058        9276.4      9191.2      8949        92%
FireFox Nightly 10.0a1 (2011-10-28)   5861.7      6784.3      7042.3      7278        7052.2      6804        70%
FireFox 7.0.1                         2161.7      2395.8      2356.3      2343        2401.5      2332        24%

Model-steps/s over 5000 model-steps, visualization and graphing OFF                             average   comparison
---------------------------------------------------------------------------------------------------------------------
Chrome 17.0.920.1 canary              15822.8     15723.3     16129       16077.2     16233.8     15997       100%
Chrome 14.0.835.186                   15243.9     14619.9     15151.5     14970.1     14925.4     14982       94%
FireFox Nightly 10.0a1 (2011-10-28)   12626.3     12787.7     13089       12658.2     12468.8     12726       80%
FireFox 7.0.1                         3728.6      3776.4      3742.5      3652.3      3663        3713        23%

page:      http://stepheneb.github.com/avalanche2d-js/avalanche2d.html                                                                         
commit:    https://github.com/stepheneb/avalanche2d-js/commit/f8c4f24df                                                                        
date:      10/28/11                                                                     
computer:  Mac OS X 10.6.8, PowerBook Pro, 2.66 GHz Intel Core i7
Actually, I'm going to split this into two bugs given that data.  Leaving this as the canvas bug; going to file a separate jseng bug.
Summary: Slow canvas and simulation performance on this testcase → Slow canvas performance on this testcase
(Reporter)

Comment 17

6 years ago
I've updated the code a bit.

Basically I made the canvas rendering part of the avalanche2d.model object and am keeping a persistent copy of the canvas, context, ImageData, and CanvasPixelArray. I am also pre-rendering the alpha values in the CanvasPixelArray so that these bytes don't need to be written during regular rendering.

  https://github.com/stepheneb/avalanche2d-js/commit/608030b

There is now a google doc with updated timings here:

https://docs.google.com/spreadsheet/ccc?key=0AtvlFoSBUC5kdDM4cHJhU0I4NGg5TjlWRkQ0ZExpcGc

In general an increase of about 10% when running model, visualization, and grapher.
You need to log in before you can comment on or make changes to this bug.