really slow on earthmodel testcase/benchmark

RESOLVED WORKSFORME

Status

()

RESOLVED WORKSFORME
8 years ago
6 years ago

People

(Reporter: vlad, Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 1 obsolete attachment)

From a WebGL mailing list post:

----
Using JavaScript Typed Arrays: 
  http://visual-demos.dev.concord.org/seasons/earth/model2d.html

Using regular JavaScript Arrays:
  http://visual-demos.dev.concord.org/seasons/earth/model2d-reg-arrays.html

                                      steps per second
browser/version                       Regular Arrays       Typed Arrays
------------------------------------------------------------------------
Minefield 4.0b12pre (2011-02-22):     17.2                  13.3
WebKit 79303:                         15.5                  27.5
Chrome 9.0.597.102                   67.6                  42.2
Chrome 10.0.648.82 beta              109.8                  38.4

I ran my tests on a MacBook Pro, Mac OS X 10.6.6, Intel Core i& 2.66 GHz

The code is available here: https://github.com/stepheneb/seasons

More specifically:
  https://github.com/stepheneb/seasons/blob/master/earth/model2d-reg-arrays.html
  https://github.com/stepheneb/seasons/blob/master/earth/model2d.html
---

Note that having typed arrays be slower is suspect to begin with...
Looking into this.  First off, the testcase uses:

    paintInterval = setInterval("renderModelStep()", 0);

for benchmarking.  additionally, inside renderModelStep(), it sets innerHTML per step to a string.  None of that is helping us.  Changing things to use postMessage didn't change the numbers much.  I wrote a simple benchmark function in model2d.html:

	  function do_benchmark() {
	      var model = new model2d.Model2D();

	      var t0 = (new Date()).getTime();
	      for (var i = 0; i < 100; ++i) {
		  model2d.addHotSpot(model, 50.0);
		  model.nextStep();
	      }
	      var t1 = (new Date()).getTime();

	      step_count.innerHTML = '100 steps: ' + (t1 - t0) + 'ms';
	  }

tie it to a button somewhere.   This gives me 3000 in Minefield, and 2500 in Chrome 10 -- the 2500 number is ~40fps, which fits well with 38fps above (because it's not doing setInterval or the innerHTML set inside the hot loop).  Minefield is at ~33fps, which is over double what I get with the original benchmark (~15/~16), leading me to believe that the innerHTML set is killing us the most.

Ok, now on to why we're at 3000.  addHotspot() does a very simple loop with some misc math and array setting.  The meat is in nextStep().  Note that sunShine() calls shootAtAngle() which has an early return before all the meat -- e.g. no Photon constructors in loops etc.

A lot of the meat is in fluidsolver.solve() -- commenting that out, I get 924ms with Minefield, and 1168ms with Chrome10 (that is, we end up being faster, and 2/3 of the time is in fluidsolver).

Both those functions seem to be largely math and array sets/gets, and some property gets from |this|.  I don't have a debug build, but I wonder if we're just aborting due to too long of a trace?
Created attachment 514432 [details]
standalone benchmark

standalone benchmark.  can be trivially turned into shell testcase if needed.
(Also, our first run is generally ridiculously slow -- more indication of too-long tracer abort?)
Framerate goes from 14 to 23 after enabling methodjit_always (or disabling the tracer). Maybe a regression from bug 631951. Seems bad.
In the shell this is even more dramatic:

-m -j   : 100 steps: 6187ms (16 fps)
-m -j -a: 100 steps: 3262ms (31 fps)
Two problems:

1) JM does not compile JSOP_CASE. In function FluidSolver2D.prototype.applyBuoyancy we have cases like this: case model2d.BUOYANCY_AVERAGE_COLUMN. This is bug 628073.

Working around this wins 6 fps with -m only (from 31 to 37)

2) This JM abort makes mjp much slower than mjpa. Without the abort there is not much difference.
Created attachment 514444 [details]
Modified shell test case

For the attached file:

-m -j -p   : 38 fps
-m -j -p -a: 93 fps

After commenting out line 939:

-m -j -p   : 93 fps
-m -j -p -a: 93 fps

Does not slow down when working around the JM abort by changing model2d.BUOYANCY_AVERAGE_ALL to 0 and model2d.BUOYANCY_AVERAGE_COLUMN to 1.
I noticed that, in Jan's tests case, there is also a huge 32-bit/64-bit difference. When running it with -m -j -p -a, I get these numbers:
  32-bit: 44fps
  64-bit: 98fps
These are for the same machine, of course. Normally we're faster on 32-bit. I guess we generate better typed array code on 64-bit? Or maybe it's register allocation?

I'll look into the -a difference, which seems unrelated.
Depends on: 636219
I filed bug 636219 for the -a/no -a issue. There's a patch for it in that bug.
No longer depends on: 636219
Depends on: 636219
I like 93fps! How do we get 93fps?
Can you confirm that bill's fix in bug 636219 gives us the good FPS?
(In reply to comment #10)
> I like 93fps! How do we get 93fps?

I hacked the test case a bit to locate the problem.. But still, the patch should help a lot.
Do we need a separate bug on the innerHTML issue?  It shouldn't generally take 16ms to set innerHTML once!
Created attachment 516196 [details]
Shell version of benchmark

Bug 636219 has been fixed, attaching unmodified shell test case. Bug 628073 will win another 23%
Attachment #514444 - Attachment is obsolete: true
Blocks: 467263

Comment 15

8 years ago
On my MacBook Pro the speed I get running Vladimir's stand-alone benchmark (https://bug636096.bugzilla.mozilla.org/attachment.cgi?id=514432) has increased from 22 to 31 model steps per second.

Comment 16

8 years ago
Interesting that after updating to the latest Minefield Vladimir's benchmark  slowed by 25% on my MacBook Pro 10.6.7

https://bug636096.bugzilla.mozilla.org/attachment.cgi?id=514432

Minefield 4.0pre23                      100 steps: 2452ms (41 fps)
Minefield 6.0a1 (2011-04-21) 100 steps: 3114ms (32 fps)
(In reply to comment #16)
> Interesting that after updating to the latest Minefield Vladimir's benchmark 
> slowed by 25% on my MacBook Pro 10.6.7
> 
> https://bug636096.bugzilla.mozilla.org/attachment.cgi?id=514432
> 
> Minefield 4.0pre23                      100 steps: 2452ms (41 fps)
> Minefield 6.0a1 (2011-04-21) 100 steps: 3114ms (32 fps)

Would you be able to bisect that using Tracemonkey nightly builds?
Or even just m-c builds; I'm seeing no perf change between 2011-04-01 and now, and both are 30% faster than 2011-03-01 builds.
TI does really well here. Just added the regular array version to awfy-assorted to make sure it won't regress. I had to reduce the number of steps to make it run faster, does not otherwise affect the results.

-m -n   :  157 ms
-m      :  770 ms
-m -j -p:  770 ms
d8      :  814 ms
-j      : 3480 ms
We're not in tracer-territory, speed-wise, but we're a lot faster than the competition:

Web:
Safari 6.0.5:            100 steps: 1922ms (52 fps)
Chrome 30.0.1568.2 dev:  100 steps: 521ms (192 fps)
Nightly 2013-07-22:      100 steps: 382ms (262 fps)

Shell:
jsc (from above Safari): 100 steps: 1988ms (50 fps)
d8 3.18.5:               100 steps: 369ms (271 fps)
SpiderMonkey:            100 steps: 362ms (276 fps)


I'm going to declare victory.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WORKSFORME

Comment 21

6 years ago
We've integrated my initial spike developing a JavaScript version of Energy2D into our Lab framework: http://lab.concord.org

Here are two Interactives running different Energy2D models in the Interactive Browser page.

Benard Cell (mainly convective solver):
http://lab.concord.org/interactives.html#interactives/energy2d/imported/benard-cell.json

Material Conduction (only conductive solver)
http://lab.concord.org/interactives.html#interactives/energy2d/htb/S3A1.json

Near the bottom of the Interactive Browser page is a tool for running benchmarks for the selected Interactive.

The benchmark tool produces data for rate in steps/s for graphics (just rendering), model-only, model+graphics, and fps (model+graphics integratinged with the browser anim frame callback). There is one additional column "fps-webgl". We've also implemented the physics solvers and renderers in WebGL.

Results: Chrome 30.0.1572.0 is about 4-5 times faster than Firefox Nightly 25.0 (2013-07-23)

http://lab.concord.org/interactives.html#interactives/energy2d/imported/benard-cell.json

Web:

browser                      model (steps/s)    fps      fps-webgl
----------------------------------------------------------------------
Chrome 30.0.1572.0 canary:   30.6               27.5     25.8
Nightly 25.0 (2013-07-23):    4.8                5.3     19.3

http://lab.concord.org/interactives.html#interactives/energy2d/htb/S3A1.json

Web:

browser                      model (steps/s)    fps      fps-webgl
----------------------------------------------------------------------
Chrome 30.0.1572.0 canary:   91.3               46.3     44.5
Nightly 25.0 (2013-07-23):   22.3               14.5     24.5

NOTE: these tests were run on a 2010 MacBook Pro. Chrome 30.0.1572.0 no longer supports WebGL on this computer so the fps-webgl data reported for Chrome are actually just a second test for the JavaScript solvers and renderers.

Should I make a new performance issue for these results?
Interesting, thanks for the update. It would be great if you could file two new bugs: one for the webgl part, and one for the js part. These are most likely interesting to different teams within Mozilla. (Please CC me.)

As a quick note, my results on a 2012 rMBP@2.7Ghz are quite different from yours:

benard-cell:
browser                      graphics  model  model+graphics  fps  fps-webgl
----------------------------------------------------------------------------
Chrome 30.0.1568.2 dev:      4173.9    44.9   45.3            20.5     58.3
Nightly 25.0 (2013-07-23):   3125.0    40.3   40.4            34.5     60.0


S3A1:
browser                      graphics  model  model+graphics  fps  fps-webgl
----------------------------------------------------------------------------
Chrome 30.0.1568.2 dev:      2779.9    138.3  143.7           45.3     57.0
Nightly 25.0 (2013-07-23):   1398.5    121.9  111.7           34.5     60.0


I find it a bit puzzling that we're slower at graphics, model and model+graphics, but get higher fps. I don't fully understand what's being tested, though, so that might have an easy explanation.
You need to log in before you can comment on or make changes to this bug.