Last Comment Bug 508716 - Fluid dynamics simulator is slow
: Fluid dynamics simulator is slow
Status: RESOLVED FIXED
: meta
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal with 2 votes (vote)
: ---
Assigned To: general
:
Mentors:
http://nerget.com/fluidSim/
Depends on: 505591 510554 516264 536564 542002 592631 593270 593342
Blocks: 467263
  Show dependency treegraph
 
Reported: 2009-08-05 18:26 PDT by Boris Zbarsky [:bz]
Modified: 2011-09-08 22:15 PDT (History)
23 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
First JS file (9.84 KB, text/plain)
2009-08-05 18:29 PDT, Boris Zbarsky [:bz]
no flags Details
Second js file (3.71 KB, text/plain)
2009-08-05 18:29 PDT, Boris Zbarsky [:bz]
no flags Details
Testcase HTML (6.62 KB, text/html)
2009-08-05 18:30 PDT, Boris Zbarsky [:bz]
no flags Details
patch to test (24.35 KB, patch)
2010-01-25 00:35 PST, Brendan Eich [:brendan]
no flags Details | Diff | Splinter Review
Shell test case (19.33 KB, text/plain)
2010-09-01 18:47 PDT, David Mandelin [:dmandelin]
no flags Details

Description Boris Zbarsky [:bz] 2009-08-05 18:26:48 PDT
I get 12-13fps in t-m; Safari 4 gets closer to 40fps.

Stats:

recorder: started(116), aborted(104), completed(15), different header(1), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(4), blacklisted(26)
monitor: triggered(27), exits(27), type mismatch(0), global mismatch(4)

Typical abort:

abort: 6918: fp->scopeChain is not global or active call object
Abort recording of tree http://nerget.com/fluidSim/pressure.js:290@137 at http://nerget.com/fluidSim/pressure.js:291@138: name.
Comment 1 Boris Zbarsky [:bz] 2009-08-05 18:29:08 PDT
Created attachment 392846 [details]
First JS file
Comment 2 Boris Zbarsky [:bz] 2009-08-05 18:29:53 PDT
Created attachment 392847 [details]
Second js file
Comment 3 Boris Zbarsky [:bz] 2009-08-05 18:30:39 PDT
Created attachment 392848 [details]
Testcase HTML
Comment 4 Boris Zbarsky [:bz] 2009-08-05 18:35:31 PDT
Oh, and a profile shows 55% of the time spent in (not under) js_Interpret.  Another 11% allocating doubles, another 15% on js_NativeGet, esp with Call objects.

That adds up to 80% or so.  17% is painting, and 3% is sort of here and there (2/3 of that is minor stuff under js_Interpret, the remaining 1/3 is event loop overhead, I think).

In any case, this is definitely in the right component, at least until we get this tracing...
Comment 5 Boris Zbarsky [:bz] 2009-08-10 20:49:52 PDT
I also see a lot of:

Abort recording of tree https://bug508716.bugzilla.mozilla.org/attachment.cgi?id=392847:47@96 at https://bug508716.bugzilla.mozilla.org/attachment.cgi?id=392847:49@109: Inner tree is trying to grow, abort outer recording.

Line 49 is:

   var d = field.getDensity(x, y) * 255 / 5;

where |field| comes from this:

        displayFunc(new Field(dens, u, v));

and Field has a getDensity function.  I have no idea why that's causing us trace aborts; perhaps something akin to bug 497789?
Comment 6 Boris Zbarsky [:bz] 2009-08-11 22:12:20 PDT
Or some sort of branding thing going on, perhaps?  Brendan?  The relevant guard seems to be :

    About to try emitting guard code for SideExit=0x17bb1ac exitType=BRANCH
    shape = ld map[32]
    #0x8756
    guard_kshape = eq shape, #0x8756
    xf1512: xf guard_kshape -> pc=0x160ded9 imacpc=0x0 sp+8 rp+0

The other branch exit I see some of is:

Abort recording of tree https://bug508716.bugzilla.mozilla.org/attachment.cgi?id=392846:62@120 at https://bug508716.bugzilla.mozilla.org/attachment.cgi?id=392846:64@138: Inner tree is trying to grow, abort outer recording.

Line 64 is:

  var lastRow = (j - 1) * rowSize;

Relevant guard:

    About to try emitting guard code for SideExit=0x17478e0 exitType=OVERFLOW
    eq134 = eq mul19, callDepth
    xt165: xt eq134 -> pc=0x16156d6 imacpc=0x0 sp+16 rp+0
Comment 7 Andreas Gal :gal 2009-08-11 23:46:22 PDT
The 2nd guard looks like upvar access. Paging dmandelin.
Comment 8 David Mandelin [:dmandelin] 2009-08-12 17:38:38 PDT
'Inner tree is trying to grow, abort outer recording.' is usually not a problem. In the normal case, we back off from tracing the outer loop for 32 iterations, by which time the inner tree hopefully has stabilized. We should be able to get the outer loop after one or a few rounds of that.

The abort in comment 0 looks like the returned closures abort.
Comment 9 Boris Zbarsky [:bz] 2009-08-12 19:08:56 PDT
Yes, the data in comment 6 is with the returned closures patch applied (which makes the abort in comment 0 go away).

In this case, we don't stabilize as far as I can see; I'm seeing the trying to grow aborts all through the log, and it's certainly running long enough that the outer tree ends up blacklisted as a result of all the aborts.
Comment 10 David Mandelin [:dmandelin] 2009-08-12 19:22:36 PDT
Hmmm. One thing I thought of is that we might generate so much code that we end up OOMing the code buffer and throwing it all away, but it sounds like you are saying that's not happening. I wonder if the inner loop code is so branchy that it can't stabilize in the number of backoffs we give. Based on:

    About to try emitting guard code for SideExit=0x17bb1ac exitType=BRANCH
    shape = ld map[32]
    #0x8756
    guard_kshape = eq shape, #0x8756
    xf1512: xf guard_kshape -> pc=0x160ded9 imacpc=0x0 sp+8 rp+0

This looks like a guard from a property access. If a zillion shapes go through, then we would try to grow a lot of branch exits.
Comment 11 Boris Zbarsky [:bz] 2009-08-12 19:44:55 PDT
Yeah, that was my gut reaction too.  Hence wondering about bug 497789 or branding or some such...
Comment 12 Boris Zbarsky [:bz] 2009-12-12 02:44:19 PST
dvander, how does trace merging behave on this one?
Comment 13 Brendan Eich [:brendan] 2010-01-21 11:58:21 PST
The fix for bug 536564 seems to have helped quite a bit: with the latest tm nightly I get >30fps with black canvas and >26fps upon stirring up some green.

Boris, can you regen some stats? Or perhaps someone can make a shell testcase. wish (along with Oliver) we had TraceVis built in...

/be
Comment 14 Brendan Eich [:brendan] 2010-01-21 11:59:49 PST
(In reply to comment #11)
> Yeah, that was my gut reaction too.  Hence wondering about bug 497789

No .prototype in either .js file, so 497789 is not relevant.

> or branding or some such...

Indeed bug 536564 was all about unbranding to despecialize module pattern and power constructor (if that's what it is called) pattern code.

/be
Comment 15 Brendan Eich [:brendan] 2010-01-21 12:14:34 PST
I should probably give this bug to someone. Volunteers?

/be
Comment 16 Boris Zbarsky [:bz] 2010-01-21 12:42:12 PST
I don't see any impact from bug 536564's fix on this testcase in terms of fps.  I see about 22-30fps (depending on amount of green) both before and after that fix here.  I'll try to get some stats.
Comment 17 Boris Zbarsky [:bz] 2010-01-21 12:48:01 PST
I'm still seeing the abort from comment 5 on t-m tip.
Comment 18 Boris Zbarsky [:bz] 2010-01-21 13:01:56 PST
And in particular, it's not clear to me why bug 536564 would lead to less branding here, offhand...
Comment 19 Brendan Eich [:brendan] 2010-01-21 13:33:40 PST
(In reply to comment #18)
> And in particular, it's not clear to me why bug 536564 would lead to less
> branding here, offhand...

brendan: bz: we see this.foo = function... where the function is not a null closure, we unbrand this
brendan: first thing in the [Field] ctor


Comment 5 does not demonstrate shape regeneration per new Field, and I confirmed via dis('-r', FluidField) that Field unbrands |this|. Need to see what guards are exiting trace.

/be
Comment 20 Boris Zbarsky [:bz] 2010-01-21 13:41:20 PST
OK.  So if I look at the trace exits, the vast majority are one of:

leaving trace at http://nerget.com/fluidSim/pressure-display.js:49@124, op=call, lr=0x16a439c, exitType=BRANCH, sp=4, calldepth=0, cycles=0
leaving trace at http://nerget.com/fluidSim/pressure-display.js:49@115, op=callprop, lr=0x1652224, exitType=BRANCH, sp=1, calldepth=0, cycles=0
leaving trace at http://nerget.com/fluidSim/pressure-display.js:49@124, op=call, lr=0x16b3c04, exitType=BRANCH, sp=4, calldepth=0, cycles=0
leaving trace at http://nerget.com/fluidSim/pressure-display.js:49@115, op=callprop, lr=0x169c4a4, exitType=BRANCH, sp=1, calldepth=0, cycles=0

The relevant side exits seem to be:

    About to try emitting guard code for SideExit=0x16a439c exitType=BRANCH
...
    callee_obj->getPrivate() = int 357565760
    ld1 = ld $stack14[16]
    eq1 = eq ld1, callee_obj->getPrivate()
    xf1: xf eq1 -> pc=0x1637ae8 imacpc=0x0 sp+32 rp+0 (GuardID=001)
    About to try emitting guard code for SideExit=0x16a439c exitType=BRANCH
    OBJ_GET_PARENT(cx, callee_obj) = int 357591200
    ld2 = ld $stack14[12]
    eq2 = eq ld2, OBJ_GET_PARENT(cx, callee_obj)
    xf2: xf eq2 -> pc=0x1637ae8 imacpc=0x0 sp+32 rp+0 (GuardID=002)


    About to try emitting guard code for SideExit=0x1652224 exitType=BRANCH
    shape = ld map[4]
    37285 = int 37285
    guard_kshape = eq shape, 37285
    xf3: xf guard_kshape -> pc=0x1637adf imacpc=0x0 sp+8 rp+0 (GuardID=003)

0x16b3c04 is the function-identity and callee parent guard again, and 0x169c4a4 is another kshape guard.

This is all on t-m tip.
Comment 21 Boris Zbarsky [:bz] 2010-01-21 13:42:28 PST
When I say "vast majority", btw, of 135k total exits, about 91k are on the callee_obj guard and about 37k are on the kshape guard.
Comment 22 Boris Zbarsky [:bz] 2010-01-21 13:57:35 PST
OK.  The callee guard that fails is the Call object identity guard.  See TraceRecorder::guardCallee the second guard it puts in.

If I comment out that code block I see a bit of a win on the testase (maybe from 30fps to 36fps) and BRANCH exits more or less disappear from the log (283 exits out of a total of 10500).
Comment 23 Boris Zbarsky [:bz] 2010-01-21 14:29:59 PST
With that code block commented out, profile looks like:

30% painting
58% under js_Interpret, almost all of it on trace.  None inside js_Interpret.
4% vm_fault stuff


Of the time under js_Interpret, about half is actually jit-generated code.  The rest is js_DoubleToInt32 (10% of overall time), js_Array_dense_setelem_double (10% of overall time, about half creating new gcobjects), 6% js_UnboxDouble.  Everything else is minor.

With the code block left in, however, time in js_Interpret is 5.5%.  Time under js_Interpret is 62%.  So dropping this guard and not exiting on it would give us something like 10% faster performance (which is about reflected in the "just slightly higher" fps).

To win more than that we need to at least stop all that double-to-int-and-back thrashing.  And speed up the painting part (need separate bugs).
Comment 24 Boris Zbarsky [:bz] 2010-01-21 15:01:34 PST
OK.  That parent guard is bug 510554.
Comment 25 Brendan Eich [:brendan] 2010-01-25 00:35:52 PST
Created attachment 423311 [details] [diff] [review]
patch to test

This improves the all-or-nothing nature of flat closures by making an escaping function (funarg) flat if any of its upvars is invariant with initialization that dominates the closure. Any upvars not flattened must be accessed via Call objects but this wins (big, I think) over bailing to the all-upvars-need-Call-homes state.

I will clean up and incorporate any testing feedback before soliciting review.

/be
Comment 26 Boris Zbarsky [:bz] 2010-01-25 07:48:29 PST
That patch may help a bit.  Hard to tell; would need to profile; not a high priority right this second.  But it should definitely go into a bug that blocks this one, not in this tracking bug.
Comment 27 Brendan Eich [:brendan] 2010-01-25 09:16:48 PST
(In reply to comment #26)
> That patch may help a bit.

With the parent guard elimination that depends on the changes in this patch, it should help a lot. But I need to add the elimination logic ;-)).

> ... it should definitely go into a bug that blocks
> this one, not in this tracking bug.

Will file, thanks for reminding me.

/be
Comment 28 Brendan Eich [:brendan] 2010-01-25 14:44:32 PST
Comment on attachment 423311 [details] [diff] [review]
patch to test

See bug 542002.

/be
Comment 29 Brendan Eich [:brendan] 2010-08-31 17:47:53 PDT
dmandelin got 48fps today (on tm tip, I think -- JaegerMonkey landed there earlier today). Want to get up to high 50s like the competition. ;-)

/be
Comment 30 Boris Zbarsky [:bz] 2010-08-31 20:07:24 PDT
Over here, after the merge, I see 38fps on TM, and about 49-50 in Chrome and Safari.  Opera latest is at 54.

Doing a profile now.
Comment 31 Boris Zbarsky [:bz] 2010-08-31 21:38:37 PDT
Sorry that took a bit; shark seems to need the frame pointer to produce useful calltree output, so I had to recompile first.

Looks like we're at about (on Mac; on Windows/Linux this might all be different because painting is so much of the profile):

55% painting (argb32_mark and argb32_image for the most part; mostly painting
    the canvas layer, but some painting of some thebes layer too, and a few
    percent in display list construction and such).
 5% Gecko event processing (I was mousing about and whatnot)
40% JS engine.

The JS engine time splits up as follows (all percentages that follow are percentages of JS engine time):

86.6% in jm-generated symbol-less code.
 5.2% under stubs::BitOr (half self time, half calling ValueToECMAInt32Slow)
   2% under PutImageData
 1.2% under stubs::SetElem (looks like for the image data)

Oh, all the above was without supervisor callstacks.  If I put those back I get the usual 3.5% of vm_fault and whatnot.

Upshot:  The only way to make this faster is to generate something different codewise on the JM end and/or make the painting faster.

Unfortunately, shark is not all that much use for the jit-generated stuff; it seems to have one entry per instruction or something, for the most part...  The most common ones are (all self times, obviously):

1.7% add byte ptr [eax], al
1.5% jmp 0x4c86e883
1.5% jmp 0x1886e987
1.4% jmp 0x7486e22f
1.4% jmp 0xb886ea4d
1.3% a 9.5KB chunk of code.  Looks like your typical code; lots of cmp, jmp,
     mov.
1.2% sldt dword ptr [eax]
1.1% sldt dword ptr [eax]
1.0% jmp 0xb086e4fa
0.7% mov dword ptr [eax], eax

Anyway, the upshot is that we have jmps, we presumably have context switches (the sldt), and that's about all I can tel from this tool.
Comment 32 Oliver Hunt 2010-08-31 22:11:59 PDT
I suspect the bitor is '<expr> | 0' to convert a double to int, in which case simply exiting to the slowcase is probably very expensive relative to simply doing cvttsd2si (and associated checks)
Comment 33 Jan de Mooij [:jandem] (PTO until July 31) 2010-09-01 02:41:49 PDT
(In reply to comment #32)
> I suspect the bitor is '<expr> | 0' to convert a double to int, in which case
> simply exiting to the slowcase is probably very expensive relative to simply
> doing cvttsd2si (and associated checks)

Thanks, I will file a bug.
Comment 34 Boris Zbarsky [:bz] 2010-09-01 06:29:06 PDT
The patch in bug 592631 gets us to 42fps or so on my hardware (depending on how much I move the mouse; can go as low as 37 and as high as 44).
Comment 35 David Mandelin [:dmandelin] 2010-09-01 12:37:04 PDT
I Sharked Safari on this and got 47% of time in JavaScriptCore. Using your 50fps measurement, that corresponds to a JS run time of 9.4ms/frame.

Our "old" (before the x|0 patch) score of 38 fps corresponds with 40% of time in JS gives a JS run time of 10.5ms/frame.

I estimate our JS run time to be 8.8-9.2ms/frame after the x|0 patch, depending on whether I (a) use the fact that we spent 5.2% there before and compute 10.5ms/frame*(1-5.2/40) or (b) use the new fps of 42 with an estimate new JS fraction of 37%.

Anyway, at this point it seems more likely that we can improve our score by making the non-JS parts faster. But we should still make a shell benchmark and do a more direct comparison of the JS times to see if there we are taking any direct losses there.
Comment 36 David Mandelin [:dmandelin] 2010-09-01 18:47:59 PDT
Created attachment 471373 [details]
Shell test case

Here's a shell version. The timing loop is all the way at the bottom if you want to tweak it.
Comment 37 David Mandelin [:dmandelin] 2010-09-02 12:42:57 PDT
Results from the shell benchmark, time taken to render each frame (in a run of 1000 frames), all 32-bit builds on OSX10.6:

  TM tip                6.8 ms
  TM tip + bug 592631   6.5 ms
  JSC                   7.0 ms
Comment 38 Jan de Mooij [:jandem] (PTO until July 31) 2010-09-02 12:46:07 PDT
One problem here is that x/y in JM and JSC does not convert the result to int32 if it's 0. V8 does division differently (they try idiv first), so they end up with ints everywhere which is obviously faster. I'm working on a patch to speedup JSOP_DIV by making our double-to-int logic work when the result is 0.
Comment 39 Boris Zbarsky [:bz] 2010-09-02 12:49:18 PDT
The problem is that there are two different 0s, right?  So you have to be a little careful with that....

In any case, dmandelin's numbers suggest that painting is the place we really need to improve here at this point.
Comment 40 Jan de Mooij [:jandem] (PTO until July 31) 2010-09-02 12:55:04 PDT
(In reply to comment #39)
> The problem is that there are two different 0s, right?  So you have to be a
> little careful with that....
Yes..

> In any case, dmandelin's numbers suggest that painting is the place we really
> need to improve here at this point.
Sure. I don't know how much this jsop_div thing helps. dmandelin's numbers show that SM is fast enough here. Maybe my patch is not worth it and I will just drop it then.
Comment 41 Brendan Eich [:brendan] 2010-09-02 13:07:34 PDT
Jan: don't give up on account of just one benchmark! What bug are you using for your idiv work?

/be
Comment 42 David Mandelin [:dmandelin] 2010-09-02 13:50:01 PDT
Yeah, I wouldn't give up on idiv yet, either. Although the problem does seem to be more on the gfx side, your x|0 patch makes a very noticeable difference here. Any JS wins we can find are worthwhile.
Comment 43 Boris Zbarsky [:bz] 2010-09-02 15:09:11 PDT
Right; I wasn't trying to discourage you.  The 0 thing has come up on other testcases too, so if we can fix it, great.  I just don't expect it to give us 10fps here.  ;)
Comment 44 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-09-02 19:52:50 PDT
On my Mac, we spend 78% of the time in and under JS_CallFunctionValue on trunk. I guess that's just lack of JM. We spend 16% of the time (i.e. most of the rest) in BasicCanvasLayer::Paint just copying the canvas surface to the window, upscaled by by 4.

Now the interesting thing is that BasicCanvasLayer::Paint uses EXTEND_PAD --- which is correct, but for cairo-quartz this always means taking a path involving CGPatterns which is quite likely to be ... not as fast as it could be.
Comment 45 Boris Zbarsky [:bz] 2010-09-02 19:59:30 PDT
> I guess that's just lack of JM. 

Yeah.

Is there a way to deal with the CGPattern/EXTEND_PAD thing?
Comment 46 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-09-02 21:45:12 PDT
Yes. By treating it as EXTEND_NONE (which is completely reasonable, because as it turns out the current code is treating it as EXTEND_REPEAT, which is much more wrong), things get a little bit better. From 16% of my profile I get down to 13.5%. That's taken off about 15% of the paint time, actually more since I get more FPS now. Not great, but worth having. Filed bug 593270.

It's not clear that we can do much better than that using Quartz, though. Now we're just doing a bog-standard CGContextDrawImage, which does scaling, and there's no caching available since the image is different every time.

The good news is that we have some bigger guns in our graphics arsenal now. With D2D, D3D9 or GL enabled, the canvas compositing happens on the GPU. Indeed, in my Mac build if I set MOZ_ACCELERATED=11, painting goes from 13.5% to 2% of the profile.

So I would really like to know how we perform in a JM build with MOZ_ACCELERATED=11 and/or D2D enabled!
Comment 47 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-09-03 06:12:18 PDT
(In reply to comment #46)
> The good news is that we have some bigger guns in our graphics arsenal now.
> With D2D, D3D9 or GL enabled, the canvas compositing happens on the GPU.
> Indeed, in my Mac build if I set MOZ_ACCELERATED=11, painting goes from 13.5%
> to 2% of the profile.

Actually that was overselling the benefit, because on trunk GL-layers, the main thread ends up blocked waiting for the GPU for a while when we call glFlush at the end of each paint. In the profile I did, Shark hadn't sampled the main thread while it was blocked. Fixing that shows that GL layers isn't really making any difference to this benchmarks on Mac.

We shouldn't actually have to call glFlush and block though. If I remove it, performance is great, although unfortunately nothing actually renders on the screen. I filed bug 593342 to look into that.
Comment 48 Boris Zbarsky [:bz] 2010-09-03 07:59:02 PDT
> So I would really like to know how we perform in a JM build with
> MOZ_ACCELERATED=11 and/or D2D enabled!

JM+TM + bug 592631, no moz_accel: 42fps, going down to 39 if I mouse
JM+TM + bug 592631, moz_accel: 55fps, going down to 53 if I mouse

This is without doing anything interesting with glFlush... ;)

"Let's ship it".
Comment 49 Boris Zbarsky [:bz] 2010-09-03 08:05:31 PDT
Note, btw, that CPU usage is about 60% in the latter test, and 97% in the former.  So we're definitely blocking on the flush; it's just _still_ faster than doing the composite ourselves.  At least on my hardware.

Just grinding out the numbers, without accel we take 24ms per frame, of which about half is painting.  With accel we take 18ms per frame.  So accel cut the painting time in half.  If not having to glFlush will further cut it by a factor of 3 (good approximation per comment 46), then we'll be at 14ms per frame or 71fps, with no changes to js engine.  At that point the idiv fix is likely to give a measurable improvement (since even a 2% change will be several fps).
Comment 50 Jan de Mooij [:jandem] (PTO until July 31) 2010-09-04 06:20:50 PDT
(In reply to comment #43)
> Right; I wasn't trying to discourage you.  
No problem.

> The 0 thing has come up on other
> testcases too, so if we can fix it, great.
Problem is that the most recent version no longer does division; it multiplies instead.. Do you remember one of these other benchmarks where this matters? 

Fortunately, my current WIP-patch is very small and hackish. I can polish it when I know it helps somewhere.
Comment 51 Boris Zbarsky [:bz] 2010-09-04 07:24:59 PDT
> Do you remember one of these other benchmarks where this matters? 

Sorry, not offhand....  :(
Comment 52 Brendan Eich [:brendan] 2010-09-04 10:39:38 PDT
Jeff: this bug is meta, although against JS Engine (for want of a "fluidsim" or _sui generis_ component). Please file a painting perf bug and make it block this one. Thanks,

/be
Comment 53 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-09-04 14:51:02 PDT
We don't need a new bug, I filed a bug for the only clear painting issue here.
Comment 54 Brendan Eich [:brendan] 2010-09-04 16:36:46 PDT
(In reply to comment #53)
> We don't need a new bug, I filed a bug for the only clear painting issue here.

Ok, then [painting-perf] goes there, not here.

/be
Comment 55 Konstantin Svist 2010-12-10 20:45:53 PST
Is this still being worked on? I've been checking back on the nighly builds since post @http://blog.mozilla.com/dmandelin/2010/09/08/presenting-jagermonkey/
The primary demo works great (~45-47FPS), but when the alternate drawing mode is used, frame rate takes a dive (~4-15FPS).

P.S. MOZ_ACCELERATED=11 doesn't work here yet (Linux)
Comment 56 Boris Zbarsky [:bz] 2010-12-10 21:13:23 PST
I don't know that anyone's profiled the alternate drawing mode yet.

On Linux, if your drivers are good canvas is already accelerated via XRender.  OF course most drivers on Linux are not good....
Comment 57 Boris Zbarsky [:bz] 2010-12-10 21:14:12 PST
On the other hand, I dunno whether Render accelerates the canvas upscale, which is the bottleneck here.  It might not.
Comment 58 Boris Zbarsky [:bz] 2010-12-10 21:57:36 PST
For what it's worth, on a Mac I see the primary demo at 65fps and the alternate drawing mode at 55fps in a current trunk build.  Chrome 9 dev on the same hardware is 55fps and 45fps respectively, for comparison.

So I'd guess your Linux issues are all graphics.
Comment 59 Konstantin Svist 2010-12-10 23:19:50 PST
Just retested
Fx (Dec 10 build, from http://nightly.mozilla.org/js-preview.html):
* primary mode: 44-47 FPS
* alternate mode: 3-15 FPS

You're right, it finally works with MOZ_ACCELERATED=11! Unfortunately, it works slower :(
* primary mode: 41-44 FPS
* alternate mode: 2-10 FPS


Chromium 8.0.560.0 (from http://repos.fedorapeople.org/repos/spot/chromium/fedora-14/):
* primary mode: 35 FPS (no variation while moving the mouse)
* alternate mode: 15-24 FPS (animation appears smooth, even at 15)

I also tried Chromium with --disable-accelerated-compositing (it's apparently enabled by default), but no major variation in speed
Comment 60 Konstantin Svist 2010-12-16 01:28:13 PST
So, should I file another bug? Or is it a lost cause?
Comment 61 Boris Zbarsky [:bz] 2010-12-16 10:32:03 PST
Probably worth filing one on the Linux graphics issue, yes.  Make sure to include information about your Xorg in the bug report!
Comment 62 Konstantin Svist 2010-12-17 19:13:28 PST
Added bug 620065
Comment 63 Jan de Mooij [:jandem] (PTO until July 31) 2011-05-25 14:18:17 PDT
This does very well with TI. 50+ fps in the browser with default resolution 64, trunk is somewhere between 40-45. With resolution 128 TI+JM gets 35+ fps, slightly faster than Chrome, trunk is at 25 fps.

For the shell test case:

-m -n    : 366 ms
d8       : 405 ms
-m -j -p : 512 ms 
-m       : 721 ms
-j       : 950 ms
Comment 64 Brendan Eich [:brendan] 2011-06-09 11:13:01 PDT
Are we fast yet?

I'm going to throw this back into the pool. Jan, anyone: feel free to take and drive if there are short-term wins. TI is trying to land, so mark FIXED when you can -- no point keeping an unfixable bug around.

/be
Comment 65 Jan de Mooij [:jandem] (PTO until July 31) 2011-09-08 06:31:49 PDT
(In reply to Brendan Eich [:brendan] from comment #64)
> TI is trying to land, so mark FIXED
> when you can -- no point keeping an unfixable bug around.

Just tested this on m-c: 60+ fps and a bit faster than Chrome. Nothing to do here.
Comment 66 Boris Zbarsky [:bz] 2011-09-08 07:31:03 PDT
But it's still only 6fps at "Resolution: 512".

Not sure it's worth worrying about, but if we can get _that_ fast, that would rock.  ;)
Comment 67 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-09-08 22:15:25 PDT
Plus ours looks better than Chrome's because we use bilinear scaling to scale up the canvas.

Note You need to log in before you can comment on or make changes to this bug.