Closed Bug 1334672 Opened 8 years ago Closed 7 years ago

Investigate performance of Firefox on Freeciv 3D benchmark

Categories

(Core :: JavaScript Engine, defect)

51 Branch
x86_64
Windows 10
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: andreas.rosdal, Unassigned)

References

()

Details

User Agent: Mozilla/5.0 (Linux; Android 7.1.1; Nexus 6P Build/N4F26I) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.91 Mobile Safari/537.36

Steps to reproduce:

The open source game Freeciv now has a new 3D WebGL version playable on https://play.freeciv.org . The game uses the three.js 3D engine. Currently the performance and framerate is significantly better in Chrome than Firefox. I'm using the latest stable version of both browsers on Windows 10. 

There is a benchmark of the WebGL renderer, which is available here: Go to https://play.freeciv.org then click singleplayer 3D WebGL, then Customize game - game settings - webgl - benchmark. The benchmark will take about one minute and return a fps score. Typically Chrome will get about twice as good fps score as Firefox. Multiple users of the game have reported similar performance with WebGL on Firefox compared to Chrome on this game.

I hope this can be used to improve the WebGL performance in Firefox.
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64
So I I'm getting pretty wildly different times on this benchmark on Mac: I got 10fps, then 18fps, then 11fps (Chrome got 17fps in the one run I did there).

For the 11fps run a profile of the mainthread time looks about like this, out of a total time of about 26s:

2.3s garbage collection from refresh ticks calling GCRuntime::notifyDidPaint.
1.9s painting via refresh tick.
1.2s calling requestAnimationFrame callbacks.

I see all three of those happening from both VsyncChild::RecvNotify and CompositorBridgeChild::RevDidComposite.  Is that expected?  Or are we ticking the refresh driver too often?
Flags: needinfo?(vladimir)
Then we also have:

4.47s IonGetPropertyIC::update (at least half atomizing strings)
4.3s SetPropertyIC::update (mostly SetObjectElement)
1.55s processing websocket events (mostly running JS).
0.5s calls to obj_defineProperties from jitcode.
0.3s jit::DoCallFallback 
0.3s getting clientWidth on HTML elements and doing the resulting layout.
0.3s js::GetIteratorObject
0.3s js::jit::MallocWrapper
0.2s GC/CC from timers firing
0.2s various dom methods being called via GenericBindingMethod (i.e. not from ion code, probably).
0.2s jit::IonCompileScriptForBaseline
0.2s jit::DoGetPropGeneric
0.2s setting innerHTML
0.1s jit::LazyLinkTopActivation

And then the webgl bits:

1.1s WebGL bindBuffer (about half GLContextCGL::MakeCurrentImpl and half the underlying
     gleBindBufferObject_Thread).
0.9s WebGL uniformMatrix4fv (again, over half making current, but only 1/10 glUniformMatrix4fv_Comp;
     there's also 1/6 of this time spent in ValidatUniformMatrixArraySetter and some other minor bits).
0.7s WebGL vertexAttribPointer (about 2/3 making current).
0.6s WebGL drawArrays.  About equal splits between making current, DrawArrays_check, and the underlying
     glDrawArraysInstanced_PackThread.
0.3s WebGL uniformMatrix3fv (again, half making current and some validation).

There's some long tail too, but the above really does account for almost all the time.

So my initial reaction is that modulo whatever is going on with the refresh driver the main problems here are the JS gets/sets that miss the IC, at least at first glance.  Certainly there's no way to reduce time by 2x by making the GL stuff any faster...

Jan, do you know who could take a look at that?
Status: UNCONFIRMED → NEW
Component: Untriaged → JavaScript Engine
Ever confirmed: true
Flags: needinfo?(jdemooij)
Product: Firefox → Core
And I know the MakeCurrent for GL thing has come up before, at least on Mac.  I don't see an obvious bug tracking it, though...  Jeff, do you know of one?
Flags: needinfo?(jgilbert)
> I see all three of those happening from both VsyncChild::RecvNotify and CompositorBridgeChild::RevDidComposite.

Specifically, VsyncChild::RecvNotify calls VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyVsync which calls TickRefreshDriver which calls (somewhat indirectly) RefreshDriverTimer::Tick, etc, which calls nsRefreshDriver::Tick on a bunch of refresh drivers.

For the other, CompositorBridgeChild::RecvDidComposite calls TabChild::DidComposite calls ClientLayerManager::DidComposite calls nsRefreshDriver::NotifyTransactionCompleted calls nsRefreshDriver::FinishedWaitingForTransaction calls nsRefreshDriver::DoRefresh which does a Tick().  

I guess this is all working as expected, since FinishedWaitingForTransaction will only Tick() if we earlier bailed out of a Tick() due to waiting for paints...  So unlikely that we're over-ticking.
Summary: Performance of WebGL in Firefox on Freeciv 3D benchmark → Investigatep performance of Firefox on Freeciv 3D benchmark
Summary: Investigatep performance of Firefox on Freeciv 3D benchmark → Investigate performance of Firefox on Freeciv 3D benchmark
Thanks for investigating this! I hope that WebGL 2 will improve the performance of this game, due to improved instancing support.
The benchmark results can be a bit variable due to network latency. When running locally on my developer machine the randomness goes away, but the tendency of Chrome being faster is still there.
Tom, do you happen to have a build with your CacheIR logging patch? This might be a good testcase.

> 4.47s IonGetPropertyIC::update (at least half atomizing strings)
> 4.3s SetPropertyIC::update (mostly SetObjectElement)

Hm this suggests we probably have a megamorphic GetElem/SetElem and disable the IonGetPropertyIC. Annoying to fix especially if most of the time is spent atomizing...
Flags: needinfo?(evilpies)
This is indeed highly polymorphic for GetProp at least, we have locations where we try to attach some NativeSlot IC over 1000 times, especially also idempotent stubs. GetElem seems to run into sparse elements a lot, or something else that prevents us from optimizing a simple int32 access. SetProp is also not pretty, but without NativeAddSlot in CacheIR it's hard to say if we actually fail to attach stubs.

I pushed my CacheIR patches to try https://treeherder.mozilla.org/#/jobs?repo=try&revision=9bbd56493bb703573f4afa6bb97ef004f9633619 so at some point there should be some builds.
Flags: needinfo?(evilpies)
> This is indeed highly polymorphic for GetProp at least

What do the strings coming through look like?  No matter what we do, this sort of polymorphic access might be worth fixing in general in the page to improve performance in all engines...
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #1)
> I see all three of those happening from both VsyncChild::RecvNotify and
> CompositorBridgeChild::RevDidComposite.  Is that expected?  Or are we
> ticking the refresh driver too often?

:mchang can probably answer this best.  In my abandoned vsync rewrite in bug 1184283 I consolidated all this to only ever tick off vsync, because I think I had the same questions.
Flags: needinfo?(vladimir) → needinfo?(mchang)
The GetProp/GetElem IC hits the slow path millions of times here:

  (b){b=b.uuid;var c=a[b];void 0===c&&(c={},a[b]=c);return c}

I think that's this function in three.js: https://github.com/mrdoob/three.js/blob/bf75413322057d8803afac240a59abc7d94f1d20/src/renderers/webgl/WebGLProperties.js#L11-L25

uuid is a UUID string so this lookup is polymorphic and it's difficult to optimize this for JS engines. We could make this a bit faster, but I don't expect huge wins from that. Does Freeciv really need to call this function so many times or could it cache the results somehow?
Flags: needinfo?(jdemooij)
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #4)
> > I see all three of those happening from both VsyncChild::RecvNotify and CompositorBridgeChild::RevDidComposite.
> 
> Specifically, VsyncChild::RecvNotify calls
> VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyVsync which calls
> TickRefreshDriver which calls (somewhat indirectly)
> RefreshDriverTimer::Tick, etc, which calls nsRefreshDriver::Tick on a bunch
> of refresh drivers.
> 
> For the other, CompositorBridgeChild::RecvDidComposite calls
> TabChild::DidComposite calls ClientLayerManager::DidComposite calls
> nsRefreshDriver::NotifyTransactionCompleted calls
> nsRefreshDriver::FinishedWaitingForTransaction calls
> nsRefreshDriver::DoRefresh which does a Tick().  
> 
> I guess this is all working as expected, since FinishedWaitingForTransaction
> will only Tick() if we earlier bailed out of a Tick() due to waiting for
> paints...  So unlikely that we're over-ticking.

This is correct. If you're seeing nsRefreshDriver::DoRefresh, then it means when vsync is ticking the refresh driver normally, it won't actually tick since the refresh driver can't tick more than one frame ahead of the compositor [1]. 

[1] http://searchfox.org/mozilla-central/source/layout/base/nsRefreshDriver.cpp#1742
Flags: needinfo?(mchang)
So is this bit from my profile:

4.47s IonGetPropertyIC::update (at least half atomizing strings)
4.3s SetPropertyIC::update (mostly SetObjectElement)

pretty much all the uuid bits?
Flags: needinfo?(jdemooij)
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #12)
> So is this bit from my profile:
> 
> 4.47s IonGetPropertyIC::update (at least half atomizing strings)
> 4.3s SetPropertyIC::update (mostly SetObjectElement)
> 
> pretty much all the uuid bits?

I think so. I'll land bug 1328140 in a few days and check if it has any effect on this.
Depends on: 1328140
Thanks for making this improvement! I will give it a try on a nightly build.
For what it's worth, now I'm seeing 12s time or so (instead of 26).  The IC situation is much better; maybe a total of 2s for the getprop IC, and less than 300ms for the setprop IC (this seems too good to be true!).

Anyway, definitely worth remeasuring.
These are the latest benchmark results from my testing:

Chrome 57:    39fps
Firefox 52.0.1:  31fps
Firefox nightly 55.0a1 (2017-03-23): 33fps

This is running Freeciv-web locally on Windows 10.

These is still a significant difference between Chrome and Firefox.
Microsoft Edge: 27 fps
Explorer 11: 18 fps
Yeah this is much better now. In a profile I see we're spending a few % doing megamorphic lookups in  GetNativeDataProperty(ByValue), but that path is faster than what we did before (the IC fallback path).

Atomization in GetNativeDataPropertyByValue is not being fully inlined which is silly, I'll file a bug for that. There's also bug 1346217 and time doing GC, but at this point time in DOM/WebGL/Graphics is probably also becoming an issue.
Flags: needinfo?(jdemooij)
Depends on: 1350760
Andreas do you see any improvements with the latest Nightly? Bug 1350760 should have improved this a bit.

(Also you might want to set the JSGC_DISABLE_POISONING=1 environment var to match non-Nightly builds a bit better.)
Yes, this is much better in the nightly build of Firefox. You may close this bugreport now if you want to. Thanks!

This is the result of my latest benchmark:

Chrome 57:   44 fps
Firefox nightly:   42 fps
Edge:   39 fps
Internet Explorer 11:   28 fps
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(jgilbert)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.