Open Bug 1498485 Opened 3 years ago Updated 1 year ago

hubs.mozilla.com frame rate with an empty scene is too slow in FxR on Oculus Go

Categories

(Core :: JavaScript Engine, defect, P3)

Unspecified
Android
defect

Tracking

()

Tracking Status
geckoview62 --- wontfix
geckoview63 --- wontfix
firefox-esr60 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- affected

People

(Reporter: cpeterson, Unassigned)

References

(Depends on 3 open bugs, )

Details

(Whiteboard: [geckoview:fxr:p1][webvr][qf:p2:responsiveness])

Mozilla Hubs is a key scenario for FxR, but we can't make frame rate today rendering an empty scene.

https://hubs.mozilla.com/
Lars says: "The Google Pixel 1 is an equivalent device to the Oculus Go and would make a good baseline. There are some platform differences around CPU throttling (e.g., affecting the Gecko media stack), but from what the team has been telling me, the perf is roughly the same."
63=wontfix because FxR 1.1 will ship GV 64.
Jeff and Bas, here is the FxR bug about poor Hubs performance on the Oculus Go.
Flags: needinfo?(jgilbert)
Flags: needinfo?(bas)
Here's the empty plane:
http://bit.ly/2zljQxi
Flags: needinfo?(jgilbert)
Priority: -- → P3
Duplicate of this bug: 1494710
(As in bug 1498484, it looks like the Gecko profile is missing symbols here, too.  Might be handy to recapture & be sure they get resolved/uploaded/whatever.)
Flags: needinfo?(jgilbert)
(In reply to Jeff Gilbert [:jgilbert] from comment #5)
> Here's the empty plane:
> http://bit.ly/2zljQxi

It does look like a fair amount of work here is coming from WebGL, in particular some texturing business shows up quite clearly. I don't see any reason why we'd be a lot slower than chromium here though :s.
Flags: needinfo?(bas)
(In reply to Bas Schouten (:bas.schouten) from comment #8)
> It does look like a fair amount of work here is coming from WebGL, in
> particular some texturing business shows up quite clearly. I don't see any
> reason why we'd be a lot slower than chromium here though :s.

Does Gecko on a phone (like the Google Pixel 1, whose hardware specs are comparable to the Oculus Go's) have the same WebGL hot spots? Or does this problem appear to be unique to the Oculus Go?
(In reply to Bas Schouten (:bas.schouten) from comment #8)
> (In reply to Jeff Gilbert [:jgilbert] from comment #5)
> > Here's the empty plane:
> > http://bit.ly/2zljQxi
> 
> It does look like a fair amount of work here is coming from WebGL, in
> particular some texturing business shows up quite clearly. I don't see any
> reason why we'd be a lot slower than chromium here though :s.

You're right, thanks for checking me. I was reading the profile wrong.
Assignee: nobody → jgilbert
Flags: needinfo?(jgilbert)
Is the performance difference reproducible in a regular Fennec or GeckoView-example build? From those, we can get profiles with symbols, which should make the analysis here easier. Or is the difference only visible in VR mode?
What's the performance gap between Chrome and Firefox here?
Flags: needinfo?(jgilbert)
sounds very much like bug 1463904, not much we could there as it's to do with task priorities.
See Also: → 1463904
(In reply to Chris Peterson [:cpeterson] from comment #1)
> Lars says: "The Google Pixel 1 is an equivalent device to the Oculus Go and
> would make a good baseline. There are some platform differences around CPU
> throttling (e.g., affecting the Gecko media stack), but from what the team
> has been telling me, the perf is roughly the same."

hubs.mozilla.com is pretty smooth on my Pixel 1, pretty much consistently showing 60fp, when I move a lot, it occasionally drop to 50fps but it's still very usable
Ok, so I didn't notice that if you narrow it to "WebGL", it also renormalizes the percentages. As you can see from the profile color stack, it's mostly yellow (js), not green (graphics) or blue (dom). Of 30,908ms total, filtering by "WebGL" yields 3,849ms (12.4%). 

Of that 3,849ms, drawElements() and clear() are 1441 (37%) and 496 (12%), but only 340ms and 57ms respectively of that is outside the driver.

There just doesn't seem like a lot of optimization opportunity here.

I took a profile of Nightly Fennec on the spec-equivalent Pixel 1 XL running in mono fullscreen, and there did seem to be more Graphics load by proportion:
https://perfht.ml/2zRFhX5

Also interesting is the eglCreateImage and eglCreateSync are taking 400ms out of 9400ms total, or 4% on Fennec, but that's not relevant to this bug or FxR, since VR uses SwapBuffers.
Flags: needinfo?(jgilbert)
64=wontfix because FxR 1.1 is using GV 65 and this issue doesn't block Focus 8.0 from using GV 64.
Should we move this to js?
Flags: needinfo?(jgilbert)
Component: Graphics → JavaScript Engine
Whiteboard: [geckoview:fxr:p1][webvr][qf] → [geckoview:fxr:p1][webvr][qf:p2:responsiveness]
I was going to comment about this being an ARM64-not-having-ion issue.  But the arch in question is ARM32 and Ion definitely shows up in the profile.

Looking at the stack map in inverted mode, I am struck that most of the leaf nodes in profile stacks, as I scan through.. look like they lead into libxul.so and platform.  I really need to see what's going on in the leaves of these profiles.  Is it calling into JS impl code, or webgl code, or other stuff?

Can we get a profile with symbols enabled?
Flags: needinfo?(jgilbert) → needinfo?(cpeterson)
See Also: → arm64-ion
See Also: arm64-ion
(In reply to Kannan Vijayan [:djvj] from comment #17)
> Looking at the stack map in inverted mode, I am struck that most of the leaf
> nodes in profile stacks, as I scan through.. look like they lead into
> libxul.so and platform.  I really need to see what's going on in the leaves
> of these profiles.  Is it calling into JS impl code, or webgl code, or other
> stuff?
> 
> Can we get a profile with symbols enabled?

Lars, is there an FxR engineer that would know about profiling FxR on the Oculus Go headset? Can we get FxR's debug symbols so the Gecko Profiler can symbolicate libxul.so code?
Flags: needinfo?(cpeterson) → needinfo?(larsberg)
All that's needed is to run public FxR and pull the profile. I can do this real quick.
Flags: needinfo?(larsberg)
Empty plane:
http://bit.ly/2AyqFfw

Atrium:
http://bit.ly/2Ax92gq

Atrium looks overwhelmingly JS to me, with just a sliver of green GFX.
Kannan, please see new profiles above.
Flags: needinfo?(kvijayan)
Ok I spent a couple of hours looking at the latest two profiles.

The first one is interesting.  There's no clear _single_ story here, but a few things stand out.  The first is that Ion and Baseline compilation account for 5.6% of the TOTAL execution time across the entire profile.  There's another 1% spent in IonCacheIR compilation, bringing it up to 6.6% of total time.  Another 1.6% in ReprotectRegion (bad stacks which leave it unmoored, but which almost definitely come from compiler code), takes us to 8.2% of compile-related time.

The rest of the profile is a grab bag of stuff, mostly related to interpreter + slowpath execution (property lookups, calls, etc.), gc, and painting / layout/etc.


The second one is more interesting.  A full 9% of the time across the entire profile is in ReprotectRegion.  This is incredible, and can be attributed entirely to compiles and some to GCs.  GC doesn't account for he bulk of the calls, however.. and is dwarfed by the respective compilers using ReprotectRegion.

There are a couple of very clear signals coming out of this:

1. ReprotectRegion, specifically W^X memory protection, needs to be dealt with.  I don't see any worker threads in the profile, so I'm not sure if we didn't capture them or GeckoView simply is not using them in this case.

2. BaselineCompiles need to be dealt with.  Baseline compiles are showing up heavily in profiles and we can improve greatly if we are more fine-grained about the hot code we compile (i.e. cold blocks in hot functions should not be compiled).

3. Once again, there should be general improvements from a faster interpreter - lower overhead of Interp <=> JIT transitions, and simply faster overall performance.
Flags: needinfo?(kvijayan)
Sharing this profile as well, as asked in https://github.com/MozillaReality/FirefoxReality/issues/878

This one is specifically about hitching while spawning media (ducks) in Hubs. See the above github issue for repro steps.

https://perfht.ml/2BbdqSx
(In reply to Kannan Vijayan [:djvj] from comment #22)
> The rest of the profile is a grab bag of stuff, mostly related to
> interpreter + slowpath execution (property lookups, calls, etc.), gc, and
> painting / layout/etc.

Could you clarify this point a bit more? Are there any resources / best practices to avoid slowpath execution?
> Could you clarify this point a bit more? Are there any resources / best practices to avoid slowpath execution?

Nothing obvious I could suggest.  The early-phase performance issue is something that simply requires faster execution of cold JS before it warms up.  Jan's compiled interpreter (https://bugzilla.mozilla.org/show_bug.cgi?id=1499324) should help here quite a bit.


I took a look at the latest profile just now.  The key thing that stands out to me is in the Markers tab.  At ~6s, ~12s, ~18s - roughly six-second increments, there seems to be a GC that wipes out all of our compiled scripts, which we subsequently recompile.

It feels strongly that around every 6 seconds, we are throwing away all of our compiled code, and then hitting the interpreter, and then recompiling with baseline, then recompiling with Ion, etc. etc.

This is apparent when we do an inverse view of the samples and notice that ReprotectRegion, associated with recompilation of Baseline scripts, Ion scripts, and Ion ICs, is the single most prominent item.

Here are the general list of things that probably relate to improving this:

1. Stop throwing away code on GC, or at least be smarter about it and keep recently executed code.  I'm needinfoing jonco about this.

2. When we do throw code away, we can recover faster if our interpreter is faster.  That's primarily going to be Jan's interpreter.. which is likely to take a while - probably landing sometime in 2019.

3. A longstanding issue is that our codegen on ARM is very poor.  CraneLift is a long-term thing that should improve things here, but I don't know what the timeframe for it is.

4. We _need_ to get ReprotectRegion and mprotect off the main thread.  We are spending 4% of our total time in this call alone.  

5. I suspect that our blind scheduling of heavyweight compiles on background threads is hurting us here - as the Oculus Go would have limited resources, and that's precisely where background compiles end up bottlenecking and taking a long time.  The scheduling issue needs to be investigated - I suspect but cannot confirm that we are losing a lot of potential performance because of this.

On the plus side, it turns out that event queue scheduling is likely to become a priority for 2019, as it's responsible for a significant chunk of our page-load performance issues.

I will remember to raise this general issue with scheduling of JS background tasks when we discuss how to staff and implement the scheduler work we will need to do in 2019.
Depends on: 1514113
(In reply to Kannan Vijayan [:djvj] from comment #25)
> 4. We _need_ to get ReprotectRegion and mprotect off the main thread.  We
> are spending 4% of our total time in this call alone.  

Since that sounds like an important and discrete task, I filed new bug 1514113.
(In reply to Kannan Vijayan [:djvj] from comment #25)
> Here are the general list of things that probably relate to improving this:
> 
> 1. Stop throwing away code on GC, or at least be smarter about it and keep
> recently executed code.  I'm needinfoing jonco about this.

                               ^^^^^^^^^^^^^^^^^
Flags: needinfo?(jcoppeard)
Depends on: 1514281
I filed bug 1514281 for this.
Flags: needinfo?(jcoppeard)

Jeff took this bug for investigation when we thought the problem was in the gfx area. It appears to be a JS issue.

Assignee: jgilbert → nobody
Depends on: 1537879
Depends on: 1537951
Depends on: 1537957
Depends on: 1537961
Depends on: 1537967
Depends on: 1538260
Depends on: 1537550
Depends on: 1536672
Depends on: 1412202
You need to log in before you can comment on or make changes to this bug.