Open
Bug 1498485
Opened 6 years ago
Updated 6 months ago
hubs.mozilla.com frame rate with an empty scene is too slow in FxR on Oculus Go
Categories
(Core :: JavaScript Engine, defect, P3)
Tracking
()
NEW
Performance Impact | medium |
People
(Reporter: cpeterson, Unassigned)
References
(Depends on 3 open bugs, )
Details
(Keywords: perf:responsiveness, Whiteboard: [geckoview:fxr:p1][webvr])
Mozilla Hubs is a key scenario for FxR, but we can't make frame rate today rendering an empty scene.
https://hubs.mozilla.com/
Reporter | ||
Comment 1•6 years ago
|
||
Lars says: "The Google Pixel 1 is an equivalent device to the Oculus Go and would make a good baseline. There are some platform differences around CPU throttling (e.g., affecting the Gecko media stack), but from what the team has been telling me, the perf is roughly the same."
Reporter | ||
Comment 2•6 years ago
|
||
63=wontfix because FxR 1.1 will ship GV 64.
status-firefox63:
--- → wontfix
Comment hidden (typo) |
Reporter | ||
Comment 4•6 years ago
|
||
Jeff and Bas, here is the FxR bug about poor Hubs performance on the Oculus Go.
Updated•6 years ago
|
Flags: needinfo?(jgilbert)
Flags: needinfo?(bas)
Updated•6 years ago
|
Priority: -- → P3
Comment 7•6 years ago
|
||
(As in bug 1498484, it looks like the Gecko profile is missing symbols here, too. Might be handy to recapture & be sure they get resolved/uploaded/whatever.)
Flags: needinfo?(jgilbert)
Comment 8•6 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #5)
> Here's the empty plane:
> http://bit.ly/2zljQxi
It does look like a fair amount of work here is coming from WebGL, in particular some texturing business shows up quite clearly. I don't see any reason why we'd be a lot slower than chromium here though :s.
Flags: needinfo?(bas)
Reporter | ||
Comment 9•6 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #8)
> It does look like a fair amount of work here is coming from WebGL, in
> particular some texturing business shows up quite clearly. I don't see any
> reason why we'd be a lot slower than chromium here though :s.
Does Gecko on a phone (like the Google Pixel 1, whose hardware specs are comparable to the Oculus Go's) have the same WebGL hot spots? Or does this problem appear to be unique to the Oculus Go?
Comment 10•6 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #8)
> (In reply to Jeff Gilbert [:jgilbert] from comment #5)
> > Here's the empty plane:
> > http://bit.ly/2zljQxi
>
> It does look like a fair amount of work here is coming from WebGL, in
> particular some texturing business shows up quite clearly. I don't see any
> reason why we'd be a lot slower than chromium here though :s.
You're right, thanks for checking me. I was reading the profile wrong.
Assignee: nobody → jgilbert
Flags: needinfo?(jgilbert)
Comment 11•6 years ago
|
||
Is the performance difference reproducible in a regular Fennec or GeckoView-example build? From those, we can get profiles with symbols, which should make the analysis here easier. Or is the difference only visible in VR mode?
What's the performance gap between Chrome and Firefox here?
Flags: needinfo?(jgilbert)
Comment 12•6 years ago
|
||
sounds very much like bug 1463904, not much we could there as it's to do with task priorities.
See Also: → 1463904
Comment 13•6 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #1)
> Lars says: "The Google Pixel 1 is an equivalent device to the Oculus Go and
> would make a good baseline. There are some platform differences around CPU
> throttling (e.g., affecting the Gecko media stack), but from what the team
> has been telling me, the perf is roughly the same."
hubs.mozilla.com is pretty smooth on my Pixel 1, pretty much consistently showing 60fp, when I move a lot, it occasionally drop to 50fps but it's still very usable
Comment 14•6 years ago
|
||
Ok, so I didn't notice that if you narrow it to "WebGL", it also renormalizes the percentages. As you can see from the profile color stack, it's mostly yellow (js), not green (graphics) or blue (dom). Of 30,908ms total, filtering by "WebGL" yields 3,849ms (12.4%).
Of that 3,849ms, drawElements() and clear() are 1441 (37%) and 496 (12%), but only 340ms and 57ms respectively of that is outside the driver.
There just doesn't seem like a lot of optimization opportunity here.
I took a profile of Nightly Fennec on the spec-equivalent Pixel 1 XL running in mono fullscreen, and there did seem to be more Graphics load by proportion:
https://perfht.ml/2zRFhX5
Also interesting is the eglCreateImage and eglCreateSync are taking 400ms out of 9400ms total, or 4% on Fennec, but that's not relevant to this bug or FxR, since VR uses SwapBuffers.
Flags: needinfo?(jgilbert)
Reporter | ||
Comment 15•6 years ago
|
||
64=wontfix because FxR 1.1 is using GV 65 and this issue doesn't block Focus 8.0 from using GV 64.
Updated•6 years ago
|
Component: Graphics → JavaScript Engine
Whiteboard: [geckoview:fxr:p1][webvr][qf] → [geckoview:fxr:p1][webvr][qf:p2:responsiveness]
Comment 17•6 years ago
|
||
I was going to comment about this being an ARM64-not-having-ion issue. But the arch in question is ARM32 and Ion definitely shows up in the profile.
Looking at the stack map in inverted mode, I am struck that most of the leaf nodes in profile stacks, as I scan through.. look like they lead into libxul.so and platform. I really need to see what's going on in the leaves of these profiles. Is it calling into JS impl code, or webgl code, or other stuff?
Can we get a profile with symbols enabled?
Flags: needinfo?(jgilbert) → needinfo?(cpeterson)
See Also: → arm64-ion
Reporter | ||
Comment 18•6 years ago
|
||
(In reply to Kannan Vijayan [:djvj] from comment #17)
> Looking at the stack map in inverted mode, I am struck that most of the leaf
> nodes in profile stacks, as I scan through.. look like they lead into
> libxul.so and platform. I really need to see what's going on in the leaves
> of these profiles. Is it calling into JS impl code, or webgl code, or other
> stuff?
>
> Can we get a profile with symbols enabled?
Lars, is there an FxR engineer that would know about profiling FxR on the Oculus Go headset? Can we get FxR's debug symbols so the Gecko Profiler can symbolicate libxul.so code?
Flags: needinfo?(cpeterson) → needinfo?(larsberg)
Comment 19•6 years ago
|
||
All that's needed is to run public FxR and pull the profile. I can do this real quick.
Flags: needinfo?(larsberg)
Comment 20•6 years ago
|
||
Empty plane:
http://bit.ly/2AyqFfw
Atrium:
http://bit.ly/2Ax92gq
Atrium looks overwhelmingly JS to me, with just a sliver of green GFX.
Comment 22•6 years ago
|
||
Ok I spent a couple of hours looking at the latest two profiles.
The first one is interesting. There's no clear _single_ story here, but a few things stand out. The first is that Ion and Baseline compilation account for 5.6% of the TOTAL execution time across the entire profile. There's another 1% spent in IonCacheIR compilation, bringing it up to 6.6% of total time. Another 1.6% in ReprotectRegion (bad stacks which leave it unmoored, but which almost definitely come from compiler code), takes us to 8.2% of compile-related time.
The rest of the profile is a grab bag of stuff, mostly related to interpreter + slowpath execution (property lookups, calls, etc.), gc, and painting / layout/etc.
The second one is more interesting. A full 9% of the time across the entire profile is in ReprotectRegion. This is incredible, and can be attributed entirely to compiles and some to GCs. GC doesn't account for he bulk of the calls, however.. and is dwarfed by the respective compilers using ReprotectRegion.
There are a couple of very clear signals coming out of this:
1. ReprotectRegion, specifically W^X memory protection, needs to be dealt with. I don't see any worker threads in the profile, so I'm not sure if we didn't capture them or GeckoView simply is not using them in this case.
2. BaselineCompiles need to be dealt with. Baseline compiles are showing up heavily in profiles and we can improve greatly if we are more fine-grained about the hot code we compile (i.e. cold blocks in hot functions should not be compiled).
3. Once again, there should be general improvements from a faster interpreter - lower overhead of Interp <=> JIT transitions, and simply faster overall performance.
Flags: needinfo?(kvijayan)
Comment 23•6 years ago
|
||
Sharing this profile as well, as asked in https://github.com/MozillaReality/FirefoxReality/issues/878
This one is specifically about hitching while spawning media (ducks) in Hubs. See the above github issue for repro steps.
https://perfht.ml/2BbdqSx
Comment 24•6 years ago
|
||
(In reply to Kannan Vijayan [:djvj] from comment #22)
> The rest of the profile is a grab bag of stuff, mostly related to
> interpreter + slowpath execution (property lookups, calls, etc.), gc, and
> painting / layout/etc.
Could you clarify this point a bit more? Are there any resources / best practices to avoid slowpath execution?
Comment 25•6 years ago
|
||
> Could you clarify this point a bit more? Are there any resources / best practices to avoid slowpath execution?
Nothing obvious I could suggest. The early-phase performance issue is something that simply requires faster execution of cold JS before it warms up. Jan's compiled interpreter (https://bugzilla.mozilla.org/show_bug.cgi?id=1499324) should help here quite a bit.
I took a look at the latest profile just now. The key thing that stands out to me is in the Markers tab. At ~6s, ~12s, ~18s - roughly six-second increments, there seems to be a GC that wipes out all of our compiled scripts, which we subsequently recompile.
It feels strongly that around every 6 seconds, we are throwing away all of our compiled code, and then hitting the interpreter, and then recompiling with baseline, then recompiling with Ion, etc. etc.
This is apparent when we do an inverse view of the samples and notice that ReprotectRegion, associated with recompilation of Baseline scripts, Ion scripts, and Ion ICs, is the single most prominent item.
Here are the general list of things that probably relate to improving this:
1. Stop throwing away code on GC, or at least be smarter about it and keep recently executed code. I'm needinfoing jonco about this.
2. When we do throw code away, we can recover faster if our interpreter is faster. That's primarily going to be Jan's interpreter.. which is likely to take a while - probably landing sometime in 2019.
3. A longstanding issue is that our codegen on ARM is very poor. CraneLift is a long-term thing that should improve things here, but I don't know what the timeframe for it is.
4. We _need_ to get ReprotectRegion and mprotect off the main thread. We are spending 4% of our total time in this call alone.
5. I suspect that our blind scheduling of heavyweight compiles on background threads is hurting us here - as the Oculus Go would have limited resources, and that's precisely where background compiles end up bottlenecking and taking a long time. The scheduling issue needs to be investigated - I suspect but cannot confirm that we are losing a lot of potential performance because of this.
On the plus side, it turns out that event queue scheduling is likely to become a priority for 2019, as it's responsible for a significant chunk of our page-load performance issues.
I will remember to raise this general issue with scheduling of JS background tasks when we discuss how to staff and implement the scheduler work we will need to do in 2019.
Reporter | ||
Comment 26•6 years ago
|
||
(In reply to Kannan Vijayan [:djvj] from comment #25)
> 4. We _need_ to get ReprotectRegion and mprotect off the main thread. We
> are spending 4% of our total time in this call alone.
Since that sounds like an important and discrete task, I filed new bug 1514113.
Comment 27•6 years ago
|
||
(In reply to Kannan Vijayan [:djvj] from comment #25)
> Here are the general list of things that probably relate to improving this:
>
> 1. Stop throwing away code on GC, or at least be smarter about it and keep
> recently executed code. I'm needinfoing jonco about this.
^^^^^^^^^^^^^^^^^
Flags: needinfo?(jcoppeard)
Comment 29•6 years ago
|
||
Jeff took this bug for investigation when we thought the problem was in the gfx area. It appears to be a JS issue.
Assignee: jgilbert → nobody
Updated•3 years ago
|
Performance Impact: --- → P2
Keywords: perf:responsiveness
Whiteboard: [geckoview:fxr:p1][webvr][qf:p2:responsiveness] → [geckoview:fxr:p1][webvr]
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•