Closed Bug 1498484 Opened 7 years ago Closed 1 year ago

Sketchfab frame rate is too slow in FxR on Oculus Go

Categories

(Core :: JavaScript: GC, defect, P2)

Unspecified
Android
defect

Tracking

()

RESOLVED INCOMPLETE
Performance Impact low
Tracking Status
geckoview62 --- wontfix
geckoview63 --- wontfix
firefox-esr60 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- fix-optional
firefox66 --- affected

People

(Reporter: cpeterson, Unassigned)

References

()

Details

(Whiteboard: [geckoview:fxr:p1][webvr])

Rendering a basic Sketchfab model is a key scenario for FxR, but we can't make frame rate today.
Lars says: "The Google Pixel 1 is an equivalent device to the Oculus Go and would make a good baseline. There are some platform differences around CPU throttling (e.g., affecting the Gecko media stack), but from what the team has been telling me, the perf is roughly the same."
See Also: → 1498485
Andre says: "Go to Sketchfab.com and test the first model shown on the page - used to be a robot, now is some sort of orange Sheep/Bull (Sheebul?). The perf issues were specific to Oculus Go, but we can't compare to the VIVE Focus easily because it has different hardware (the Qualcomm 835 versus the 820 in the Go).
63=wontfix because FxR 1.1 will ship GV 64.
Jeff and Bas, here is the FxR bug about poor Sketchfab performance on the Oculus Go.
Flags: needinfo?(jgilbert)
Flags: needinfo?(bas)
FxR Profile: http://bit.ly/2zhnXuk I think this was just the sketchfab home page, with a floating robot model.
Flags: needinfo?(jgilbert)
Component: Graphics → JavaScript Engine
Whiteboard: [geckoview:fxr:p1][webvr][qf] → [geckoview:fxr:p1][webvr][qf:js:investigate]
Jeff: that profile doesn't have C++ symbols for anything in libxul.so -- see e.g. https://perfht.ml/2znUOxs -- so it looks like they failed to upload. Might be useful to try capturing again and make sure they get resolved before clicking the "share" link (it sometimes takes a minute or two)
Flags: needinfo?(jgilbert)
(Note: if this is a local build, I think you have to build with ac_add_options --enable-debug-symbols (or --enable-debug) in your mozconfig, in order for symbols to be available.)
(In reply to Daniel Holbert [:dholbert] from comment #7) > (Note: if this is a local build, I think you have to build with > ac_add_options --enable-debug-symbols (or --enable-debug) in your mozconfig, > in order for symbols to be available.) It was built with --enable-debug-symbols --disable-debug.
Flags: needinfo?(jgilbert) → needinfo?(dholbert)
Cool. You might just need to give it a little more time before hitting the "share" button then. (There should be a little notification at the top of the profile display that says "Symbolicating call stacks..." or something like that. This took 10-15 seconds for me in a local build just now -- though it also blocks the "share" button from displaying, so I'm not entirely sure what could've gone wrong in your shared profile.) Stock nightly may be more reliable, if it's possible for you to use that, though local builds with --enable-debug-symbols should work, too, IIUC. If they're not, you should ping markus or someone else who works on gecko profiler integration to see what's going on.
Flags: needinfo?(dholbert)
Unfortunately this won't work - symbolication of local Android builds is currently not implemented. I've filed bug 1504101 for the next step there. For now, the only way to obtain profiles with symbols for Android builds is to have the symbols exist on the mozilla symbol server. There are two ways to achieve that: 1. Either, profile an official Nightly. Symbols for Nightlies are automatically uploaded to the mozilla symbol server. 2. Or, push to try, and run a "Sym" job for your build. Then download the try build, run it on your Android device, and profile that build. The Sym job will have uploaded your build's symbols to the mozilla symbol server.
Interestingly enough, our desktop performance for me appears to be better than Chrome's for the Sketchfab front page model (and 1 or 2 others), the profiles look very different from what I'm seeing in Jeff's though (although lack of symbol makes it hard). What I can see though is that there's a bunch of invalidates coming from a small area of the page's JS code, and Ion compiles are clearly showing up in the profile. I'm guessing there's certainly room for JS improvements here. What do you think Kannan?
Flags: needinfo?(bas) → needinfo?(kvijayan)
64=wontfix because FxR 1.1 is using GV 65 and this issue doesn't block Focus 8.0 from using GV 64.
Flags: needinfo?(kvijayan)
That was of the model at the top of the homepage. This is in VR mode: https://perfht.ml/2AdvvyP
Things that stand out to me: 1. There's a 25-second GCMajor going on in the background. 2. There are long, blocking Baseline compilations -- about 110ms. Some of this will get solved long-term by us moving Baseline off-thread, but it's probably a sign that the function that's being compiled is *way* too large. It looks like most of the time is spent within inlining. 3. There are large stacks, and basically nothing has a "self" time. Ion is interspersed with Baseline nodes. Because there's a cost from the trampoline of switching back and forth between Ion and Baseline, it's likely that a bunch of time is spent in trampoline code, not actually doing anything useful, only to execute a very short Ion instruction. One way to fix this would be to cut down on the amount of generic code, and do inlining in the source. Generally reducing stack depth and having less generic plumbing code. 4. Looking at the "Stack Chart" of a ~1-second jank, it looks like 9 FrameRequestCallbacks were handled during that single run of RunFrameRequestCallbacks. They all seem to have similar stacks, but return to the root. I'm not sure what these are all doing, but it looks like a lost cause scheduling issue, where we keep missing deadlines until we finally find a callback that resolves in 33ms, and only then draw.
I suspect you'll want to comment on this, Kannan. Points 2 and 3 at least.
Flags: needinfo?(kvijayan)
Thanks for the heads up Bas. I actually had a half-composed, but unsubmitted response on the page when I loaded the tab from your bugmail link. > 2. There are long, blocking Baseline compilations -- about 110ms. Some of this will get solved long-term by us moving Baseline off-thread, but it's probably a sign that the function that's being compiled is *way* too large. It looks like most of the time is spent within inlining. If long baseline compilations are an issue, then the problem is not inlining - it would simply be very large scripts being compiled. One of the things we can build on top of Jan's baseline-compiled interpreter lands is a more granular baseline compiler. The ABI of the baseline-interpreter will match the baseline JIT ABI, and transitioning between them is virtually free (a single jump). Shifting to basic-block baseline JIT compilation at that point should be straightforward: have a collection of BaselineScripts (e.g. rename to BaselineBlockScript) pointers per JSScript, one-per-basic-block entrypoint (uninitialized array that's allocated when the JSScript gets hot enough (at which point it's guaranteed to be in the compiled-interpreter mode). The only change to the interpreter would be adding transitions from every jump target The savings we'd get from JIT-code would help both performance and memory usage. 3. The overhead of transitioning from Interp <=> JIT is certainly high now, but should be much lower once jandem's new interpreter lands. The ABIs will be matched and there should be far less jumping around.
Flags: needinfo?(kvijayan)
Priority: -- → P1
What would be the next step to get this bug forward? Waiting for the Baseline-Interpreter? Isn't the fact that we have huge Baseline compilation in the profile just related to the 25 seconds of GC? Could it be different GC settings causing more shrinking GC to discard Baseline code?
Flags: needinfo?(kvijayan)
I don't actually have a good understanding of what GC stuff is happening given the profile. I can see a major GC running for most of the profile starting at the very beginning and running until about 60% through the execution (check the marker chart). However, I don't see baseline recompiles happening often. Also, I don't know what those periodic grey regions in the profile are - there is no stack information available there at all - what are we doing? Are those major GC incremental marking pauses?
Flags: needinfo?(kvijayan) → needinfo?(pbone)
This response has a NI for jonco, just for one specific bit. (In reply to Kannan Vijayan [:djvj] from comment #19) > I don't actually have a good understanding of what GC stuff is happening > given the profile. I can see a major GC running for most of the profile > starting at the very beginning and running until about 60% through the > execution (check the marker chart). Both profiles are missing a GCMajor events. The first profile is missing one at the beginning and the 2nd one is missing one at the end. GCSlice events only occur within a GCMajor, So you can see where the slices start again in the 2nd profile, that is where there should be a new GCMajor. One thing we can probably do is drive the GC slice budget down lower (and correspondingly lower the threshold for how often a slice should run). This should help with the GC causing skipped frames. There also seem to be some slower madvise calls. We could experiment weather skipping these helps on this platform. These These occur in the rather long 158ms GC pause between marking and sweeping. (that's the easy stuff) There's probably more than can be done in that rather long GC pause between marking and sweeping, mostly we're already aware of it, and some of it may be very difficult to remove. Jonco, take a look at https://perfht.ml/2SFNZQD this part of the 2nd profile. There's a bunch of stuff there that we can't yet yield in, plus the madvise calls from areana_t::Purge, I know some of this we're already working on, but I wonder what else we could priorities? how would we prioritize it WRT other perf issues. on the marker chart, all the slow RunFrameRequestCallback events occur within the GCMajor. That's suspicious but I don't know if it's a coincidence, or if it's something like barriers, or cache effects or something else slowing down the mutator. I'd like to know if that happens in other profiles, but I don't think we have enough information to understand why. > However, I don't see baseline recompiles happening often. Also, I don't > know what those periodic grey regions in the profile are - there is no stack > information available there at all - what are we doing? Are those major GC > incremental marking pauses? I don't think I found the same grey regions, because I do see stack information there (a whole bunch of different things on the stacks, mostly DOM but also IPC, system calls and graphics). So I'm not sure. nbp wrote > Isn't the fact that we have huge Baseline compilation in the profile just related to the 25 seconds of GC? Could it be > different GC settings causing more shrinking GC to discard Baseline code? I'm not sure either. The GCs in the profile arn't compacting GCs (pretty sure). Maybe some compacting/shrinking was done earlier that's not captured here. While there's probably GC things we could do, I'm not sure of their priorities compared to other stuff in this profile.
Flags: needinfo?(pbone) → needinfo?(jcoppeard)

(In reply to Paul Bone [:pbone] from comment #20)
In the profile in comment 13 we've got an incremental GC that lasts for 25 seconds. That's way too long and we need to look at ways to mitigate this, for example to increasing slice budget if the GC runs for too long (see bug 1434542). However most GC slices were reasonable and within budget with the exception of the first GC slice. This one spends a lot of time freeing memory in minor GC. We can probably do that off-thread (bug 1518193), although that may lead to lock contention on allocation.

Also the profile seems to suggest that nsJSContext::GarbageCollectNow is calling arena_t::Purge, but I can't find any evidence of that in the code. This is called from MozJemalloc::jemalloc_free_dirty_pages which is run on memory pressure events. If we're in a low memory situation then we can expect degraded performance.

take a look at https://perfht.ml/2SFNZQD

35% of this is in arena_t::Purge again.

8% of this is in gray marking which is now incremental, so the situation may be slightly better now. 10% is in marking weak references which Steve is working on making incremental (bug 1167452). ~8% is discarding JIT code and there's a bug for throwing less of that away (bug 1514281).

(In reply to Kannan Vijayan [:djvj] from comment #19)

Also, I don't know what those periodic grey regions in the profile are - there is no stack information available there at all - what are we doing?

Kannan, I'm not sure what you're referring to here.

Flags: needinfo?(jcoppeard)

(In reply to Jon Coppeard (:jonco) from comment #21)

Also the profile seems to suggest that nsJSContext::GarbageCollectNow is calling arena_t::Purge, but I can't find any evidence of that in the code. This is called from MozJemalloc::jemalloc_free_dirty_pages

Oh, this is also called when deallocating memory. I'm still not sure why the stack looks like it does, but it turns out that any free can cause jemalloc to decide it needs to decommit unused memory which can take a long time. It would be great if this could happen on another thread, or incrementally, or without holding a lock the entire time.

Component: JavaScript Engine → JavaScript: GC
Priority: P1 → P2

Already highlighted in email, but this is a critical issue to resolve for Firefox Reality and we would very much like this to be a P1 if at all possible. Andreas Bovens is aware of the discussions around this topic as well.

Let me look into this.

No longer blocks: 1528330

I just tested sketchfab with latest FxR and performance has actually become worse. Gecko still become completely unresponsive where it is unable to load a new page, or even go back.

Randall, Do you have a profile for today's test. It would be good to see if it's the same problems as before or new problems.

Especially since some things have changed in the GC to make it more incremental.

Thanks.

Flags: needinfo?(rbarker)

I haven't been able to get a profile. Since Gecko becomes unresponsive for so long the remote profiler loses connection and doesn't capture anything.

Flags: needinfo?(rbarker)

Oh damn, sorry I should have realised that.

Performance Impact: --- → ?
Whiteboard: [geckoview:fxr:p1][webvr][qf:js:investigate] → [geckoview:fxr:p1][webvr]
Performance Impact: ? → P3
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.