Closed Bug 1537957 Opened 5 years ago Closed 5 years ago

Too much idle time on the main thread between requestAnimationFrame callbacks

Categories

(Core :: WebVR, enhancement)

All
Android
enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox68 --- affected

People

(Reporter: mstange, Assigned: kip)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxr-ww])

In this profile from a VR scene on the Oculus Go, we spend 15% of the time waiting for the next UpdateDisplayInfo call to come in: https://perfht.ml/2YjS5kC

It would be better to spend that time running JavaScript code for the next frame, unless we're backed up in GPU work.

This profile should show some of our idle time in the immersive mode view of cubes-aframe with 500 nodes on the Go device: https://perfht.ml/2TmlI0Z. The marker to search for is SubmitFrameAtVRDisplayHost.

So the idle time on the main thread corresponds exactly to the SubmitFrameAtVRDisplayHost time in the compositor thread.

That's not too surprising - we wait for FxR to initiate the next frame before we return to the main thread to fire the callback.

However, why do we have to wait for FxR to signal us for the next frame? Do we do this because we want to use the most up-to-date pose information?

From what I can tell, the current setup means we only ever have a single frame in-flight at any time. For optimum throughput, we should instead strive to have two frames in flight: One frame being worked on by JavaScript, and one frame being submitted on the FxR side. So could we use slightly-out-of-date pose information instead?

Preparing the next frame in advance should eliminate the idle time and increase FPS.
Saving 15% of per-frame-time will increase the FPS number by 17.6%. In the profile from comment 0 we were running at 20.1 FPS when we could have been running at 23.7 FPS.

Whiteboard: [fxr-ww]

Kip, are you able to add this to your queue?

Flags: needinfo?(kgilbert)

The question to resolve here is a strategy of having two frames in-flight (one processing JS and one submitting to the VR API) would impose an unacceptable increase in motion-to-photo lag, vs the 15% saving from not idling on the main thread. Assuming that pose data is taken up at the start of scenegraph processing, the double-buffered approach has a best-case pose-to-submission latency of twice the frame rate, which is a lot worse than the current latency. But given that we're well below the native 72Hz refresh rate anyway, I can't see why we're idling for that long in the current case anyway.

(In reply to Markus Stange [:mstange] from comment #2)

So the idle time on the main thread corresponds exactly to the SubmitFrameAtVRDisplayHost time in the compositor thread.

That's not too surprising - we wait for FxR to initiate the next frame before we return to the main thread to fire the callback.

However, why do we have to wait for FxR to signal us for the next frame? Do we do this because we want to use the most up-to-date pose information?

From what I can tell, the current setup means we only ever have a single frame in-flight at any time. For optimum throughput, we should instead strive to have two frames in flight: One frame being worked on by JavaScript, and one frame being submitted on the FxR side. So could we use slightly-out-of-date pose information instead?

Preparing the next frame in advance should eliminate the idle time and increase FPS.
Saving 15% of per-frame-time will increase the FPS number by 17.6%. In the profile from comment 0 we were running at 20.1 FPS when we could have been running at 23.7 FPS.

This could warrant some investigation; however, the precise time to start the frame rendering must be controlled by the VR compositor in the OS / runtime. The reason for this, is that unlike regular vsync, VR runtimes try to dynamically shift the rendering activity to the end of the cycle and deliver a more up-to-date headset pose prediction that would be more accurate than the prediction made immediately after the last vsync. In many of these runtimes (eg, Oculus VR desktop, Oculus Go), there is a notion of "underlapped rendering", in which the GPU activity ping-pongs between the VR compositor and the VR rendering during the vsync interval. The content (eg, WebGL) rendering is often scheduled to occur 2-4ms before the next VBlank start and to be composited in the following frame (overlapping rendering with the vblank interval, but not completing by the end of VBlank). This would look strange in a profile, but is intentional behavior.

In Bug 1466702, I am significantly refactoring (almost rewriting) much of the code involved here. I would suggest doing a profile after that lands and see if it has any effect.

If; however, we have too much delay between the signal from the VR compositor to the WebGL rendering code to start rendering, this would be an area needing much optimization.

May I suggest that the actionable task for this bug be to expose some markers to Gecko Profiler to identify these key parts of the render cycle? Perhaps it would be possible to also collect some telemetry from this at runtime?

Depends on: 1466702
Flags: needinfo?(kgilbert)

One more issue that should be watched out for...

Some VR runtimes function as "closed-loop" systems. They use the timing of the WebVR "SubmitFrame" call to determine explicitly how long the content rendering actually took. This is used as a feedback mechanism to dynamically adjust the frame timing such that frames are not dropped but still need as little re-projection as possible to correct for older pose prediction values. If we are creating additional latency in these "submitFrame" calls, or getting any timestamp values incorrect, this could be causing less ideal scheduling of the frame start times.

Unfortunately, the successor to the WebVR spec, WebXR, has dropped the explicit "SubmitFrame" call -- the browser will be required to infer the time of frame completion based on the WebGL calls. Fortunately, we can overload the function of the WebGL "Commit" call when using WebVR in webworkers, which should be a best-practice for VR content once it is supported.

(In reply to Markus Stange [:mstange] from comment #2)

Thanks again for looking into this in such detail -- These are all great observations and could lead to some optimization opportunities for sure!

So the idle time on the main thread corresponds exactly to the SubmitFrameAtVRDisplayHost time in the compositor thread.

That's not too surprising - we wait for FxR to initiate the next frame before we return to the main thread to fire the callback.

However, why do we have to wait for FxR to signal us for the next frame? Do we do this because we want to use the most up-to-date pose information?
Precisely, see my Comment 5 above for details.

From what I can tell, the current setup means we only ever have a single frame in-flight at any time. For optimum throughput, we should instead strive to have two frames in flight: One frame being worked on by JavaScript, and one frame being submitted on the FxR side. So could we use slightly-out-of-date pose information instead?
The underlying VR compositor is usually already 2 frames ahead, but possibly less on mobile platforms. IMHO, we should probably not use a pose that is farther behind than that, as it is already dependent on a large degree of prediction algorithms and visual re-projection to compensate for the error in the prediction.

Preparing the next frame in advance should eliminate the idle time and increase FPS.
Saving 15% of per-frame-time will increase the FPS number by 17.6%. In the profile from comment 0 we were running at 20.1 FPS when we could have been running at 23.7 FPS.

What I would suggest to WebGL VR engine developers, is to perform CPU-side culling, physics, audio, and gameplay within the space between RAF callbacks. Some rendering can occur outside of RAF also, if it is not view-dependent. Real-time reflection probes, shadow map generation, and GPU-side skin meshes are some good candidates for such optimization.

I would like to know what kind of signals/events the browser could emit that enable such intervals to be more effectively utilized. We could bring these finding back to the W3C immersive-web group to inform the WebXR spec development.

I would like to capture one more thought...

If WebGPU allows for command buffer recycling, then command buffer generation could be performed out of the RAF callbacks. During RAF, as little as possible should be done. With recycled or pre-generated command buffers, the ideal renderer would do little more than update some uniform values and to execute the command buffers that have already incurred the driver overhead of parsing and validation.

Assignee: nobody → kgilbert

Web developers have window.requestIdleCallback at their disposals. It would be helpful for Fernando's team to audit three.js, Babylon.js, A-Frame, and other libraries to see if there are low-hanging fruit to fix.

https://developer.mozilla.org/en-US/docs/Web/API/Window/requestIdleCallback
https://developer.mozilla.org/en-US/docs/Web/API/Background_Tasks_API#Example

Flags: needinfo?(fserrano)

(In reply to Christopher Van Wiemeersch [:cvan] from comment #9)

Web developers have window.requestIdleCallback at their disposals. It would be helpful for Fernando's team to audit three.js, Babylon.js, A-Frame, and other libraries to see if there are low-hanging fruit to fix.

https://developer.mozilla.org/en-US/docs/Web/API/Window/requestIdleCallback
https://developer.mozilla.org/en-US/docs/Web/API/Background_Tasks_API#Example

For some context on window.requestIdleCallback and WebXR:

https://github.com/immersive-web/webvr/issues/26

There is still open discussion on if WebXR needs a separate window.requestIdleCallback (or should have an effect on the existing window.requestIdleCallback). In particular, we need to determine when the browser should trigger the event when VR and the 2d display are rendering at differing frame rates simultaneously.

Given the above discussion, I would propose we close this as wontfix for now. There's some useful discussion which could be captured in a separate issue for content frameworks, plus followup work on further performance profiling.

Flags: needinfo?(kgilbert)

Closing as per Comment 11

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(kgilbert)
Resolution: --- → WONTFIX
Flags: needinfo?(fserrano)
You need to log in before you can comment on or make changes to this bug.