Closed Bug 1668878 Opened 4 years ago Closed 4 years ago

When webrender misses a vsync on android we try too hard to catch up and end up waiting in glClear

Categories

(Core :: Graphics: WebRender, defect)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
83 Branch
Tracking Status
firefox83 --- fixed

People

(Reporter: jnicol, Assigned: jnicol)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

At least, I think that is what is happening. Profile (scrolling quickly on mozilla.org on a pixel 2): https://share.firefox.dev/3ikGQkt

Sometimes a render takes longer than a vsync interval, eg due to a texture cache resize (glCopyImageSubData) or shader compilation taking a long time. On Android we will then see a number of laggy frames after this one, and the profiler shows a lot of time spent in glClear(). I see this on both Adreno and Mali.

If I comment out the glClear calls in webrender, then the time shifts to glDrawElements. Basically the first call which writes to the main framebuffer. I assume we're attempting to render too many frames ahead, so we must wait in glClear for a buffer to become available.

I added some logging and this is how things seem to fall apart:

  • The renderer thread is busy due to eg a long glCopyImageSubData
  • WebRenderBridgeParent::CompositeToTarget gets called from vsync. We have 2 pending frames, so it sets mSkippedComposite = true
  • A frame is finished rendering, we call NotifyDidRender() which calls CompositeIfNeeded() which calls CompositeToTarget() because mSkippedComposite == true.
  • We call another CompositeToTarget() on the next vsync, and then each vsync after that, but each time glClear now takes something like 13ms.
  • Eventually things settle back to normal, maybe it takes another really long function like glLinkProgram to break the cycle.

If I comment out the call to CompositeToTarget() in CompositeIfNeeded() (we still have to set mSkippedComposite = false; otherwise we'll never do a composite again) then none of this seems to happen. But presumably we do indeed want to try to catch up composite there, but our mechanism for not getting too far ahead is failing somewhere.

Would be very interested to hear thoughts/suggestions!

Does setting eglSwapInterval(0) help with this?

It seems like the point of the catch up composite is to produce two frames in a single vsync interval, so that we don't ever actually display the slow frame and instead display one we started more recently (with newer APZ data etc).

The behaviour you're describing sounds like the implementation is forcing every swapBuffers to be presented, and will just block and wait for the next vsync interval if we try get ahead.

It looks like we don't set eglSwapInterval anywhere, so it's probably at the default of 1, which I think matches what you see. Setting it to 0 (assuming EGL_MIN_SWAP_INTERVAL allows us to) might help get closer to the behaviour we intend.

That said, the idea of a catch up composite seems a bit risky in general to me. Rendering 'completing' only means that we've called swapped buffers and flushed all of our work to the GPU, not that it's actually completed. It seems fairly likely that attempting to start a new frame mid-vsync wouldn't actually finish in time to replace the late frame, and we'll just be weirdly out of time with vsync for a while. Frame scheduling is hard.

Note that we also don't implement RenderCompositor::WaitForGPU on RenderCompositorEGL.

In case you are looking into solving this with eglSwapInterval, you might want to have EGL_EXT_swap_control_tear (https://github.com/KhronosGroup/EGL-Registry/pull/113) in mind:

    This extension extends EGL by allowing late swaps to occur without
    synchronization to the video frame. This per-surface attribute reduces the
    visual stutter on late frames and reduces the stall on subsequent frames.

Unfortunately it's not yet finished, i.e. it will take a while until it will be available in the wild :/

I had tried adding an implementation for RenderCompositorEGL::WaitForGPU(), calling eglClientWaitSync() for a sync created in the previous frame's InsertFrameDoneSync(). It seemed to return immediately every time, regardless of the swap interval. Even waiting on the frame that had just been swapped, it always returned within a millisecond or two. Not sure if I'm doing something wrong or if that's to be expected. Maybe it waits on the flush rather than the GPU actually finishing, or maybe because we're CPU bound rather than GPU bound on the pages I'm testing...

Anyway, eglSwapInterval(0) does indeed seem to fix the issue. So I guess we can try that if we do indeed need to do catch up composites. But let's try just removing the catch up composites first and see what happens.

For the record it looks like catch up composites were added in bug 1510899, and we weren't entirely convinced they were the right thing to do at the time.

(In reply to Jamie Nicol [:jnicol] from comment #4)

Anyway, eglSwapInterval(0) does indeed seem to fix the issue.

Do we know what this does behind the scenes? Does it cause tearing? If not, does it allocate extra buffers as needed?

(In reply to Jamie Nicol [:jnicol] from comment #4)

I had tried adding an implementation for RenderCompositorEGL::WaitForGPU(), calling eglClientWaitSync() for a sync created in the previous frame's InsertFrameDoneSync(). It seemed to return immediately every time, regardless of the swap interval. Even waiting on the frame that had just been swapped, it always returned within a millisecond or two. Not sure if I'm doing something wrong or if that's to be expected. Maybe it waits on the flush rather than the GPU actually finishing, or maybe because we're CPU bound rather than GPU bound on the pages I'm testing...

It does sound like your tests were not GPU bound. I think the main question we need to answer is, if you make this change, do we ever block in glClear()? I would expect all the blocking time to move from glClear() to eglClientWaitSync().

But let's try just removing the catch up composites first and see what happens.

Doesn't this have the potential to cause "stuck" rendering if nothing else causes a composite? CompositeToTarget is called because we have something we need to put on the screen, for example a new WR display list, a scroll position update, or a new video frame. If we simply ignore some calls because we're currently busy, doesn't this mean that the new display list / scroll position / video frame won't make it onto the screen until something else changes?

Severity: -- → S3

Do we know what this does behind the scenes? Does it cause tearing? If not, does it allocate extra buffers as needed?

I cannot see any tearing, from my limited testing. Every android device I've seen is triple-buffered, so would that mean it can just swap the middle and backbuffers? However, I can see that the buffer age can now be 4 or sometimes even 5, whereas it was consistently 3 when the swap interval is 1. This is slightly unexpected, I was guessing it might now sometimes be 2 if my swapping-middle-and-backbuffer theory was true. Maybe it means it's allocating extra buffers. (Side note: Currently we only track ages up to 3 for partial present, so if we went this route we might need to extend that.)

It does sound like your tests were not GPU bound. I think the main question we need to answer is, if you make this change, do we ever block in glClear()? I would expect all the blocking time to move from glClear() to eglClientWaitSync().

Yes, even after adding the WaitForGPU implementation, we block in glClear. (Actually, since I filed this, bug 1575765 has landed and we now block when querying the buffer age rather than in glClear.)

Doesn't this have the potential to cause "stuck" rendering if nothing else causes a composite?

I think we will still composite the frame, but we will just wait until the next vsync notification. (And if there is a newer frame available by that time we might skip to it, so that specific frame might not be composited, but a frame will definitely be.) I need to confirm this behaviour though.

(In reply to Jamie Nicol [:jnicol] from comment #7)

It does sound like your tests were not GPU bound. I think the main question we need to answer is, if you make this change, do we ever block in glClear()? I would expect all the blocking time to move from glClear() to eglClientWaitSync().

Yes, even after adding the WaitForGPU implementation, we block in glClear. (Actually, since I filed this, bug 1575765 has landed and we now block when querying the buffer age rather than in glClear.)

Thanks for testing! This is a rather disappointing result. Could you share a profile that shows this?

Actually, no, that makes sense. If eglSwapInterval(0) makes a difference, this means that the blocking is not because we're submitting too much GPU work, it's because we're submitting frames more quickly than Vsync (after too much GPU work caused a one-time delay in the pipeline). So eglClientWaitSync() won't block, because the GPU isn't busy. But glClear() blocks because the buffer of frame N-2 is still held until vsync.

Here's a testcase where we submit too much GPU work: http://tests.themasta.com/too-much-gpu-work.html . My Moto G5 blocks in eglSwapBuffers() (this is a slightly outdated Fenix build).

I think you must be right about it causing rendering to get stuck: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0ccda9a8588e72e05f48b515875f2c3c0dd3996

There are some waiting for MozAfterPaint failures.

I think you need to replace mSkippedComposite=true with mCompositorScheduler->ScheduleComposition() so that we know that we still need a new composite on the next vsync.

(In reply to Matt Woodrow (:mattwoodrow) from comment #12)

I think you need to replace mSkippedComposite=true with mCompositorScheduler->ScheduleComposition() so that we know that we still need a new composite on the next vsync.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f89ca2bc3fbe087ca3045fedea4eadaab695d85

Yep, that seems to fix the failures, thanks!

(In reply to Markus Stange [:mstange] from comment #9)

Actually, no, that makes sense. If eglSwapInterval(0) makes a difference, this means that the blocking is not because we're submitting too much GPU work, it's because we're submitting frames more quickly than Vsync (after too much GPU work caused a one-time delay in the pipeline). So eglClientWaitSync() won't block, because the GPU isn't busy. But glClear() blocks because the buffer of frame N-2 is still held until vsync.

Okay, I think that makes sense to me too. So WaitForGPU() should help in cases where the GPU is overwhelmed, so we should implement it, but it is a separate issue to this bug?

(In reply to Jamie Nicol [:jnicol] from comment #14)

So WaitForGPU() should help in cases where the GPU is overwhelmed, so we should implement it, but it is a separate issue to this bug?

I think so, yes. And it might not actually have any effect, if eglSwapBuffers() already enforces backpressure in the same way.

Flags: needinfo?(matt.woodrow)

My only guess is that we're now scheduling a new composition (from the compositor thread) each time we determine that the render thread is still busy. Since the compositor thread doesn't really do any work, then this might mean it wakes up a lot more in the ASAP mode used for talos tests, attempting to composite, failing, and then requesting another composite.

The old code only attempted a new composite in response to something happening from the render thread, so there might be less wasted wakeups.

Flags: needinfo?(matt.woodrow)

You could try calling ScheduleComposition from WebRenderBridgeParent::CompositeIfNeeded (reverting most of your patch), to get the same behaviour in ASAP mode?

Yep, that did the trick! https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=04788f1a0a38d6cf4b98b0ebc224e658fc648154&newProject=try&newRevision=8268d26729fe35ea30914356a2b7c36e24e92e66&framework=1

With that change, scrolling and zooming feels much better on Android. I'll put a patch up now. However, as I said in comment 7, now we dequeue the surface in eglQuerySurface(EGL_BUFFER_AGE) rather than glClear(), which occurs much earlier in the frame: In RenderCompositorEGL::BeginFrame() rather than Renderer::composite_simple() or Renderer::draw_frame(). As a result, profiles still show some time blocking for a surface (though it's like 5% vs 50%).

There's no reason why we need to query the buffer age so early, we can wait until composite_simple() instead. I'll write a patch for that too.

See Also: → 1670014

Currently, when webrender fails to render a frame on time, it will
attempt to catch up by rendering the next frame immediately (part way
through the vsync period) rather than delaying until the next
vsync. The idea is that even though the first frame missed vsync, the
next frame can hopefully still be rendered on time.

However, on Android, because of the default EGL swap interval of zero,
the driver blocks rather than allowing 2 frames to be rendered in a
single vsync. This guarantees that the next frame will also miss its
vsync, and so on. Other platforms may avoid this by setting the swap
interval to zero. While that was an option on Android too, it was felt
that the premise of catch up composites was still flawed: a
late-started composite is unlikely to meet its deadline, even if the
driver does not block. Better to cut our losses and wait for the next
vsync.

Assignee: nobody → jnicol
Status: NEW → ASSIGNED
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/741d451756e1 Stop doing catch-up composites on webrender. r=mattwoodrow
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch
Regressions: 1670313
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: