1668878 - When webrender misses a vsync on android we try too hard to catch up and end up waiting in glClear

Assignee

Description

•

4 years ago

At least, I think that is what is happening. Profile (scrolling quickly on mozilla.org on a pixel 2): https://share.firefox.dev/3ikGQkt

Sometimes a render takes longer than a vsync interval, eg due to a texture cache resize (glCopyImageSubData) or shader compilation taking a long time. On Android we will then see a number of laggy frames after this one, and the profiler shows a lot of time spent in glClear(). I see this on both Adreno and Mali.

If I comment out the glClear calls in webrender, then the time shifts to glDrawElements. Basically the first call which writes to the main framebuffer. I assume we're attempting to render too many frames ahead, so we must wait in glClear for a buffer to become available.

I added some logging and this is how things seem to fall apart:

The renderer thread is busy due to eg a long glCopyImageSubData
WebRenderBridgeParent::CompositeToTarget gets called from vsync. We have 2 pending frames, so it sets mSkippedComposite = true
A frame is finished rendering, we call NotifyDidRender() which calls CompositeIfNeeded() which calls CompositeToTarget() because mSkippedComposite == true.
We call another CompositeToTarget() on the next vsync, and then each vsync after that, but each time glClear now takes something like 13ms.
Eventually things settle back to normal, maybe it takes another really long function like glLinkProgram to break the cycle.

If I comment out the call to CompositeToTarget() in CompositeIfNeeded() (we still have to set mSkippedComposite = false; otherwise we'll never do a composite again) then none of this seems to happen. But presumably we do indeed want to try to catch up composite there, but our mechanism for not getting too far ahead is failing somewhere.

Would be very interested to hear thoughts/suggestions!

Matt Woodrow (:mattwoodrow)

Comment 1

•

4 years ago

Does setting eglSwapInterval(0) help with this?

It seems like the point of the catch up composite is to produce two frames in a single vsync interval, so that we don't ever actually display the slow frame and instead display one we started more recently (with newer APZ data etc).

The behaviour you're describing sounds like the implementation is forcing every swapBuffers to be presented, and will just block and wait for the next vsync interval if we try get ahead.

It looks like we don't set eglSwapInterval anywhere, so it's probably at the default of 1, which I think matches what you see. Setting it to 0 (assuming EGL_MIN_SWAP_INTERVAL allows us to) might help get closer to the behaviour we intend.

That said, the idea of a catch up composite seems a bit risky in general to me. Rendering 'completing' only means that we've called swapped buffers and flushed all of our work to the GPU, not that it's actually completed. It seems fairly likely that attempting to start a new frame mid-vsync wouldn't actually finish in time to replace the late frame, and we'll just be weirdly out of time with vsync for a while. Frame scheduling is hard.

Matt Woodrow (:mattwoodrow)

Comment 2

•

4 years ago

Note that we also don't implement RenderCompositor::WaitForGPU on RenderCompositorEGL.

Robert Mader [:rmader]

Comment 3

•

4 years ago

In case you are looking into solving this with eglSwapInterval, you might want to have EGL_EXT_swap_control_tear (https://github.com/KhronosGroup/EGL-Registry/pull/113) in mind:

    This extension extends EGL by allowing late swaps to occur without
    synchronization to the video frame. This per-surface attribute reduces the
    visual stutter on late frames and reduces the stall on subsequent frames.

Unfortunately it's not yet finished, i.e. it will take a while until it will be available in the wild :/

Jamie Nicol [:jnicol]

Assignee

Comment 4

•

4 years ago

I had tried adding an implementation for RenderCompositorEGL::WaitForGPU(), calling eglClientWaitSync() for a sync created in the previous frame's InsertFrameDoneSync(). It seemed to return immediately every time, regardless of the swap interval. Even waiting on the frame that had just been swapped, it always returned within a millisecond or two. Not sure if I'm doing something wrong or if that's to be expected. Maybe it waits on the flush rather than the GPU actually finishing, or maybe because we're CPU bound rather than GPU bound on the pages I'm testing...

Anyway, eglSwapInterval(0) does indeed seem to fix the issue. So I guess we can try that if we do indeed need to do catch up composites. But let's try just removing the catch up composites first and see what happens.

Jamie Nicol [:jnicol]

Assignee

Comment 5

•

4 years ago

For the record it looks like catch up composites were added in bug 1510899, and we weren't entirely convinced they were the right thing to do at the time.

Markus Stange [:mstange]

Comment 6

•

4 years ago

(In reply to Jamie Nicol [:jnicol] from comment #4)

Anyway, eglSwapInterval(0) does indeed seem to fix the issue.

Do we know what this does behind the scenes? Does it cause tearing? If not, does it allocate extra buffers as needed?

(In reply to Jamie Nicol [:jnicol] from comment #4)

I had tried adding an implementation for RenderCompositorEGL::WaitForGPU(), calling eglClientWaitSync() for a sync created in the previous frame's InsertFrameDoneSync(). It seemed to return immediately every time, regardless of the swap interval. Even waiting on the frame that had just been swapped, it always returned within a millisecond or two. Not sure if I'm doing something wrong or if that's to be expected. Maybe it waits on the flush rather than the GPU actually finishing, or maybe because we're CPU bound rather than GPU bound on the pages I'm testing...

It does sound like your tests were not GPU bound. I think the main question we need to answer is, if you make this change, do we ever block in glClear()? I would expect all the blocking time to move from glClear() to eglClientWaitSync().

But let's try just removing the catch up composites first and see what happens.

Doesn't this have the potential to cause "stuck" rendering if nothing else causes a composite? CompositeToTarget is called because we have something we need to put on the screen, for example a new WR display list, a scroll position update, or a new video frame. If we simply ignore some calls because we're currently busy, doesn't this mean that the new display list / scroll position / video frame won't make it onto the screen until something else changes?

Andrew Osmond [:aosmond] (he/him)

Updated

•

4 years ago

Severity: -- → S3

Jamie Nicol [:jnicol]

Assignee

Comment 7

•

4 years ago

Do we know what this does behind the scenes? Does it cause tearing? If not, does it allocate extra buffers as needed?

I cannot see any tearing, from my limited testing. Every android device I've seen is triple-buffered, so would that mean it can just swap the middle and backbuffers? However, I can see that the buffer age can now be 4 or sometimes even 5, whereas it was consistently 3 when the swap interval is 1. This is slightly unexpected, I was guessing it might now sometimes be 2 if my swapping-middle-and-backbuffer theory was true. Maybe it means it's allocating extra buffers. (Side note: Currently we only track ages up to 3 for partial present, so if we went this route we might need to extend that.)

It does sound like your tests were not GPU bound. I think the main question we need to answer is, if you make this change, do we ever block in glClear()? I would expect all the blocking time to move from glClear() to eglClientWaitSync().

Yes, even after adding the WaitForGPU implementation, we block in glClear. (Actually, since I filed this, bug 1575765 has landed and we now block when querying the buffer age rather than in glClear.)

Doesn't this have the potential to cause "stuck" rendering if nothing else causes a composite?

I think we will still composite the frame, but we will just wait until the next vsync notification. (And if there is a newer frame available by that time we might skip to it, so that specific frame might not be composited, but a frame will definitely be.) I need to confirm this behaviour though.

Markus Stange [:mstange]

Comment 8

•

4 years ago

(In reply to Jamie Nicol [:jnicol] from comment #7)

It does sound like your tests were not GPU bound. I think the main question we need to answer is, if you make this change, do we ever block in glClear()? I would expect all the blocking time to move from glClear() to eglClientWaitSync().

Yes, even after adding the WaitForGPU implementation, we block in glClear. (Actually, since I filed this, bug 1575765 has landed and we now block when querying the buffer age rather than in glClear.)

Thanks for testing! This is a rather disappointing result. Could you share a profile that shows this?

Markus Stange [:mstange]

Comment 9

•

4 years ago

Actually, no, that makes sense. If eglSwapInterval(0) makes a difference, this means that the blocking is not because we're submitting too much GPU work, it's because we're submitting frames more quickly than Vsync (after too much GPU work caused a one-time delay in the pipeline). So eglClientWaitSync() won't block, because the GPU isn't busy. But glClear() blocks because the buffer of frame N-2 is still held until vsync.

Markus Stange [:mstange]

Comment 10

•

4 years ago

Here's a testcase where we submit too much GPU work: http://tests.themasta.com/too-much-gpu-work.html . My Moto G5 blocks in eglSwapBuffers() (this is a slightly outdated Fenix build).

Jamie Nicol [:jnicol]

Assignee

Comment 11

•

4 years ago

I think you must be right about it causing rendering to get stuck: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0ccda9a8588e72e05f48b515875f2c3c0dd3996

There are some waiting for MozAfterPaint failures.

Matt Woodrow (:mattwoodrow)

Comment 12

•

4 years ago

I think you need to replace mSkippedComposite=true with mCompositorScheduler->ScheduleComposition() so that we know that we still need a new composite on the next vsync.

Jamie Nicol [:jnicol]

Assignee

Comment 13

•

4 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #12)

I think you need to replace mSkippedComposite=true with mCompositorScheduler->ScheduleComposition() so that we know that we still need a new composite on the next vsync.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f89ca2bc3fbe087ca3045fedea4eadaab695d85

Yep, that seems to fix the failures, thanks!

Jamie Nicol [:jnicol]

Assignee

Comment 14

•

4 years ago

(In reply to Markus Stange [:mstange] from comment #9)

Actually, no, that makes sense. If eglSwapInterval(0) makes a difference, this means that the blocking is not because we're submitting too much GPU work, it's because we're submitting frames more quickly than Vsync (after too much GPU work caused a one-time delay in the pipeline). So eglClientWaitSync() won't block, because the GPU isn't busy. But glClear() blocks because the buffer of frame N-2 is still held until vsync.

Okay, I think that makes sense to me too. So WaitForGPU() should help in cases where the GPU is overwhelmed, so we should implement it, but it is a separate issue to this bug?

Markus Stange [:mstange]

Comment 15

•

4 years ago

(In reply to Jamie Nicol [:jnicol] from comment #14)

So WaitForGPU() should help in cases where the GPU is overwhelmed, so we should implement it, but it is a separate issue to this bug?

I think so, yes. And it might not actually have any effect, if eglSwapBuffers() already enforces backpressure in the same way.

Jamie Nicol [:jnicol]

Assignee

Comment 16

•

4 years ago

I see a couple of macos talos regressions from disabling catch up composites: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=fcbdb422a4959f7e35936c61a764988db1dda8d8&newProject=try&newRevision=ecd4e8ce72d46bd43886e648f285e0ef0299fdd0&framework=1

Any ideas what could be causing them, Matt?

Flags: needinfo?(matt.woodrow)

Matt Woodrow (:mattwoodrow)

Comment 17

•

4 years ago

My only guess is that we're now scheduling a new composition (from the compositor thread) each time we determine that the render thread is still busy. Since the compositor thread doesn't really do any work, then this might mean it wakes up a lot more in the ASAP mode used for talos tests, attempting to composite, failing, and then requesting another composite.

The old code only attempted a new composite in response to something happening from the render thread, so there might be less wasted wakeups.

Flags: needinfo?(matt.woodrow)

Matt Woodrow (:mattwoodrow)

Comment 18

•

4 years ago

You could try calling ScheduleComposition from WebRenderBridgeParent::CompositeIfNeeded (reverting most of your patch), to get the same behaviour in ASAP mode?

Jamie Nicol [:jnicol]

Assignee

Comment 19

•

4 years ago

Yep, that did the trick! https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=04788f1a0a38d6cf4b98b0ebc224e658fc648154&newProject=try&newRevision=8268d26729fe35ea30914356a2b7c36e24e92e66&framework=1

With that change, scrolling and zooming feels much better on Android. I'll put a patch up now. However, as I said in comment 7, now we dequeue the surface in eglQuerySurface(EGL_BUFFER_AGE) rather than glClear(), which occurs much earlier in the frame: In RenderCompositorEGL::BeginFrame() rather than Renderer::composite_simple() or Renderer::draw_frame(). As a result, profiles still show some time blocking for a surface (though it's like 5% vs 50%).

There's no reason why we need to query the buffer age so early, we can wait until composite_simple() instead. I'll write a patch for that too.

Jamie Nicol [:jnicol]

Assignee

Updated

•

4 years ago

Comment 20

•

4 years ago

Attached file Bug 1668878 - Stop doing catch-up composites on webrender. r?mattwoodrow — Details

Currently, when webrender fails to render a frame on time, it will
attempt to catch up by rendering the next frame immediately (part way
through the vsync period) rather than delaying until the next
vsync. The idea is that even though the first frame missed vsync, the
next frame can hopefully still be rendered on time.

However, on Android, because of the default EGL swap interval of zero,
the driver blocks rather than allowing 2 frames to be rendered in a
single vsync. This guarantees that the next frame will also miss its
vsync, and so on. Other platforms may avoid this by setting the swap
interval to zero. While that was an option on Android too, it was felt
that the premise of catch up composites was still flawed: a
late-started composite is unlikely to meet its deadline, even if the
driver does not block. Better to cut our losses and wait for the next
vsync.

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → jnicol

Status: NEW → ASSIGNED

Pulsebot

Comment 21

•

4 years ago

Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/741d451756e1 Stop doing catch-up composites on webrender. r=mattwoodrow

Cristina Coroiu [:ccoroiu]

Comment 22

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/741d451756e1

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox83: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 83 Branch

Jamie Nicol [:jnicol]

Assignee

Updated

•

4 years ago

Regressions: 1670313