Closed Bug 1877429 Opened 4 months ago Closed 3 months ago

Intermittent Flickering on a Codepen canvas demo (https://codepen.io/frontsideup/pen/mgNNvx)

Categories

(Core :: Graphics: Canvas2D, defect)

defect

Tracking

()

VERIFIED FIXED
124 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox122 --- unaffected
firefox123 --- wontfix
firefox124 --- verified
firefox125 --- verified

People

(Reporter: mayankleoboy1, Assigned: aosmond)

References

(Regression, )

Details

(Keywords: perf-alert, regression)

Crash Data

Attachments

(5 files)

Go to https://codepen.io/frontsideup/pen/mgNNvx

AR: the demo flickers.. Sometimes all the lines will disappear and only the red line is visible. Try resizing the demo-pane if it doesnt repro.
ER: Not so

Regression range:
Bug 1870488 - Part 3. Make OffscreenCanvas use PersistentBufferProvider on the display pipeline. r=lsalzman

Differential Revision: https://phabricator.services.mozilla.com/D195124

Attached file about:support
Summary: Flickering on a Codepen canvas demo (https://codepen.io/frontsideup/pen/mgNNvx) → Intermittent Flickering on a Codepen canvas demo (https://codepen.io/frontsideup/pen/mgNNvx)
Flags: needinfo?(aosmond)

Maybe try opening the demo in two tabs and then drag one of them out such that both demo tabs are in the foreground.

Set release status flags based on info from the regressing bug 1870488

Does the fix in bug 1877010 help here? I fixed a bit of bugginess related to offscreen canvas in there.

Severity: -- → S3
Flags: needinfo?(mayankleoboy1)
See Also: → 1877010

(In reply to Lee Salzman [:lsalzman] from comment #4)

Does the fix in bug 1877010 help here? I fixed a bit of bugginess related to offscreen canvas in there.

I tried the autoland build, and that did not fix this bug.

Flags: needinfo?(mayankleoboy1)

screencapture. The funny thing is that as soon as i start the screencapture, the flashes decrease, and the the "all lines disappear except teh red one" thing completely stops happening.

FWIW, this repros with d2d-canvas, gpu-canvas and even no-remote canvas.

Andrew, any guesses why occasionally some frame in offscreen canvas may fail to present here? I thought I cleaned up some incidental issues with bug 1877010 related to remote + offscreen canvas that may cause similar sadness when erroneous forwarder transaction tracking is involved, but it seems like even Skia offscreen canvas is producing this too?

Attached video skia-canvas-bad.m4v
Attached video VID_20240202_130755.mp4

"Good" version from a build of 1Oct2024. This is using the default backend (i.e. d2d-canvas) for ease of demonstration.

(In reply to Mayank Bansal from comment #7)

FWIW, this repros with d2d-canvas, gpu-canvas and even no-remote canvas.

OffscreenCanvas right now always uses Skia. We haven't landed the patches to make it use GPU/WebGL or D2D based canvases.

If I comment out, it stops the flickering and partial draws for me and seems to render the same as Chrome. I'm not sure if this is because the timing is different or if it is somehow related to the filters.

$.filter = `hue-rotate(${hColor}deg) opacity(100%)`;

It appears if I delay on each stroke call, I can reproduce this as well, so removing the filter must just change the timing sensitivity.

I think something is going wrong with the readlocks.

If I sync flush the ImageBridgeChild thread event queue in OffscreenCanvasDisplayHelper::CommitFrameToCompositor, the problem goes away:
https://searchfox.org/mozilla-central/rev/6b8a3f804789fb865f42af54e9d2fef9dd3ec74d/dom/canvas/OffscreenCanvasDisplayHelper.cpp#284

The readlock for the compositor is only acquired in TextureClient::OnForwardedToHost:
https://searchfox.org/mozilla-central/rev/6b8a3f804789fb865f42af54e9d2fef9dd3ec74d/gfx/layers/client/TextureClient.cpp#862

On the ImageBridgeChild thread, which we dispatched to asynchronously. If we took too long handling requestAnimationFrame, we may have run out of budget and the next callback gets queued immediately/very soon after. Then we can reuse a buffer we just sent for presentation.

Ideally we can avoid blocking on the ImageBridgeChild thread, and simply increment the read count for the compositor in advance while on the main/DOM worker thread for OffscreenCanvas.

Assignee: nobody → aosmond
Status: NEW → ASSIGNED
Flags: needinfo?(aosmond)

When OffscreenCanvas::CommitFrameToCompositor uses the non-remote
texture canvas path with Skia, it uses ImageBridgeChild for compositing.
When ImageContainer::SetCurrentImages is called, there was an
intermediate state where the relevant textures were not yet marked as
read only for the compositor's consumption, because the event to do so
was dispatched asynchronously to the ImageBridgeChild thread. If the
owning thread of the canvas (main or DOM worker) ran immediately after
CommitFrameToCompositor, then we could run into texture reuse since
nothing marked the texture yet as being used for compositing. This had
the end result of sometimes displaying back buffer textures currently
being used for drawing on the display pipeline.

This patch makes it so that we mark OffscreenCanvas textures as read
only for the compositor before dispatching, and releasing the lock
either when we swap the images in the ImageContainer (winning the race
with ImageBridgeChild), or after the compositor has finished with it
(losing the race, if any, with ImageBridgeChild).

Additionally, to handle better the case where we run out of buffers, we
need to implement ImageBridgeChild::SyncWithCompositor, to be analogous
to how WebRenderBridgeChild::SyncWithCompositor works. We achieve this
by calling from ImageBridgeChild back into the appropriate
WebRenderBridgeChild based on the window ID associated with the canvas,

It also adds a new pref, gfx.offscreencanvas.shared-provider, which
allows one to switch between PersistentBufferProviderShared and Basic.
The latter of which is used if we fallback from using shared buffers if
it takes too long to get the shared buffers back from the compositor.

The performance in the codepen demo is pretty terrible for non-OffscreenCanvas reasons. There is low hanging fruit on the software filter pathways I can fix to make this usable on at least a powerful machine. That will land in another bug. My changes had the potential to appear in this test case, as it stresses the pathways I added, but it barely shows up in my local profiling.

Pushed by aosmond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/efdf3af4f7fe
Prevent offscreen canvas2d updates from racing with compositing. r=gfx-reviewers,lsalzman

Preliminary talos results suggest this should be a pretty big win for Linux (and possibly OSX/Android), and smaller for Windows. The reason appears to be that non-Windows wasn't getting PersistentBufferProviderShared at all and was always falling back to PersistentBufferProviderBasic. This also may improve a few Windows numbers, but to a lesser extent, since it seemed to be falling back less.

Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 124 Branch

Latest nightly fixes the issue for me, albeit it is very slow (separate problem).

Duplicate of this bug: 1879687

Comment on attachment 9378836 [details]
Bug 1877429 - Prevent offscreen canvas2d updates from racing with compositing.

Beta/Release Uplift Approval Request

  • User impact if declined: OffscreenCanvas display/stability issues
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Once the boilerplate/plumbing is stripped away, the patch is fairly simple. We rarely hit the edge case of needing to call SyncWithCompositor, but if we do, it uses the exact same mechanism for OffscreenCanvas as we do for HTMLCanvasElement. If it turns out we still have issues with PersistentBufferProviderShared, it also provides an escape hatch to use PersistentBufferProviderBasic which is much more simple/basic.
  • String changes made/needed:
  • Is Android affected?: Yes
Attachment #9378836 - Flags: approval-mozilla-beta?

I realize we might have missed the typical window, but strong consideration should be given for uplifting in a dot release if there is a driver for one.

Copying crash signatures from duplicate bugs.

Crash Signature: [@ mozilla::layers::TextureClient::Unlock]

Comment on attachment 9378836 [details]
Bug 1877429 - Prevent offscreen canvas2d updates from racing with compositing.

Our beta cycle is over, it has almost no bake time on nightly and we build our 123 release candidate tomorrow, this feels risky for the benefit it brings, let's have it ride the 124 train, thanks.

Attachment #9378836 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Comment on attachment 9378836 [details]
Bug 1877429 - Prevent offscreen canvas2d updates from racing with compositing.

Moving to our mozilla-release request queue to reevaluate for the planned 123.0.1 dot release (as per comment #25)

Attachment #9378836 - Flags: approval-mozilla-release?

(In reply to Pulsebot from comment #19)

Pushed by aosmond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/efdf3af4f7fe
Prevent offscreen canvas2d updates from racing with compositing.
r=gfx-reviewers,lsalzman

Perfherder has detected a browsertime performance improvement from efdf3af4f7fe
== Change summary for alert #41396 (as of Thu, 08 Feb 2024 00:06:25 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
40% offscreencanvas_webcodecs_worker_2d_h264 offscreencanvas_webcodecs_worker_2d_h264 Mean time across 100 frames: linux1804-64-shippable-qr e10s fission stylo webrender 12.25 -> 7.37
39% offscreencanvas_webcodecs_main_2d_h264 offscreencanvas_webcodecs_main_2d_h264 Mean time across 100 frames: linux1804-64-shippable-qr e10s fission stylo webrender 12.10 -> 7.36
39% offscreencanvas_webcodecs_worker_2d_h264 offscreencanvas_webcodecs_worker_2d_h264 Mean time across 100 frames: linux1804-64-shippable-qr e10s fission stylo webgl-ipc webrender 12.27 -> 7.47
39% offscreencanvas_webcodecs_worker_2d_vp9 offscreencanvas_webcodecs_worker_2d_vp9 Mean time across 100 frames: linux1804-64-shippable-qr e10s fission stylo webgl-ipc webrender 12.23 -> 7.46
39% offscreencanvas_webcodecs_worker_2d_h264 offscreencanvas_webcodecs_worker_2d_h264 Mean time across 100 frames: linux1804-64-shippable-qr e10s fission stylo webrender-sw 11.90 -> 7.26
... ... ... ... ...
13% offscreencanvas_webcodecs_worker_2d_av1 offscreencanvas_webcodecs_worker_2d_av1 Mean time across 100 frames: windows10-64-shippable-qr e10s fission stylo webrender-sw 13.47 -> 11.75

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=41396

Flags: qe-verify+
QA Whiteboard: [qa-triaged]

Comment on attachment 9378836 [details]
Bug 1877429 - Prevent offscreen canvas2d updates from racing with compositing.

That's a significant amount of code changed for an S3 and we only had one crash in 123, let's have it ride the 124 train. Thanks.

Attachment #9378836 - Flags: approval-mozilla-release? → approval-mozilla-release-

I was able to reproduce the issue on Win10x64 using FF build 124.0a1(20240130214506).
Verified as fixed on Win10x64/Mac12.6 using FF builds 124.0b5 and 125.0a1(2024-02-28).

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: