Open Bug 1884493 Opened 2 years ago Updated 1 year ago

Browser completely freezes when trying to capture the profile of a Canvas demo

Categories

(Core :: Graphics: CanvasWebGL, defect, P2)

defect

Tracking

()

Tracking Status
firefox-esr115 --- unaffected
firefox123 --- wontfix
firefox124 --- wontfix
firefox125 --- fix-optional

People

(Reporter: mayankleoboy1, Assigned: jgilbert)

References

(Regression, )

Details

(Keywords: regression)

Attachments

(3 files, 1 obsolete file)

  1. Use latest Nightly (should have d2d-canvas enabled by default)
  2. Enable the Firefox profiler. Select the "Graphics" preset.
  3. Go to https://www.fxhash.xyz/generative/slug/algorhythm-1
  4. Start the profiler (i.e. start the recording phase of profiling)
  5. Click on "Run" on the demo page
  6. Let the demo run for 20-30 seconds
  7. Capture the profile by either clicking on the profiler icon in the toolbar, or pressing "Ctrl + shift + 2"

AR: The whole browser completely freezes
ER: Not so. Previously, the browser would remain responsive even if the demo would continue to run in the background tab

Bisection points to :
Bug 1863914: Use multiple shmem buffers for remote canvas recording. r=aosmond
Differential Revision: https://phabricator.services.mozilla.com/D193207

Flags: needinfo?(bobowencode)
Attached file about:support

Set release status flags based on info from the regressing bug 1863914

Severity: -- → S2
Flags: needinfo?(lsalzman)
Assignee: nobody → bobowencode
Priority: -- → P1

The root cause seems to be a very long running GL thread that is blocking certain things in the GPU process.
I've attached a stack and it seems to be very similar to this each time I refresh.
Before bug 1863914 the canvas ring buffer would fill and block the content process.
After bug 1863914 it seems to get further but then rendering/compositing (or something) gets blocked in the GPU process, so everything seems to hang.
It doesn't seem to be related to the profiling directly.

It frees up eventually once the GL work finishes.

Flags: needinfo?(bobowencode) → needinfo?(aosmond)
Attachment #9390808 - Attachment is obsolete: true

This is still marked as tracking for 124. How likely is it that we're going to fix it? We're releasing next week.
(Also, if I may: Do we really want to treat this as S2 if this affects the profiler? Not entirely a very end user feature in my personal opinion)

I think our options are to mark 124 as wontfix and/or downrank

Flags: needinfo?(bobowencode)

(In reply to Frederik Braun [:freddy] (reo duty for Fx124) from comment #6)

This is still marked as tracking for 124. How likely is it that we're going to fix it? We're releasing next week.
(Also, if I may: Do we really want to treat this as S2 if this affects the profiler? Not entirely a very end user feature in my personal opinion)

I think our options are to mark 124 as wontfix and/or downrank

I think it is this particular demo that is causing large amounts of blocking work in the GPU process.
Even earlier than bug 1863914 the profile takes a very long time to load.
I've bisected that to:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c2341f2f621212e9f3f7a8fdeddd4b72bc33c551&tochange=cf0ddbec5963b58732cc2ce17a8064ea31d45fb1

Bug 1852145 moved some things to the CanvasRenderThread, so I'm guessing it is that thread that is getting blocked by the GL work.
I've no real idea as to how many things might trigger this scenario (other than canvas demos).

Flags: needinfo?(bobowencode)
Regressed by: 1852145

Bob, anything we can do to progress this one? Do you think this remains valid as an S2 severity?

Flags: needinfo?(bobowencode)

I should have unassigned myself here sorry.
I don't really have anything more to add than comment 5.
I don't think bug 1863914 is the underlying cause here.

Assignee: bobowencode → nobody
Flags: needinfo?(bobowencode)

Marking as S3 because this won't affect many users because it requires using the profiler and specific content to trigger.

Severity: S2 → S3

Mayank, are you still able to reproduce? I tried the STR on nightly with gpu-canvas, and it was fine.
https://share.firefox.dev/3KmjknW

Maybe it is worse on Windows?

I see we spend most of our time on the main thread reading back from HTMLCanvasElement containing a WebGL context, and then reuploading it. We could probably do a lot better for this scenario.

Flags: needinfo?(aosmond) → needinfo?(mayankleoboy1)
  1. GPU-canvas + no-compositor-block:
    Browser UI (switching between tabs, right-click etc.) = fluid
    UI of Profiler page (scrolling, selecting tracks etc.): unresponsive

  2. GPU-canvas:
    Browser UI (switching between tabs, right-click etc.) = completely stuck
    UI of Profiler page (scrolling, selecting tracks etc.): completely stuck

  3. D2d-canvas + no-compositor-block
    Browser continued to run after all tabs were closed and had to be force quit. In general feels as if one of the firefox processes just doenst know to stop if you close the demo tab.
    Browser UI (switching between tabs, right-click etc.) = Fluid
    UI of Profiler page (scrolling, selecting tracks etc.): Page keeps on "loading"

  4. D2d-canvas

Browser UI (switching between tabs, right-click etc.) = completely stuck
UI of Profiler page (scrolling, selecting tracks etc.): completely stuck

Flags: needinfo?(mayankleoboy1)
Flags: needinfo?(lsalzman)
Flags: needinfo?(aosmond)

Andrew, i messed with the prefs (due to landing of bug 1898650) and my comments were incorrect. I have now updated them. So you may want to reread it.

Summary:

  1. Both gpu-canvas and d2d-canvas completely freeze for me.
  2. The recent no-compositor-blocking thingy works really well in keeping the browser UI fluid.
  3. The no-compositor-blocking thingy does not help with scrolling/selecting/interacting with the profile page while the demo runs.
    3.1 Even if you close the demo tab, the gpu process keeps on chugging for a long time as if it doesnt understand that the demo tab has been closed.

I wonder if Bug 1899231 comment 8 might related to the problem.

Priority: P1 → --

This did not get fixed by the patch in bug 1899231.
With both gpu-canvas/D2d-canvas, let the demo run for 20-30 seconds. Then :

  1. interacting with the profiler is still blocked
  2. When you close the demo tab, the GPU process keeps on running.

WebGLParent::RecvDispatchCommands() did not exit for 265 seconds at a certain point in time. It seemed to block canvas2d tasks.

Kelsey, can you comment to the bug?

Flags: needinfo?(jgilbert)
Component: Graphics: Canvas2D → Graphics: CanvasWebGL

There is a webgl.linkProgram call that is taking an extremely long time.

Flags: needinfo?(jgilbert)
Flags: needinfo?(aosmond)
Assignee: nobody → jgilbert
Priority: -- → P2
No longer blocks: gfx-triage
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: