Closed Bug 1768495 Opened 2 years ago Closed 1 year ago

Compositor CSS animations keep being sampled in fully occluded windows

Categories

(Core :: Graphics: Layers, defect, P3)

defect

Tracking

()

RESOLVED FIXED
112 Branch
Performance Impact medium
Tracking Status
firefox112 --- disabled
firefox113 --- disabled
firefox114 --- disabled
firefox115 --- disabled
firefox116 --- fixed

People

(Reporter: florian, Assigned: bradwerth)

References

(Blocks 2 open bugs)

Details

(Keywords: perf:resource-use)

Attachments

(4 files, 3 obsolete files)

Steps to reproduce (on a platform where detecting window occlusion is supported, I think that's Mac and Windows) :

  • load a page with a CSS animation that can run on the compositor, eg. data:text/html,<head><style>body{animation: fadein 1s infinite;} @keyframes fadein{from{opacity: 0;}}</style><body>Text
  • cover the window entirely with another window.
  • see that the CPU usage barely drops (I checked that using about:processes in another window).

When the same tab with CSS animation is put in the background by opening another tab in the same window, the CPU use drops significantly.

After doing some debugging, I noticed that removing the nsIRemoteTab.preserveLayers call at https://searchfox.org/mozilla-central/rev/c7183440fe8ad391e2ab2e990229dd85fac318d3/browser/base/content/tabbrowser.js#5636-5639 makes the animations stop when the window is occluded.

When the window is occluded, the refresh driver ticks are throttled in the content process, only the compositor/renderer threads remain active at 60Hz.

Example profile where I made the animated tab inactive first by occluding the window (the IPCs related to sending vsync notifications to the child process stop, but the renderer and compositor activity continues) and another time by switching to an about:blank tab (this time the compositor and renderer thread activity stopped): https://share.firefox.dev/3kVgGIY

Is it a bug of the PreserveLayers code that in addition to keeping the layers it keeps the animations active, or should the front-end not call preserveLayers when a window is fully occluded?

Flags: needinfo?(emilio)

I'm confused, how's that call the relevant one? Shouldn't we have a tab switcher, in which case the event is handled here? https://searchfox.org/mozilla-central/rev/c7183440fe8ad391e2ab2e990229dd85fac318d3/browser/modules/AsyncTabSwitcher.jsm#852

Flags: needinfo?(emilio)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #1)

I'm confused, how's that call the relevant one? Shouldn't we have a tab switcher, in which case the event is handled here?

My understanding is that an AsyncTabSwitcher object is created whenever a tab switch is initiated, and it self detroys before sending the TabSwitchDone event, typically 300ms after the new tab became visible. (This behavior is visible in this part of the profile: https://share.firefox.dev/3LYRQUx)

Huh, that's weird... But yeah I think preserveLayers() should be avoided, that keeps compositing. But I don't know why it's there to begin with.

I guess it comes from bug 1098131... Does minimizing show the same effect, actually? Or does something else stop the compositor?

Severity: -- → S3
Priority: -- → P3

(In reply to Emilio Cobos Álvarez (:emilio) from comment #4)

I guess it comes from bug 1098131... Does minimizing show the same effect, actually? Or does something else stop the compositor?

Bug 1580117 landed Mac-specific code to pause the compositor in minimized windows.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #3)

Huh, that's weird... But yeah I think preserveLayers() should be avoided, that keeps compositing. But I don't know why it's there to begin with.

When I remove the preserveLayers(), the content area becomes blank while the window is occluded, this is visible in the screenshots of this profile: https://share.firefox.dev/3N2YGZ7

That's likely what bug 1098131 worked around by adding the preserveLayers call. Could we instead suppress the paints as soon as the window is occluded?

(In reply to Emilio Cobos Álvarez (:emilio) from comment #4)

I guess it comes from bug 1098131... Does minimizing show the same effect, actually? Or does something else stop the compositor?

I tested on Windows, here's a profile: https://share.firefox.dev/3PbA4zd We keep compositing when the window is minimized, but the minimized window is resized to a 1px height, so the screenshots look empty during that time.

Blocks: 1771902

(In reply to Florian Quèze [:florian] from comment #5)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #4)

I guess it comes from bug 1098131... Does minimizing show the same effect, actually? Or does something else stop the compositor?

Bug 1580117 landed Mac-specific code to pause the compositor in minimized windows.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #3)

Huh, that's weird... But yeah I think preserveLayers() should be avoided, that keeps compositing. But I don't know why it's there to begin with.

When I remove the preserveLayers(), the content area becomes blank while the window is occluded, this is visible in the screenshots of this profile: https://share.firefox.dev/3N2YGZ7

That's likely what bug 1098131 worked around by adding the preserveLayers call. Could we instead suppress the paints as soon as the window is occluded?

Looks like the needinfo chain broke here, re-adding.

Flags: needinfo?(emilio)

This wastes enough CPU / power that I don't think the severity could be less than S2. Also nominating for performance triage.

Severity: S3 → S2
Performance Impact: --- → ?

I'm not so familiar with the compositor side of things, but presumably? We have compositor-pausing machinery on various platforms, IIRC.

preserveLayers() keeping compositing animations should affect Android as well (they always preserveLayers() iirc). Jamie, do you know if/how Android pauses the compositor of background tabs?

Flags: needinfo?(emilio)

Err, see comment above Jamie.

Flags: needinfo?(jnicol)

To pause the compositor the java layer calls in to SyncPauseCompositor() which will result in CompositorBridgeParent::PauseComposition() being called on the compositor thread. This will then call through to RendererThread::Pause() on the render thread. RenderThread::UpdateAndRender() then checks whether it is paused here and skips rendering if so.

However, this only skips rendering frames, but not generating the frames (which I think would be when we sample compositor animations). I'm afraid I don't know whether we have a mechanism for stopping the rest of the work the compositor thread (and other internal webrender threads) perform. I think the ClearCachedResources() command should take care of clearing compositor animations, but not sure under what circumstances it is called. Hope that helps!

Flags: needinfo?(jnicol)

I wonder which platforms are affected here - on Wayland the WaylandVsyncSource stops sending NotifyVsync() events in this case which IIUC would avoid generating the frame.

(In reply to Robert Mader [:rmader] from comment #12)

I wonder which platforms are affected here - on Wayland the WaylandVsyncSource stops sending NotifyVsync() events in this case which IIUC would avoid generating the frame.

I don't think I tested Linux. From what I remember, on Mac we keep compositing when a window is occluded but stop compositing when the window is minimized. On Windows we keep compositing for CSS animations both when the window is occluded and when it is minimized.

The Performance Priority Calculator has determined this bug's performance priority to be P2. If you'd like to request re-triage, you can reset the Performance flag to "?" or needinfo the triage sheriff.

Platforms: [x] Windows [x] macOS [x] Linux [x] Android
[x] Causes severe resource usage
[x] Bug affects multiple sites

Performance Impact: ? → medium

I'll try to figure this out.

Assignee: nobody → bwerth

There doesn't appear to be a way to check the non-occluded region, so any
part of the bounding rectangle of the window will resume the compositor.

macOS fixup seems easy; I'll try to create fixes for the other platforms also.

(In reply to Brad Werth [:bradwerth] from comment #17)

macOS fixup seems easy; I'll try to create fixes for the other platforms also.

Would it be possible to do this in cross platform code, eg. https://searchfox.org/mozilla-central/rev/08362489086b10de96e7a199b267ea5504c01583/xpfe/appshell/AppWindow.cpp#2992 ?

(In reply to Florian Quèze [:florian] from comment #18)

Would it be possible to do this in cross platform code, eg. https://searchfox.org/mozilla-central/rev/08362489086b10de96e7a199b267ea5504c01583/xpfe/appshell/AppWindow.cpp#2992 ?

I'll attempt that instead.

(In reply to Brad Werth [:bradwerth] from comment #19)

(In reply to Florian Quèze [:florian] from comment #18)

Would it be possible to do this in cross platform code, eg. https://searchfox.org/mozilla-central/rev/08362489086b10de96e7a199b267ea5504c01583/xpfe/appshell/AppWindow.cpp#2992 ?

I'll attempt that instead.

I'm not sure this can be done purely in AppWindow. It seems to me that each widget window class has its own widget that hosts the CompositorBridgeChild. For example, for macOS it's the last child of the window's content view. That's specific enough that each widget window class should really manipulate it directly, rather than trying to replicate that logic within AppWindow. I'll build out the patches for each widget window class and if somebody can find a more centralized way to do it, maybe that will come out in the review process.

(In reply to Jamie Nicol [:jnicol] from comment #11)

To pause the compositor the java layer calls in to SyncPauseCompositor() which will result in CompositorBridgeParent::PauseComposition() being called on the compositor thread.

Getting the call through to CompositorBridgeParent::PauseComposition() is sufficient to reduce the CanvasThread traffic on other platforms, and that appears to be the majority of the CPU work noted in comment 0. Maybe there's more CPU usage to target, but we can definitely get this piece. Meanwhile, GeckoView doesn't seem to have any existing concept of occlusion. Can you sketch a model of how to enable occlusion detection on Android?

Flags: needinfo?(jnicol)

Just making sure I'm on the right page.. Occlusion in this case refers to the OS window being occluded because another window is in front of it (or it is minimised)? I don't think windows being in front of eachother is really a thing on Android. Perhaps on some samsung tablets you can have multiple windows overlapping eachother, but I'm not sure that's exposed via an API or whether we care enough anyway. For the equivalent of a window being minimised, SyncPauseCompositor() will be called when an app is no longer visible, and SyncResumeResizeCompositor() will be called when it becomes visible again.

Flags: needinfo?(jnicol)

Getting the call through to CompositorBridgeParent::PauseComposition() is sufficient to reduce the CanvasThread traffic on other platforms

By this do you mean it does not reduce the CanvasThread traffic on Android, or just that you haven't tested?

Flags: needinfo?(bwerth)

(In reply to Jamie Nicol [:jnicol] from comment #25)

Getting the call through to CompositorBridgeParent::PauseComposition() is sufficient to reduce the CanvasThread traffic on other platforms

By this do you mean it does not reduce the CanvasThread traffic on Android, or just that you haven't tested?

Thanks for the info. I meant that I haven't tested it, so I don't know. Also, I misspoke: the traffic is on the Render thread. Really, I'm just responding to the note in comment 14 that this bug affects all platforms. It's possible that that is also a misstatement and this is not an issue on Android.

I have a new version of the patch that pauses/resumes the compositor when CanonicalBrowsingContext::RecomputeAppWindowVisibility decides the window is visible or not. So that should be a cross-platform solution, though I'm not sure under what conditions that function is hit on Android.

Flags: needinfo?(bwerth)
Attachment #9316427 - Attachment is obsolete: true
Attachment #9316659 - Attachment is obsolete: true
Attachment #9316660 - Attachment is obsolete: true
Attachment #9316815 - Attachment description: Bug 1768495 Part 1: Add a method nsIWidget::PauseOrResumeCompositor, and override it for nsCocoaWindow. → Bug 1768495 Part 1: Add a method nsIWidget::PauseOrResumeCompositor, and use it in nsCocoaWindow.
Attachment #9316815 - Attachment description: Bug 1768495 Part 1: Add a method nsIWidget::PauseOrResumeCompositor, and use it in nsCocoaWindow. → Bug 1768495 Part 1: Add a method nsIWidget::PauseOrResumeCompositor, and
Attachment #9316815 - Attachment description: Bug 1768495 Part 1: Add a method nsIWidget::PauseOrResumeCompositor, and → Bug 1768495 Part 1: Add a method nsIWidget::PauseOrResumeCompositor, and override it in nsCocoaWindow.
Pushed by bwerth@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/74c1903a5df6
Part 1: Add a method nsIWidget::PauseOrResumeCompositor, and override it in nsCocoaWindow. r=gfx-reviewers,emilio,ahale
https://hg.mozilla.org/integration/autoland/rev/ac0c0d68e31d
Part 2: Pause or resume compositor when CanonicalBrowsingContext recomputes its visibility. r=emilio
https://hg.mozilla.org/integration/autoland/rev/6a48a9418fb3
Part 3: Remove redundant pause-on-minimize in nsCocoaWindow, remove pref. r=gfx-reviewers,ahale

Backed out for causing reftest failures in 1478035.html

Flags: needinfo?(bwerth)

This failure of the 1478035.html test is super weird. The crash log shows that the test is hitting JS errors in the anti-tracking cookie PurgeTrackerService and in the remote settings Utils. I can't explain why the code changes would cause those JS errors. The code probably doesn't cause these errors. I see there's a resolved incomplete intermittent orange Bug 1584357. I can't see the logs from that Bug to determine if they are the same logs here. Cristian, is this just another expression of Bug 1584357, which needs to be re-opened so that this code can land?

Flags: needinfo?(bwerth) → needinfo?(ctuns)

Yes, local testing has shown that there's some kind of interference with the crashtests that I don't understand yet. I'll try to sort it out and make the patches safe to land.

Flags: needinfo?(bwerth)

I'm still not sure why manipulating the compositor causes the cookie purge tracker to crash. I'm adding local logging to the purge tracker to see what the change is in origin attributes that is causing this. Ultimately, the purge tracker should be wrapping this call in a try-catch block and if I can figure out the right way to do that, I'll open a new Bug on that issue and mark it as a blocker.

I've figured this out but I'm not sure how to fix it yet. This timeout is happening because the crashtests are allowing the sending of the idle-daily event, which triggers the PurgeTrackerService, which then fails. None of these tests should be sending the idle-daily event, which is disabled by setting the idle.lastDailyNotification pref to -1, which is done for most of the test harnesses but not for crashtests. I'm trying to figure out how to fix this, which obviously should be a separate otherwise-unrelated Bug, but which will lead to timeouts in crashtests until it is fixed.

(In reply to Brad Werth [:bradwerth] from comment #35)

Ultimately, the purge tracker should be wrapping this call in a try-catch block and if I can figure out the right way to do that, I'll open a new Bug on that issue and mark it as a blocker.

I have filed Bug 1818782 on this issue. It shouldn't be a blocker because the real fix will be ensuring that the idle-daily event is never sent and the PurgeTrackerService is never activated during crashtests. Still working on that Bug.

See Also: → 1818782
Depends on: 1818787

It looks like preventing the idle-daily event is not sufficient to solve this Bug. Local testing with the patch for Bug 1818787 applied shows that the crashtest still times out, and a different idle event is triggering after over 3 minutes. So the fix will require something to ensure that a requestAnimationFrame is always successful and won't fail silently if the compositor is paused.

No longer depends on: 1818787
See Also: → 1818787
Blocks: 1819154

This crashtest relies upon a requestAnimationFrame firing even when the window is
occluded, which is something that we don't yet handle.

There are a total of 10 crashtests that timeout on requestAnimationFrame with this patch applied. I'll update the final part of the patch to set the pref on each of those tests individually, but that's a lot of tests. I'm removing Bug 1819154 as a dependency, because if it was fixed first, this wouldn't be necessary.

No longer blocks: 1819154
See Also: → 1819154
Attachment #9320092 - Attachment description: Bug 1768495 Part 4: Disable window occlusion tracker pref on a problematic crashtest. → Bug 1768495 Part 4: Disable window occlusion tracker pref on crashtests that rely on a running compositor.
Pushed by bwerth@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/425fd934f385
Part 4: Disable window occlusion tracker pref on crashtests that rely on a running compositor. r=emilio

Ugh, I screwed up the landing request because Part 4 wasn't parented correctly. It can land independently from parts 1 through 3, which should then work with part 4 already in place.

Pushed by bwerth@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/21c21d6a428a
Part 1: Add a method nsIWidget::PauseOrResumeCompositor, and override it in nsCocoaWindow. r=gfx-reviewers,emilio,ahale
https://hg.mozilla.org/integration/autoland/rev/52f7a952958b
Part 2: Pause or resume compositor when CanonicalBrowsingContext recomputes its visibility. r=emilio
https://hg.mozilla.org/integration/autoland/rev/1d55841947c3
Part 3: Remove redundant pause-on-minimize in nsCocoaWindow, remove pref. r=gfx-reviewers,ahale
Regressions: 1819502
Blocks: 1786258
Duplicate of this bug: 1567052
Regressions: 1828587

bug 1828587 backed this out of beta and release for 113.0b7 and 112.0.2 dot release

Backout link for beta: https://hg.mozilla.org/releases/mozilla-beta/rev/e6bc65cb318a
Backout link for release: https://hg.mozilla.org/releases/mozilla-release/rev/6b8f889688ed

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Since it's still in Nightly I think we should keep this closed, but feel free to reopen if you disagree.

Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Regressions: 1830753
See Also: → 1834246
See Also: → 1836699
Regressions: 1839109
See Also: → 1864255
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: