After dragging a tab with an off-main-thread animation into a new window, the old window keeps compositing at 60fps forever
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox100 | --- | fixed |
People
(Reporter: mstange, Assigned: sotaro)
References
(Blocks 2 open bugs)
Details
(Keywords: perf:resource-use)
Attachments
(3 files)
Steps to reproduce:
- Open the attached testcase in a tab and switch to that tab.
- Drag the tab out of the tab bar to create a new window from it.
- Close that new window.
Expected results:
The parent window should stop triggering composites.
Actual results:
Dragging the tab out of the window prints the following in debug builds:
[Parent 30464, Compositor] ###!!! ASSERTION: Tried to delete invalid animation: 'Error', file /Users/mstange/code/mozilla/gfx/layers/wr/OMTASampler.cpp:204
And from that point on, the original window will trigger composites indefinitely and use up CPU and power.
Comment 1•3 years ago
|
||
Thank you Markus for filing this bug. This sounds quite bad. I will investigate what's going on. That said, I am going to do NI to sotaro as well since I believe he's more familiar with the code in question rather then I.
Comment 2•3 years ago
|
||
I did accidentally drop my NI. (I thought I did uncheck it, but it seems not).
Comment 3•3 years ago
|
||
Moving into WebRender component since I believe this is more or less an issue in WebRender backend.
And I think the stuck is probably due to an assertion FocusState::IsCurrent. I don't see the stuck on release builds. Bug 1659439 is the closest issue for the assertion because the STR in bug 1659439 comment 0 involves dragging tab between browser windows.
Assignee | ||
Comment 4•3 years ago
|
||
I tried to reproduce the problem on Windows, Linux and MacOS. But I could not reproduce the problem, is there something I am missing? Does the problem happen also on release build?
Reporter | ||
Comment 5•3 years ago
|
||
I was able to reproduce it in a regular Nightly build on macOS with a fresh profile.
Comment 6•3 years ago
|
||
Reproduces on a Linux opt build with a "WRRende~ckend#1" thread and the Renderer thread continuing to use 4-5% of a cpu each, and firefox and Compositor, "BHMgr Monitor", GLXVsyncThread, threads using lesser cpu.
Disabling (temporarily or not) compositing in the window manager (and forcing an invalidation through a focus change) is sufficient to stop the cpu usage.
Assignee | ||
Comment 7•3 years ago
•
|
||
Ah, I also misunderstood the symptom like :hiro.
From Matrix,
gosh! I think I was misunderstanding the stuck mean. I was thinking it's stuck at somewhere our composition work so that no further composition happens, but in fact it means it keeps compositing, right? karlt's comment realized me. :)
Assignee | ||
Comment 8•3 years ago
•
|
||
It seems that we failed to remove all animation by WebRenderBridgeParent::UpdateWebRender(). The ClearAnimationResources() seemed not not remove all animations.
Comment 9•3 years ago
|
||
Thanks to karlt's comment above and thanks to miko's help to understand our display list building, I think now I understand what's going on.
- When a tab is moved to a new browser window, OMTASampler::ClearActiveAnimations gets called via AdoptChild IPC call
- in the mean time, the display item for the animation in question persists
- then in the first paint, CreateOrRecycleWebRenderUserData returns a new WebRenderAnimationData instance rather than re-using the previous one, thus we assign a new animation id for the animation, thus we do call DeleteCompositorAnimations with the previous animation id, then the "Tried to delete invalid animation" happens
I'd think, given that display items persist after moving between browser windows, we don't need to call ClearActiveAnimations at 1)?
Comment 10•3 years ago
|
||
FWIW, though I haven't further investigated about the stuck issue, my guess is that when the tab is detached from the original browser window, we don't clear the animation in question, thus the OMTASample instance for the original window keeps the animation id in question. I suppose sotaro has already noticed it though.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 11•3 years ago
|
||
(I've rephrased the bug title and description, sorry for the ambiguous wording!)
Assignee | ||
Comment 12•3 years ago
|
||
In CompositorBridgeParent::RecvAdoptChild(), NotifyChildCreated() is called before WebRenderBridgeParent::UpdateWebRender() call. It is not good.
In the UpdateWebRender(), WebRenderBridgeParent::ClearAnimationResources() is called. But in ClearAnimationResources(), OMTASampler is got from new CompositorBridgeParent. In this case, OMTASampler needs to be got from old CompositorBridgeParent.
Assignee | ||
Comment 13•3 years ago
|
||
The NotifyChildCreated() needs to be called after the UpdateWebRender() in CompositorBridgeParent::RecvAdoptChild().
Comment 14•3 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #12)
In the UpdateWebRender(), WebRenderBridgeParent::ClearAnimationResources() is called. But in ClearAnimationResources(), OMTASampler is got from new CompositorBridgeParent. In this case, OMTASampler needs to be got from old CompositorBridgeParent.
It sounds to me that some APZ related functions have the same issue, WebRenderBridgeParent::UpdateAPZFocusState for example. Does it cause bug 1659439? In the APZ cases we don't clear resources thingie in RecvAdoptChild() though. CCing botond just FYI.
Comment 15•3 years ago
|
||
Gah, CCing botond was dropped by an comment collision.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 16•3 years ago
|
||
Assignee | ||
Comment 17•3 years ago
|
||
Comment 18•3 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #14)
It sounds to me that some APZ related functions have the same issue, WebRenderBridgeParent::UpdateAPZFocusState for example. Does it cause bug 1659439?
I think it's a potential diagnosis for bug 1659439, yeah. Thanks for making the connection. I commented on that bug so we don't forget if we work on it.
Comment 19•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 20•3 years ago
|
||
bugherder |
Comment 21•3 years ago
|
||
Hi Sotaro, thanks for fixing this! I don't see a test included in the patch (maybe it's elsewhere?), and I think it would be nice to ensure this doesn't regress. I've done a quick attempt at writing a browser chrome mochitest for this fix that I'm attaching as a proof of concept to show what I had in mind. I verified locally that it passes with your fix included and fails without, but haven't pushed this to try.
Did you have thoughts about testing that I didn't see?
Assignee | ||
Comment 22•3 years ago
|
||
Thank you! I did not create a test for it. For testing the situation, it might be better to check omta value directly than "ChromeUtils.vsyncEnabled()".
:hiro knows omta very well.
:hiro, is it possible to create a test that check omta value directly for testing the situation?
Comment 23•3 years ago
|
||
I don't think we need to make sure the OMTA value there. What the mochitest Florian posted is trying to looks basically reasonable to me.
(Current ChromeUtils.vsyncEnabled() returns a global state, if it returns a per-window state, it would be nice? I am totally unsure though)
Reporter | ||
Updated•3 years ago
|
Comment 24•3 years ago
|
||
It looks like the needinfo chain stopped somehow. :sotaro, are you taking care of the test?
Comment 25•3 years ago
|
||
I think it would be nice that Florian pushes the mochitest to phabricator to ask review. I am happy to do r+ on that.
Comment 26•3 years ago
|
||
Note that writing this kind of browser mochitest would be much appreciated, specifically for Gecko core developers.
Comment 27•3 years ago
|
||
Florian landed the mochitest in bug 1764812. Thank you, Florian!
Updated•3 years ago
|
Description
•