Constant GPU load on Google Photos (opacity 0 spinners)

RESOLVED FIXED in Firefox 66

Status

()

P3
normal
RESOLVED FIXED
5 months ago
4 days ago

People

(Reporter: Tobias.Marty, Assigned: mattwoodrow)

Tracking

(Depends on: 1 bug, Blocks: 1 bug, {regression})

64 Branch
mozilla66
x86_64
Windows 10
regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox66 fixed)

Details

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

5 months ago
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0

Steps to reproduce:

Open the timeline on Google Photos. Alternatively try adding photos to an album. 


Actual results:

Noticed that WebRender is causing a constant GPU load of 50-60% on my Intel HD 630, even though the page seems static. According to the profiler the page is being 
refreshed at 60 fps. 

Sometimes this persists when you switch to another tab, so Google Photos isn't visible anymore. 
If you scroll further down in the timeline this stops. 
If you close the tab and reopen it, the load is mostly gone, too. 


Expected results:

With D3D11 renderer the page isn't being refreshed, but is indeed static. Same in Google Chrome.
(Reporter)

Updated

5 months ago
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64
I can reproduce this. It seems like there are some spinner animations that are constantly running, it looks like these are culled because of having an opacity of 0 in non-WebRender Gecko.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Markus, do you have an idea what might be doing this culling in non-WebRender?
Flags: needinfo?(mstange)
I think FLB might not be descending into zero-opacity nsDisplayOpacity items. But I can't find the check that would be responsible for such behavior.
(And if FLB deosn't detect any changes, then we don't composite or maybe we don't even send a transaction.)
Flags: needinfo?(mstange)
Matt do you know?
Flags: needinfo?(matt.woodrow)
(Assignee)

Comment 5

5 months ago
(In reply to Markus Stange [:mstange] from comment #3)
> I think FLB might not be descending into zero-opacity nsDisplayOpacity
> items. But I can't find the check that would be responsible for such
> behavior.
> (And if FLB deosn't detect any changes, then we don't composite or maybe we
> don't even send a transaction.)

So, the expectation is that nsDisplayOpacity::mForEventsAndPluginsOnly is true for zero-opacity items that got created to ensure the relevant hit testing items are in the display list.

FrameLayerBuilder then skips processing for content items within a zero-opacity container, and we indeed detect at the FLB level that no changes have happened, and we skip sending a transaction entirely.

Interestingly, it looks like bug 1436409 removed the code that would set mForEventsAndPluginsOnly to true, so we instead now just skip the nsDisplayOpacity entirely (seems like a bug where we wouldn't respect event settings within a zero-opacity container, any ideas kats?).

Since there's no zero-opacity item at all, FLB still detects no changes, and never forwards to the compositor. WebRender doesn't have that infrastructure, so has no idea that the new display list is identical to the old and gets a transaction + composite.

There's an opportunity for RDL to detect this and stop it earlier in the pipeline, but it fails because of this code: https://searchfox.org/mozilla-central/rev/fcfb479e6ff63aea017d063faa17877ff750b4e5/layout/painting/RetainedDisplayListBuilder.cpp#1372

I don't remember why we needed that, but it would be nice if we could not, since we should be able to detect the no-op display list change much more often. You don't happen to remember do you Miko (bug 1429932 wasn't much help)?
Flags: needinfo?(mikokm)
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(kats)
(In reply to Matt Woodrow (:mattwoodrow) from comment #5)
> Interestingly, it looks like bug 1436409 removed the code that would set
> mForEventsAndPluginsOnly to true, so we instead now just skip the
> nsDisplayOpacity entirely (seems like a bug where we wouldn't respect event
> settings within a zero-opacity container, any ideas kats?).

Indeed, you are correct. Although by the time bug 1436409 landed it was already busted. I've filed bug 1501382 for this issue with a testcase and regression range.
Flags: needinfo?(kats)
(In reply to Matt Woodrow (:mattwoodrow) from comment #5)
> There's an opportunity for RDL to detect this and stop it earlier in the
> pipeline, but it fails because of this code:
> https://searchfox.org/mozilla-central/rev/
> fcfb479e6ff63aea017d063faa17877ff750b4e5/layout/painting/
> RetainedDisplayListBuilder.cpp#1372
> 
> I don't remember why we needed that, but it would be nice if we could not,
> since we should be able to detect the no-op display list change much more
> often. You don't happen to remember do you Miko (bug 1429932 wasn't much
> help)?

I guess the assumption here has been that if we end up with a top-level dirty rect or have frames with SC dirty rect set, then something in the display list has changed, and it is no longer identical.
This was before the change to prefer the old items, which could improve this situation, if we can reliably detect that the resulting list is the same.
Flags: needinfo?(mikokm)
Blocks: 1386669
Priority: -- → P3
Summary: Constant GPU load on Google Photos → Constant GPU load on Google Photos (opacity 0 spinners)
(Assignee)

Updated

3 months ago
Assignee: nobody → matt.woodrow
(Assignee)

Updated

3 months ago
Depends on: 1501382

Comment 10

3 months ago
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/551e3cbe82ba
Don't record a display list mutation based on the partial build rect, rely on comparisons during merging. r=mstange
https://hg.mozilla.org/integration/autoland/rev/b232989d707c
Cull items within opacity:0 containers when merging with retained display lists. r=mstange

Comment 11

3 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/551e3cbe82ba
https://hg.mozilla.org/mozilla-central/rev/b232989d707c
Status: NEW → RESOLVED
Last Resolved: 3 months ago
status-firefox66: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Code landed from this bug is showing up at the top of the crash stacks for bug 1514528 and it landed right around the time the crashes started. Should this be backed out?
Blocks: 1514528
Flags: needinfo?(matt.woodrow)
No longer blocks: 1514528
Depends on: 1514528
Backed out for causing topcrash bug 1514528.
https://hg.mozilla.org/mozilla-central/rev/36a2f27ecc47
Status: RESOLVED → REOPENED
status-firefox66: fixed → ---
Flags: needinfo?(matt.woodrow)
Resolution: FIXED → ---
Target Milestone: mozilla66 → ---
(Assignee)

Comment 16

3 months ago
Fixed the bug, and included the crashtest from bug 1514544. Looks like I confused phabricator in the process though. :(
Attachment #9030869 - Attachment description: Bug 1500864 - Don't record a display list mutation based on the partial build rect, rely on comparisons during merging. r?miko → Bug 1500864 - Don't record a display list mutation based on the partial build rect, rely on comparisons during merging. r?mstange
Attachment #9030870 - Attachment description: Bug 1500864 - Cull items within opacity:0 containers when merging with retained display lists. r?miko → Bug 1500864 - Cull items within opacity:0 containers when merging with retained display lists. r?mstange
Attachment #9031773 - Attachment is obsolete: true
Attachment #9031774 - Attachment is obsolete: true

Comment 17

3 months ago
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/65697c3c1afe
Don't record a display list mutation based on the partial build rect, rely on comparisons during merging. r=mstange
https://hg.mozilla.org/integration/autoland/rev/0f2d638c5c8f
Cull items within opacity:0 containers when merging with retained display lists. r=mstange

Comment 18

3 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/65697c3c1afe
https://hg.mozilla.org/mozilla-central/rev/0f2d638c5c8f
Status: REOPENED → RESOLVED
Last Resolved: 3 months ago3 months ago
status-firefox66: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla66

Hello,

I have tried reproducing the issue (by setting gfx.webrender.all to true on the reported nightly build from 2018-10-22), but was unsuccessful in doing so. I have tried this on Ubuntu 16.04 as well as Windows 10. Tested using an account that had no data in google photos as well as a personal account containing a lot of photos, but the process would not remain stuck at 50-60%. At best, it reached 54% but then it went down.

Is there any particularity for the machine this occurred on?
If not,@Tobias we might need help in verifying the fix.

Flags: needinfo?(Tobias.Marty)
(Reporter)

Comment 20

11 days ago

I can't reproduce this anymore. Looks fixed to me.

Flags: needinfo?(Tobias.Marty)
Keywords: regression
Depends on: 1536151
You need to log in before you can comment on or make changes to this bug.