Closed Bug 1354080 Opened 4 years ago Closed 4 years ago

Excessive CPU usage while the connecting/loading spinners are showing

Categories

(Core :: DOM: Content Processes, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: jfkthame, Assigned: mconley)

References

Details

(Whiteboard: [qf:p1])

Attachments

(1 file)

From bug 759252:
(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #104)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #103)
> > Do we care about the power usage regression that this will cause?
> 

> If you're asking about only when it is visible, it is a more nuanced
> question since it won't be visible for the whole time that Firefox is
> running. At this point, is it large enough of an issue to prioritize it?

I've been noticing this repeatedly in Nightly during the last few days: when the connecting/loading throbber is running, Nightly's CPU usage goes through the roof -- in Activity Monitor on macOS, the %CPU utilization of the main Nightly process goes from a "resting" value of < 10% to somewhere in the 150-200% range. (Note that this is separate from the CPU usage of the content process that's actually loading the page; AFAICS, this is basically just due to the chrome process animating the throbber.)

Moreover, if the throbber remains active for an extended period (e.g. several seconds trying to initiate a connection on slow/flaky wifi, or just a very slow-loading page) the system's fans start to run pretty high.

IMO, this isn't a good user experience; we need to either figure out how to reduce the CPU (and hence power) usage of these throbbers, or revert the theme change that introduced them.
Here's a relevant profile: https://perfht.ml/2nHtjai.
An approach very similar to this, using a sliding SVG filmstrip to generate animations, is planned for Photon work. It's important that this gets attention ASAP so that we can shift our strategy for implementing the animations if this route will not be sustainable. Marking [qf] in the whiteboard to get this on to the Quantum Flow teams triage.
Whiteboard: [qf]
Looked at the profile today with jrmuizel. We've got two questions, which might be the subject of two separate bugs here:

1) Why are we calling the refresh driver and TabParent::DidRefresh so often?

2) Is it necessary for each refresh driver tick to be so expensive inside TabParent::GetChildProcessOffset?

Hey mattwoodrow - any ideas for (1)? Is it expected that spinning the tab throbbers at 60fps on the compositor will cause such frequent ticks of the refresh driver?

For kats - any ideas for (2)? hg blame suggests that TabParent::DidRefresh and friends are things you wrote. Is there any low-hanging fruit in there for making DidRefresh cheaper?
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(bugmail)
And to echo what jaws said in comment 2, a sizeable chunk of the Photon project aims to move as much of our animation off of the main thread and onto the compositor as possible. We should figure out what's going on here as soon as possible.
Blocks: 1352085
(In reply to Mike Conley (:mconley) from comment #3)
> 2) Is it necessary for each refresh driver tick to be so expensive inside
> TabParent::GetChildProcessOffset?
> 
> For kats - any ideas for (2)? hg blame suggests that TabParent::DidRefresh
> and friends are things you wrote. Is there any low-hanging fruit in there
> for making DidRefresh cheaper?

So, I did add the TabParent::DidRefresh code, back in bug 1153023. The reason it was needed was because in Gaia there was code that would set CSS transforms on the child browser area, and we needed to propagate that to the child process via the UpdateDimensions IPC message. However there was no event we could listen for on the parent side to find out when the CSS transform changed, so we just did on every refresh. Calling GetChildProcessOffset wasn't supposed to be that expensive.

That being said, I see two options on how to fix this:
(1) remove the DidRefresh entirely. If firefox never sets CSS transforms on the content browser windows (I suspect we only did this on B2G, and don't do it on desktop/mobile) then this code will never actually get exercised and it can be removed. I don't know if there are add-ons that might do this though.
(2) Try to optimize the GetChildProcessOffset code somehow. It seems like most of the time is being spent in a QI call which I find kind of odd. I'm not sure which QI call it is but I feel like pretty much none of the code in GetChildProcessOffset should require a nontrivial QI.
Flags: needinfo?(bugmail)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)
> (1) remove the DidRefresh entirely. If firefox never sets CSS transforms on
> the content browser windows (I suspect we only did this on B2G, and don't do
> it on desktop/mobile) then this code will never actually get exercised and
> it can be removed. I don't know if there are add-ons that might do this
> though.

Yes, pretty sure we don't do this, and add-ons doing it would break anyway as we drop support for XUL add-ons.
(In reply to Dão Gottwald [::dao] from comment #6)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)
> > (1) remove the DidRefresh entirely. If firefox never sets CSS transforms on
> > the content browser windows (I suspect we only did this on B2G, and don't do
> > it on desktop/mobile) then this code will never actually get exercised and
> > it can be removed. I don't know if there are add-ons that might do this
> > though.
> 
> Yes, pretty sure we don't do this, and add-ons doing it would break anyway
> as we drop support for XUL add-ons.

Concur on desktop... does anyone know about Android?
(In reply to :Gijs from comment #7)
> Concur on desktop... does anyone know about Android?

Paging sebastian
Flags: needinfo?(s.kaspari)
Actually we don't even have e10s enabled on Android, so it's a moot point. Feel free to rip it out.
Flags: needinfo?(s.kaspari)
Okay great, we can get rid of DidRefresh, and that'll help here.

But the other question remains: is it necessary for us to be ticking the refresh driver so often in the parent process when doing this kind of animation? Hoping mattwoodrow can weigh in on that one.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)
> (1) remove the DidRefresh entirely. If firefox never sets CSS transforms on
> the content browser windows (I suspect we only did this on B2G, and don't do
> it on desktop/mobile) then this code will never actually get exercised and
> it can be removed. I don't know if there are add-ons that might do this
> though.

I think this is what we should do. I also think that the child process offset should not respect transforms, and pretend the browser element is at the non-transformed position - otherwise we could get into an inconsistent state without this updating mechanism in case somebody does end up transforming a remote browser element. But if we don't respect the transform, then we don't need to update any values when it changes.

Events that are sent to the child process should go through the transforms on the parent process side. And screen-relative coordinates inside the child process should be the ones they'd be if there was no transform.
The refresh driver ticking at 60fps is the expected behaviour (for a foreground tab/the parent process).

We still want to notify the animation system that time is progressing, but we expect it to bail out fairly quickly and not do any real work if all the present animations are running on the compositor.

This looks like it's working correctly as we see some time being spent in 'CanThrottle' underneath the DocumentTimeline::WillRefresh part.
Flags: needinfo?(matt.woodrow)
Assignee: nobody → mconley
Comment on attachment 8855541 [details]
Bug 1354080 - Stop doing needless work in the parent process every time the refresh driver ticks.

https://reviewboard.mozilla.org/r/127374/#review130086

Make sure to do a try push before landing!
Attachment #8855541 - Flags: review?(bugmail) → review+
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4b43dc3ad001
Stop doing needless work in the parent process every time the refresh driver ticks. r=kats
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #14)
> 
> Make sure to do a try push before landing!

Cripes. I did not see that part. :/ Whoops. Well, push looks green so far - hopefully I've dodged a bullet.
https://hg.mozilla.org/mozilla-central/rev/4b43dc3ad001
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Hey jfkthame, are you able to test tomorrow's Nightly (or a local build with the above revision) and see if CPU usage has improved?
Flags: needinfo?(jfkthame)
(In reply to Mike Conley (:mconley) from comment #19)
> Hey jfkthame, are you able to test tomorrow's Nightly (or a local build with
> the above revision) and see if CPU usage has improved?

Yes, it seems considerably improved... looks like we're back to around 130% CPU usage for the chrome process while the spinner is showing. (That still seems like a lot of CPU just to keep a small spinner moving, but at least it's no longer a big regression from where we were previously.)
Flags: needinfo?(jfkthame)
Hey mattwoodrow, would you consider the remaining tp5o % Processor Time regression is likely us waking up the main thread more often when we tick the refresh driver with our 60fps throbber?
Flags: needinfo?(mconley) → needinfo?(matt.woodrow)
The main-thread wakeup rate should be the same. The higher framerate on the compositor will contribute to more processor time for sure though.
Flags: needinfo?(matt.woodrow)
Component: Theme → DOM: Content Processes
Product: Firefox → Core
Target Milestone: Firefox 55 → mozilla55
(In reply to Matt Woodrow (:mattwoodrow) from comment #23)
> The main-thread wakeup rate should be the same. The higher framerate on the
> compositor will contribute to more processor time for sure though.

Is there a bug tracking fixing the remaining regression?
Flags: needinfo?(mconley)
Whiteboard: [qf] → [qf:p1]
See Also: → 1357093
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #24)
> (In reply to Matt Woodrow (:mattwoodrow) from comment #23)
> > The main-thread wakeup rate should be the same. The higher framerate on the
> > compositor will contribute to more processor time for sure though.
> 
> Is there a bug tracking fixing the remaining regression?

I filed bug 1357093 for this, and the other Talos regressions from the new throbber animations.
Flags: needinfo?(mconley)
You need to log in before you can comment on or make changes to this bug.