Closed Bug 1432858 Opened 6 years ago Closed 3 years ago

CPU Spikes

Categories

(Core :: General, defect)

60 Branch
defect
Not set
critical

Tracking

()

RESOLVED INCOMPLETE
Performance Impact none
Tracking Status
firefox60 --- affected

People

(Reporter: skoch13, Unassigned, NeedInfo)

References

Details

(Keywords: hang)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0
Build ID: 20180124100321

Steps to reproduce:

Just using the browser: open/switching tabs and etc, nothing special.

Specs (actually it's MSI GP72VR 7RFX Leopard Pro with NVMe SSD): Win 10 1709 with latest updates, GeForce GTX 1060, i7-7700HQ, 16 Gb RAM, SSD Samsung 960 EVO 256Gb 

Profile: https://perfht.ml/2BqBIpE
Video: https://youtu.be/nEVeifA292Q 


Actual results:

CPU spikes, laptop freezes.


Expected results:

At work, the browser works flawlessly despite the work pc is weaker than this gaming monster.
Severity: normal → critical
Keywords: hang
Skock, could you tell us if this behavior is a regression for your configuration? A.k.a how did Firefox 57/  Firefox 58/ firefox 59 fares on your configuration.
Meanwhile moving this issue to core/general.
Component: Untriaged → General
Product: Firefox → Core
(In reply to Adrian Florinescu [:AdrianSV] from comment #1)
> Skock, could you tell us if this behavior is a regression for your
> configuration? A.k.a how did Firefox 57/  Firefox 58/ firefox 59 fares on
> your configuration.
> Meanwhile moving this issue to core/general.

Hello Adrian,

Thank you for your attention! I can confirm that on the 57/58 the behavior is the same: each iteration with browsers leads to CPU spikes and laptop micro-freezes while 59 (as you saw on video) just uses 100% of CPU.

In addition, on Linux on the same laptop Firefox works just as expected.

Thank you
Mike, Florian - can you take a look at the profile and decide who to NI on this profile?
Flags: needinfo?(mconley)
Flags: needinfo?(florian)
There are at least 2 content processes where activity stream is taking a LOT of CPU: https://perfht.ml/2nvO7Ui
Flags: needinfo?(florian) → needinfo?(edilee)
On the main thread, we have a lot of PAPZCTreeManager::Msg_ReceiveMouseInputEvent sync IPC, and it's the main cause of jank, blocking the main thread for more than 3.3s: https://perfht.ml/2nzxiIb

Ehsan, I see you mentioned this sync IPC several times on your Quantum Flow Engineering Newsletters, is there a bug on file tracking it?
Flags: needinfo?(ehsan)
k88hudson, anything in particular jump out from the profile? https://perfht.ml/2nvO7Ui

Here's the bundle to match up line numbers:
https://hg.mozilla.org/mozilla-central/file/549d78378587/browser/extensions/activity-stream/data/content/activity-stream.bundle.js

I do see a 51% 1734 running time with 1039ms self time to resource://activity-stream/vendor/react-dom.js

Within that the largest activity stream bundle item is bundle.js:3975 taking 9% 321ms for Card.componentDidMount doing the image loaded detection to fade in the image. I'm guessing setting Image().src is getting time associated to the promise here:
https://hg.mozilla.org/mozilla-central/file/549d78378587/browser/extensions/activity-stream/data/content/activity-stream.bundle.js#l3917

Although a step back from this 9% time, the react-dom self time is approx 31% overall time. Any ideas for that?
Flags: needinfo?(edilee) → needinfo?(khudson)
We also have permitUnload calls blocking the main thread (because content processes are busy and don't reply immediately) for about 300ms: https://perfht.ml/2nxDqAs

It seems to be the permitUnload call at https://searchfox.org/mozilla-central/rev/eeb7190f9ad6f1a846cd6df09986325b3f2c3117/browser/base/content/browser.js#1068
I don't think activity stream registers beforeunload listeners, so this is probably a waste.

Mike, I wonder if we should move https://searchfox.org/mozilla-central/rev/eeb7190f9ad6f1a846cd6df09986325b3f2c3117/browser/base/content/tabbrowser.xml#3172-3182 to the browser binding to avoid more permitUnload calls. Or maybe make the permitUnload method implement this check and return early if there's no beforeunload listener.
jrmuizel and I looked at this last Friday during The Joy of Profiling: Episode 19[1].

Findings:

* Shipping web components might help with the YouTube script hangs, since we're doing Polymer stuff in there.
* Hit-testing for APZ is blocked by the compositor which is doing a lot of work sometimes and can’t respond right away.
* Not graphics bound - and the beefy graphics card is not being used (it's using the integrated one)

Ultimately, we need better tools for users to identify what’s going on here. We've requested some things[2] from the perf.html team, and will revisit this profile once those issues are resolved.

[1]: https://air.mozilla.org/the-joy-of-profiling-episode-19/
[2]: https://github.com/devtools-html/perf.html/issues/758, https://github.com/devtools-html/perf.html/issues/759, https://github.com/devtools-html/perf.html/issues/760
Flags: needinfo?(mconley)
(In reply to Florian Quèze [:florian] from comment #5)
> On the main thread, we have a lot of
> PAPZCTreeManager::Msg_ReceiveMouseInputEvent sync IPC, and it's the main
> cause of jank, blocking the main thread for more than 3.3s:
> https://perfht.ml/2nzxiIb
> 
> Ehsan, I see you mentioned this sync IPC several times on your Quantum Flow
> Engineering Newsletters, is there a bug on file tracking it?

I've discussed this problem a few times with kats and we talked about it again yesterday, and this time we got to a possible idea on how to solve this issue that he's going to try out in bug 1441324!
Depends on: 1441324
Flags: needinfo?(ehsan)
Now that bug 1441324 is fixed the PAPZTreeManager stuff should be less of an issue; it might be worth getting a new profile on nightly to see what the current state is.
Nothing jumps out at me from that profile, sorry :(
Flags: needinfo?(khudson)
Reporter - can you reprofile on a latest Nightly based on comment 10 please?
Flags: needinfo?(skoch13)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #12)
> Reporter - can you reprofile on a latest Nightly based on comment 10 please?

Hello, will do.
Thanks
Flags: needinfo?(skoch13)
Hello everyone,

Here is the new profile on the latest Nightly: https://perfht.ml/2GH4eKl
Thank you! Can you confirm that while performing the steps to collect the new profile you were experiencing the same problems you described in your original comment, "CPU spikes, laptop freezes."?
Flags: needinfo?(skoch13)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #15)
> Thank you! Can you confirm that while performing the steps to collect the
> new profile you were experiencing the same problems you described in your
> original comment, "CPU spikes, laptop freezes."?

Hey,

Unfortunately, yes :(
Flags: needinfo?(skoch13)
Whiteboard: [qf]
Thank you!

Back to you Florian and Mike with a new profile.
Flags: needinfo?(mconley)
Flags: needinfo?(florian)
I'm seeing a bunch of activity from the Grammarly WebExtension in that profile in both the content processes and (I believe) the parent process. Does disabling this add-on help?
Flags: needinfo?(mconley) → needinfo?(skoch13)
(In reply to skoch13 from comment #14)
> Hello everyone,
> 
> Here is the new profile on the latest Nightly: https://perfht.ml/2GH4eKl

In this profile there's still a lot of the main thread main process samples in APZ stuff: https://perfht.ml/2uOiuw8

kats, does it look to you like the fixes from bug 1441324 are having their intended effect?
Flags: needinfo?(florian) → needinfo?(bugmail)
(In reply to Florian Quèze [:florian] from comment #19)
> kats, does it look to you like the fixes from bug 1441324 are having their
> intended effect?

Yeah, it looks like it's working. The APZ stuff on the main thread is no longer blocking on the compositor, it's blocking on the GPU process main thread which is basically idle (other than handling the incoming APZ requests).
Flags: needinfo?(bugmail)
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #18)
> I'm seeing a bunch of activity from the Grammarly WebExtension in that
> profile in both the content processes and (I believe) the parent process.
> Does disabling this add-on help?

Hello, 

I can try removing the extension, however, I can confirm that all other installations work fine with the said extension. Should I retry the test?
Flags: needinfo?(skoch13)
(In reply to skoch13 from comment #21)
> Hello, 
> 
> I can try removing the extension, however, I can confirm that all other
> installations work fine with the said extension. Should I retry the test?

Yes, please.
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #22)
> (In reply to skoch13 from comment #21)
> > Hello, 
> > 
> > I can try removing the extension, however, I can confirm that all other
> > installations work fine with the said extension. Should I retry the test?
> 
> Yes, please.

Hello,

Here is the new profile without Grammarly: https://perfht.ml/2GJrFQ5
Flags: needinfo?(mconley)
Hey skoch13,

Unfortunately, I'm not able to derive much actionable data from this profile. I can see you browsing around to a bunch of different sites (Twitter and YouTube being two of them), and there are periodic transform animations happening on the compositor. There was a big 5.5 second pause there just before you dumped the profile where it looks like almost everything locked up. I can't tell what that was, unfortunately.

I'm sorry I can't be much more help here. :/ I've queued your profile for additional analysis.
Flags: needinfo?(mconley)
Okay, at least I tried ¯\_(ツ)_/¯

Thank you all!
Whiteboard: [qf] → [qf-]
Just in case: even running a Linux inside a virtual box works pretty fine. MS Windows is the key I suppose

Hi Skoch13 does this issue still occur on your end ? are you still seeing these issues with our latest Firefox Builds ?

Flags: needinfo?(skoch13)

Hi, I'll close this ticket due to a lack of information as Resolved-Incomplete. Please feel free to reopen it if you consider it necessary.

Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
Performance Impact: --- → -
Whiteboard: [qf-]
You need to log in before you can comment on or make changes to this bug.