Closed Bug 1430820 Opened 6 years ago Closed 6 years ago

Very high CPU with Facebook (Profile Included)

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla61
Performance Impact ?
Tracking Status
firefox61 --- fixed

People

(Reporter: mdppatrick, Assigned: jfkthame)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/604.4.7 (KHTML, like Gecko) Version/11.0.2 Safari/604.4.7

Steps to reproduce:

Firefox: Nightly 59.0a1 (2018-01-16) (64-bit)
OS: macOS High Sierra 10.13.2
Processor: 1.1 GHz Intel Core M
Memory: 8 GB 1600 MHz DDR3
Graphics: Intel HD Graphics 5300 1536 MB
Model: MacBook (Retina, 12-inch, Early 2015)
Add-ons: None. *Fresh install* of Firefox Nightly.

Opened Facebook in a tab and tried to use Facebook like a normal person. That's it. Yep. Nothing to it.


Actual results:

YouTube video: https://youtu.be/3tVGarfH7v8
Profile: https://perfht.ml/2DlmvLv

Waited over 10 seconds for Facebook to load completely, and then it hangs a lot when scrolling. CPU usage almost never goes down below 80%. No other tabs are active.


Expected results:

I should have been able to load Facebook like any other browser, but sadly this is not the case. Facebook loads perfectly well on any other browser, so it is not an issue with my computer. My MacBook is also only 2 years old and has never had any technical issues. It should be able to load Facebook just fine.
:kats, :mstange - can you take a look at this profile? Is it similar to bug 1425916?

Based on the video it stays within 80-120% of CPU utilization after loading and even when scrolling is limited.

The only piece of stack that I noticed as weird is:

```
mach_msg_trap
___CFRunLoopServiceMachPort
___CFRunLoopRun
_CFRunLoopRunSpecific
RunCurrentEventLoopInMode
ReceiveNextEventCommon
_BlockUntilNextEventMatchingListInModeWithFilter
_DPSNextEvent
-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]
-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
-[NSApplication run]
nsAppShell::Run()
XRE_RunAppShell()
MessageLoop::Run()
XRE_InitChildProcess(int, char**, XREChildData const*)
XRE_InitChildProcess
main
start
(root)

taking 90%
```
Flags: needinfo?(mstange)
Flags: needinfo?(bugmail)
See Also: → 1425916
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #1)
> The only piece of stack that I noticed as weird is:

This stack is the idle stack, i.e. that thread is just waiting.

Some notes:
- the profile indicates that the compositor thread is spending around 29% of the time stuck in SwapBuffers, at least in the first half of the profile (which is the scrolling mostly occurred, according to the video). This would explain the janky scrolling on the facebook news feed page.
- content process #4 (which seems to be one running facebook) is missing data for the first half of the profile, probably because the buffer was to small to collect data for the entire time. we only have data for the second half, which seems to be mostly the page load on the K/J Pop Lovers Group page (per video).
- the profile data that is available for content process #4 indicates a lot of time spent in content JS:
-- 22% of time is spent inside rAF callbacks
-- 11% of time spent in gecko paint code
-- 20% of time is spent in setTimeout callbacks
- the main process main thread seems to be mostly idle, as you noted

It's not clear to me why the Activity Monitor is reporting such a high CPU usage for the main process; based on the profile the main thread isn't doing very much and the compositor thread would also be blocked a good chunk of the time so it shouldn't be that busy.

One thing that would help when taking these kinds of recordings is to adjust the Activity Monitor update frequency to be higher. By default it updates every 5 seconds which is not very frequent, but if you go to "Update Frequency" under the View menu you can change it to update every second. That will provide a more precise measure.
Flags: needinfo?(bugmail)
Also, just to emphasize how much of the content process time is spent in JS: of the 27217 samples in the profile for that thread, 19974 were inside js::RunScript, which is around 73%.
Status: UNCONFIRMED → NEW
Ever confirmed: true
User Agent: OS & 32/64 bit : MacOS High Sierra 10.13.3 (beta) & 64 bit 

MacBook Pro (13-inch, Mid 2012)
Processor - 2.5 GHz Intel Core i5
Memory - 8 GB 1600 MHz DDR3
Graphics - Intel HD Graphics 4000 1536 MB

I confirm that this bug is reproducible on a system with the above specifications

[bugday-201801]
@Mike, can you please have a look at the provided performance report (comment 0) in order to see if there's any actionable item here?
Flags: needinfo?(mconley)
Here's the relevant section of the profile: https://perfht.ml/2nOlNMU

I'm having trouble deriving much of value out of this, beyond the fact that there's just a ton of JS being run by the Facebook page, and we're not running it quickly enough. sfink, does anything jump out at you?
Flags: needinfo?(mconley) → needinfo?(sphink)
Not sure if it's related, but having https://www.nytimes.com/ as the active tab makes the "Firefox CP Web Content" process consume 50-60% CPU for me on my Mac with macOS 10.13.3 and Firefox 58.0.1. Switching away from the New York Times tab makes the process go down to consume about 1% instead.

Hmm… Just noticed CPU usage when having the New York Times site in front also uses quite a lot of CPU in Safari (30-40%). Seems to be something extraordinary going that site, so not related to Firefox it seems.
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #6)
> Here's the relevant section of the profile: https://perfht.ml/2nOlNMU
> 
> I'm having trouble deriving much of value out of this, beyond the fact that
> there's just a ton of JS being run by the Facebook page, and we're not
> running it quickly enough. sfink, does anything jump out at you?

I see some JS running under SchedulerGroup::Runnable::Run that is reflowing like crazy due to HTMLElementBinding::get_offsetHeight.

But more time is in an extremely deep callstack (680 frames deep is where the time starts dropping below 0.5sec), with a bunch of React/flux stuff:

ba/ug.prototype.render
a/r.prototype.render
a/i.prototype.__emitToSubscription
a/i.prototype.emit
a/i.prototype.__invokeOnDispatch
a/i.prototype.$FluxStore3
a/j.prototype.__invokeCallback
r/this.ufiCentralUpdatesSubscriptions<
a/l.prototype.$Arbiter5
a/i.prototype.__emitToSubscription
a/l.prototype.inform
a/k.prototype.$ServerJS11
a/s.prototype._displayPagelet
a/s.prototype._displayPageletHandler
a/h.prototype.$CallbackDependencyManager6
a/h.prototype.satisfyPersistentDependency
a/l.prototype.inform
a/s.prototype._tryRenderingNextPhase
a/s.prototype._onPhaseDisplayEnd
a/h.prototype.$CallbackDependencyManager6
a/h.prototype.satisfyPersistentDependency
a/s.prototype._displayPagelet/K<
mozilla::dom::EventHandlerNonNull::Call
mozilla::EventDispatcher::DispatchDOMEvent
mozilla::css::SheetLoadData::FireLoadEvent

Many, many intermediate frames elided. Hm, but now I notice that's all from FireLoadEvent; perhaps I shouldn't have picked the first hang.

Ok, the next interval I looked at https://perf-html.io/public/2897502f346102bddf89ebdedbfd0507b3a709f8/calltree/?hiddenThreads=2-3-4&range=50.9055_85.9273~71.1758_75.7883&thread=5&threadOrder=0-2-3-4-5-1&v=2 is another reflow case underneath a/t.prototype.getStoriesInView

For now, I'm seeing more reflow than actual JS. I miss having the categories. Who should I redirect reflow to?
Flags: needinfo?(sphink) → needinfo?(mconley)
Thanks - I agree, I didn't look close enough, and reflow seems to be the more dominant culprit here. dholbert, anything actionable stand out?
Flags: needinfo?(mconley) → needinfo?(dholbert)
From the interval in comment 8, this chunk in particular looks bad: https://perfht.ml/2CcNkS1
It's a 4.8-second refresh driver tick, which is pretty much all categorized as "scripts", but the script time is all spent in restyle/reflow (for a .offsetHeight read).

I see:
 a) a pretty-long restyle (167ms), which is mostly frame construction.
 b) and then a quite-long reflow (2,748ms), which is mostly extremely deeply nested nsBlockFrame reflow.  Seems like a fully-page reflow with everything dirty, or something similarly-painful.
 c) And then: another short restyle (5ms) and a series of shorter (~30ms) script-triggered reflows, which are collectively another ~1 second.

The worst thing here is "b", but I assume it's being triggered by the restyle in "a".  So my main question at this point is: what was the user doing at this point in the profile, and what property/properties were changed to trigger this expensive restyle/reflow?  (If I hover the "style" bar from part "a", I see "First invalidated" stack which seems to point to an image decode finishing. That probably wouldn't directly have been what triggered the full-page dirty reflow, though, I imagine...)


mdppatrick, could you clarify what sorts of things you were doing during the profile?  (particularly at this point towards the end)  e.g. were you just scrolling your timeline and that's it, or were you also clicking into photo albums and doing stuff like that?
Flags: needinfo?(dholbert) → needinfo?(mdppatrick)
I was just scrolling. The only time that I had clicked on something was to go to the Facebook group.
Flags: needinfo?(mdppatrick)
If I'm not mistaken the video part matching the profile is https://youtu.be/3tVGarfH7v8?t=1m07s
Component: Untriaged → General
Product: Firefox → Core
Daniel, based on comment 11 from the reporter, are there any actionable further steps for this issue?
Flags: needinfo?(dholbert)
Probably next step is for a gecko layout engineer to spend a few hours trying to reproduce the hang in comment 10 (by following the reporter's STR & scrolling through a feed, per comment 11), and trying to figure out why the restyle/reflow in comment 10 is so expensive. (Maybe we're marking more frames as dirty than we need to, or something like that.)
Flags: needinfo?(dholbert)
Whiteboard: [qf]
FWIW, it seems to me majority of the time in reflow is in gfxMacFont. From the link in comment 10, the total time spent is 4,204ms, in which 3,025ms is handling gfxMacFont somehow. There might be some issue around font shaping stuff.
Jonathan, do you know if this is something that would also impact us on a windows machine.  Not sure if you are aware of this problem. thanks!
Flags: needinfo?(jfkthame)
This is probably mac-specific, as it seems to be time spent creating Core Text font instances, which is a mac-only backend.

Normally, gfxCoreTextShaper holds onto a CTFontRef to use for its shaping operations, so the cost of creating that font instance should be amortized over all the ShapeText calls on a given font. However, there are some cases where it needs to use a CTFont with a non-standard set of features applied (to disable ligatures when letter-spacing is in effect, or to work around some Apple bugs in certain scripts):
https://searchfox.org/mozilla-central/rev/7e663b9fa578d425684ce2560e5fa2464f504b34/gfx/thebes/gfxCoreTextShaper.cpp#138-163

The expectation was that these cases will be uncommon enough for the perf cost of creating a temporary font instance wouldn't be too bad; but perhaps that's not true. Sure enough: checking a facebook feed, I see "letter-spacing: -0.01em" being applied to the Apple system font (SF Text), which gets shaped with Core Text because it's AAT rather than OpenType.

Ouch. That'll be causing an awful lot of churn here in the Core Text font world.

So I think we should make gfxCoreTextShaper cache the modified-features font instances it uses, rather than treat them as throwaways in ShapeText. I guess that should help a lot here.
Flags: needinfo?(jfkthame)
Comment on attachment 8963574 [details] [diff] [review]
Cache any required feature-modified Core Text font instances in gfxCoreTextShaper, rather than re-creating them for each shaping call

This avoids the creation of a temporary CTFont for each ShapeText call when non-default features (such as disable-ligatures, for letter-spacing) are in effect.
Attachment #8963574 - Flags: review?(jmuizelaar)
Can you confirm that this patch makes a performance difference?
Flags: needinfo?(jfkthame)
Facebook is in the tp6 set. I wonder if we'd see impact there in the tp6 talos test.

Hey rwood, do you know how recent the Facebook snapshot is in the tp6 set?
Flags: needinfo?(rwood)
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #21)
> Facebook is in the tp6 set. I wonder if we'd see impact there in the tp6
> talos test.
> 
> Hey rwood, do you know how recent the Facebook snapshot is in the tp6 set?

Yessir, the current tp6 facebook mitmproxy recording was made approx 11 months ago.
Flags: needinfo?(rwood)
Attachment #8963574 - Flags: review?(jmuizelaar) → review+
(In reply to Jeff Muizelaar [:jrmuizel] from comment #20)
> Can you confirm that this patch makes a performance difference?

Profiling a local build, scrolling down facebook feed as quickly as I can, I got around 19% CPU under gfxCoreTextShaper::ShapeText without the patch; doing the same thing with a patched build, it was down to 0.5%. So that looks like a win.

(I'm not sure it'll show up significantly in talos, as it probably just loads a facebook page but doesn't try to scroll through lots of the timeline so that more and more text is constantly getting shaped...)
Flags: needinfo?(jfkthame)
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/619174f7144f
Cache any required feature-modified Core Text font instances in gfxCoreTextShaper, rather than re-creating them for each shaping call. r=jrmuizel
https://hg.mozilla.org/mozilla-central/rev/619174f7144f
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Assignee: nobody → jfkthame
Flags: needinfo?(mstange)
Performance Impact: --- → ?
Whiteboard: [qf]
You need to log in before you can comment on or make changes to this bug.