tab switching to gmail with theme is very slow on mac

NEW
Unassigned

Status

()

P3
normal
a year ago
9 months ago

People

(Reporter: bkelly, Unassigned, NeedInfo)

Tracking

(Depends on: 1 bug, {perf})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qf:p3] [e10s-multi:-] [gfx-noted])

I'm running 55.0a1 (2017-05-17) (64-bit) on a MacBook Pro (15-inch, 2016).  I'm seeing pretty bad tab switch times when going to gmail.  See this profile:

https://perfht.ml/2quKq4w

This is for switching from example.com in one tab to my mozilla mail in another tab.  It shows:

* 586ms GC major... not sure if this is contiguous or many slices
* 18ms to build layers
* 131ms rasterize
* 91ms layer transaction
* 21ms composite and many other large composites

This is with the "dark" gmail theme applied.  If I remove the theme it seems somewhat better:

https://perfht.ml/2quHaGf

I realize this may be an issue with the theme, but observing some of my family members themes seem popular.

Filing this in graphics because of all the missed frame budgets there.  Its unclear to me if the GC major is a problem or not.
The large rasterize time is caused by creating a whole bunch of new tiles. I'm guessing this might have become worse with multi-e10s because we either need to keep the tile pool per process live and take the memory hit or recreate it when you switch from one content process to the other.
That long GC is from another child process, no?
(Also, that GC has several slices, and just waiting for input between them)
Looks like the UI for showing GCs is quite misleading. It looks as if there was something processed all the time, yet when hovering one can see the slices.
(Reporter)

Comment 3

a year ago
(In reply to Olli Pettay [:smaug] from comment #2)
> That long GC is from another child process, no?
> (Also, that GC has several slices, and just waiting for input between them)
> Looks like the UI for showing GCs is quite misleading. It looks as if there
> was something processed all the time, yet when hovering one can see the
> slices.

Ok, let ignore the GC parts of the profile, then.  My confusion with the profiler.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #1)
> The large rasterize time is caused by creating a whole bunch of new tiles.
> I'm guessing this might have become worse with multi-e10s because we either
> need to keep the tile pool per process live and take the memory hit or
> recreate it when you switch from one content process to the other.

FWIW, I saw this repeatedly when switching back and forth between the same two tabs.  So this wasn't something like a process I had not viewed in a long time.

I'll add some multi-e10s folks to the CC list since there is a suspicion this is worse with multi-e10s.  I'll try to retest with fewer content processes as well.
(Reporter)

Comment 4

a year ago
Note, since I'm only seeing this on mac it may not be showing up in our multi-e10s experiment data.  My understanding is we had low number of samples from the experiment on mac.
(Reporter)

Comment 5

a year ago
I retested:

1 content process: https://perfht.ml/2ruVNqo
2 content processes: https://perfht.ml/2ruKaj3

Not a huge difference although the single-e10s seems about 20ms faster.  I shutdown all my other tabs this time, so I really only had 2 processes.

Also I noticed that this pause mainly triggers when I have the mail list open.  If I have the message body opened, then tab switching seems fast.

Are we maybe creating too many layers for the message list?
So even in the 1 content process case we're still allocating a bunch of tiles so maybe we're just not keeping enough around. I'll see if I can dig up something that will give us more information on the size of the tile pool and the size requested.
Flags: needinfo?(jmuizelaar)
Whiteboard: [qf] → [qf] [e10s-multi:?]
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #5)
> I retested:
> 
> 1 content process: https://perfht.ml/2ruVNqo
> 2 content processes: https://perfht.ml/2ruKaj3

For the high level overview the difference is:
nsDisplayList::PaintRoot regressed from 129ms to 138ms

Mainly from ClientTiledPaintedLayer::InvalidateRegion which regressed from 9ms to 16ms

(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)
> So even in the 1 content process case we're still allocating a bunch of
> tiles so maybe we're just not keeping enough around. I'll see if I can dig
> up something that will give us more information on the size of the tile pool
> and the size requested.

Yeah this is what it looks like, the regressions is mostly coming from _moz_pixman_region32_init_rects (from ClientTiledPaintedLayer::InvalidateRegion).

Since we will have to decide if we should block the release or not on this, could you give us an ETA on this work? Also, do you think it will be an upliftable change?
Priority: -- → P3
Whiteboard: [qf] [e10s-multi:?] → [qf] [e10s-multi:?] [gfx-noted]
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(jmuizelaar)
Whiteboard: [qf] [e10s-multi:?] [gfx-noted] → [qf] [e10s-multi:-] [gfx-noted]
Whiteboard: [qf] [e10s-multi:-] [gfx-noted] → [qf:p3] [e10s-multi:-] [gfx-noted]
(Reporter)

Comment 8

a year ago
I guess I'm surprised tab switch on mac wouldn't be considered a higher qf priority.  I know its not where our main user population is, but a lot of web developers use it.

Anyway, here's a profile of switching between two bugzilla tabs:

https://perfht.ml/2sbwlqj

Rasterization and LayerTransaction still completely blow our frame budget, even though its not as bad as the gmail case.

Jeff, are there some prefs or constants I can play with to see if it helps on my machine?
Increasing layers.tile-initial-pool-size or layers.tile-pool-unused-size might help.
Flipping layers.componentalpha.enabled to false should also help.

And bug 1265824 should eliminate the rest of the slowness.
(Reporter)

Comment 11

a year ago
(In reply to Markus Stange [:mstange] from comment #9)
> Increasing layers.tile-initial-pool-size or layers.tile-pool-unused-size
> might help.

These did not help.  Here are some profiles using different values:

layers.tile-initial-pool-size=50
https://perfht.ml/2s6CFlU

layers.tile-initial-pool-size=75
https://perfht.ml/2s6VLIA

layers.tile-initial-pool-size=100
https://perfht.ml/2s6y2se

layers.tile-initial-pool-size=100
layers.tile-pool-unused-size=50
https://perfht.ml/2s6RqFh
(Reporter)

Comment 12

a year ago
(In reply to Markus Stange [:mstange] from comment #10)
> Flipping layers.componentalpha.enabled to false should also help.

This helped a lot!

https://perfht.ml/2s6zVVJ
https://perfht.ml/2s6JZ15

Is this something we would consider flipping to false by default?  What is the long term fix here?
Flags: needinfo?(mstange)
(Reporter)

Comment 13

a year ago
The profiles in comment 12 had the layers.tile.* prefs set to 100/50.  Here is one with those reset:

https://perfht.ml/2s6zjiQ

An improvement, but maybe not as quite as large.
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #12)
> Is this something we would consider flipping to false by default?  What is
> the long term fix here?

The medium term fix is to disable component alpha on HiDPI displays (originally bug 941095, has been discussed in bug 1366618 again). The long term fix is webrender, which will not need component alpha layers in order to support subpixel text anti-aliasing.
Flags: needinfo?(mstange)
(Reporter)

Updated

a year ago
Depends on: 1366618
(Reporter)

Comment 15

a year ago
Ok, thanks.  I had to set layers.componentalpha.enabled false on my 2016 macbook pro.  It felt very sluggish without the pref change and I noticed on a daily basis.
(Reporter)

Comment 16

a year ago
This is the issue we spoke about this morning where Firefox Nightly feels sluggish on my 2016 MacBook Pro.
Flags: needinfo?(milan)
Let's where the conversation goes in bug 1366618.
Flags: needinfo?(milan)
You need to log in before you can comment on or make changes to this bug.