Serious regression in startup time with many lots of tabs in session restore

RESOLVED FIXED

Status

()

Firefox
Tabbed Browser
RESOLVED FIXED
2 years ago
2 months ago

People

(Reporter: glandium, Unassigned)

Tracking

({perf})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
(Putting in tabbrowser, but this could be plenty of different things)

I've found that there was a recent perceived regression recently (as in, since a few releases ; I usually run aurora, but stuck at the beginning of the cycle, then switch to the first beta before switching to the next aurora ; I'm currently on 45b2 and will go on 46) from my use of many tabs, but I hadn't quantify it so far. So I created a new profile and filled with 1000+ (1105, exactly) tabs over 3 windows, all of which are about:newtab except 1 on about:startup (I installed the addon), and 1 on about:tabs (I installed the addon). Both of those are the selected tab on one window, so after session restore, I have one window with 600+ tabs and about:startup selected, one window with 400+ tabs and about:newtab selected, and one windows with about:tabs. Only those 3 tabs are loaded.

Then I used mozregression --launch <date> -p /path/to/that/profile, which, as I understand it, creates a fresh clone of the profile for every launch of the browser.

I ran it for 2016-02-02 (47.0a1), and the sessionRestored timestamp was around 40s.
I ran it for 2015-02-02 (38.0a1), and the sessionRestored timestamp was around 30s.

That would put my perceived regression in that range. I haven't gone through a bisection yet, but I went further back in time, and was shocked:

2014-02-02 (29.0a1): 20s
2013-02-02 (21.0a1): ... holy shit batman, less than five freaking seconds.

So we went, in 3 years, from more or less bearable to "I don't want to have to restart this browser". Sadly, since we can't unload tabs, after a while, a session with 1000+ tabs and ~50 loaded ends up sucking CPU and being slow overall.

Some more data points:

[2013-02-02 (21.0a1): <5s]
2013-06-02 (24.0a1): 7s
2013-07-02 (25.0a1): 8s
2013-08-02 (25.0a1): 11s
2013-09-02 (26.0a1): 7s (woohoo!)
2013-10-02 (27.0a1): 7s
2013-11-02 (28.0a1): 8s
2013-12-02 (28.0a1): 9s first time the tabs look like australis.
2014-01-02 (29.0a1): 9s
[2014-02-02 (29.0a1): 20s]
2014-03-02 (30.0a1): 21s
2014-04-02 (31.0a1): 24s
2014-05-02 (32.0a1): 24s
2014-06-02 (32.0a1): 23s
2014-07-02 (33.0a1): 23s
2014-08-02 (34.0a1): 22s
2014-09-02 (35.0a1): 24s
2014-10-02 (35.0a1): 27s
2014-11-02 (36.0a1): 28s
2014-12-02 (37.0a1): 31s
2015-01-02 (37.0a1): 33s
[2015-02-02 (38.0a1): 30s]
2015-03-02 (39.0a1): 30s
2015-04-02 (40.0a1): 30s
2015-05-02 (40.0a1): 85s (hey, what am I complaining about, it has been worse than it currently is!)
2015-06-02 (41.0a1): 39s
2015-07-02 (42.0a1): 40s
2015-08-02 (42.0a1): 43s
2015-09-02 (43.0a1): 40s
2015-10-02 (44.0a1): 45s
2015-11-02 (45.0a1): 39s
2015-12-02 (45.0a1): 46s
2016-01-02 (46.0a1): 47s
[2016-02-02 (47.0a1): 40s]

Note those numbers don't tell the whole story either, because even after the sessionRestored event, you can still have to wait before the UI is actually responsive, which /feels/ proportional to the time spent restoring the session (i.e. the longer the session restore, the longer it takes after it for the UI to be responsive). The time it takes to start displaying the various windows varies, too (as well as how they are painted).

(And I'm not even talking about the time it takes to close that session)

All timings from Linux64 builds.

Note the numbers starting from 2015-05-02 up to 2015-11-02 are funky, the UI was unresponsive like crazy looooooong after that, and it looks like the content process crashed in a lot of those startups, I didn't look why.

When I say the numbers don't tell the whole story, here is what I mean, if I take the last built I tested, which is 2016-02-02, the timeline of it starting up looks like this, approximately:
- 5s: First window (600+ tabs) appears
- 11s: Throbber appears in one tab and the arrow to scroll the tab bar is lit
- 29s: Second window (1 tab) appears
- 35s: Third window (400+ tabs) appears
- 40s: Throbber appears in one tab and the arrow to scroll the tab bar is lit
- 57s: Content appears in first window
- 58s: Content appears in second window
- 59s: Content appears in third window
- 59s: UI is responsive.

Between each of these steps everything is frozen. Throbbers don't move.
The sessionRestore timer in about:startup seems to correspond to the throbber appearing in the third window. As you can see, that's not when things are good for the user.
Keywords: perf
(Reporter)

Comment 1

2 years ago
Created attachment 8715860 [details]
Flame graph

Here is a flame graph I got with perf ; sadly, this includes both startup and shutdown.
(Reporter)

Comment 2

2 years ago
http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html here's a description of what the flame graph represents.
Would it be possible to get a Gecko Performance profile from your case as well?

Cc'ing jesup who was mentioning slow session restore earlier in #developers.
Flags: needinfo?(mh+mozilla)
(Reporter)

Comment 4

2 years ago
Probably better, a profile with the Gecko profiler: https://cleopatra.io/#report=4535fdc5b3a7695decb7947af664c8156335c7b1
(Reporter)

Updated

2 years ago
Flags: needinfo?(mh+mozilla)
(Reporter)

Comment 5

2 years ago
Actually not better, it seems like it misses most of the startup...
I'm seeing a number of samples inside what appears to be SDK code. Can we see what add-ons you have enabled?
(Reporter)

Comment 7

2 years ago
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #6)
> I'm seeing a number of samples inside what appears to be SDK code. Can we
> see what add-ons you have enabled?

About startup, and Tab stats, as stated in comment 0, none of which use the SDK, and, well the gecko profiler, which does.
(Reporter)

Comment 8

2 years ago
Okay, this one is more complete, time-wise: https://cleopatra.io/#report=741550c32537364521c6243de6ce12b92ba6021b

Let me see if I can manage to get one with 1ms intervals with the same timespan.
(Reporter)

Comment 9

2 years ago
(In reply to Mike Hommey [:glandium] from comment #8)
> Okay, this one is more complete, time-wise:
> https://cleopatra.io/#report=741550c32537364521c6243de6ce12b92ba6021b

It bugs me that we're spending so much time on WindowDestroyedEvent::Run during startup... What windows are being destroyed when restoring a session, and how many of them to justify spending 36% of the profile on that?
(Reporter)

Comment 10

2 years ago
So this pretty much is a dupe of bug 1135214, although considering the information in comment 0, it's probably better to dupe in the other direction.

(In reply to Mike Hommey [:glandium] from comment #9)
> (In reply to Mike Hommey [:glandium] from comment #8)
> > Okay, this one is more complete, time-wise:
> > https://cleopatra.io/#report=741550c32537364521c6243de6ce12b92ba6021b
> 
> It bugs me that we're spending so much time on WindowDestroyedEvent::Run
> during startup... What windows are being destroyed when restoring a session,
> and how many of them to justify spending 36% of the profile on that?

This part is bug 989816.
(Reporter)

Comment 11

2 years ago
(In reply to Mike Hommey [:glandium] from comment #10)
> > It bugs me that we're spending so much time on WindowDestroyedEvent::Run
> > during startup... What windows are being destroyed when restoring a session,
> > and how many of them to justify spending 36% of the profile on that?
> 
> This part is bug 989816.

And that's probably the first dip around FF29 in comment 0 data.
(Reporter)

Comment 12

2 years ago
And as per comments in bug 989816, I get down to 22s with a simple (but dangerous) patch, which unfold 2 years of regressions.
(Reporter)

Comment 13

2 years ago
Here is a new profile with bug 989816 out of the way: https://cleopatra.io/#report=1b52d5ac33b5530f853d904a53d86807e1509a89
(Reporter)

Comment 14

2 years ago
Death by thousand cuts... in the form of many .init functions called from content-sessionStore.js, content.js, browser-content.js, tab-content.js, etc. for each and every tab.
(Reporter)

Comment 15

2 years ago
My guess is that those are the same reason each of those non-loaded tabs are sucking memory. So, in fact, I'd expect fixing one of the problems to fix both...

Comment 16

2 years ago
(In reply to Mike Hommey [:glandium] from comment #15)
> My guess is that those are the same reason each of those non-loaded tabs are
> sucking memory. So, in fact, I'd expect fixing one of the problems to fix
> both...

Folks are working on making the browser creation here lazy in bug 906076. That will likely fix quite a lot of this. Unfortunately it's also quite likely it would get "unfixed" by lots of add-ons (ie quite some add-ons might need to be updated to not iterate over browsers in a way that triggers creating them for non-loaded tabs).

Comment 17

2 years ago
That said, it's also possible that using a single-process script instead of a separate script for each tab could be an option for some of the objects you're looking at.

Comment 18

2 months ago
How are things with release 56 and pre-release 57? 

The step from 54 to 55 should have brought significant improvements, provided you have `true` for 

browser.sessionstore.restore_on_demand

– and so on.
(Reporter)

Comment 19

2 months ago
Heh, I forgot I had filed this bug. It is, indeed, very much better since 54 or 55.
Status: NEW → RESOLVED
Last Resolved: 2 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.