Last Comment Bug 394492 - SessionStore API performance issues with large number of windows and tabs
: SessionStore API performance issues with large number of windows and tabs
Status: NEW
[Snappy:P2]
: meta, perf
Product: Firefox
Classification: Client Software
Component: Session Restore (show other bugs)
: unspecified
: All All
: -- normal with 13 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 471089 498179 (view as bug list)
Depends on:
Blocks: 447581 sessionRestoreJank 396375 402768
  Show dependency treegraph
 
Reported: 2007-08-31 13:58 PDT by Michael Kraft [:morac]
Modified: 2015-10-31 17:36 PDT (History)
27 users (show)
ttaubert: firefox‑backlog+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Zipped sessionstore.js with 24 windows and 228 tabs. (7.47 KB, application/zip)
2007-08-31 18:50 PDT, Michael Kraft [:morac]
no flags Details
Starting memory use (268.15 KB, image/jpeg)
2007-08-31 18:52 PDT, Michael Kraft [:morac]
no flags Details
Memory use after 24 window and 228 tabs restored (170.34 KB, image/jpeg)
2007-08-31 18:56 PDT, Michael Kraft [:morac]
no flags Details
Memory use after all but 1 tab/window closed and browser data cleared (272.26 KB, image/jpeg)
2007-08-31 18:57 PDT, Michael Kraft [:morac]
no flags Details
A good example of a sessionstore.js file that causes severe performance issues (158.10 KB, application/zip)
2007-09-01 00:27 PDT, Michael Kraft [:morac]
no flags Details
JProf of startup with the last testcase (1.54 MB, text/html)
2011-07-01 23:14 PDT, [:jesup] on pto until 2016/8/1 Randell Jesup
no flags Details

Description Michael Kraft [:morac] 2007-08-31 13:58:41 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6

I've been getting a number of bug reports against the Session Manager extension detailing issues loading sessions with large numbers of windows and tabs.  The following issues have been documented:

1. Slow load time.  This can also cause javascript timeout prompts when pages take too long to load.  When a Session is loaded it appears to open all the windows at once and then load 1 tab per window for each window, repeating until all windows have been loaded.  This doesn't seem very efficient and it would probably make more sense to load one window at a time.

2. Not all tabs/windows will load if there are a large number of windows and tabs in a session.  This seems to happen haphazardly. May be related to #1 above or the memory issue below.

Something else I've noticed is that if I load a session with a fair number of windows and tabs it may load relatively quickly and only use about 100 MB of memory.  If I load it again, it will load slower and Firefox will use more memory.  Load it again and it will use even more memory.  I managed to get a nearly clean copy of Firefox to use around 1 GB of memory just by repeating this process (even after closing all open windows/tabs).  This seems to indicate a memory leak, though it is probably in the browser code and not the SessionStore code.  

See also the following:
https://www.mozdev.org/bugs/show_bug.cgi?id=17664
https://www.mozdev.org/bugs/show_bug.cgi?id=17553


I do realize that the processing power and memory size will limit Firefox's ability to restore tabs and windows, but I ran tests on a fairly powerful machine and still had issues so it appears the functionality could be optimized.


Reproducible: Sometimes

Steps to Reproduce:
1. Open tons of windows and tabs and then restart Firefox
2.
3.
Actual Results:  
Takes a very long time to load, even on a fairly fast machine with lots of memory.  Sometimes not all tabs/windows will load.

Expected Results:  
All tabs/windows should be restored in a reasonable amount of time.

I ran my tests on 2 different machines:

Pentium 4 3 GhZ - 1 GB RAM
Pentium M 1.86 GHz - 2 GB RAM

Test results were similar.
Comment 1 Simon Bünzli 2007-08-31 14:34:52 PDT
Please file one bug per individual issue and make them block this meta-bug.

What will also help is if you could attach a sessionstore.js which causes trouble as a reference.
Comment 2 Michael Kraft [:morac] 2007-08-31 18:50:53 PDT
Created attachment 279193 [details]
Zipped sessionstore.js with 24 windows and 228 tabs.

Well I had a very good one that someone sent me, but it contained personal data so I ended up erasing it.  I put together something on my own that's not as good, but it does have 24 windows and 228 tabs.  I'm trying to get someone to send me a better example.

When I tested this in Firefox 2.0.0.6 with no plugins (searching for plugins disabled) or addons installed, it loads (eventually). 

After it loaded, I closed out all the windows and tabs and cleared all the private data and Firefox was using 75 MB of memory (63 MB VM).  Compared with a starting browser state of 24 MB (13 MB VM) and 222 MB (211 MB VM) with all the windows/tabs open. Like I mentioned I think this is a memory leak in Firefox itself and not the Session Store API, but there are a large number of memory leak bugs already open.
Comment 3 Michael Kraft [:morac] 2007-08-31 18:52:48 PDT
Created attachment 279194 [details]
Starting memory use
Comment 4 Michael Kraft [:morac] 2007-08-31 18:56:39 PDT
Created attachment 279195 [details]
Memory use after 24 window and 228 tabs restored
Comment 5 Michael Kraft [:morac] 2007-08-31 18:57:34 PDT
Created attachment 279196 [details]
Memory use after all but 1 tab/window closed and browser data cleared
Comment 6 Michael Kraft [:morac] 2007-09-01 00:27:06 PDT
Created attachment 279228 [details]
A good example of a sessionstore.js file that causes severe performance issues

Someone sent me this stored session.  It only has 10 windows with a total of 96 tabs, but it nearly brings Firefox to its knees because it is about 2.1 MB in size (mostly seems to be cookies).
Comment 7 Wayne Mery (:wsmwk, NI for questions) 2007-12-08 14:57:56 PST
lots of js tabs can be problematic IMO

might networking load cause problems, both external (as in causing a server to throttle you) and internal (as in windows or gecko not been efficient enough for the load)?  I have an example that needs retesting.  On big session restarts on a beefy laptop (wireless to cablemodem) I generally have no problem.  But when it was logged in through a VPN, forcing all traffic through the VPN box and a different DNS server - potentially slowing the pipe as it were - then I had trouble with restore. Many of my tabs failed to load perhaps because of dns timeouts, these tabs gave a message to the effect of site or server not found.

another thought - might it be possible, in extreme cases, to prompt the user for how much to restore in order to improve startup time?  For example ignore tab history, don't load images (ala the netscape days), etc
Comment 8 Simon Bünzli 2008-12-27 16:50:59 PST
*** Bug 471089 has been marked as a duplicate of this bug. ***
Comment 9 Simon Bünzli 2009-06-15 12:07:22 PDT
*** Bug 498179 has been marked as a duplicate of this bug. ***
Comment 10 Michael Kraft [:morac] 2009-08-16 20:58:13 PDT
Would throttling the number of windows/tabs that open per second work here?  Currently the restoreWindow function just opens the windows and restores things as fast as it possible can.  If there was some kind of artificial user settable delay added I think that would satisfy cases where people either don't have powerful enough machines or fast enough connections to load all windows and tabs nearly simultaneously.

Another possibility would be to open all the windows and tabs just as it's done now, but instead of actually loading the tabs, simply store the session state in the tab and have a button which when clicked restores the tab's state.  Basically it would behave similar to the reload button page that shows up when a tab fails to load, except in this case when the button is pressed it would restore the session data into the tab.
Comment 11 Robert Bradbury 2009-08-16 22:51:04 PDT
It is questionable whether Bug #498179 should be marked as a duplicate of this bug due to (a) time difference (> 1 year); (b) Firefox/Mozilla version changes (2.0 --> 3.0); and (c) different operating systems (Windows vs. Linux).

It is also true that if one uses "old" or "variable" URLs (news pages, etc) in the problematic sessions, it may be difficult or impossible to reproduce the problematic sessions (the load a URL places on the network and/or CPU can vary widely over time).

In particular, I would cite 2 differences in Firefox 3.0 which tend to make the problem even worse than it was in 2.0.  (1) There now appears to be a "timer" on Javascripts that attempts to detect Javascripts that are running amok.  This timer appears to be a major contributor to CPU use (at least if one monitors poll() and/or gettimeofday() calls) -- there are one or more bugs filed that deal with this problem; and (2) NoScript (which is usually active in my sessiosn) has become quite a bit more complex and potentially more CPU intensive than it was a year ago).

With all of the above as caveats, I generally agree with Michael on possible solutions.  These may include:

(1) Mostly static tab-state gifs (Unaccessed / Partially loaded / Complete) which indicate tab load state.  These could be nothing more than a colored square which changes color/brightness as the estimated page load completes. Minimize the X communications and CPU load as much as possible during session restarts -- the network & CPU can be maxed for 5-15 minutes for large sessions -- no need to make additional work (and increase global warming) just for "eye candy".

(2) There should be user-constrainable (prefs.js) settings on *both* max-CPU use and max-network use.  I believe that Opera now has user determined constraints on the network use which is an advantage in my book over Firefox.  When I restore a complex session priority should last-in, first-out (pages most recently accessed should appear first), excepting that an active tab (and secondarily the window it is in) should move to the "head" of the pending network & CPU queues.  Being able to simply constrain session restores to 1 active network connection (DNS lookups and page I/O's) would go a long way towards fixing this problem until a more permanent solution could be developed (If in addition the spinners are "static" on all non-active tabs.)

(3) The third level should be to postpone all non-essential CPU / network activities until the reload is complete.  No "active" plug-ins / extensions / Javascripts *until* all first level page loads are complete.  By first level I mean the top level "essential" URLs to give the user an idea of a page contents -- i.e. text, maybe style sheets, then images.  Once those are all complete you can activate Javascript, extensions, News tickers (RSS feeds), etc.

It should be noted (relating to #'s 2 & 3) that Opera seems to be way ahead of Firefox in these areas in that it seems to now have the concept of "effective" browsing for slow connections (with user control over how this works).  It should be fairly obvious that "slow" connections can also equate to "congested" connections (3G & 4G phone networks come to mind).  If one is browsing google result pages one is largely going to know whether the page is of interest by looking at the first paragraph or two (the same is true for academic abstracts) -- the goal for both active browsing as well as session restores should be to get as much critical information on the screen as fast as possible and leave the bells and whistles as background (as time/bandwidth permits) pursuits.
Comment 12 Michael Kraft [:morac] 2009-11-09 19:43:00 PST
It would be nice to get some movement on this bug, though if throttling isn't going to be used, then to really fix it properly would probably require threading.
Comment 13 Dietrich Ayala (:dietrich) 2009-11-12 20:39:28 PST
Can someone please use something like Shark or DTrace or CodeAnalyst to figure out where time is *actually* being spent loading the testcase? That's the first step to making progress here.

https://developer.mozilla.org/Profiling_with_AMD_CodeAnalyst
https://developer.mozilla.org/en/Profiling_JavaScript_with_Shark
https://wiki.mozilla.org/Performance/Optimizing_JavaScript_with_DTrace
Comment 14 Michael Kraft [:morac] 2009-11-12 23:14:58 PST
Unfortunately I don't think any of those will run on my machine which is a Windows machine with an Intel processor.
Comment 15 Dietrich Ayala (:dietrich) 2009-11-12 23:31:30 PST
CodeAnalyst should still work.
Comment 16 Robert Bradbury 2010-03-03 10:38:01 PST
It may be worth reading Chrome Issues #32061, #32165 and #30933, which deal with the same problem -- namely that session restores pay no attention to the resources they use and the load they place on the system.

My short way to fix this:
1) Disable all Javascript "running" until all pages are loaded.
2) Change the "spinner"/"throbber" from an active icon to a staged set of static gifs (display a different image at different stages of the load process).
3) Set a maximum active thread limit which is constrained by the network bandwidth (e.g. dial-up << DSL << cable << FIOS+).  One could expand this to include a max-CPU limit (Firefox load <= X% (where X=60-80?) of available CPU).

This will prevent CPU time from being wasted on starting network connections, spinners or Javascripts which *will* result in timeouts.  Continuation of the current model -- starting up all windows/tabs/thread and letting them run concurrently is guaranteed to waste CPU/network resources -- until that problem is resolved this is likely to remain problematic.  I regularly produce sessions that take 15+ minutes to restore -- I could easily produce one that would take an hour or two (I've got a lot of swap space on my machine) [1].

One problem with attempting to reproduce this using "non-local" pages is that one has no control over the connection timeout settings on the web servers.  Web servers will timeout (and effectively hangup) web connections which are open but not responding (as will be the case with a busy session restore).  One can imagine that busy web servers set these connection timeouts to shorter periods than less loaded web servers.  But the server managers are free to vary these timeouts on hourly, daily or monthly basis so precise reproduction of this problem is a very unlikely situation.  Using local pages is unlikely to provide the same symptoms as the DNA lookups and page downloads are likely to be very fast and one will saturate the network link for a brief period followed by saturation of the CPU while the pages are redrawn.

There is one interesting difference between chrome and firefox in this area -- a chrome restore is somewhat less likely to saturate the network link, I suspect because the multi-process startup/switching tends to generate some periods where fewer network requests are pending.

1. It is also worth noting that people would care a *lot* less how long a session restore took if there were a concept of a "priority" (active) window/tab which pre-empted all the other windows/tabs/threads.  I don't care if a session restore takes an hour as long as the new window I just created behaves like its in a brand new browser session.
Comment 17 [:jesup] on pto until 2016/8/1 Randell Jesup 2011-07-01 23:14:11 PDT
Created attachment 543578 [details]
JProf of startup with the last testcase

jprof of startup with profile from attachment 279228 [details]; trunk build pulled today.

Not sure I'd categorize it as "severe performance problems" with current builds
Comment 18 Michael Kraft [:morac] 2011-07-02 07:16:54 PDT
A number of performance features have been added since this bug was filed such as progressive tab loading and increased JavaScript performance.  Performance also depends on CPU speed and amount of free RAM.  On lower end machines, I can still bring Firefox trunk loads to a grinding halt.
Comment 19 Dietrich Ayala (:dietrich) 2011-11-28 14:28:59 PST
Since the most recent comment, we've moved to a model where tabs are not actually loaded until accessed. Michael, can you test with a Nightly build?

Taras: I'm removing the "P1" here because the data from Test Pilot shows that test-cases like this are *far* outside the norm.
Comment 20 (dormant account) 2011-11-28 16:02:45 PST
(In reply to Dietrich Ayala (:dietrich) from comment #19)
> Since the most recent comment, we've moved to a model where tabs are not
> actually loaded until accessed. Michael, can you test with a Nightly build?
> 
> Taras: I'm removing the "P1" here because the data from Test Pilot shows
> that test-cases like this are *far* outside the norm.

Can we get telemetry probes to confirm this? Test pilot is far from representative of overall population, telemetry is slightly better and will get better once we do it by default on nightlies.
Comment 21 Paul O'Shannessy [:zpao] (not reading much bugmail, email directly) 2011-11-28 17:27:20 PST
(In reply to Taras Glek (:taras) from comment #20)
> (In reply to Dietrich Ayala (:dietrich) from comment #19)
> > Since the most recent comment, we've moved to a model where tabs are not
> > actually loaded until accessed. Michael, can you test with a Nightly build?
> > 
> > Taras: I'm removing the "P1" here because the data from Test Pilot shows
> > that test-cases like this are *far* outside the norm.
> 
> Can we get telemetry probes to confirm this? Test pilot is far from
> representative of overall population, telemetry is slightly better and will
> get better once we do it by default on nightlies.

Yea, let's figure out what exactly we need and make that happen in bug 671041 (if it makes sense), though I'm wont to believe the Test Pilot numbers Dietrich is talking about.
Comment 22 (dormant account) 2011-12-01 12:21:29 PST
Dietrich I'd like to keep this as P2 until we have telemetry data that can show otherwise. Marking as P2 because it shouldn't block other work, but it would be nice to have this.
Comment 23 Tim Taubert [:ttaubert] 2014-04-23 07:24:29 PDT
This is a meta bug that doesn't go into the backlog.
Comment 24 Tim Taubert [:ttaubert] 2014-04-23 07:29:52 PDT
Taras, David, what do you both think about closing this bug? We restore on demand by default for a while now and since this bug was filed sessionstore also started to load at most three tabs concurrently when restoring a multi-window session. I'm not sure what telemetry measurements exactly we were talking about here but I don't know of any cases where single tabs don't load or show timeout dialogs since we have cascaded restore.
Comment 25 (dormant account) 2014-04-23 10:23:44 PDT
(In reply to Tim Taubert [:ttaubert] from comment #24)
> Taras, David, what do you both think about closing this bug? We restore on
> demand by default for a while now and since this bug was filed sessionstore
> also started to load at most three tabs concurrently when restoring a
> multi-window session. I'm not sure what telemetry measurements exactly we
> were talking about here but I don't know of any cases where single tabs
> don't load or show timeout dialogs since we have cascaded restore.

This is more of a Vladan question.
Comment 26 Vladan Djeric (:vladan) 2014-04-24 14:37:51 PDT
glandium used to keep very large sessions until recently and he says that he has seen more than 3 or 4 tabs being loaded simultaneously, as well as some tabs not loading at all (with "server not found" errors)

I think this bug is still valid. Can we try reproducing this again?
Comment 27 Tim Taubert [:ttaubert] 2014-04-25 05:03:05 PDT
Ok, let's put this into the backlog and find some time to investigate whether this is still valid and reproducible.

Note You need to log in before you can comment on or make changes to this bug.