Closed Bug 394492 Opened 17 years ago Closed 5 years ago

SessionStore API performance issues with large number of windows and tabs

Categories

(Firefox :: Session Restore, defect)

defect
Not set
normal

Tracking

()

RESOLVED INACTIVE

People

(Reporter: morac, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: meta, perf, Whiteboard: [Snappy:P2][fxperf])

Attachments

(7 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6

I've been getting a number of bug reports against the Session Manager extension detailing issues loading sessions with large numbers of windows and tabs.  The following issues have been documented:

1. Slow load time.  This can also cause javascript timeout prompts when pages take too long to load.  When a Session is loaded it appears to open all the windows at once and then load 1 tab per window for each window, repeating until all windows have been loaded.  This doesn't seem very efficient and it would probably make more sense to load one window at a time.

2. Not all tabs/windows will load if there are a large number of windows and tabs in a session.  This seems to happen haphazardly. May be related to #1 above or the memory issue below.

Something else I've noticed is that if I load a session with a fair number of windows and tabs it may load relatively quickly and only use about 100 MB of memory.  If I load it again, it will load slower and Firefox will use more memory.  Load it again and it will use even more memory.  I managed to get a nearly clean copy of Firefox to use around 1 GB of memory just by repeating this process (even after closing all open windows/tabs).  This seems to indicate a memory leak, though it is probably in the browser code and not the SessionStore code.  

See also the following:
https://www.mozdev.org/bugs/show_bug.cgi?id=17664
https://www.mozdev.org/bugs/show_bug.cgi?id=17553


I do realize that the processing power and memory size will limit Firefox's ability to restore tabs and windows, but I ran tests on a fairly powerful machine and still had issues so it appears the functionality could be optimized.


Reproducible: Sometimes

Steps to Reproduce:
1. Open tons of windows and tabs and then restart Firefox
2.
3.
Actual Results:  
Takes a very long time to load, even on a fairly fast machine with lots of memory.  Sometimes not all tabs/windows will load.

Expected Results:  
All tabs/windows should be restored in a reasonable amount of time.

I ran my tests on 2 different machines:

Pentium 4 3 GhZ - 1 GB RAM
Pentium M 1.86 GHz - 2 GB RAM

Test results were similar.
Please file one bug per individual issue and make them block this meta-bug.

What will also help is if you could attach a sessionstore.js which causes trouble as a reference.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: meta
Well I had a very good one that someone sent me, but it contained personal data so I ended up erasing it.  I put together something on my own that's not as good, but it does have 24 windows and 228 tabs.  I'm trying to get someone to send me a better example.

When I tested this in Firefox 2.0.0.6 with no plugins (searching for plugins disabled) or addons installed, it loads (eventually). 

After it loaded, I closed out all the windows and tabs and cleared all the private data and Firefox was using 75 MB of memory (63 MB VM).  Compared with a starting browser state of 24 MB (13 MB VM) and 222 MB (211 MB VM) with all the windows/tabs open. Like I mentioned I think this is a memory leak in Firefox itself and not the Session Store API, but there are a large number of memory leak bugs already open.
Attached image Starting memory use
Someone sent me this stored session.  It only has 10 windows with a total of 96 tabs, but it nearly brings Firefox to its knees because it is about 2.1 MB in size (mostly seems to be cookies).
Blocks: 396375, 402768
Keywords: perf
lots of js tabs can be problematic IMO

might networking load cause problems, both external (as in causing a server to throttle you) and internal (as in windows or gecko not been efficient enough for the load)?  I have an example that needs retesting.  On big session restarts on a beefy laptop (wireless to cablemodem) I generally have no problem.  But when it was logged in through a VPN, forcing all traffic through the VPN box and a different DNS server - potentially slowing the pipe as it were - then I had trouble with restore. Many of my tabs failed to load perhaps because of dns timeouts, these tabs gave a message to the effect of site or server not found.

another thought - might it be possible, in extreme cases, to prompt the user for how much to restore in order to improve startup time?  For example ignore tab history, don't load images (ala the netscape days), etc
OS: Windows XP → All
Hardware: PC → All
Would throttling the number of windows/tabs that open per second work here?  Currently the restoreWindow function just opens the windows and restores things as fast as it possible can.  If there was some kind of artificial user settable delay added I think that would satisfy cases where people either don't have powerful enough machines or fast enough connections to load all windows and tabs nearly simultaneously.

Another possibility would be to open all the windows and tabs just as it's done now, but instead of actually loading the tabs, simply store the session state in the tab and have a button which when clicked restores the tab's state.  Basically it would behave similar to the reload button page that shows up when a tab fails to load, except in this case when the button is pressed it would restore the session data into the tab.
It is questionable whether Bug #498179 should be marked as a duplicate of this bug due to (a) time difference (> 1 year); (b) Firefox/Mozilla version changes (2.0 --> 3.0); and (c) different operating systems (Windows vs. Linux).

It is also true that if one uses "old" or "variable" URLs (news pages, etc) in the problematic sessions, it may be difficult or impossible to reproduce the problematic sessions (the load a URL places on the network and/or CPU can vary widely over time).

In particular, I would cite 2 differences in Firefox 3.0 which tend to make the problem even worse than it was in 2.0.  (1) There now appears to be a "timer" on Javascripts that attempts to detect Javascripts that are running amok.  This timer appears to be a major contributor to CPU use (at least if one monitors poll() and/or gettimeofday() calls) -- there are one or more bugs filed that deal with this problem; and (2) NoScript (which is usually active in my sessiosn) has become quite a bit more complex and potentially more CPU intensive than it was a year ago).

With all of the above as caveats, I generally agree with Michael on possible solutions.  These may include:

(1) Mostly static tab-state gifs (Unaccessed / Partially loaded / Complete) which indicate tab load state.  These could be nothing more than a colored square which changes color/brightness as the estimated page load completes. Minimize the X communications and CPU load as much as possible during session restarts -- the network & CPU can be maxed for 5-15 minutes for large sessions -- no need to make additional work (and increase global warming) just for "eye candy".

(2) There should be user-constrainable (prefs.js) settings on *both* max-CPU use and max-network use.  I believe that Opera now has user determined constraints on the network use which is an advantage in my book over Firefox.  When I restore a complex session priority should last-in, first-out (pages most recently accessed should appear first), excepting that an active tab (and secondarily the window it is in) should move to the "head" of the pending network & CPU queues.  Being able to simply constrain session restores to 1 active network connection (DNS lookups and page I/O's) would go a long way towards fixing this problem until a more permanent solution could be developed (If in addition the spinners are "static" on all non-active tabs.)

(3) The third level should be to postpone all non-essential CPU / network activities until the reload is complete.  No "active" plug-ins / extensions / Javascripts *until* all first level page loads are complete.  By first level I mean the top level "essential" URLs to give the user an idea of a page contents -- i.e. text, maybe style sheets, then images.  Once those are all complete you can activate Javascript, extensions, News tickers (RSS feeds), etc.

It should be noted (relating to #'s 2 & 3) that Opera seems to be way ahead of Firefox in these areas in that it seems to now have the concept of "effective" browsing for slow connections (with user control over how this works).  It should be fairly obvious that "slow" connections can also equate to "congested" connections (3G & 4G phone networks come to mind).  If one is browsing google result pages one is largely going to know whether the page is of interest by looking at the first paragraph or two (the same is true for academic abstracts) -- the goal for both active browsing as well as session restores should be to get as much critical information on the screen as fast as possible and leave the bells and whistles as background (as time/bandwidth permits) pursuits.
It would be nice to get some movement on this bug, though if throttling isn't going to be used, then to really fix it properly would probably require threading.
Can someone please use something like Shark or DTrace or CodeAnalyst to figure out where time is *actually* being spent loading the testcase? That's the first step to making progress here.

https://developer.mozilla.org/Profiling_with_AMD_CodeAnalyst
https://developer.mozilla.org/en/Profiling_JavaScript_with_Shark
https://wiki.mozilla.org/Performance/Optimizing_JavaScript_with_DTrace
Whiteboard: [tsnap][ts]
Unfortunately I don't think any of those will run on my machine which is a Windows machine with an Intel processor.
CodeAnalyst should still work.
Blocks: 447581
It may be worth reading Chrome Issues #32061, #32165 and #30933, which deal with the same problem -- namely that session restores pay no attention to the resources they use and the load they place on the system.

My short way to fix this:
1) Disable all Javascript "running" until all pages are loaded.
2) Change the "spinner"/"throbber" from an active icon to a staged set of static gifs (display a different image at different stages of the load process).
3) Set a maximum active thread limit which is constrained by the network bandwidth (e.g. dial-up << DSL << cable << FIOS+).  One could expand this to include a max-CPU limit (Firefox load <= X% (where X=60-80?) of available CPU).

This will prevent CPU time from being wasted on starting network connections, spinners or Javascripts which *will* result in timeouts.  Continuation of the current model -- starting up all windows/tabs/thread and letting them run concurrently is guaranteed to waste CPU/network resources -- until that problem is resolved this is likely to remain problematic.  I regularly produce sessions that take 15+ minutes to restore -- I could easily produce one that would take an hour or two (I've got a lot of swap space on my machine) [1].

One problem with attempting to reproduce this using "non-local" pages is that one has no control over the connection timeout settings on the web servers.  Web servers will timeout (and effectively hangup) web connections which are open but not responding (as will be the case with a busy session restore).  One can imagine that busy web servers set these connection timeouts to shorter periods than less loaded web servers.  But the server managers are free to vary these timeouts on hourly, daily or monthly basis so precise reproduction of this problem is a very unlikely situation.  Using local pages is unlikely to provide the same symptoms as the DNA lookups and page downloads are likely to be very fast and one will saturate the network link for a brief period followed by saturation of the CPU while the pages are redrawn.

There is one interesting difference between chrome and firefox in this area -- a chrome restore is somewhat less likely to saturate the network link, I suspect because the multi-process startup/switching tends to generate some periods where fewer network requests are pending.

1. It is also worth noting that people would care a *lot* less how long a session restore took if there were a concept of a "priority" (active) window/tab which pre-empted all the other windows/tabs/threads.  I don't care if a session restore takes an hour as long as the new window I just created behaves like its in a brand new browser session.
jprof of startup with profile from attachment 279228 [details]; trunk build pulled today.

Not sure I'd categorize it as "severe performance problems" with current builds
A number of performance features have been added since this bug was filed such as progressive tab loading and increased JavaScript performance.  Performance also depends on CPU speed and amount of free RAM.  On lower end machines, I can still bring Firefox trunk loads to a grinding halt.
Whiteboard: [tsnap][ts] → [snappy]
Whiteboard: [snappy] → [Snappy:P1]
Since the most recent comment, we've moved to a model where tabs are not actually loaded until accessed. Michael, can you test with a Nightly build?

Taras: I'm removing the "P1" here because the data from Test Pilot shows that test-cases like this are *far* outside the norm.
Whiteboard: [Snappy:P1] → [Snappy]
(In reply to Dietrich Ayala (:dietrich) from comment #19)
> Since the most recent comment, we've moved to a model where tabs are not
> actually loaded until accessed. Michael, can you test with a Nightly build?
> 
> Taras: I'm removing the "P1" here because the data from Test Pilot shows
> that test-cases like this are *far* outside the norm.

Can we get telemetry probes to confirm this? Test pilot is far from representative of overall population, telemetry is slightly better and will get better once we do it by default on nightlies.
(In reply to Taras Glek (:taras) from comment #20)
> (In reply to Dietrich Ayala (:dietrich) from comment #19)
> > Since the most recent comment, we've moved to a model where tabs are not
> > actually loaded until accessed. Michael, can you test with a Nightly build?
> > 
> > Taras: I'm removing the "P1" here because the data from Test Pilot shows
> > that test-cases like this are *far* outside the norm.
> 
> Can we get telemetry probes to confirm this? Test pilot is far from
> representative of overall population, telemetry is slightly better and will
> get better once we do it by default on nightlies.

Yea, let's figure out what exactly we need and make that happen in bug 671041 (if it makes sense), though I'm wont to believe the Test Pilot numbers Dietrich is talking about.
Dietrich I'd like to keep this as P2 until we have telemetry data that can show otherwise. Marking as P2 because it shouldn't block other work, but it would be nice to have this.
Whiteboard: [Snappy] → [Snappy:P2]
Flags: firefox-backlog?
This is a meta bug that doesn't go into the backlog.
Flags: firefox-backlog? → firefox-backlog-
Taras, David, what do you both think about closing this bug? We restore on demand by default for a while now and since this bug was filed sessionstore also started to load at most three tabs concurrently when restoring a multi-window session. I'm not sure what telemetry measurements exactly we were talking about here but I don't know of any cases where single tabs don't load or show timeout dialogs since we have cascaded restore.
Flags: needinfo?(taras.mozilla)
Flags: needinfo?(dteller)
Whiteboard: [Snappy:P2] → [Snappy:P2] [tracking]
(In reply to Tim Taubert [:ttaubert] from comment #24)
> Taras, David, what do you both think about closing this bug? We restore on
> demand by default for a while now and since this bug was filed sessionstore
> also started to load at most three tabs concurrently when restoring a
> multi-window session. I'm not sure what telemetry measurements exactly we
> were talking about here but I don't know of any cases where single tabs
> don't load or show timeout dialogs since we have cascaded restore.

This is more of a Vladan question.
Flags: needinfo?(vdjeric)
Flags: needinfo?(taras.mozilla)
glandium used to keep very large sessions until recently and he says that he has seen more than 3 or 4 tabs being loaded simultaneously, as well as some tabs not loading at all (with "server not found" errors)

I think this bug is still valid. Can we try reproducing this again?
Flags: needinfo?(vdjeric) → needinfo?(ttaubert)
Ok, let's put this into the backlog and find some time to investigate whether this is still valid and reproducible.
Flags: needinfo?(ttaubert)
Flags: needinfo?(dteller)
Flags: firefox-backlog-
Flags: firefox-backlog+
Whiteboard: [Snappy:P2] [tracking] → [Snappy:P2]
Whiteboard: [Snappy:P2] → [Snappy:P2][fxperf]
I habitually work with hundreds of tabs per session (recently around a thousand, down from ~1,500 a few months ago). 

This is the type of bug to which I'd like to contribute, however from my point of view it'll not make sense until bugs such as this are progressed: 

1381922 - Allow modifying/restoring back-forward history for each tab
<https://bugzilla.mozilla.org/show_bug.cgi?id=1381922>

>  Blocks: Session_managers

– followed by a reasonable period, maybe a few months, for things to form around new (or reintroduced) capabilities. 

Also I'm on Tier-3 FreeBSD so no Gecko Profiler, and so on.
Ehrm.. I'm not sure about what happens when you (I understand?) *want* to load every tab at once

But at least since bug 1345090 landed, even 2000 not-currently-loaded tabs are a breeze.
Ehrm indeed :-) 

Before the 2017-08-08 release of Firefox 55 my sessions – with no greater than 54.x (I'm certain, because it's pretty much releases-only on FreeBSD) – were around 500 tabs. 

With my choices of extensions, sessions with _too many_ more (than 500 tabs) were memorably impractical. 

I never imagined finding a need to share this type of screenshot :-) – and I simply haven't got around to weeding these oldest sessions – but for posterity, here's a Session Manager set with 477 tabs across ten windows, the July before 55 …
(In reply to mirh from comment #29)

> … even 2000 not-currently-loaded tabs are a breeze.

Yes and no, YMMV. 

Inarguably, 55 then 57 brought the greatest leaps – most noticeable when _extension-free_ but that is, for me, non-realistic. (So, off-topic, I'm with Waterfox for now.) 

Fast forward to 60. When I last spent some time experimenting with something approaching a realistic set of extensions, things were (for me) not close enough to breezy with ~1,050-tab sessions. 

tl;dr 60 seemed somehow more of a drag than 59 but I couldn't easily get a hands-on sense of why that might be (and here's not the place to go into detail). That was three weeks ago. 

----

Partly off-topic, here's that point from three weeks ago: https://www.reddit.com/r/waterfox/comments/8iyfef/-/ ONLY if anyone would like to chime in, on Reddit, with advice on how best to reuse, in Firefox 60, a session that's written by Waterfox 56.2.x.
If you are handling sessions in that way, then I'm 99% sure you should follow bug 1427928, and ask the corresponding extensions developer to check for lazily loading of tabs, best current practices and whatnot. 

I believe "SessionStore" itself is fine, if when instead I use built-in save mechanism I haven't particular problems. 
And probably this decennial issue has run its course.
Blocks: ss-SM
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.