Closed Bug 396375 Opened 17 years ago Closed 16 years ago

Firefox restart consumes 100% of CPU for an hour or more.

Categories

(Firefox :: General, defect)

x86
Linux
defect
Not set
major

Tracking

()

VERIFIED INVALID

People

(Reporter: robert.bradbury, Unassigned)

References

Details

(Keywords: perf)

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en; rv:1.8.1.6) Gecko/20070801 Epiphany/2.18 Firefox/2.0.0.6
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en; rv:1.8.1.6) Gecko/20070801 Epiphany/2.18 Firefox/2.0.0.6

I restarted Firefox today, and it reloaded the 67+ windows with 400+ tabs.  Under normal conditions I expect the restart to take 15+ minutes (because lord knows, the developers don't care about efficiency -- or the concept that the pages might be prioritized based on the one which is the forefront window on your desktop).  Firefox has now consumed 1 hour and 11 minutes of CPU time.  And it may finally be starting to behave normally.  So it may "get er done".  But its not anything I'd write home about being a positive experience.

The most interesting question is whether the startup process was delayed by an hour or more due to interactions with NoScript and vendors attempting to shove music down ones throat?  I will be attaching the transcript from the console to this and I would like people who understand this to explain what is going on.  But I would also like to comment that given the typical windows user (or even Linux user) who may not have access to the console logs -- are clueless about what Firefox may or may not be doing on their machine. 

Reproducible: Sometimes

Steps to Reproduce:
1. Restart a complex firefox session, measure the real time and CPU time it takes.

Actual Results:  
It can take an hour or more and various shuffling of windows to get a functional Firefox.

Expected Results:  
Firefox should come up immediately.  Any new firefox window should have top priority.  It should not take an hour or more to produce a "functional" Firefox.  The reason I'm filing this bug report using Epiphany is because I do not know whether Firefox is usable or trustable.
There seem to be three separate problems here:

1) Session store takes 15 minutes to get you a usable browser window when you have 67+ windows with 400+ tabs.  There's already a metabug on this issue, bug 394492.

2) Once, it took 71 minutes instead of 15 minutes.  There's not much information to go on here, especially if you can't reproduce.  If you see this kind of thing again, try using a "sampling" tool to find out which functions Firefox is hanging in.

For both of these problems, I suggest not keeping so many windows and tabs open :)

3) Vague paranoia about "not knowing what Firefox is doing".

I'm not sure what you want us to here.  Is it the hanging that made you worried or something else?
Jesse, I understand and will need to read the bug report you refer to.  I do have the saved session and suspect the problem may be related to one of the indirect sites associated with the session restart trying to shove an mp3 down my throat.  The console log clearly indicated an attempt to start a plug-in for the mp3 and the Firefox cpu usage only declined after I explictly killed the sub-process.  (There were broken pipe messages that followed on the console.)  I'm thinking "run amok" firefox processes shoving data into the mp3 player (totem?).  I do not understand this at all given that I've got NoScript active and so in theory one should not be able to shove mp3's into my system -- but I'm naive with respect to these topics.

Unfortunately, I've rebooted the machine since then and have to attempt to recreate this to get the precise console logs.

With respect to running 60-100 windows and 400+ tabs -- big minds require big spaces to operate in.  The software should be designed to handle such.  My system reboot had been after the system had been up for over a month.  Firefox should be able to match that.
I'm not familiar with the method NoScript uses to disable plugins, but you might be interested in knowing that Firefox 3 lets you disable plugins through the add-on manager window.

I think I've seen bug reports asking for "let me see which plugins are using CPU/memory" or "let me see which tabs are using CPU/memory", but I can't find them now.  Bug 269685 is close.  Either of those would be hard to implement, though; it's easier to detect that kind of thing from an external application such as a sampler.  (And usually, bug reports are about a specific web page rather than a collection of 400 web pages trying to open at the same time, so it's not as hard to create a reduced testcase.)

If you still have problems after bug 394492 is fixed, they'll probably be easier to track down simply because Firefox won't be trying to load all those pages at once.
Depends on: 394492
Keywords: perf
Hello gentlemen,

I may have a clue as to what is causing this bug. In short, I note that in the failure mode, the process timeGetSystemTime appears to be thrashing with another thread (i.e. with excessive context switching). But first, some background...

I regularly run into Firefox hogging CPU resources to the point that Firefox--and sometimes the overall system--becomes unusable. A lesser manifestation of this occurs when there are still a few MIPS left for other processes to keep running. The trouble becomes acute when Firefox CPU usage is in the 95 to 100 percent range, and it is SUSTAINED until I KILL Firefox or close nearly all of it's windows.

There are actually two facets of the problem: One, and the main one that led me to your posts, is that Firefox becomes a SEVERE CPU HOG. The other is that Firefox retains excessive memory. I'm not sure if the memory hogging is related to CPU hogging, but I see a clue that it is (I'll get to that in a minute).

I am a HEAVY user of windows, and when Firefox hasn't gone into the crippled state I can be very productive. I am talking upwards of 45 windows open simultaneously, but with no tabs. I used to have as many as 90 Firefox windows open, but now I use Internet Explorer to share the "window burden" (that is, I try to limit Firefox to 45 windows, and will run IE up to a maximum of 30 windows or so). That generally works better than just using Firefox alone with so many windows.

Another thing I do that may be unusual is that I leave Firefox (and other processes) running for days at a time. I put my PC into "suspend" mode at night so I won't have to spend tons of time reestablishing my working environment every day.

I use the free Windows tool Process Explorer ("PE"), which, if you don't know about this is a very useful tool for identifying problems like the one I am documenting now (for the tool set, go to Microsoft's site and search for Systinternals).

Anyway, when my PC gets slow, I hover over the PE monitor icon in my Systray. When I discover that Firefox has gone CPU wild (e.g. 95 ~ 100%), I open up Firefox's Properties/Threads in PE. There, you can see that the CPU is thrashing between two threads:

timeGetSystemTime 
jpeg_fdct_islow.

I don't know much about the latter function, but here's the quick fix I've found that INSTANTLY RESTORES normal operation to Firefox:

Suspend the timeGetSystemTime thread.

If you do that using PE, the CPU usage instantly and magically drops to a low, single digit number, and Firefox and the rest of the system become usable again. I can work fine after suspense of the time function. The only other research I've done beyond this regarding the time function is that the timeGetSystemTime function has more overhead than the  comparable function timeGetTime (ref: http://www.drbob42.com/delphi/perform.htm).

Back to the the memory leak aspect...
Upon CPU hog state I had tried previously closing windows down, one at a time, until CPU power was "de-hogged." For that to happen, I would have to close all but say, 16 windows, at which point the CPU usage abruptly dives to a reasonable level. However, while I watched memory use gradually decline too (again, via PE)--and even if I closed ALL BUT ONE Firefox window--Firefox would still retain too much memory.

So now, what I do when Firefox goes CPU nuts is:
1) Suspend the timeGetSystemTime thread via PE,
2) Keep working until Firefox usage grows way too big (e.g. 700MB) or the system otherwise isn't working well,
3) Close down Firefox windows I feel I don't really need to reestablish my core working environment,
4) Wait a couple of minutes so that Firefox won't "remember" to reopen the extra windows you just closed! (i.e. after the next kill / restart steps)
5) KILL Firefox,
6) Restart Firefox

Hope this info helps...

Jaime


Jaime, let us try to be clear about this problem.  Since you appear to be working under Windows and I am working under Linux that may be difficult.

I believe there are two different problem conditions under discussion.
1. CPU usage and slow response when restarting large Firefox sessions.
   (And that is the topic of *this* bug #396375).
2. CPU usage and/or memory usage of general Firefox sessions.  This is an ongoing problem.  My current Firefox session has 47 windows, 382 tabs, and between Firefox and X, I'm consuming 40-50% of the available CPU time on a Pentium IV Prescott.  But if I'm reading Jaimie's messages properly, he is having the same problems under Windows.

But for the sake of clarity let us keep these problems separate.
1) There is a restart CPU consumption problem.
2) There is a ongoing runtime CPU consumption problem.


If you wish to identify or file an "ongoing CPU consumption" bug report [1] I would be happy to subscribe to that as well.


1.I would be surprised if there is not already an ongoing bug on this topic since it has existed for years.
(In reply to comment #6)
> 2. CPU usage and/or memory usage of general Firefox sessions. This is an ongoing problem.

FYI.
Bug 389620 is an example of such kind of issues.
For the sake of clarity, let us be clear.

It does not appear that Bug 389620 related to this.  That appears to be much more of a memory allocation/consumption problem.

Now, mind you Firefox does have a severe in/out of main-memory consumption problem, which on my machine I can manage to constraining the Firefox process to something less than 70% of main memory and something less than 60% of the CPU time.

This is a specific bug related to the CPU consumption during a Firefox restart (necessitated because excessive CPU/memory consumption over time requires such).

There are two other distinct bugs
1) Excessive memory use in Firefox (which has a very long history).
2) Excessive CPU consumption by an active Firefox program, but largely inactive threads.

When one has NoScript in operation (so Internet scammers & corporate marking bots cannot be using ones CPU cycles) -- one should not be running at a 30-50% CPU cosumption by Firefox+X.  And that is what I routinely run when I am running 40+ windows and 400+ tabs.
It is quite possible that this is the same bug documented, with a test case, as Bug #413390.  A firefox restart will attempt to open a lot of windows at the same time.  Bug #413390 does the same when INTERVAL is set too low in the shell script.

There is another problem (but it doesn't eat CPU time) when one of the windows being reloaded creates a pop-up requesting some kind of approval (POSTDATA, unrecognized security codes, etc.) and it gets buried under other windows.  But I do not believe that involves "maxed" CPU usage.
Yeah, when you're trying to load a webpage that doesn't exist (like aeiveos.com), it can take a while for the system to return the dns results as it struggles to get that information.
Now, now, now.  aeiveos.com is a valid web address (its currently 68.239.4.70).  There are two problems (a) its web address changes every few weeks because Verizon doesn't allow you to pin down one of their dynamic IP addresses as a static IP address; and (b) Verizon filters HTTP traffic on port 80 unless you are willing to pay them for a commercial DSL line.

Pings to www.aeiveos.com should work fine unless the machine is down for maintenance or it is in one of the time windows when the dynamic IP address is being reallocated and the upper level DNS caches need to be updated.  HTTP access to www.aeiveos.com should work for external requests if one uses port 8080 rather than port 80.  Verizon doesn't (or at least didn't block) port 8080 and since the robots seem to be accessing pages from the apache server without difficulty I'll assume it is still unblocked.
right now, aeiveos.com consistently just times out.

You've got quite a testcase on your hands - 400 URLs to pick from. Without to ability within FF to see what tab is sucking up the CPU, do you have figured out a methodology yet to narrow down the field from 400 to a handful?

Have you ever copied off your session state, and when one of your restarts went wild, were you able produce the same results a second time by using the copied session state?

Also, with so many URLs, and not reproducing the problem using trunk, it seems to me the likelihood that you are hitting at least several bad bugs that are already fixed is very great.

In other words, if you were to address both of the above ideas by proving there is a reproducible session state that occurs on trunk, then you might attract some interest to your issue.
Attached is an intermediate sized session file for a typical Firefox session after a week or two of active use.  It has 54 windows and 351 tabs [1].  It is not unusual for me to run 70+ windows and 700+ tabs.  It takes 12-15 minutes of 100% CPU usage to restart this file under Firefox 3.0pre (compiled from CVS 29 Mar 2008) on a Pentium 4 Prescott with nothing else going on under Linux.

After it has finally completed loading all of the tabs, it still imposes a 30% CPU usage on the machine (i.e. Firefox shows up as the top CPU user (in top) consuming anywhere from 30-35% of the CPU).  This is in spite of the fact that *all* firefox windows, except the one being used to file this bug report are minimized under Gnome.

It should be noted that this bug has been expanded a bit in Bug #402768

1. Note the original sessionstore size had to be trimmed a bit to fit in the 300KB attachment limit.
In all seriousness, and speaking from a purely unofficial capacity, I doubt this is an issue that is high on the list currently or in the near future. You seem to fail to grasp what resuming such a session entails. When you resume a session that large, you're requesting several hundred megabytes of data from potentially several hundred servers. First you have a limited bandwidth on your end, with a limited number of connections per server, and globally in Firefox. Further, Firefox must manage all this incoming data, parse it, render it, etc. This will consume significant amounts of CPU time just as if you were recalculating millions of interconnected cells in a spreadsheet, creating a detailed 3D image render, compiling a large program (such as Firefox, KDE, etc.), encoding/transcoding large videos, or other type tasks.

In short, you're asking Firefox to sidestep the limits placed upon it by your processor, RAM, hard disk, and Internet connection, and somehow recreate gigabytes of data in unrealistic timeframes. Each tab places upon a browser, any browser, almost the same demands as a new window (less UI). Imagine recreating 400 browser windows on startup, or 400 PDFs, or a thousand word documents. I think your big mind is failing to grasp the size of the request, and the limits your computer and connection (and the internet as a whole) put on the fulfillment of that request.

Resolving INVALID because it's irrational and unrealistic at the current state of computing ability. Feel free to revisit this in 5 years when we have 32 cores, terabytes of RAM, and multi gigabit connections.
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
Btw, 12 minutes / 343 tabs = 2 seconds per tab.  That seems like what I'd expect.
Just in case Grey Hodge was unclear.  Saving a session doesn't actually SAVE the complete current state of your Firefox memory and cache and the rendered pages.  It just saves a list of what URLs are listed in which tab and what the history of each tab is.  When you restore, Firefox can retrieve that data and bring you quickly back to where you were when you closed.  Thus reopening a session is a lot of work as Firefox has to rerender each tab, each tab has to be reprocessed by relevant extensions etc etc etc.  The browser also has to process all the directives it gets when it loads pages, like pages that offer downloads or pages that spawn new windows and then those new pages have to be processed .  This is particularly bad if you have extensions that act on the whole page and every page like NoScript and Greasemonkey and Firebug.

While Firefox may take couple seconds when loading a page normally and so when you built up those 400 tabs you used up 800 seconds that you didn't even notice.  However when you load all 400 tabs at once, it's like compressing all those 2 second waits into one massive computer-blowing chunk.
Grey/Jesse/CCW, Please read comment #6 under Bug #402768.  There I explain is is *not* network or disk bound.  Nor should it be if the Cache is working reasonably well.  This is entirely a CPU usage problem -- it is 10-15 minutes, of *CPU* time.  Having NoScript active significantly reduces the amount of CPU time consumed, so I assume the problem is in part due to unfettered Javascripts during the restore process.

So, one possible solution would be to disable all Javascript activity until all pages are completely loaded, i.e. all Javascript's are stuck on a "to be activated" list until *after* all non-Javascript pages are completely redrawn.  The problem with this is that the limited number of sites I typically allow to use Javascript (primarily gmail.com and amazon.com use it reasonably).

More importantly, Firefox cannot effectively be used *while* all the background windows/tabs are loading (due to the excessive CPU usage).  The "window (tab) on top" should always be positioned/scheduled so it preempts all other windows and tabs.  That means one could still use the browser while all the background reloading is taking place.  For this to work Firefox really needs its own scheduler (which would be useful from the perspective of being able to stop the activity of specific tabs) [1].  Given improper and excessive use of Javascript (See bug #413390 for an example of how to hang Firefox by enabling Javascript) it seems reasonable to provide an option which prevents it from working at all during a session reload.

1. I want to be able to control Firefox windows/tabs the way the Gnome System Monitor allows me to control processes.  This will become an ongoing and increasing problem as browsers become increasingly like operating systems.
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
(In reply to comment #17)
> Grey/Jesse/CCW, Please read comment #6 under Bug #402768.  There I explain is
> is *not* network or disk bound.  Nor should it be if the Cache is working
> reasonably well.  This is entirely a CPU usage problem -- it is 10-15 minutes,
> of *CPU* time.  Having NoScript active significantly reduces the amount of CPU
> time consumed, 

please quantify
I run the GNOME system monitor in my top panel tracking CPU, MEMORY, NETWORK, SWAP and DISK usage.  These are normally all very low unless I'm doing system recompiles or Firefox is reloading a session.  If I am reloading a very old, complex session, I may get a fairly high level of network (DSL) activity for the first 5-10 minutes of a restore (presumably the cache is old and/or incomplete).  But the network activity will eventually drop to zero and the CPU activity both when then network is active and subsequently will remain pegged at 90-100%.  Using "top" my top two processes will be firefox-bin and X, generally flipping back and forth with each one consuming 30-40% of the CPU.  If I am loading a session without NoScript (as I did yesterday, which was the session I submitted as an attachment), there was much more firefox-bin CPU usage, and once all the sessions were loaded Firefox remained consuming 30-35% of the CPU.  Today on the other hand, I'm running 3.0pre *with* NoScript on the same session with a few more windows and firefox-bin is consuming less than 5% of the CPU when it is idle.

My general experience is that during large session reloads, the network activity will cease long before the 100% CPU usage does.  This is consistent with the fact that if I page through all the windows/tabs during the 100% CPU usage phase I will have lots of them with titles (indicating they have at least started to parse the basic HTML page), but the rest of the page will be blank and the spinners indicating that browser is "working" will be spinning.  The bar indicating the fraction of the page that is complete will also usually be someplace in the middle of its range.
(In reply to comment #18)
> Having NoScript active significantly reduces the amount of CPU time consumed, 
> 
> please quantify

my quantify request (for time, %, amount, numbers, etc) is specific to the change in behavior you see by using noscript. Specifically (but feel free to add anything you feel pertinent):

1. what is the effect on how long it takes to recover
2. over that time period, what is the average cpu usage compared to without noscript
3. cpu time (number of seconds) for both, with and without
You have the attachment and can run the test yourself.  It might be good to run it once to get the pages in the cache, then run it once w/o NoScript, then run it with NoScript.  On my machine it is definitely 10+ minutes of 100% CPU usage (presumably split between firefox-bin & X) -- its a roughly even split w/o NoScript.  With NoScript it is significantly faster (perhaps 15-20% of real time without NoScript) -- so much so that I was moderately shocked.  I do not believe it was that much faster in 2.0.  But that is why I would argue either (a) disabling Javascript entirely during session restores (or giving the user a setting that allows doing so) or pushing any previously opened pages (not pages opened during the restore by the user) which require Javascript to the end of the window/tab activity queue.

The rules should be (a) restore the open page in front of my face first; (b) restore the least costly pages next, i.e. smallest, fewest images, etc.; (c) restore the most complicated, esp. those with large #'s of Javascripts last.

Sorry I can't be more helpful but as I'm generally used to these restores taking 10-15 minutes I wasn't paying much attention to the with vs. without NoScript situation (I think 3.0pre says it doesn't even work with NoScript so I wasn't expecting a difference).
One only needs to test Trunk behavior. 
1. almost any possible fix is going to be based on trunk
2. What happens in 2.0 is only helpful if for some reason 2.0 is better than trunk at some subset of behavior.

I don't plan to recreate any of this, as the restore doesn't especially bother me on any of my 4 machines, even with large restores (10-20 windows each with 5-15 tabs).  UI is responsive
Sir, I was referring to YOUR bandwidth being the limiting factor. Not to mention the number of server connections I mentioned, and other factors. This bug is far too broad for anyone to take apart and debug. This is at least half a dozen actual bugs rolled into one. We don't work that way. Each issue needs filed separately for any chance of it being tackled. Further, 15 minutes to restart several hundred tabs isn't terribly unreasonable when opening several hundred tabs at once.

Re-resolving as invalid. Please feel free to boil this down into a set of individual bugs rather than broad "stuff doesn't work well" bugs which will never be touched anyway.
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → INVALID
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: