Closed Bug 458592 Opened 16 years ago Closed 12 years ago

Firefox consumes 90+% of CPU doing nothing

Categories

(Core :: Networking, defect)

1.9.0 Branch
x86
Linux
defect
Not set
major

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: robert.bradbury, Unassigned)

References

()

Details

(Keywords: perf, Whiteboard: [needs retest with trunk][dupme])

Attachments

(9 files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3pre) Gecko/2008091617 Firefox/3.0.3pre Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3pre) Gecko/2008091617 Firefox/3.0.3pre Large firefox sessions (25-53 windows; 324-553 tabs) cause the CPU to become 80-100% active. Over 12 hours of Linux uptime, 2 firefox-3.0.3pre sessions consume nearly 7 hours of CPU time doing extremely little (the system was up overnight and almost all firefox windows were minimized). X consumes another 4 hours of CPU time (there were a few VTs running) but the primary X programs were firefox). The Gnome System-monitor typically shows 2 firefox programs consuming ~25% of the CPU each with ~33% being consumed by X. The CPU usage graph indicates ~60% of the CPU is user-time with ~40% system time. Interestingly enough, if one runs program compilation processes (even at nice -19), the "niced" programs will out-compete the firefox/X processes and consume 30-60% of the CPU. This is on a Linux 2.6.26-gentoo-r1 system configured with CONFIG_HIGH_RES_TIMERS=y. It is useful to note that the Intel "powertop" utility is reporting that CPU wakeups are dominated by firefox in the "futex_wait (hrtimer_wakeup)" system processes. I have looked at the kernel code, but it is unclear to me *what* firefox system calls (poll(), select(), thread switching, etc.) may be leading to this behavior. This is *important* because with Firefox exhibiting this behavior, it is difficult to impossible for the CPU to enter power-saving modes (implying that Firefox may be responsible for wasteful consumption of millions (billions?)) of watts per year. A year or more ago (Firefox-2.X) I did not observe this behavior. It may be a combination of (a) including CONFIG_HIGH_RES_TIMERS in linux (for me as of linux-2.6.23); (b) Changes to Firefox 3.0?; (c) somewhat more Javascript usage in some of the tabs in these Firefox sessions. I normally run with Noscript, only enabling Javascript for safe sites (certain academic sites & gmail). It should be noted that in the larger session cited above, the CPU usage becomes so excessive that Gmail essentially becomes unusable and there are extremely lengthy delays bringing up new windows/tabs. Reproducible: Always Steps to Reproduce: 1. Start Firefox with a large session (it is easy to reproduce these by simply opening all of the sites pointed to from the first 5-10 pages of digg.com) But the sessions I am using will be attached and can be used as alternatives. 2. Watch the gnome system-monitor (or top) for high CPU usage. One can also use ps to watch CPU accumulation on firefox-bin & X. 3. A sure sign of the problem is when the gnome system-monitor in ones top panel indicates CPU usage is ~100% (even with all the windows minimized). 4. Alternatively, one can run Intel's powertop (which is a package under Gentoo and some of the other Linux versions) and see firefox causing 30+% of the wakeups (in futex_wait (hrtime_wakeup) if one has the high resolution timer option enabled). Actual Results: Firefox (and X) eat up all the CPU time, prevent the CPU from going into power-saving modes and negatively impact performance of Firefox as well as other programs. Expected Results: Firefox, esp. with all windows minimized, should consume minimal CPU time. It should be noted that by using the system-monitor to "stop" the mal-behaved firefox processes one will see the CPU usage drop significantly into the range one would expect on an unloaded system. So the *problem* is with Firefox. Listed in preceding text. Additional information (sessions & CPU usage traces will be attached).
15 minute trace of 1 minute "slots" of top CPU users (including ~50 window w/ ~550 tab) firefox session that has been up for perhaps a week or more. Galeon is being used for Gmail because the firefox session is unusable for gmail (gmail is a very heavy Javascript application). Ignore the top claim that there are 39 users (that may simply be the number of gnome-terminal processes running (doing nothing). This is a single user desktop (Pentium IV Prescott 2.8 GHz w/ 1.5GB memory).
This is a trace similar to the previous trace with the exception that *all* Firefox windows in all workspaces are minimized. This has little if any impact on CPU usage. To provide maximal user responsiveness combined with minimal power consumption Firefox should be "down prioritizing" both X-windows (tab) updates as well as Javascript threads for "minimized" windows. These priorities should be shifted back to "high priority" when a minimized window is brought to the foreground (top) application on the screen. Given the current emphasis on "green" applications (and the effort Intel and AMD are putting into low power CPUs) it is a pointless effort if the applications do not adopt the same mind set. This machine serves as a very low activity web server so it as well as a desktop. So it is running a low bandwidth apache2 server as well a Firefox, OpenOffice, gnome-terminal, etc. sessions across several workspaces which may be active for weeks or even months (if they are reliable enough not to crash). In theory the machine should be able to throttle back the CPU much of the time. Currently firefox is at the top of the list preventing that.
Large firefox session that causes excessive CPU consumption. This session has 53 windows and 553 tabs. Although a large session, I have run sessions as large as 91 windows and 680 tabs. Firefox becomes pretty unresponsive and X starts consuming lots of CPU time at that point. It should be noted that lots of these tabs contain pointers to NCBI/NLM abstracts. NCBI is a "light" javascript user in that Javascript must be enabled to take advantage of some features. The excessive CPU consumption generated by this site may not be present if the session is started with Javascript completely disabled. I am not sure I tested this due to the long time (perhaps 15+ minutes) it takes to restart this session (yet another bug). I believe I have filed previous bug reports about excessive Javascript usage (even for inactive windows) dragging the CPU into the ground. I believe one can do this by simply starting dozens of home pages of a typical Javascript using site (e.g. www.nytimes.com).
This is an intermediate sized firefox session (25 windows/324 tabs) with many sites used for genomics research studies, some of which require Javascript and/or Java so these sites may be enabled in Noscript. A session this size is still "reasonably" responsive (i.e. you can start new windows & tabs & switch between them rin a reasonably timely fashion). This is even when the other "large" Firefox session is active (2 firefox processes are running on the machine with different profiles). The larger session (~2x larger) is relatively unresponsive. It is probably possible to use gmail in the medium session but using it in the larger session is quite painful (in terms of waiting and waiting and waiting).
do you see this problem with current beta of 3.1? http://www.mozilla.com/en-US/firefox/all-beta.html
Keywords: perf
I haven't compiled 3.1 yet but if the way Firefox handles polling its open sockets / channels hasn't changed I doubt it will be fixed. The problem seems to be associated with issusing a poll() call on 8-12 sockets/channels which seems to return either immediately or almost immediately, then it does a few gettimeofday() calls (usually 2 or 3) and immediately goes back into polling. I think I've seen a very strange error coming back when it attempts to read some of the sockets (perhaps EAGAIN?). It is common for me to have a single firefox session up for 1-2 weeks and I am wondering what happens if the server on the other end of a socket gets rebooted? Aren't you going to have a socket that will always return errors? If so then Firefox needs to detect such errors and remove that socket from the list of sockets being polled. It would also be nice if there were a way of knowing to what sites there are open channels because I don't think there is a way to get this information from /proc/#/fd. All one seems to be able to tell is whether you have a pipe or a socket not where they go. And finally, there should way to set the timeout on polling. I believe it is currently set to 100ms and people might want to allow a much higher value to allow systems to go into power-saving mode. In fact the poll timeout value should be "scalable" such that if I leave firefox up for an evening or a weekend, it keeps extending the poll time until such time as I return and activate a firefox window. That would allow the OS to implement the maximum power saving capabilities. A final possibility could involve "ftp:" connections -- it is not unusual for me to have one or more and ftp: connections and those servers that I use will terminate the session after a few minutes of idle time. If anyone works on this please document what files are involved. My current impression is that it is "nsprpub/pr/src/md/unix/uxwrap.c" but it might also be "nsprpub/pr/src/pthreads/ptio.c" or a couple of other possibilities (the gtk library, etc.). Now, one of the more interesting aspects of this problem is the Firefox CPU usage isn't "locked", I can run a "niced" build (emerge under Gentoo Linux) and the CPU time will move from Firefox to the niced processes. When the build is finished the CPU time moves back to Firefox (which is not "niced"). I think this may be a consequence of Firefox issuing so many system calls which may allow other processes even niced processes to grab the CPU. However the fact that Firefox will consume most of the available CPU does make the system "sluggish".
Some examples of stack traces in Firefox poll() calls -- these were done during a firefox "startup" and may not be directly related to this bug. They however provide some background for thinking about when and where poll() is called and what it does.
Stack traces of gettimeofday() calls which Firefox appears to do all too frequently after a poll() returns. These too were done during a Firefox startup and may not be directly related to this bug. They serve as a reference for people to think about when and why gettimeofday() is called (which is an inexpensive system call but still consumes CPU time going from user mode into system mode and back) when a better strategy may be to setup a timer within the Firefox memory space which keeps the time of day (bearing in mind that one does not want to consume CPU incrementing a timer when Firefox has been idle for 3+ hours). To my mind there is very little outside of real time video/audio feeds that needs to be monitored with 100ms accuracy.
I hope you didn't go through all that trouble, and not use 3.1beta or 3.2 alpha. Also, if you are convinced this is a networking issue, then a better starting point for this is core+networking, not FF+general. it will get almost no attention here in general
Refer forward to bug #481678 which provides further discussion regarding the poll() / gettimeofday() problems.
Component: General → Networking
Product: Firefox → Core
QA Contact: general → networking
Version: unspecified → 1.9.0 Branch
Wayne, I generally run 1-2 months behind current releases. In part this is due to when Gentoo (Linux) releases such upgrades and in part due to when/if I decide to compile a more recent upgrade from the firefox sources. As explained previously I may often run a version of Firefox for 1-2 weeks -- so I *will not* be upgrading on a nightly basis. (In large part due to the extensive time Firefox takes to restart a complex session (15+ minutes of CPU time) which I have previously documented.) And I reject your comment regarding the 3.1 beta / 3.2 alpha (in part because I haven't them run time and in part because you provide me with zero evidence that the UNIX/Linux files that manage the poll()/gettimeofday() files have changed). If the problem is in the OS interface and you haven't changed that then I have no reason to believe that your changing the javascript system (which is what has been promoted with respect to 3.x) will make any any significant difference! In large part because I have Noscript installed and most of my windows *are* *not* running Javascript. So any changes you make in that forum mean zilch to me.
Ray, is this bug a reincarnation of bug 396375, which is closed invalid? if not, then ... gavin perhaps can set us on a productive path, and also offer perspective on whether getttime or poll are realistic sources of trouble, or related to another bug. Also, if what you assert about getttime and poll were true to the extent that you see it, then everyone would have a problem. So it's very hard to buy the argument that something is drastically wrong with these functions or the use of them. Again, I'm far more inclined to suspect discrete problems that you are resistant to peruse (bug 481678 comment 7). Also, session restore can have perf problems after restore that don't take long to show, and didn't exist prior to restore. So we shouldn't assume that someone else loading your large sessions is a productive path.
p.s. you have tested this started safe mode, correct?
I generally try to run firefox with --sync. I have not recently seen the error but I'm running with up-to-date gtk libraries which may contain my modification to gdk/gdkwindow.c:gdk_window_new() where I changed if (parent != NULL) --> if (parent != NULL && window != NULL) (line 377 in gdk+-2.14.3) that seemed to eliminate the segmentation violations, though perhaps not the "window unexpectedly destroyed" problem. I do not know what "safe" mode is. I am currently running a limited strace on firefox looking at open(), poll() and bind() calls -- which produces a very large file after a while, in large part due to the neverending stream of poll()s. I have confirmed that gmail definitely opens an additional channel that gets poll()ed. The problem is that I don't know what the URLs are for the other 11 channels being polled. Tracing bind() doesn't help. What I *really* need is somebody to tell me what firefox source function is handed the URL for parsing so I can setup a gdb trace on that function and watch the socket() calls that follow so I can make a stab at matching the URL's to the file handles socket() returns to the file descriptor arguments to poll() -- then I could produce the minimal test case that people seem to want.
Depends on: 477850
This bug IS still present in firefox 3.6.12 running under Ubuntu 10.10. I will attach an archive of files taken for a Firefox session running on an HP Pavilion 512n with 380 MB of memory (~5 yr. old machine) with a Celeron running at 1.4 GHz processor. Take a look at the 5 second strace file included in the archive. It is still poll-ing/gettimeofday-ing endlessly wasting CPU time in the process. This is with NoScript, Adblock and Flashblock installed, 10 windows & 63 tabs open (about the max one can run on this machine before swapping causes firefox performance to degrade to uselessness). Gmail is not running, nor are an flash videos playing. All firefox windows are minimized so firefox *should* be doing NOTHING. I will assert that you will NOT fix this problem until you investigate how the poll calls are set up in gtk/gdk/glib and change the *default* system library poll() function (the libraries have hooks to allow this) and change the timeout values on the various channels being polled to reflect the amount of actual user use of the browser (not the use the "server" might like to impose). If I go to bed for 6-8 hours and leave the session running it should accumulate minimal CPU time (all the windows are idle). The longer I leave a session "idle" the longer one should see between system calls that do *nothing*. I believe it is not true that flash is the only user of gettimeofday calls. I believe they are also used by the gdk/gtk/glib poll functions. If however a majority of the gettimeofday calls *are* being issued by non-executing flash plugins then a way should be found to make flash a separate process (as is the case with chrome). That way one can run background process that periodically kill CPU hogging flash activities. The current problem with the firefox architecture is that due to its multi-threaded single-process nature it is difficult to determine the responsible parties without a lot of complex debugging. I believe this goes back to a bug I filed long ago which requested an internal firefox "ps" with Thread-id/cpu-use/memory-use/URL matching capability. This is found to some extent in chrome's Task Manager capability but even that doesn't go far enough in terms of determining what causes browser performance (CPU | swapping) issues. Note: the archive attached does not display the 90% problem per se though it does display using ~3.5 hours of CPU time in 21 hours of runtime (at least 8-10 hours of which the browser was not being used). The situation involving 90+% use of the CPU typically involves several dozen windows and several hundred tabs and cannot be achieved on the current hardware without producing excessive swapping (which prevents CPU accumulation). Simple step to change the status of this bug from UNCONFIRMED to CONFIRMED -- start a moderately simple firefox under Linux, leave it idle for minutes or hours, then run strace on it and observe the system calls which *do nothing*.
(In reply to comment #19) > If however a majority of the gettimeofday calls *are* being issued by > non-executing flash plugins then a way should be found to make flash a separate > process (as is the case with chrome). Plugins are already running in separate processes. http://kb.mozillazine.org/Plugin-container_and_out-of-process_plugins
Further note as pertains to this bug and Bug #504990. I do have a rather complex Seamonkey 2.0.10 session running on another machine. An extended ps is: PID PPID UID %CPU %MEM SZ RSS CLS SCH TIME ELAPSED F NI RTPRIO S WCHAN COMMAND CMD 7780 7776 0 26.5 8.0 446436 248880 TS 0 5-20:01:18 21-23:52:59 4 0 - S futex_ seamonkey-bin /usr/lib/seamonkey/seamonkey-bin --sync As can be seen, the process has been running for nearly 22 days and has consumed almost 6 days of CPU time. Yet the process has not been touched (I'm in a different state) in almost 20 days. While the process has no apparent Flash children it does have an acroread child (which in theory should be displaying a "static" document). In these cases seamonkey should have detected that it was idle and stopped the acroread process from accumulating ~18 hours of CPU time. Though one could argue that Seamonkey is consuming CPU in an idle state at 5+x the rate of Acroread.
Technical details. It would appear (IMO) that noone has really bothered to strace Firefox/Seamonkey and ask what are all these system calls doing and are they really necessary? In particular with respect to polling. The default poll function is g_poll (in glib) which directly calls the system poll() function. The source file is glib/gpoll.c It is possible to enable a more complex polling system by replacing the g_poll function with one, perhaps calling ppoll, using the function g_main_context_set_poll_func() which is defined in glib/gmain.c. It would be suggested that people diagnosing this problem compile a debug glib with G_MAIN_POLL_DEBUG and setup the associated environment variable to collect insight into what polling is being done. The proper solution involves lengthening timeout periods as the browser is increasingly idle -- one has to ignore "noise" from servers / plugins. The only thing which should cause browser to resume normal operations is activity in minimized windows. Ideally polls should be tuned to the requirement of each specific file descriptor -- and a user should be able to shut down undesired server push content. The general perspective is that "its my browser, my machine and my electricity" and it should be doing what I want -- which means not wasting CPU cycles or electricity.
I am still seeing this problem with Firefox 4 (beta 9) on Ubuntu 10.10. I ran Firefox from a script that redirects STDERR to a file, and there were zillions of messages from Acroread -- including over a million instances of GLib-GObject-WARNING **: invalid uninstantiatable type `(null)' in cast to `GtkObject' I note that I'm running an old Acroread (8.1.3). I'm going to try upgrading to the current available Acroread (9.4.1-1maverick1) and see if that improves things.
I am having similar issue on 32-bit Windows XP Pro and versions v7 & v8 of Firefox. CPU usage hovers aroung 100% of a CPU core and that is it. Firefox itself is responsive (menus etc) but no URL can not be visited nor web pages get loaded. That issue is quite random and will need not a single web page to be loaded. Ususally killing the Firefox and restarting will cure it temporarily, but it is returning soon.
Ivo, maybe this can help : <http://blog.bonardo.net/2011/09/30/is-your-firefor-freezing-at-regular-intervals>. The cause might be different, but the workaround might also help in this case.
Thanks, I added it. For additional information - page loading etc will eventually commence, but before that can happen the Firefox CPU usage must drop below 100% and according my testing that can take several minutes. I can not attribute that issue to plugins nor addons, it is something within Firefox.
Am I supposed to run that addon daily? If not then that didn't fix that issue for me.
(In reply to ivo from comment #28) > Am I supposed to run that addon daily? If not then that didn't fix that > issue for me. nope :) ... the fix for bug 686025 is in version 8. do you still see the problem?
Flags: needinfo?(ivo)
Whiteboard: [needs retest with trunk][dupme] → [closeme 2012-12-21][needs retest with trunk][dupme]
Resolved per whiteboard
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Flags: needinfo?(ivo)
Resolution: --- → INCOMPLETE
Whiteboard: [closeme 2012-12-21][needs retest with trunk][dupme] → [needs retest with trunk][dupme]
I'm not sure this is all that helpful but I add here instead of filing a new report. Similar symptom [ FF wants nearly half a dual-core machine's cycles ] on a different platform [ OSX 10.6 ]. Attachment is the Process Monitor sample. FF's usage dropped during each of the polling periods then jumped back up to the high 90s. Memory usage wasn't obscene. 2GB RAM, MacBook Air was only 6GB into swap (on its SSD) and not hitting that all that much. FF was reported as using CPU of around 96%. The little machine became much more responsive after FF was killed.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: