Last Comment Bug 213637 - Mozilla runs at ~100% cpu usage after network connection is interrupted, changing network/LAN, switching from wireless to non-wireless, undocking
: Mozilla runs at ~100% cpu usage after network connection is interrupted, chan...
Status: RESOLVED FIXED
: fixed1.8.1.8, perf
Product: Core
Classification: Components
Component: General (show other bugs)
: Trunk
: x86 Windows XP
: -- major with 8 votes (vote)
: mozilla1.9alpha5
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 82728 229961 251111 271636 287994 290963 299202 321565 327050 350883 355024 355157 365371 (view as bug list)
Depends on: 376643
Blocks:
  Show dependency treegraph
 
Reported: 2003-07-23 18:58 PDT by Juergen Stein
Modified: 2011-05-16 12:15 PDT (History)
48 users (show)
mtschrep: blocking1.9+
dveditz: wanted1.8.1.x+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Proposed patch (7.06 KB, patch)
2007-05-19 15:29 PDT, Wan-Teh Chang
darin.moz: superreview+
Details | Diff | Review
Patch with debugging output (do not check in) (7.83 KB, patch)
2007-05-21 10:34 PDT, Wan-Teh Chang
no flags Details | Diff | Review
Proposed patch v1.1 (as checked in) (7.06 KB, patch)
2007-05-30 10:34 PDT, Wan-Teh Chang
cbiesinger: review+
Details | Diff | Review
Use nsAutoLock. Remove mThreadEventLock. (6.76 KB, patch)
2007-08-25 21:17 PDT, Wan-Teh Chang
cbiesinger: review+
Details | Diff | Review
Patch for MOZILLA_1_8_BRANCH (4.19 KB, patch)
2007-08-26 15:14 PDT, Wan-Teh Chang
cbiesinger: review+
dveditz: approval1.8.1.8+
Details | Diff | Review

Description Juergen Stein 2003-07-23 18:58:11 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624

I have a laptop with wireless connection to the router/internet. Everytime I
have a mozilla open and remove the wireless card mozilla uses 100% cpu. The same
thing happens after windows xp comes back from hibernate or stand by.
Mozilla itself is still responding, it also still usable for surfing and so on,
but it runs on 100% cpu. Restarting mozilla is only way.

I use mozilla 1.4 final, full install, (skinned, sidebar open, lots of links,
lots of webpages open... i disabled all that, same problem)

Reproducible: Always

Steps to Reproduce:
1. start windows
2. start mozilla
3. hibernate
4. turn computer on


Actual Results:  
5. mozilla hangs

Expected Results:  
idle around :)
Comment 1 Juergen Stein 2003-07-23 19:00:14 PDT
I tried it on two different wireless cards now, same problem
Comment 2 Juergen Stein 2003-07-23 19:12:40 PDT
I player around and figured the following out:

I use the adblocker admuncher (www.admuncher.com) if he his turned off (not
running), the problem is gone. So i tried if opera or internet explorer have the
same problem, because there the admuncher is active too, and suprise, no
problems there.

The admuncher somehow hooks up an dll and inserts javascript, since the problem
is there even with no page loaded I guess its somewhere in the dll. But I cant
tell if it is mozilla, or the admunchen... its strange.

But its getting better, same mozilla version, admuncher on the desktop
computer... hibernating and stand by... works fine, no hang up. Unplugging
network cable, work also fine... so somehow the wirelesscard, admuncher and
mozilla wont work together
Comment 3 Tom Wynne 2003-10-21 11:51:54 PDT
This is also happening to me (running Firebird .6.1) but I'm not using the ad
program you mention.  Steps to reproduce:

1.  Start Firebird
2.  Enter standby
3.  Resume XP
4.  Entire system crawls.
Comment 4 Tom Wynne 2003-10-21 11:54:42 PDT
This is also happening to me (running Firebird .6.1) but I'm not using the ad
program you mention.  Steps to reproduce:

1.  Start Firebird
2.  Enter standby
3.  Resume XP
4.  Entire system crawls.
Comment 5 Asa Dotzler [:asa] 2004-02-24 23:58:16 PST
Unable to reproduce with Firefox 0.8 or Mozilla 1.7 Alpha on windowsXP
Comment 6 Majken Connor [:Kensie] 2006-06-20 15:24:27 PDT
Alright so this is the oldest instance of the cpu bug from hibernate that I could find, so reopening and duping to this one.

This is still happening on *all* mozilla products - Firefox, Thunderbird and Sunbird for myself.

It's always caused by the internet connection being dropped, but it doesn't happen *every* time the connection is dropped.  Something must be getting interrupted while trying to use a connection, and then getting caught in a loop.  On my systems a connection can't be restablished until the offending program is closed.  As reported in the original comment, the programs themselves are responsive and usable.  Only symptoms are high CPU usage, and inability to reconnect. I've reproduced this also by killing my connection by hand (not just entering standby/hibernate), although I haven't yet been able to find an event that killing the connection during will cause it to reproduce.

For reference sake, I think a lot of the people discussing in bug 265172 are having this problem, and not the flash problem (can't be flash if it's happening in the other apps).
Comment 7 Majken Connor [:Kensie] 2006-06-20 15:26:58 PDT
*** Bug 229961 has been marked as a duplicate of this bug. ***
Comment 8 Majken Connor [:Kensie] 2006-06-20 15:27:21 PDT
*** Bug 271636 has been marked as a duplicate of this bug. ***
Comment 9 Majken Connor [:Kensie] 2006-06-20 15:27:56 PDT
*** Bug 287994 has been marked as a duplicate of this bug. ***
Comment 10 Majken Connor [:Kensie] 2006-06-20 15:28:27 PDT
*** Bug 290963 has been marked as a duplicate of this bug. ***
Comment 11 Majken Connor [:Kensie] 2006-06-20 15:28:50 PDT
*** Bug 327050 has been marked as a duplicate of this bug. ***
Comment 12 Sean Coates 2006-06-20 18:29:46 PDT
Note: I've seen this happen in Komodo, too, so I agree: all gecko apps seem to be affected.
Comment 13 Keith S. 2006-06-20 20:09:19 PDT
Okay, so hopefully all the votes on the numerous duplicate bugs aren't lost here, but it should be clear that a large number of people are experiencing this problem. Many people can reproduce it regularly, but not everyone can reproduce it consistently (for instance, I have two machines -- one with AdMuncher, and one without -- but the problem happens on both; also, I have this problem reproduce consistently with Firefox, but never with Thunderbird).

It should be noted in this bug what was noted in other bugs:
- This often occurs when a dialup connection is dropped, as well as losing a network connection.
- It may be reproducible any time you disable your network adapter, wireless or not (for instance, when taking a laptop from a docked to an undocked configuration)
- It is reproducible in safe mode with a clean profile
Comment 14 Sergey Bykov 2006-07-19 11:25:41 PDT
I approve that this bug happen with FF 1.5.0.4 without any extensions and on WinXP SP2 without any updates.
Thunderbird 1.5.0.4 doesn't reproduce this bug!

I have AdMuncher and OutPost FireWall.
Comment 15 Keith S. 2006-07-19 12:52:24 PDT
Sergey,

Juergen indicated that when he has AdMuncher disabled, the problem does not reproduce. I have AdMuncher, but when I disabled it, the problem DID reproduce. Can you test this and update with your findings, so we can break the tie? :)

Thanks.
Comment 16 Majken Connor [:Kensie] 2006-07-19 19:59:25 PDT
I don't have admuncher at all, nor do others that are seeing this issue.  Does admuncher connect to something at intervals?

As for the people not seeing it in thunderbird, how often do you have tb set to autocheck for more mail?  Maybe set it to check every minute and see if it still doesn't reproduce.  Maybe there is another problem?
Comment 17 Bill Wood 2006-07-20 13:16:08 PDT
Try using FlashBlock - many people who had a 100% CPU bug found that using FlashBlock resolved it.  Don't leave any Flash-enabled windows open when hibernating or going on standby.  Flash has a bug that it tries to run every event fired since the last time the CPU was active.  Its fixed in the latest beta version I believe.
Comment 18 Majken Connor [:Kensie] 2006-07-20 14:30:50 PDT
flash doesn't explain why it happens in sunbird and thunderbird.
Comment 19 Stuart Parmenter 2006-08-10 16:18:04 PDT
we want steps to reproduce.  please file seperate bugs for each set of steps.
Comment 20 Keith S. 2006-08-10 16:46:02 PDT
Stuart,

You can find explicit steps to reproduce this problem in bug 327050.

Keith
Comment 21 Majken Connor [:Kensie] 2006-08-10 23:57:18 PDT
stuart - what are the different sets of steps that should be filed differently? There is a flash issue that has its own bug that iirc is fixed now and is a different issue from the one reported.  Or were you wanting separate bugs per app?
Comment 22 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-08-11 00:02:18 PDT
This bug contains several sets of steps to reproduce.  Some involve extensions.  Some do not.  Some involve plug-ins.  Some do not.

Chances are that there are several different issues involved.  There should be one bug per separate issue; this bug can probably be used to track them.
Comment 23 Majken Connor [:Kensie] 2006-08-11 00:15:37 PDT
ahh I see, the reporter is also the same person as commented that disabling admuncher resolved the issue, so we have to assume this is a report on admuncher causing the issue rather than relying on the original steps to reproduce in comment 0.  So while it's probable that admuncher is just exacerbating the flaw in firefox we can't dupe these to each other until we find that out for sure.
Comment 24 Keith S. 2006-08-23 22:53:39 PDT
After doing some more investigation on bug 327050, I found that I was able to reproduce this particular issue with Ad Blocker by doing the following:

1) Start Ad Muncher 4.6 (latest production version)
2) Start Firefox
3) Exit Ad Muncher

AT this point, the CPU starts to tailspin. I created an HTTP debug file (using latest nightly), and it contained about 20 MB worth of this, after exiting Ad Muncher:

1880[1283f18]: STS poll iter [1]
1880[1283f18]:   calling PR_Poll [active=0 idle=0]
1880[1283f18]:     timeout = -1 milliseconds
1880[1283f18]:     ...returned after 0 milliseconds

This problem does NOT reproduce for me with the latest Ad Muncher 4.7 Beta.
Comment 25 Robert Krawitz 2006-11-27 18:42:42 PST
I'm seeing something very similar with Firefox 2.0 on Linux (SUSE 10.0, to be precise).  The problem is specific to 2.0 (it doesn't happen with any of the 1.5 releases, including 1.5.0.8, and I don't remember it with 1.0 either) and is severe enough that I've had to go back to 1.5, because I can't use the computer for much of anything else.

What's interesting here is that I have two computers with identical Firefox configurations (indeed, I copied my Firefox directory, both the installation and my .mozilla/firefox directory, from one to the other) and otherwise software loads as identical as I can manage.  The problem happens on my laptop, which is configured to always go through a proxy, but not on my home server, which uses a direct connection.

More specifically, what I've observed is that when I suspend to disk and resume my laptop Firefox immediately starts consuming 100% of the CPU.  Firefox stays responsive, and the X server doesn't accumulate any special amount of CPU time.  In other situations, I find that Firefox is consuming CPU time (usually between 10 and 30%), and as I open more tabs the amount of CPU time it consumes increases (it doesn't tend to decrease as I close tabs).  It grows worse over time, and at some point it jumps to 99-100% CPU utilization (I don't know exactly what triggers it, but it isn't only suspend/resume).

One interesting thing I noticed this weekend is that I was scanning a lot of images, which is very CPU-intensive.  While I was doing this, Firefox tended to jump to full CPU utilization very quickly, no matter how few tabs I opened.  It appears that once this happens it never gets out of that state, although forcing a restart (e. g. by installing or updating an add-on) does temporarily push the CPU utilization down.

On my server I find that Firefox tends to consume a smaller amount of background CPU time, typically more like 5%, and it never seems to go to 100%.

I tried disabling every extension except for AdBlock+ and NoScript, and it made no difference at all.  I use both of these extensions under 1.5 with no problems at all.  As it happens, I copied my configuration files from my laptop to my server (I'm using a proxy.pac file for the server configuration, so this works).  I'm using the stock versions of 1.5 and 2.0 from mozilla.com.

As I stated, both systems are running SUSE 10.0 with kernel 2.6.13-15.12-default (the SUSE kernel).  I'm using KDE 3.5.5 (from SUSE packages).  My laptop has a P4 1.8 GHz with 1.5 GB of RAM; my server has an Athlon 64 3000+ with 2 GB RAM.  The network inside our house is a 100 Mbit ethernet with fixed IP addresses, and I'm running Squid 2.5.STABLE10.5.2 on the server.  I tried disabling proxy pipelining and proxy persistent connections, which had no effect.

I once trapped it under gdb and I vaguely recall having seen PR_Poll show up a lot.

If anyone has any tests they'd like me to run I'm happy to do so; I'd like to see this bug stomped.  It's a most unpleasant one.
Comment 26 Robert Krawitz 2006-11-27 18:45:45 PST
I should add that subject to compatibility issues I'm using essentially the same add-ons and theme (LittleFox) in both 1.5 and 2.0.
Comment 27 Robert Krawitz 2006-11-27 19:12:07 PST
One other thing: the Athlon 64 system is actually running in 32 bit mode (just like the P4).
Comment 28 Robert Krawitz 2007-01-08 13:04:00 PST
This is a bit strange.

I'm not seeing this problem any more: Firefox 2.0 isn't gobbling computrons out of control.

However, if I use the GNOME control center to adjust the font sizes (the default fonts are much bigger than I'd like), the problem comes back.

As it happens, I'm running KDE, not GNOME.  I have no idea where to poke around in GNOME if I try to change any of this stuff.  I'd like to help debug it, but I'm out of ideas where to go.  There appears to be some kind of interaction between GNOME/GTK fonts and KDE/QT fonts that I simply don't understand.
Comment 29 Wayne Mery (:wsmwk, NI for questions) 2007-01-08 13:12:36 PST
Keith, Lucy,

Has it been determined yet if extension is clearly involved? And could someone update the summary with such details (yeah or nay, either way)?  thanks

I commented in bug 327050 of possible duplicates and related bugs.
Comment 30 Scott 2007-01-08 13:58:02 PST
*** Bug 251111 has been marked as a duplicate of this bug. ***
Comment 31 Keith S. 2007-01-08 22:10:05 PST
I've reproduced it with no extensions, however I haven't seen it reproduce since making those changes to my wireless network adapter drivers. It hasn't resurfaced since going to Firefox 2, either.
Comment 32 OstGote! 2007-01-13 16:14:34 PST
*** Bug 355024 has been marked as a duplicate of this bug. ***
Comment 33 OstGote! 2007-01-13 16:15:22 PST
*** Bug 356911 has been marked as a duplicate of this bug. ***
Comment 34 OstGote! 2007-01-13 16:21:42 PST
*** Bug 352457 has been marked as a duplicate of this bug. ***
Comment 35 OstGote! 2007-01-13 16:21:46 PST
*** Bug 321565 has been marked as a duplicate of this bug. ***
Comment 36 OstGote! 2007-01-13 16:42:33 PST
*** Bug 355157 has been marked as a duplicate of this bug. ***
Comment 37 OstGote! 2007-01-15 15:22:17 PST
*** Bug 299202 has been marked as a duplicate of this bug. ***
Comment 38 phalgand 2007-01-17 09:22:40 PST
a test comment
Comment 39 Ross Roberts 2007-01-23 10:29:00 PST
any progress on this? anything a non-programmer-but-semi-techie person can provide? i have this happening on both my work-related desktop (winXP) and main home PCs (winXp AND linux), with or without extensions. I did strace the occurrence on my linux box @ home and did not an endless/looping stream of gettimeofday() and (iirc) in_futex calls just hammering the cpu.. and the home pc is a typical grey-box sitting there running 24/7, not a laptop or host losing network connectivity or going in/out of suspend/hibernation. this is becoming quite debilitating on my work pc; i'm soon to resort to an older portable firefox or (heaven forbid) opera. i never had this before 2.x..

the only trigering component in my case seems to be time: after the browser has been executing an hour or so, cpu util simply skyrockets.
Comment 40 Majken Connor [:Kensie] 2007-01-23 11:19:53 PST
Ross, are those machines using wireless internet? if so what wireless card do you have and what is the version on the firmware/drivers?
Comment 41 Keith S. 2007-01-23 11:54:30 PST
Ross,

Also, do you have a firewall or ad blocker installed on either machine? Long ago, before I logged this bug, I did experience this problem when running a buggy ad blocker. Basically, it was acting as a system-level proxy, intercepting HTTP traffic, but somehow munging it before it got to Firefox, causing it to go into a tailspin. So anything that looks at HTTP traffic at the system level can also be suspect here.

Ditto for anything at the browser level (i.e.: extensions), but I'm guessing you already ruled those out.
Comment 42 Ross Roberts 2007-01-24 15:14:22 PST
majken@: home pc has wired ethernet and sits running 24/7, either in winXP or Slackware11. The comments were relative to Slack11..

Keith: I have tried pulling out all extensions but will do so again to absolutely verify. I do regularly use adblocker in addition to external proxies (squid on external host, winXP instances also pass through proxomitron to further scrub www traffic).

thx for the hints.. will post more on the results. i'm also pulling down the sources on my linux box and will try building a debug setup to see what it looks like..
Comment 43 Ross Roberts 2007-01-24 15:15:57 PST
majken@: duh, sorry.. the work pc is a dell latitude/d600 with intel ipw2100 wireless chipset. suspend/resume almost ALWAYS drives firefox wild.. and i saw the mentions of that before. 
Comment 44 Keith S. 2007-01-24 21:31:52 PST
Interesting -- that's exactly the same configuration I had. When I upgraded the ipw drivers to 10.5, the problem went away on that machine.

Could be worth looking into.
Comment 45 mani 2007-02-11 09:38:06 PST
may be it should be good idea to mark.
Comment 46 mr 2007-03-25 03:59:32 PDT
A while after I reported these problems (in bug 355157) I upgraded to Firefox 2.0, and the problem is still here. Just wanted to add that to my reports.
Comment 47 Mathieu Gagné 2007-03-25 11:03:04 PDT
Hi,

> suspend/resume almost ALWAYS drives firefox wild.. and i saw
> the mentions of that before.

I used to use Wireless connection and hibernate mode. Since I use a wired connection, the problem is gone.

Might be related to the wireless connection and the hibernate mode.
Comment 48 Henrik Skupin (:whimboo) 2007-03-25 11:23:23 PDT
That's not a problem for Mac OS X so far. Seems that some drivers on Windows going crazy. Someone here who can see it on Linux?

Mozilla/5.0 (Macintosh; U; Intel Mac OS X; de; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3
Comment 49 Majken Connor [:Kensie] 2007-03-25 13:24:45 PDT
mr@analogue.org - can you try upgrading the firmware on your wireless card? A few of us have and the problem has gone away.
Comment 50 mr 2007-03-25 15:31:30 PDT
(In reply to comment #49)
> mr@analogue.org - can you try upgrading the firmware on your wireless card? A
> few of us have and the problem has gone away.

I did that in late 2006 and the problem probably went away for Firefox 1.5.
But now they are back, in 2.0 - and Windows shows the latest firmware version number for my wireless network card.
Comment 51 Majken Connor [:Kensie] 2007-03-25 15:45:15 PDT
If you use 1.5.0.x now, will it have the same problem?  It'd be good to know the firmware versions as well.
Comment 52 whorfin 2007-04-07 15:37:17 PDT
I believe I've found a root cause of this, unrelated to network issues.

I reported as Bug 371879, with Bug 376643 representing what i am quite sure
is the core problem.  Note that this is NOT related to network connectivity,
376643 relates to massive CPU consumption based on the current behavior
of setInterval, which doesn't set an interval but rather sets a number of
invocations based on the difference between initial invocation and current wall-clock time.

It would probably be a good idea to decouple this bug into hibernate/resume
and network-related threads?
Comment 53 Tomas Pospisek 2007-04-26 09:18:35 PDT
After waking up from suspend, I'm seeing something similar as this as well.

I can close all tabs, still firefox remains at 20% CPU usage. Nothing moves though.

OS: Debian testing (>etch)
Firefox aka Iceweasel: 2.0.0.3

I'm using Debian's firefox atm, but I have been seeing this behavour for ages now, even before Debian renamed it's shipped firefox with FF 1.5 and now FF 2.0.

*t
Comment 54 jieming wang 2007-05-10 06:08:10 PDT
Saw high 99% CPU usage with 2.0.0.3 on Windows XP. This is reproducible all the times, by keeping firefox session open and switching to a different network (i.e., home wireless, to office wireless).
Comment 55 Wan-Teh Chang 2007-05-16 11:20:04 PDT
I received new information from a Firefox user.  Mr. Zhan reported
that the TCP connection over the loopback adddress that we use to
implement PR_NewPollableEvent is broken when a PC switches between
wired LAN and wireless LAN.  With that information, I have this
theory, using MOZILLA_1_8_BRANCH source code:
  The broken loopback connection in the NSPR pollable event
  (mThreadEvent, polled as mPollList[0]) causes
  nsSocketTransportService::Run to return from
  nsSocketTransportService::Poll.  Since we only handle the
  PR_POLL_READ flag set in mPollList[0].out_flags and we
  don't check the return value of PR_WaitPollableEvent:

  617             //
  618             // service the event queue (mPollList[0].fd == mThreadEvent)
  619             //
  620             if (n == 0)
  621                 active = ServiceEventQ();
  622             else if (mPollList[0].out_flags == PR_POLL_READ) {
  623                 // acknowledge pollable event (wait should not block)
  624                 PR_WaitForPollableEvent(mThreadEvent);
  625                 active = ServiceEventQ();
  626             }

  It is possible that
  1. mPollList[0].out_flags is not equal to PR_POLL_READ.  It
     may have other flags set, or
  2. PR_WaitForPollableEvent returns PR_FAILURE.

  By the way, that test should probably be written as
  (mPollList[0].out_flags & PR_POLL_READ) != 0, and we should
  probably also check the return value of PR_SetPollableEvent.

Could someone who can reproduce this bug and attach a debugger
to Firefox verify my theory?
Comment 56 Wan-Teh Chang 2007-05-19 15:29:34 PDT
Created attachment 265381 [details] [diff] [review]
Proposed patch

This patch may be incomplete but is an improvement over the current
code.  I only have a Windows development environment on a desktop
PC, so I can't make it hybernate or switch between wired and wireless
networks.  I can only simulate the error condition by closing the
TCP loopback/localhost connection in the pollable event using
Sysinternals' TCPView tool.  This patch can recover from such a
simulated bad pollable event.  If this patch can't do that, it falls
back on "busy wait", which wastes some CPU cycles but works.

I'd appreciate if someone can test this patch with real conditions
that cause this bug.

Other issues with my patch.

1. I have to create a new lock, mThreadEventLock, to protect other
threads' access to mThreadEvent.  So this class has two locks now:
mLock and mThreadEventLock.  This means the name mLock is a little
confusing.  We probably should rename it.  (mLock protects mShuttingDown
only, even though it seems to be intended to also protect mInitialized.)

2. I call PR_Lock and PR_Unlock on mThreadEventLock.  I wonder if
I should use nsAutoLock instead.

3. I didn't add new code to handle mPollList[0].out_flags with flags
other than PR_POLL_READ set.  Perhaps we should at least change
  mPollList[0].out_flags == PR_POLL_READ
to
  (mPollList[0].out_flags & PR_POLL_READ) != 0
Comment 57 Wan-Teh Chang 2007-05-21 10:34:51 PDT
Created attachment 265527 [details] [diff] [review]
Patch with debugging output (do not check in)

I figured out how to enable hibernation and installed a
USB wireless card on my desktop PC, but I still can't
reproduce this bug.

If you can reproduce this bug and know how to build the
Firefox trunk, please apply this patch and build Firefox.
I provided Firefox binary with this patch applied at
https://kuix.de/mozilla/wtc/20070521/ for your convenience.

This patch causes Firefox to write debugging output to
the file C:\pollevent.log.

Please collect debugging output as follows.
1. Shut down Firefox.
2. Remove C:\pollevent.patch, if it exists.
3. Start up your Firefox build with this patch applied.
4. Go through the procedure to reproduce this bug.  My
   patch may eliminate the 100% CPU usage, which is
   fine.
5. Shut down Firefox.
6. Use a regular Firefox build or another browser to
   attach the file C:\pollevent.patch to this bug.

I will analyze the debugging output and see if my
patch handles the real error condition properly.
Thank you for your help.
Comment 58 Wan-Teh Chang 2007-05-24 15:59:36 PDT
Comment on attachment 265381 [details] [diff] [review]
Proposed patch

Mr. Zhan ran my test Firefox build, and the debugging output
he sent me showed that my patch enabled Firefox to recover from
the real error condition successfully.  So I'm now formally
requesting code reviews.

Notes to the reviewers:

1. Read comment 55 for the cause of the 100% cpu usage, and
comment 56 for a description of the patch.

2. Verify that mThreadEventLock provides thread-safe access
to mThreadEvent, which may change now.

3. Under normal conditions, this patch incurs locking
overhead only for the threads that call PR_SetPollableEvent.
Please make sure this locking overhead is acceptable, that
is, PR_SetPollableEvent is not called frequently.

4. Verify that mThreadEventLock is not leaked.

5. Should I use nsAutoLock to follow the coding style?

Thank you.
Comment 59 Darin Fisher 2007-05-30 08:49:20 PDT
Comment on attachment 265381 [details] [diff] [review]
Proposed patch

looks good to me, sr=darin
Comment 60 Wan-Teh Chang 2007-05-30 10:34:29 PDT
Created attachment 266634 [details] [diff] [review]
Proposed patch v1.1 (as checked in)

I checked in the patch on the Mozilla trunk for Firefox 3 Alpha 5.
I fixed a spelling error in the comment: "hybernation".

Christian, I'd still appreciate your review because I'm interested
in proposing this patch for the MOZILLA_1_8_BRANCH.
Comment 61 Wan-Teh Chang 2007-06-01 21:51:19 PDT
*** Bug 327050 has been marked as a duplicate of this bug. ***
Comment 62 Wan-Teh Chang 2007-06-01 22:03:37 PDT
*** Bug 365371 has been marked as a duplicate of this bug. ***
Comment 63 Wan-Teh Chang 2007-06-01 22:04:44 PDT
*** Bug 350883 has been marked as a duplicate of this bug. ***
Comment 64 Worcester12345 2007-06-03 20:39:27 PDT
(In reply to comment #60)
> Created an attachment (id=266634) [details]
> Proposed patch v1.1 (as checked in)
> 
> I checked in the patch on the Mozilla trunk for Firefox 3 Alpha 5.
> I fixed a spelling error in the comment: "hybernation".
> 
> Christian, I'd still appreciate your review because I'm interested
> in proposing this patch for the MOZILLA_1_8_BRANCH.

Then, to be consistent with other bugs, shouldn't this be reopened or have a whiteboard entry indicating this?
Comment 65 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-06-03 20:44:05 PDT
No.  To be consistent with other bugs, neither should happen.
Comment 66 Worcester12345 2007-06-03 21:05:34 PDT
Oh, I was thinking something like Bug 294307 – seamonkey mailnews doing too-permissive content policy checks

Thanks though. For some reason, I thought there was a protocol to fixing bugs on trunk yet leaving them open for branch after "baking time" on the trunk. Must have gotten too much sun yesterday. Sorry.
Comment 67 Daniel Veditz [:dveditz] 2007-08-21 15:31:40 PDT
Won't hold a branch release for this, but if you get the review and this is still a branch problem you can request approval1.8.1.7?
Comment 68 Howard Rosen 2007-08-22 08:42:07 PDT
Hi There,

I am a VERY non-techie but have been pulling my hair out with this exact problem, I am running XP with FF 2.0 and at least one a day it hangs up with CPU at 100%. Reading the thread t seems that some of you have figured out the problem but I honestly can't understand how to resolve. It seems that the naswer for now is to run an earlier version of FF. Is that correct? If any of you could shed some light it would be greatly appreciated. Thanks.
Comment 69 atlantisician 2007-08-22 09:51:09 PDT
I >am< a techie, but must confess that the exchanges here are still mostly gibberish.  So let me change to common language :-)))

I have the impression that this bug is now solved.
If so, when will it be incorporated in FF?
Mozilla & Netscape have the same problem.
Will these browsers be fixed also?
Comment 70 Christian :Biesinger (don't email me, ping me on IRC) 2007-08-22 11:54:05 PDT
Comment on attachment 266634 [details] [diff] [review]
Proposed patch v1.1 (as checked in)

+        PR_Lock(mThreadEventLock);

please use nsAutoLock instead of manually calling PR_Lock/PR_Unlock

Is there a particular reason for not reusing mLock to also protect mThreadEvent?
Comment 71 Majken Connor [:Kensie] 2007-08-22 14:56:30 PDT
Howard, atlantisician:  The fix has been checked into builds that will become firefox 3.  Currently work is going on to make sure the patch is safe for branch and then if it is, it will be released in a 2.0.0.x security update.

Many users having problems have found that simply updating their wireless card's firmware gets them around the problem.  Please try that in the mean time.

If you need help doing so, or have other questions, please use the support channels: http://mozilla.com/support
Comment 72 Worcester12345 2007-08-23 01:00:16 PDT
perf key word
Comment 73 Howard Rosen 2007-08-24 07:48:56 PDT
Thanks  Majken, I'll give it a try.As an FYI, I uninstalled Ad-ware/Ad-Watch 2007 and though the problem remains, it now only lasts for about 30 seconds before resolving ..
Comment 74 Wan-Teh Chang 2007-08-25 21:17:37 PDT
Created attachment 278263 [details] [diff] [review]
Use nsAutoLock.  Remove mThreadEventLock.

I used a separate lock (mThreadEventLock) to protect mThreadEvent
because I didn't fully understand what members mLock protect, and
because I didn't want to create lock contention for mLock
inadvertently.

Yesterday I studied the use of mLock.  It is used by the main
thread to protect its writes to mShuttingDown, and by the socket
thread to protect its reads of mShuttingDown.  (The main thread
is the only thread that changes mShuttingDown.)  So mLock is
lightly used, and it should be fine to use mLock to protect
mThreadEvent also.

This patch uses nsAutoLock instead of PR_Lock/PR_Unlock and
removes mThreadEventLock.  Please verify my updated comments
for mThreadEvent that only the *socket thread* may change
mThreadEvent (when the socket thread detects that mThreadEvent
is broken).
Comment 75 Christian :Biesinger (don't email me, ping me on IRC) 2007-08-26 13:02:09 PDT
Comment on attachment 278263 [details] [diff] [review]
Use nsAutoLock.  Remove mThreadEventLock.

looks good. r=biesi
Comment 76 Wan-Teh Chang 2007-08-26 13:16:09 PDT
Comment on attachment 278263 [details] [diff] [review]
Use nsAutoLock.  Remove mThreadEventLock.

Thanks.  I checked in this patch on the Mozilla trunk for 1.9 M8.
Comment 77 Wan-Teh Chang 2007-08-26 15:14:06 PDT
Created attachment 278333 [details] [diff] [review]
Patch for MOZILLA_1_8_BRANCH

MOZILLA_1_8_BRANCH doesn't have mLock.  It has mEventQLock,
which already protects the main thread's accesses to mThreadEvent.
So I simply use mEventLock to protect the socket thread's
change to mThreadEvent.
Comment 78 Wan-Teh Chang 2007-08-27 10:22:04 PDT
Comment on attachment 278333 [details] [diff] [review]
Patch for MOZILLA_1_8_BRANCH

Code Reviews:
The patch for the trunk was reviewed by Darin Fisher and Christian Biesinger.
After I adapted the patch for the 1.8 branch, Christian Biesinger reviewed it
again.

Testing:
Version 1 of the patch has been tested on the trunk since 2007-05-30, and
the final version of the patch, which reuses an existing lock instead of
creating a new lock, was checked in on the trunk on 2007-08-26.

Risk:
The patch may create contention for the lock inadvertently.
The patch for the 1.8 branch is much simpler than the patch for the trunk
because the original code already protects the mThreadEvent member with
a lock.  Therefore, the risk of the patch for the 1.8 branch is actually
lower than the risk of the patch for the trunk.
Comment 79 Daniel Veditz [:dveditz] 2007-09-07 11:48:58 PDT
Comment on attachment 278333 [details] [diff] [review]
Patch for MOZILLA_1_8_BRANCH

As annoying as this problem is, we've lived with it for several years do we really want to risk deadlocks in a security update? FF3 is just around the corner, why not wait?
Comment 80 Tomas Pospisek 2007-09-07 12:53:43 PDT
(In reply to comment #79)
> (From update of attachment 278333 [details] [diff] [review])
> As annoying as this problem is, we've lived with it for several years do we
> really want to risk deadlocks in a security update? FF3 is just around the
> corner, why not wait?

Umm, well. Once upon a time I was running Firefox alongside Konqueror. This has changed however since suspend to disk/ram works on my Linux box - FF is just too much of a PITA for me with suspend. Now I use FF only rarely if some site doesn't render well in Konqueror or to manage bookmarks.

It'd be nice if FF3 would be "usable" for my kind again.
Comment 81 Daniel Veditz [:dveditz] 2007-09-11 22:34:05 PDT
The bug claims FF3 is fixed -- please give a current nightly or the upcoming "M8" alpha release a try. If this patch does not solve the problem for you we need to take a deeper look.

My comment was in regards to the request to ship it in a Firefox 2.0.0.x security update. I would love to make the problem better but not willing to tolerate much regression risk for a non-security problem. Getting confirmation that the trunk is working better would go a long way to allay our concerns (even though the branch patch is slightly different from the trunk).
Comment 82 whorfin 2007-09-12 12:30:28 PDT
(In reply to comment #81)
> The bug claims FF3 is fixed -- please give a current nightly or the upcoming
> "M8" alpha release a try. If this patch does not solve the problem for you we
> need to take a deeper look.

Let's be more specific about "the problem" which the patch fixes.  Is it
strictly network related?  I would think it has to be, since Bug 376643 is a clear cause of 100% cpu churn after wakeup if ANY javascript setInterval calls had been made.  That bug also claims to be "fixed for 3".

I tried the most recent nightly of firefox-3.0a8pre, and it crashed before even fully starting, so I'm done until there's a stable version someone will point to and say "this, this build right here, has solved your problem".  I'll be delighted to do some testing.
Comment 83 Pablo 2007-09-13 03:06:11 PDT
In case it may help.

My problem was related with the VPN connection. Every time I started Aventail (VPN client) Firefox run up to 100% CPU usage.

I just applied Wan-Teh Chang's patch and the problem dissapeared.
Comment 84 Wan-Teh Chang 2007-09-13 09:49:02 PDT
(In reply to comment #79)
Dan, the risk is lock contention, not deadlock.  Lock contention
means a lock is used so often that most of the time a thread
has to wait to acquire the lock (because some other thread is
holding the lock).  Lock contention is a performance issue.

Let me clarify the risk of this patch.
- Patch for the trunk: small risk of lock contention
- Patch for the 1.8 branch: *no* risk of lock contention

In comment 79 I didn't make the second point clear.  The patch
for the 1.8 branch is much safer, and it won't create lock
contention.

It is hard to assess how critical this bug is.  Many users
reported they encountered this bug (or its duplicates).
But only 2 (a Mr. Zhan and Pablo) verified my fix worked
for them.
Comment 85 ANdy 2007-09-13 10:24:19 PDT
I'm not too familiar where to get specific versions of the files to test (I just reported one of the duplicate bugs) but if you point me to an exe file that I can install on my laptop then I can test to see if the issue is resolved for me. Feel free to e-mail separately with instructions if needed.
Comment 86 Daniel Veditz [:dveditz] 2007-09-13 14:04:34 PDT
Comment on attachment 278333 [details] [diff] [review]
Patch for MOZILLA_1_8_BRANCH

approved for 1.8.1.7, a=dveditz
Comment 87 Wan-Teh Chang 2007-09-14 19:56:17 PDT
I checked in the patch (attachment 278333 [details] [diff] [review]) on the MOZILLA_1_8_BRANCH
for 1.8.1.7.

To test to see if the issue is fixed for you, please download a 1.8
(= Firefox 2.0.0.x) nightly build from
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla1.8/

Please make sure the nightly build is produced after I checked in
the fix.  It is best to get a nightly build produced on Sunday,
2007-09-16 or later.

Comment 88 kitchin 2007-09-16 11:45:10 PDT
The keyword should be changed to fixed1.8.1.8, right?  The Firefox 2.0.0.7 emergency release candidate did not get this patch.
Comment 89 Henrik Skupin (:whimboo) 2007-09-16 11:54:12 PDT
Correct. Anyone who can verify that it is fixed on 1.8 branch?
Comment 90 Tomas Pospisek 2007-09-18 01:58:03 PDT
I have installed 2.0.0.7pre and indeed there's no more CPU eating after a resume. Thanks for the fix!
Comment 91 Ronald Rossi 2007-09-18 06:08:10 PDT
Folks,

I had not searched for this bug until yesterday. I have been experiencing this high CPU usage after suspend for a very long time. I had high hopes when I read about 2.0.0.7pre, but coming out of suspend this morning Firefox was using  98% of the CPU time...well I mean Bon Echo 2.0.0.7pre. I have not removed all the AddOns, so maybe there is a contribution from one of them.
Comment 92 whorfin 2007-09-18 12:42:29 PDT
Ronald:

This is why i've been asking for clarification of what is being addressed in this patch.  I just tried Bon Echo 2.0.0.8pre, and can make it consume 100%
cpu coming out of suspend.  This is easily demonstrated via the mechanisms
of Bug 376643 [look for the test cases and explanation].  All this says is
that Bug 376643 is not addressed by this patch [assuming this patch is in
2.0.0.8pre, as suggested above].  So let's be explicit --- the path this bug is on is to address issues related to network connections, not the more general "100% cpu on wakeup from hibernate" which is _also_ caused by Bug 376643.

right?

perhaps the bug description should be edited to reflect this.
Comment 93 Christian :Biesinger (don't email me, ping me on IRC) 2007-09-18 13:25:21 PDT
correct, the problem addressed by Bug 376643 is independent of this one even though the symptoms may be the same.
Comment 94 Wayne Mery (:wsmwk, NI for questions) 2007-09-18 14:21:02 PDT
Right, wtc's  comment 55 through 58 are explicit that this addresses network interruption, even though hibernate was mentioned very early in this bug (comment 3 for example). In fact I don't recall that wtc was ever able to cause a problem going into hibernate.  modifiying summary

So if your problem was sleep/suspend/standby/hibernate ...

* if problem is gone in a trunk build (and not from updating flash player) and you came from one of the bugs below [1] you probably want to a) dupe it to Bug 376643 and request in 376643 that the patch be landed on branch for FF 2 - it is an obnoxious bug for laptop users

* if problem is NOT gone on trunk or 2.0.0.7pre: you probably want to undupe your bug and await further comment in your bug.

[1] these got duped 2007-01-08 through 2007-01-15 immediately after I changed the summary in this bug - I am certain some of these shouldn't be duped here:
 Bug 356911 
 Bug 355157 (has the most dupes)
 Bug 355024 
 Bug 352457
 Bug 321565 
 Bug 299202
Comment 95 kitchin 2007-09-19 12:32:05 PDT
You mean 2.0.0.8pre not 2.0.0.7pre.  Any 2.0.0.8pre has the checkin for this bug, which happened at 2007-09-14 19:44 on that branch.
Comment 96 Wayne Mery (:wsmwk, NI for questions) 2008-04-29 10:31:48 PDT
*** Bug 82728 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.