Closed Bug 500791 Opened 15 years ago Closed 13 years ago

frequent, multi-second lags and pauses

Categories

(Firefox :: General, defect)

3.5 Branch
x86
Linux
defect
Not set
major

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mozilla-bugzilla, Unassigned)

Details

(Whiteboard: [Snappy])

Attachments

(4 files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b4pre) Gecko/20090401 Ubuntu/9.04 (jaunty) Shiretoko/3.5b4pre
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b4pre) Gecko/20090401 Ubuntu/9.04 (jaunty) Shiretoko/3.5b4pre

After running for a while, Firefox gets very laggy.  After a couple days, these hangs slowly grow long enough to have complete "gee, did I have the browser window focused?  Oh, there it scrolls" trains of thought.  This manifests as:

Lag, in the middle of scrolling, especially with the mouse wheel or keyboard, though sometimes by dragging the scroll bar as well;
Long delays before there's any response on many UI events (eg, clicking on another tab);
animated GIFs pause;
flash videos freeze while playing;
typing input stops, whether in the address bar, search box, or in a form, and then catches up after a few seconds;
100% CPU consumption by the firefox[-3.5] process during the lag;
user gradually uses more profanity.

Watching "strace -p `ps -C firefox-3.5 -o pid=` -e '!poll,gettimeofday,read,futex,writev,select'" shows that the lag always occurs during a flurry of madvise(xx, yy, MADV_DONTNEED) calls, sometimes preceded by writing files to the profile.  Based on strace, these events are frequently triggered by: mouse in, or click on a link, or click on a tab, or begin scrolling.

Here's a backtrace from breakpointing at an madvise call during mouse wheel scrolling:

#0  0xb7d3ae30 in madvise () from /lib/tls/i686/cmov/libc.so.6
#1  0x0805500c in ?? ()
#2  0x08056e14 in free ()
#3  0xb7eab231 in operator delete () from /usr/lib/libstdc++.so.6
#4  0xb7430891 in nsCycleCollector::Collect (this=0x2e000, aTryCollections=1) at nsCycleCollector.cpp:323
#5  0xb7430973 in nsCycleCollector_collect () at nsCycleCollector.cpp:2904
#6  0xb702bcb2 in nsJSContext::CC () at nsJSEnvironment.cpp:3427
#7  0xb702bdba in nsJSContext::MaybeCC (aHigherProbability=0) at nsJSEnvironment.cpp:3495
#8  0xb702c163 in nsUserActivityObserver::Observe (this=0xb26e9f20, aSubject=0x0, 
    aTopic=0xb75a8ea2 "user-interaction-active", aData=0x0) at nsJSEnvironment.cpp:287
#9  0xb73ff000 in nsObserverList::NotifyObservers (this=0xb27e333c, aSubject=0x0, 
    aTopic=0xb75a8ea2 "user-interaction-active", someData=0x0) at nsObserverList.cpp:128
#10 0xb73ff2c2 in nsObserverService::NotifyObservers (this=0xb79ec8e0, aSubject=0x0, 
    aTopic=0xb75a8ea2 "user-interaction-active", someData=0x0) at nsObserverService.cpp:181
#11 0xb6f47992 in nsUITimerCallback::Notify (this=0xb2527430, aTimer=0xb251dfa0) at nsEventStateManager.cpp:323
#12 0xb74279b8 in nsTimerImpl::Fire (this=0xb251dfa0) at nsTimerImpl.cpp:423
#13 0xb7427a2d in nsTimerEvent::Run (this=0xb00e8640) at nsTimerImpl.cpp:512
#14 0xb7424d14 in nsThread::ProcessNextEvent (this=0xb79e0970, mayWait=1, result=0xbfbfd924) at nsThread.cpp:510
#15 0xb73f567c in NS_ProcessNextEvent_P (thread=0xc, mayWait=1) at nsThreadUtils.cpp:227
#16 0xb73778b8 in nsBaseAppShell::Run (this=0xb54170b0) at nsBaseAppShell.cpp:170
#17 0xb7253574 in nsAppStartup::Run (this=0xb4246670) at nsAppStartup.cpp:192
#18 0xb6c1e6be in XRE_main (argc=3, argv=0xbfc010a4, aAppData=0xb791d980) at nsAppRunner.cpp:3297
#19 0x08049b8b in ?? ()
#20 0xb7c70775 in __libc_start_main (main=0x80497b4 <strcmp@plt+252>, argc=3, ubp_av=0xbfc010a4, init=0x8058f90, 
    fini=0x8058f80, rtld_fini=0xb7ff2870 <_dl_fini>, stack_end=0xbfc0109c) at libc-start.c:220
#21 0x080496f1 in ?? ()

Note: this backtrace is from a time when the lag is not happening, due to a recently-restarted firefox.  I'll upload a better one when I get it.

Just at a guess, there may be two related bugs here:  #1, excessive memory fragmentation or inefficient garbage collection; #2, garbage collection being scheduled at the worst possible time (IE, on user-interaction-active events).

Reproducible: Always

Steps to Reproduce:
1. Start Firefox.  Web surf for 15 minutes to 2 days with normal heavy-web-user habits, until lag becomes pronounced.
2. Open a smooth, looping, animated GIF:  http://www.fur.com/~keepiru/DancingPeaks.gif
3. Use strace command above
4. Don't touch anything, and see madvise calls correlate with long pauses in the animation
5. Try generating events (mouse in, click tabs), and observe madvise calls / UI lag



This bug has persisted across Ubuntu 8.10 and 9.04 with Firefox 3.0 to 3.5b4 (Ubuntu builds), and is reproducible with Firefox 3.5rc${recent} from Mozilla.org.
Same problem on XP SP3 with Firefox 3.0.x to 3.5 RC3. Pauses/lags trigger heavy disk usage. The issue is less grave with 3.5 RC3 so far.
Severity: normal → major
Version: unspecified → 3.5 Branch
Attached file Good backtrace
Attaching a better backtrace.  This one comes with a warranty.

It was created by:  Start firefox -g -d gdb; use normally for a few hours; lag starts appearing; go to a long web page and scroll back and forth with the mouse wheel; when the scrolling freezes, quickly ^C gdb and backtrace.

This is on my own build of FIREFOX_3_5rc3_RELEASE .
Similar to this I see a loop that starts as below. This results in an unresponsive ff3.5. I have some HTML that reproduces this on linux/windows. Unfortunately my gdb keeps segfaulting when trying to trace. I cannot post the HTML here due to privacy concerns.

gettimeofday({1247218492, 228521}, NULL) = 0
futex(0xb5c70580, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailab
le)
futex(0xb5c70580, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0xb5c70580, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailab
futex(0xb5c70580, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0xb5c70580, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailab
futex(0xb5c70580, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0xb5c70580, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailab
futex(0xb5c70580, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0xb5c70580, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailab
futex(0xb5c70580, FUTEX_WAKE_PRIVATE, 1) = 0
gettimeofday({1247218499, 756868}, NULL) = 0
madvise(0xa4c44000, 4096, 0x4 /* MADV_??? */) = 0
madvise(0xa4c41000, 8192, 0x4 /* MADV_??? */) = 0
madvise(0xa4c3f000, 4096, 0x4 /* MADV_??? */) = 0
madvise(0xa4c3d000, 4096, 0x4 /* MADV_??? */) = 0
madvise(0xa4c3a000, 8192, 0x4 /* MADV_??? */) = 0
madvise(0xa4c37000, 8192, 0x4 /* MADV_??? */) = 0
madvise(0xa4c34000, 8192, 0x4 /* MADV_??? */) = 0
madvise(0xa4c31000, 8192, 0x4 /* MADV_??? */) = 0
madvise(0xa4c2e000, 8192, 0x4 /* MADV_??? */) = 0
madvise(0xa4c2b000, 8192, 0x4 /* MADV_??? */) = 0
madvise(0xa4c29000, 4096, 0x4 /* MADV_??? */) = 0
can you strip the HTML of private info and post it?
Certainly, I hope that it still causes the problem when stripped. Expect this early next week as I don't have access to that from home.
Attached image lag measurements
I enabled COLLECT_TIME_DEBUG, and collected some statistics.  Y = Collect() time in milliseconds.  X = garbage collection run number (not linear time).  The red points are individual GC runs.  The blue line is a 100-point average.

The flat area from ~600 - ~1200 is a period when I was not using the browser.
Attached image more lag measurements
In this series, the break in the line is where I restarted the browser.
As request i have stripped the html that causes this problem for me. This has been checked on w3.org and appears valid. I should also note that i shortened the table to just 1000 entries from 2500. So the loop does now seem to break after some processing. When at 2500 it never I never waited past 5 minutes.

Expect the browser to be totally unresponsive for a few minutes. This is tested on Windows and Linux (Debian/Lenny).
This does not affect firefox 3.0.
"When at 2500 it never seemed to break the loop, I never waited past 5 minutes."

oops...
What would help move this bug forward?  I've had the same issue now ever since 3.5 came out on multiple versions of Fedora.  If there was any other reasonable browser under Linux I would have dumped this already the problem is so damn annoying.

And why is it still listed as UNCONFIRMED?  There's now 4 people reporting it, and I wouldn't be surprised if some of the dups of the whole "1000 checkboxes in a form" bug aren't actually this issue.
I don't see the problem any more in 3.6 or 3.7...
I see it very clearly in 3.6.3 (Ubuntu 10.04).  Any chance this is limited to single-core systems?  I'm on an ASUS Eee 1005HA netbook.

See also https://bugs.launchpad.net/ubuntu/+source/firefox-3.0/+bug/383435
See also:
Bug 445261 - Frequent CPU Spikes to 100%
Bug 330135 - CPU 100% with random freezes (several seconds to over 7 minutes) when loading page with many selects  
Bug 277547 - Firefox freezes or hangs frequently and temporarily at zero or low cpu - HT / hyperthread / multiple processor only - NOT resolved by BIOS update
I see the exact same behavior on Win XP with 4.0b12pre; I previously thought this was due to little availability (total 228 mb), but this also happens using only Firefox in the whole os for few minutes; the most notable symptoms are keyboard input and flash freezing, long (20 seconds) tab switching even in light web pages.
Whiteboard: [Snappy]
The initial reports are problems caused by the cycle collector, which has changed a lot since 2009.  Most of the other reports are very old and don't have very specific information, so I'm going to close this.  Pietro, if this is still a problem in Firefox 8, you can reopen this and we can try to get some more specific information.  Thanks.
Status: UNCONFIRMED → RESOLVED
Closed: 13 years ago
Resolution: --- → INCOMPLETE
I don't know, I'm now using Iceweasel 8 on debian and it does not happen at all; the computer i used is broken, so I can't tell for sure.
Okay, thanks for the information!
Resolution: INCOMPLETE → WORKSFORME
I have
This issue is egregiously bad --  http://sprunge.us/ghDP

To give an idea what that is -- say I have 3 tabs - A,B and C 

I move from A to B it does madive MADV_DONTNEED and B to C same. Sorry, but this is really inane given that I will be accessing A again and that will cause massive slowdown.

First I thought this may be GC related thing. I have disabled it -- setting  javascript.options.gc_on_memory_pressure to false. But that didn't help.

To quote the madvise man page 

"
MADV_DONTNEED
               Do not expect access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.)  Subsequent accesses of pages in this range will suc‐
               ceed, but will result either in reloading of the memory contents from the underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
"

So when I move from one tab to another, do you really expect that tab won't be used at all ? Also, focus on the 'Subsequent' accesses.


I don't have a debug build, so I cant give backtrace, but after I build with one I may be able to do something myself.
 
     
To reproduce,
strace -p <firefox_pid> -e 'trace=madvise'

Why I straced in first place ?
I was noticing slow downs while changing tabs when my system was under moderate memory pressure.
Raghavendra Prabhu, please file a new bug for this issue.  It sounds like something separate from the original one.  Post a link to it in this bug when you are done.  Thanks!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: