Closed
Bug 665997
Opened 13 years ago
Closed 12 years ago
General lagginess in recent firefox builds
Categories
(Core :: Widget: Gtk, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: catlee, Unassigned)
Details
(Keywords: perf)
Attachments
(2 files)
Firefox has been unbearably laggy in recent weeks. Firefox 5.0 final as well as 7.0 nightlies are affected. Symptoms include:
* Pauses while typing into text input fields (like this bug report)
* Pauses when switching between tabs
* Pauses when scrolling
* Pauses when navigating awesomebar results
* Slow clipboard access. e.g. I copy something in firefox with Ctrl-C, and then paste in vim or my terminal with Ctrl-V, and it takes a second or two to show up.
Most of the pauses above are between 2-3 seconds.
I'm running 64-bit linux (debian testing/unstable), on an ext4 filesystem mounted on an encrypted drive. Kernel version is currently 2.6.38, glibc is 2.13-4.
bhearsum sees some of the above symptoms also, but not all of them.
If there's any debugging or profiling information I can provide, let me know, this is driving me crazy.
I took a screencast of the delay in text input:
http://people.mozilla.org/~catlee/lag.ogv
Comment 1•13 years ago
|
||
Thanks, Chris. Looks bad. I have occasionally noticed a little delay in typing, 2-3 characters behind, but I had thought it would always at a time when the browser was busy with something such as page load or location bar completions.
Is it reproducible enough for you to get a regression range?
Can you run a system-monitor tool such as gkrellm or xosview, please, to try to determine whether pauses are associated with cpu use, disk activity, or network activity?
Comment 2•13 years ago
|
||
I have been seeing some issues as well, Wordpress is really bad.
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0a1) Gecko/20110621 Firefox/7.0a1. Mainly the pauses when typing in text areas
Reporter | ||
Comment 3•13 years ago
|
||
so far it seems like lags correspond with a spike to 100% cpu usage in one of my cores, as well as disk write activity.
Reporter | ||
Comment 4•13 years ago
|
||
nothing obvious in strace output. lots of lines like this:
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=14, events=POLLIN|POLLPRI}, {fd=49, events=POLLIN}, {fd=73, events=POLLIN}, {fd=23, events=POLLIN}], 9, 0) = 0 (Timeout) <0.000021>
read(3, 0x7fa51106e074, 4096) = -1 EAGAIN (Resource temporarily unavailable) <0.000014>
Comment 5•13 years ago
|
||
Polling like that is probably expected from the event loop.
Excluding these calls might reduce noise: strace -T -e '!poll,read'
Can you comment on whether the issue gets worse over time?
Any evidence of memory leaks?
Running with MOZ_INSTRUMENT_EVENT_LOOP=1 in the environment will log whenever an event takes longer than 50 ms to process.
This would help diagnose whether one thing is taking too long, or whether there are just so many events that painting never happens.
Observations from lag.ogv:
There is a flashing icon in the desktop status-bar indicating that the server is responsive.
The flashing mozilla icon on an app tab that stops flashing, indicating that Firefox is not drawing in general. (It's not specific to the text input.)
Reporter | ||
Comment 6•13 years ago
|
||
Sample output:
MOZ_EVENT_TRACE sample 1308797266260 2912
MOZ_EVENT_TRACE sample 1308797269695 119
MOZ_EVENT_TRACE sample 1308797270594 83
MOZ_EVENT_TRACE sample 1308797271876 58
MOZ_EVENT_TRACE sample 1308797272789 198
MOZ_EVENT_TRACE sample 1308797273153 210
MOZ_EVENT_TRACE sample 1308797273345 141
MOZ_EVENT_TRACE sample 1308797273645 95
MOZ_EVENT_TRACE sample 1308797273884 137
MOZ_EVENT_TRACE sample 1308797274148 213
MOZ_EVENT_TRACE sample 1308797274327 76
MOZ_EVENT_TRACE sample 1308797274829 93
MOZ_EVENT_TRACE sample 1308797274944 64
MOZ_EVENT_TRACE sample 1308797275106 60
MOZ_EVENT_TRACE sample 1308797275297 89
MOZ_EVENT_TRACE sample 1308797275447 98
MOZ_EVENT_TRACE sample 1308797284175 2912
MOZ_EVENT_TRACE sample 1308797287561 121
MOZ_EVENT_TRACE sample 1308797291678 87
MOZ_EVENT_TRACE sample 1308797295795 88
MOZ_EVENT_TRACE sample 1308797297324 50
MOZ_EVENT_TRACE sample 1308797298577 79
MOZ_EVENT_TRACE sample 1308797298721 92
MOZ_EVENT_TRACE sample 1308797299439 54
MOZ_EVENT_TRACE sample 1308797302496 2904
MOZ_EVENT_TRACE sample 1308797305876 116
MOZ_EVENT_TRACE sample 1308797309991 86
MOZ_EVENT_TRACE sample 1308797314108 87
MOZ_EVENT_TRACE sample 1308797320388 2863
Reporter | ||
Comment 7•13 years ago
|
||
I have a hunch this is i/o related. It's much much worse when I'm compiling something like firefox.
Reporter | ||
Comment 8•13 years ago
|
||
** (firefox:22827): WARNING **: Serious fd usage error 15
** (firefox:22827): WARNING **: Serious fd usage error 11
** (firefox:22829): WARNING **: Serious fd usage error 15
** (firefox:22829): WARNING **: Serious fd usage error 11
MOZ_EVENT_TRACE sample 1308799513602 90452
fd 11,15 are:
firefox 30288 catlee 11u unix 0xffff880115b52d80 0t0 4036225 socket
firefox 30288 catlee 15u unix 0xffff880037a63740 0t0 4036227 /tmp/orbit-catlee/linc-7650-0-3e77f1ef5fb8
Reporter | ||
Comment 9•13 years ago
|
||
And some stats from strace:
[catlee] ~/mozilla/mozilla-central (hg)-[default]-% strace -p 30288 -c
Process 30288 attached - interrupt to quit
^CProcess 30288 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
62.08 0.028002 32 883 73 futex
37.88 0.017088 3 4925 poll
0.03 0.000015 0 6182 4852 read
0.00 0.000000 0 192 write
0.00 0.000000 0 2 close
0.00 0.000000 0 58 4 stat
0.00 0.000000 0 26 lseek
0.00 0.000000 0 22 mmap
0.00 0.000000 0 1 mprotect
0.00 0.000000 0 14 munmap
0.00 0.000000 0 605 writev
0.00 0.000000 0 90 madvise
0.00 0.000000 0 45 getsockname
0.00 0.000000 0 1 clone
0.00 0.000000 0 1 sched_setscheduler
0.00 0.000000 0 2 sched_get_priority_max
0.00 0.000000 0 2 sched_get_priority_min
0.00 0.000000 0 1 restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.045105 13052 4929 total
Are so many failed read calls normal? Attaching strace shows these two lines repeating over and over:
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=14, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN|POLLPRI}, {fd=36, events=POLLIN}, {fd=35, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 0 (Timeout)
read(4, 0x7f75d5b9d074, 4096) = -1 EAGAIN (Resource temporarily unavailable)
Why do we read from fd 4 if poll timed out?
Excluding poll and read from strace shows mostly futex calls
Reporter | ||
Comment 10•13 years ago
|
||
Comment 11•13 years ago
|
||
Nothing jumps out from that other than it's mostly js::Interpret(). No GC hits.
It would help to know what you were doing when you captured it, or to run a longer sequence (or shorter period: JP_PERIOD=0.002, or both)
Comment 12•13 years ago
|
||
Also, what's the environment: # of tabs, what are they on, etc.
Hmmm. "on an ext4 filesystem mounted on an encrypted drive" -- Perhaps when typing into a test box it writes to the profile, or reads from it? Compiles would make that slower, especially on an encrypted drive.
Comment 13•13 years ago
|
||
(In reply to comment #6)
> Sample output:
> MOZ_EVENT_TRACE sample 1308797266260 2912
Yes, some long pauses of about 2.9 seconds.
(In reply to comment #8)
> ** (firefox:22829): WARNING **: Serious fd usage error 15
>
> ** (firefox:22829): WARNING **: Serious fd usage error 11
> MOZ_EVENT_TRACE sample 1308799513602 90452
>
> fd 11,15 are:
> firefox 30288 catlee 11u unix 0xffff880115b52d80 0t0 4036225 socket
> firefox 30288 catlee 15u unix 0xffff880037a63740 0t0 4036227
> /tmp/orbit-catlee/linc-7650-0-3e77f1ef5fb8
A 90 second period of unresponsiveness - Wow.
I don't understand the ORBit error.
http://git.gnome.org/browse/ORBit2/tree/linc2/src/linc-connection.c#n929
I don't know why the file descriptor would be invalid.
Is 22829 actually the process id of the main firefox process?
(Wondering whether something funny is happening after fork.)
But I've seen that before, so I'm not expecting it to be directly related to lagginess.
Can you disable all plugins please to confirm they are not involved?
Comment 14•13 years ago
|
||
(In reply to comment #9)
> [catlee] ~/mozilla/mozilla-central (hg)-[default]-% strace -p 30288 -c
>
> Process 30288 attached - interrupt to quit
> ^CProcess 30288 detached
> % time seconds usecs/call calls errors syscall
> 100.00 0.045105 13052 4929 total
Time here is system (in kernel) cpu time, which is minimal, but this doesn't tell us about wall clock time from waiting.
>
> Are so many failed read calls normal?
Many EAGAIN "errors" would be expected with non-blocking IO.
> Attaching strace shows these two lines
> repeating over and over:
>
> poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=9,
> events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=14,
> events=POLLIN|POLLPRI}, {fd=15, events=POLLIN|POLLPRI}, {fd=36,
> events=POLLIN}, {fd=35, events=POLLIN}, {fd=22, events=POLLIN}], 9, 0) = 0
> (Timeout)
> read(4, 0x7f75d5b9d074, 4096) = -1 EAGAIN (Resource temporarily
> unavailable)
>
> Why do we read from fd 4 if poll timed out?
Yes, that does seem unnecessary, but it happens even when things are working fine.
fd 4 seems to be the connection to the X server. I expect some calls like XNextEvent() would read from the connection at least sometimes.
I would expect these polls to pause during the period of unresponsiveness.
Can you tell whether that is what happens?
Comment 15•13 years ago
|
||
CCing Taras as he might have some good hints for watching disk access.
Reporter | ||
Comment 16•13 years ago
|
||
Reporter | ||
Comment 17•13 years ago
|
||
(In reply to comment #12)
> Also, what's the environment: # of tabs, what are they on, etc.
I've got 26 tabs across 5 tab groups plus 6 app tabs. My app tabs are gmail, greader, google cal, zimbra, remember the milk, and yammer. Most of my other tabs are pretty normal static html pages.
Reporter | ||
Comment 18•13 years ago
|
||
> Can you disable all plugins please to confirm they are not involved?
I'm currently running with -safe-mode and have disabled all plugins and I'm still experiencing this. (incidentally, running with -safe-mode doesn't disable plugins)
> I would expect these polls to pause during the period of unresponsiveness.
> Can you tell whether that is what happens?
I don't notice any difference in poll times in the unresponsive periods. Most polls complete in well under 0.01 seconds.
I just noticed that firefox is spamming stderr (fd 2) with
(firefox-bin:15907): Gdk-WARNING **: XID collision, trouble ahead
This isn't showing up in my console for some reason, but is being redirected to a global log I have catching output in my ~/.xsession file. This file is growing at about 3.8kb/s. I'll try redirecting this to /dev/null.
Reporter | ||
Comment 19•13 years ago
|
||
> This isn't showing up in my console for some reason, but is being redirected
> to a global log I have catching output in my ~/.xsession file. This file is
> growing at about 3.8kb/s. I'll try redirecting this to /dev/null.
Redirecting stderr reducing the spam in my global X log file, but doesn't improve the lagginess.
Comment 20•13 years ago
|
||
Well, it's doing mostly JS stuff. A bunch goes through:
27.6% XPCWrappedNative::CallMethod()
The top of the biggest stack is PrepareAndDispatch() at 70% leading to JS::Interpret() and JS::Invoke(). The XRE_main event loop runs (14.x%), and does some GC, but GC is a small percentage (<4%) of CPU time so I doubt that's your problem. Around 8% is in chatting with gtk.
So barring some of this JS code doing more than expected, I see no real hotspots. The next step may help: try using JP_REALTIME. This will sample off wall-clock time, so if it's being slow in a system or other library call (but not chewing CPU) it should catch it. Note that you want to not spend time idle more than you have to during the test when done this way, or find a way to start and stop it programatically using JS (but since this is a user-interaction bug I doubt that would work.)
Comment 21•13 years ago
|
||
Note that you start collection in JP_REALTIME via kill -ABRT not -PROF (I should fix that; we know it's realtime...)
Comment 22•13 years ago
|
||
(In reply to comment #18)
> > I would expect these polls to pause during the period of unresponsiveness.
> > Can you tell whether that is what happens?
>
> I don't notice any difference in poll times in the unresponsive periods.
> Most polls complete in well under 0.01 seconds.
What I meant (but didn't say clearly) was that I would expect no polling or at least fewer polls in the unresponsive times.
> I just noticed that firefox is spamming stderr (fd 2) with
> (firefox-bin:15907): Gdk-WARNING **: XID collision, trouble ahead
>
> This isn't showing up in my console for some reason, but is being redirected
> to a global log I have catching output in my ~/.xsession file.
That was a known issue with Flash Player, but Flash should be running out of process (not firefox-bin) and is disabled anyway.
A break point in g_log would enable getting a backtrace.
If it is appearing in ~/.xsession rather than the console that started a Firefox, it suggests that there is another Firefox running, unless there's some environment variable we're using to determine error output.
Reporter | ||
Comment 23•13 years ago
|
||
So I wrote a little script that monitors page faults per second (as reported by ps -o min_flt), here's some sample output:
0
12
3
0
0
0
3
0
0
0
0
1439
4591
1912
15351
5
22
524
Periods of lagginess correspond with the spike in these soft page faults.
Reporter | ||
Comment 24•13 years ago
|
||
firefox-bin is currently using 2,328,676 bytes of virtual memory (vsz) and 1,348,904 bytes resident.
free reports:
total used free shared buffers cached
Mem: 3920404 3605112 315292 0 62068 827464
-/+ buffers/cache: 2715580 1204824
Swap: 5242876 493860 4749016
Comment 25•13 years ago
|
||
I experience this issue with one tab open. it sits at 110% of cpu, destroys my battery life.switching back to the release (fx 5.0) everything goes back to normal.
Comment 26•13 years ago
|
||
btw: I noticed this increase in CPU usage on 2011-06-24 or 2011-06-25 in the AM
Comment 27•13 years ago
|
||
:ddahl -- that sounds like a different bug than is reported here. Please file a separate bug report and please include the URL that one tab is on, and if it occurs if you start Firefox in safe mode (to rule out issues with extensions/flash/etc). Thanks.
Comment 28•13 years ago
|
||
bug 667250 may well be related or the same bug. I have a clear profile on that one.
Reporter | ||
Comment 29•13 years ago
|
||
So my session store recently got corrupted, so I now have far fewer tabs open, and performance is much better.
Comment 30•13 years ago
|
||
catlee: see also bug 667250
Comment 31•12 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #29)
> So my session store recently got corrupted, so I now have far fewer tabs
> open, and performance is much better.
do you mean far fewer than 32? (comment 25 - FWIW 32 is peanuts, I typically have >100 on a marginal vista laptop, and the only appreciable perf hit is GC and CC)
better as in fine? Or better as in, improved but still bad?
Flags: needinfo?(catlee)
Keywords: perf
Reporter | ||
Comment 32•12 years ago
|
||
I haven't had this problem in a while. I suspect it was due to bug 705597.
Status: NEW → RESOLVED
Closed: 12 years ago
Flags: needinfo?(catlee)
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•