High CPU on Linux even when in the background, possibly due to frequent polling

NEW
Unassigned

Status

()

defect
--
major
10 years ago
2 years ago

People

(Reporter: aravind, Unassigned)

Tracking

({perf})

Trunk
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [needs profile][Power:P3])

Attachments

(7 attachments)

Reporter

Description

10 years ago
I have 20 to 30 tabs open.  I have been noticing recently that my firefox process stays consistently at around 6% to 10% CPU (for the 3.5.* nightlies) and that jumps up to around 12% CPU (for the 3.6.* nightlies).  I do have about 8 addons installed, but I don't think they are causing this.  I tried starting up firefox in safe mode - disabling all the addons, and that actually seems to aggravate the problem.

A typical strace shows a lot of messages like this.

read(3, 0xb5e53058, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}], 8, 0) = 0 (Timeout)
gettimeofday({1249428217, 525577}, NULL) = 0
read(3, 0xb5e53058, 4096)               = -1 EAGAIN (Resource temporarily unavailable)

I tried a lsof on the process in this state and I noticed those polls were all to some pipes (FIFOs) or to stuff in the gnome-orb directory in /tmp/orbit-aravind.  I don't know enough about that stuff to figure out what is being polled and why.

Another test I did was to start a brand new firefox profile with a single table.  I noticed that the CPU usage in this case is pretty minimal but when I strace the process I still see it polling stuff and getting "Resource not found" messages every so often.  I would think that a firefox process not being actively used should just sit there waiting for X events..  I disabled the safe browsing checks so it shouldn't be downloading anything from google etc.

Let me know what additional information I can get you guys.
Reporter

Comment 1

10 years ago
Per jst (swilshers) suggestion, I tried setting browser.sessionstore.enabled to false and restarted my browser.  That didn't make any difference.  I still see the constant polling (and the resource unavailable messgaes in the strace).
if you think this is a recent regression, can you try 20090723 nightly?
Keywords: perf
Reporter

Comment 3

10 years ago
(In reply to comment #2)
> if you think this is a recent regression, can you try 20090723 nightly?

I don't think its anything recent.  I am willing to try that nightly just to rule it out though.  Trying the builds from ftp.mozilla.org:/pub/firefox/nightly/2009-07-23-03-mozilla-1.9.1/
Reporter

Comment 4

10 years ago
Nope, same problem with the build from that day.

Updated

10 years ago
Duplicate of this bug: 516227

Comment 6

10 years ago
I've been having this problem since at least the early 3.0 builds, and I think back into the 2.x days as well.

I have a few hundred tabs open, so the problem that I run into is significnatly stronger thatn what is described here, but the same basic problem

it does seem to trigger more reliably when there are more tabs open
This happens to me with just 7 or so tabs open. I rarely see Firefox using less than 20% of my CPU.
Reporter

Comment 9

9 years ago
Nope, none of those seem to be complaining about the same thing we are.  Most of the bugs there seem to be problems with specific use cases, but this bug is about a general problem with running firefox - the constant POLLs seem to be there no matter what you do.

Comment 10

8 years ago
Firefox 6.0.2/OpenSuse 11.4 - that bug is still here. :(

It doesn't depends on number of tabs. It constantly eats cpu with EAGAIN even in safe mode with just one empty tab...


gettimeofday({1317022142, 188180}, NULL) = 0
read(4, 0xb465b058, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN|POLLPRI}, {fd=16, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=30, events=POLLIN}, {fd=25, events=POLLIN}], 9, 0) = 0 (Timeout)
gettimeofday({1317022142, 188583}, NULL) = 0
read(4, 0xb465b058, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN|POLLPRI}, {fd=16, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=30, events=POLLIN}, {fd=25, events=POLLIN}], 9, 0) = 0 (Timeout)
gettimeofday({1317022142, 188978}, NULL) = 0
read(4, 0xb465b058, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN|POLLPRI}, {fd=16, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=30, events=POLLIN}, {fd=25, events=POLLIN}], 9, 0) = 0 (Timeout)
gettimeofday({1317022142, 189314}, NULL) = 0
read(4, 0xb465b058, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN|POLLPRI}, {fd=16, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=30, events=POLLIN}, {fd=25, events=POLLIN}], 9, -1^C <unfinished ...>

Comment 11

8 years ago
I can confirm this bug as well

Firefox 3.6.20 on Gentoo

This is with a freshly built firefox.  No Java, flash, extensions, addons.  Just starting firefox and loading a google search home page and my cpu is cranked at 99%.  

CPU is a single core and load average hits about 7-12 and then stabilizes.   

I can post my strace if desired but it basically is the same as what Sergey reported.

Comment 12

8 years ago
Upon further review. I discovered that my kernel wasn't compiled with the CONFIG_FUTEX option.  Kernel version 2.6.39.3
Can someone who is experiencing this issue run the profiler and attach a link here to their results? See https://developer.mozilla.org/en-US/docs/Performance/Reporting_a_Performance_Problem for instructions.
Flags: needinfo?
Flags: needinfo? → needinfo?(paul)

Comment 14

6 years ago
Jared Wein, this bug exists for each version of firefox for a very long time on all computers. You can see it yourself. You don't need special tools to see it.

Here is my dump from Virtual Boxed Kubuntu 13.04.

1. Open http://www.google.ru/ in Firefox & Google chrome.
2. Run top
top - 18:14:47 up 20 min,  3 users,  load average: 0.61, 0.80, 0.54
Tasks: 165 total,   1 running, 164 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.0 us,  7.4 sy,  0.0 ni, 91.5 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   2984424 total,  2754428 used,   229996 free,   378960 buffers
KiB Swap:        0 total,        0 used,        0 free,  1659416 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                                                                 
 7478 sergey    20   0 1254m 185m  37m S   6.3  6.4   0:23.11 firefox                                                                 
 1050 root      20   0  141m  76m  19m S   5.3  2.6   1:11.13 Xorg                                                                    
 2350 sergey    20   0  124m  27m  21m S   4.0  0.9   0:53.95 konsole                                                                 

I don't see google chrome here, it works cool. But firefox eats a lot of resources, even more then konsole+top!!

3. strace firefox shows a lot of redundant system calls... 

I'm affraid this bug will be never be fixed until new browser fork.

(Ku-home:~)$ sudo strace -p 7478
Process 7478 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 1
clock_gettime(CLOCK_MONOTONIC, {1491, 824188500}) = 0
read(24, "\372", 1)                     = 1
clock_gettime(CLOCK_MONOTONIC, {1491, 825447390}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 825728292}) = 0
gettimeofday({1371824339, 484272}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 825992424}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 826499725}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 826746249}) = 0
futex(0xb72dfb08, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xb72dfb04, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
gettimeofday({1371824339, 485341}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 827039170}) = 0
recv(4, 0xb729a058, 4096, 0)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=24, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN}], 8, 0) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {1491, 827320072}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 827387433}) = 0
recv(4, 0xb729a058, 4096, 0)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=24, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN}], 8, 0) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {1491, 827616906}) = 0
gettimeofday({1371824339, 486063}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 827780975}) = 0
recv(4, 0xb729a058, 4096, 0)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=24, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN}], 8, 0) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {1491, 828001784}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 828131754}) = 0
recv(4, 0xb729a058, 4096, 0)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=24, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN}], 8, -1) = 1 ([{fd=24, revents=POLLIN}])
clock_gettime(CLOCK_MONOTONIC, {1491, 840264498}) = 0
read(24, "\372", 1)                     = 1
clock_gettime(CLOCK_MONOTONIC, {1491, 840942577}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 840997360}) = 0
gettimeofday({1371824339, 499416}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 841074224}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 841119783}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 841185466}) = 0
futex(0xb72dfb08, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xb72dfb04, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
gettimeofday({1371824339, 499745}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 841457145}) = 0
recv(4, 0xb729a058, 4096, 0)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=24, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN}], 8, 0) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {1491, 841713451}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 841781930}) = 0
recv(4, 0xb729a058, 4096, 0)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=24, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN}], 8, 0) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {1491, 842009726}) = 0
gettimeofday({1371824339, 500457}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 842146124}) = 0
recv(4, 0xb729a058, 4096, 0)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=24, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN}], 8, 0) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {1491, 842359107}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 842428145}) = 0
recv(4, 0xb729a058, 4096, 0)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=24, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=37, events=POLLIN}], 8, -1) = 1 ([{fd=24, revents=POLLIN}])
clock_gettime(CLOCK_MONOTONIC, {1491, 855752139}) = 0
read(24, "\372", 1)                     = 1
clock_gettime(CLOCK_MONOTONIC, {1491, 855904189}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 855942481}) = 0
gettimeofday({1371824339, 514365}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 856105712}) = 0
clock_gettime(CLOCK_MONOTONIC, {1491, 856157141}) = 0
Flags: needinfo?(paul)

Comment 15

6 years ago
I see the same thing, a fast loop of EAGAIN polling

on my work desktop this eats up ~10% cpu, but occasionally it spins up much higher (sometimes pegging a CPU at 100%)

recvfrom(4, 0x7f86d64d4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=289, events=POLLIN}, {fd=11, events=POLLIN}, {fd=23, events=POLLIN}], 6, 0) = 0 (Timeout)
recvfrom(4, 0x7f86d64d4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=289, events=POLLIN}, {fd=11, events=POLLIN}, {fd=23, events=POLLIN}], 6, 0) = 0 (Timeout)
recvfrom(4, 0x7f86d64d4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=289, events=POLLIN}, {fd=11, events=POLLIN}, {fd=23, events=POLLIN}], 6, 0) = 0 (Timeout)
recvfrom(4, 0x7f86d64d4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=289, events=POLLIN}, {fd=11, events=POLLIN}, {fd=23, events=POLLIN}], 6, 0) = 0 (Timeout)
futex(0x7f86c4a1868c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f86c4a18688, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
recvfrom(4, 0x7f86d64d4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=289, events=POLLIN}, {fd=11, events=POLLIN}, {fd=23, events=POLLIN}], 6, 0) = 0 (Timeout)
recvfrom(4, 0x7f86d64d4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=289, events=POLLIN}, {fd=11, events=POLLIN}, {fd=23, events=POLLIN}], 6, 0) = 0 (Timeout)
recvfrom(4, 0x7f86d64d4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=289, events=POLLIN}, {fd=11, events=POLLIN}, {fd=23, events=POLLIN}], 6, 0) = 0 (Timeout)
recvfrom(4, 0x7f86d64d4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=289, events=POLLIN}, {fd=11, events=POLLIN}, {fd=23, events=POLLIN}], 6, 0) = 0 (Timeout)
futex(0x7f86c4a1868c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f86c4a18688, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
recvfrom(4, 0x7f86d64d4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=289, events=POLLIN}, {fd=11, events=POLLIN}, {fd=23, events=POLLIN}], 6, 0) = 1 ([{fd=21, revents=POLLIN}])

Comment 16

6 years ago
I've seen this forever, but am still seeing in in Aurora (23.0a2)
Duplicate of this bug: 774690
I can reproduce this problem on Macbook Air 2013

There is a striking difference in Amperage and expected battery time reported by MacOS and iStat Menus.

Without Firefox launched I get ~10 hours of battery life and Amperage reported is around ~420 mA.

With Firefox Nightly launched (fresh profile), with one tab on about:home, the Amperage jumps to ~1000 mA and battery life drops to below 4h.

All that time Firefox is reported to use ~10% of CPU.

(In reply to Jared Wein [:jaws] from comment #13)
> Can someone who is experiencing this issue run the profiler and attach a
> link here to their results? See
> https://developer.mozilla.org/en-US/docs/Performance/
> Reporting_a_Performance_Problem for instructions.

I installed the profiler, pressed "stop", "start", let it sit for 15 sec and then pressed "analyze".

http://people.mozilla.com/~bgirard/cleopatra/#report=93663ec84bc01b5d90b490deca3da6765f39be8f
Taras, is it possible for us to slow down the event loop when Firefox is idle?
Flags: needinfo?(taras.mozilla)
I'm going to resolve this bug as a dupe of bug 892023 only because there's such a low content/"me too" ratio here.

Let's get profiler data and other reproducible steps there
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 892023

Comment 21

6 years ago
Bug 892023 is talking about battery life.

Here we have numerous traces showing the very tight EAGAIN loop

If Firefox gets into such a loop, it will result in very poor battery life, but that is minor compared to the fact that it hoggs the CPU and makes everything in the system (including Firefox) perform badly.

That Bug seems to be talking about firefox when it's operating normally.
ah, ok, reopening then.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
(In reply to Jared Wein [:jaws] from comment #19)
> Taras, is it possible for us to slow down the event loop when Firefox is
> idle?

I know nothing of mac event handling. BenWa likes to profile that
Flags: needinfo?(taras.mozilla)

Comment 24

6 years ago
several of us reporting are reporting abut running this on Linux, not Mac

on my system (Ubuntu 13.104 Firefox 23.0a2), lsof shows filehandle 4 is a unix socket 

dlang@dlang-desktop2:~$ lsof |grep firefox |grep " 4[a-z] "
firefox    4117            dlang    4u     unix 0x0000000000000000       0t0     64198 socket
firefox    4117  4214      dlang    4u     unix 0x0000000000000000       0t0     64198 socket
firefox    4117 26656      dlang    4u     unix 0x0000000000000000       0t0     64198 socket
firefox   19053            dlang    4u     unix 0x0000000000000000       0t0  18284466 socket
firefox   19053 22913      dlang    4u     unix 0x0000000000000000       0t0  18284466 socket
firefox   19053 22914      dlang    4u     unix 0x0000000000000000       0t0  18284466 socket
firefox   19053 22915      dlang    4u     unix 0x0000000000000000       0t0  18284466 socket
firefox   19053 22916      dlang    4u     unix 0x0000000000000000       0t0  18284466 socket
firefox   19053 22917      dlang    4u     unix 0x0000000000000000       0t0  18284466 socket
firefox   19053 22918      dlang    4u     unix 0x0000000000000000       0t0  18284466 socket
firefox   19053 22919      dlang    4u     unix 0x0000000000000000       0t0  18284466 socket
firefox   19053 22920      dlang    4u     unix 0x0000000000000000       0t0  18284466 socket


This looks to me like it's the different threads in firefox trying to talk to each other or something along those lines. This is a very rapid poll. I could understand doing a poll, but the fact that it's getting EAGAIN continuously seems to point to a problem of some sort

dlang@dlang-desktop2:~$ sudo strace -ttt -p 4117 2>&1 |head
Process 4117 attached - interrupt to quit
1373584753.277458 restart_syscall(<... resuming interrupted call ...>) = 1
1373584753.285141 read(21, "\372", 1)   = 1
1373584753.285390 recvfrom(4, 0x7f491fad4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
1373584753.285454 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=23, events=POLLIN}, {fd=64, events=POLLIN}, {fd=11, events=POLLIN}], 6, 0) = 0 (Timeout)
1373584753.285514 recvfrom(4, 0x7f491fad4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
1373584753.285551 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=23, events=POLLIN}, {fd=64, events=POLLIN}, {fd=11, events=POLLIN}], 6, 0) = 0 (Timeout)
1373584753.285602 recvfrom(4, 0x7f491fad4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
1373584753.285639 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=21, events=POLLIN}, {fd=23, events=POLLIN}, {fd=64, events=POLLIN}, {fd=11, events=POLLIN}], 6, 0) = 0 (Timeout)
1373584753.285688 recvfrom(4, 0x7f491fad4074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)

Comment 25

6 years ago
(In reply to Jared Wein [:jaws] from comment #13)
> Can someone who is experiencing this issue run the profiler and attach a
> link here to their results? See
> https://developer.mozilla.org/en-US/docs/Performance/
> Reporting_a_Performance_Problem for instructions.

looking at the gecko profiler in the add-ons ( https://addons.mozilla.org/en-us/firefox/addon/gecko-profiler/ ), I get a message that it's not available for my platform, and I also see messages that it doesn't work with anything newer than 21

Is there any other way to get the needed info?

Updated

6 years ago
Status: REOPENED → NEW
Flags: needinfo?(david)
Whiteboard: [needs profile]

Comment 28

6 years ago
Jup, same issue here, Profiling was done: http://people.mozilla.org/~bgirard/cleopatra/#report=50e79ea4feae9dcb1de890220ca703dd6dd598ab
The issue is popping up sporadically, and and stays until I restart FireFox.
Flags: needinfo?(david)

Comment 30

5 years ago
Looks like this is making no progress. It appears that all the necessary information has been attached, and its assigned to no one.

I have a high-end i7 workstation that frequently burns 12-15% of one core. The activity shown with strace is the same as above (recvfrom() = -1 EAGAIN).
System details:
 - Thunderbird 24.3.0
 - Fedora 19, kernel 3.13.6-100.fc19.x86_64
Looks like Cycle Collector-related stuff.
Component: General → XPCOM
Product: Firefox → Core

Comment 32

5 years ago
Not sure if this is helpful, but in my strace output, I found that the failing poll was on fd=4 which looked like it was waiting on X11 events. Initially it polls for events and gets success,fail,success,fail... which seems normal:

socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0) = 4
connect(4, {sa_family=AF_LOCAL, sun_path=@"/tmp/.X11-unix/X0"}, 20) = 0
getpeername(4, {sa_family=AF_LOCAL, sun_path=@"/tmp/.X11-unix/X0"...}, [20]) = 0
uname({sys="Linux", node="localhost.nvr.com", ...}) = 0
access("/tmp/kde-toddb/xauth-500-_0", R_OK) = 0
open("/tmp/kde-toddb/xauth-500-_0", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0664, st_size=172, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb5eaa1f000
read(5, "\0\0\0\4\300\250\1e\0\0010\0\22MIT-MAGIC-COOKIE-1\0"..., 4096) = 172
close(5)                                = 0
munmap(0x7fb5eaa1f000, 4096)            = 0
getsockname(4, {sa_family=AF_LOCAL, NULL}, [2]) = 0
fcntl(4, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
poll([{fd=4, events=POLLIN|POLLOUT}], 1, 4294967295) = 1 ([{fd=4, revents=POLLOUT}])
writev(4, [{"l\0\v\0\0\0\22\0\20\0\0\0", 12}, {"", 0}, {"MIT-MAGIC-COOKIE-1", 18}, {"\0\0", 2}, {"\16V\6U\204\377\\\256i\263\312\331B\356\5.", 16}, {"", 0}], 6) = 48
recvfrom(4, 0x7fb5ea786368, 8, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}], 1, 4294967295) = 1 ([{fd=4, revents=POLLIN}])
recvfrom(4, "\1\0\v\0\0\0\322\5", 8, 0, NULL, NULL) = 8
recvfrom(4, "\340\2\256\0\0\0 \4\377\377\37\0\0\1\0\0\16\0\377\377\1\7\0\0  \10\377\0\0\0\0"..., 5960, 0, NULL, NULL) = 5960
poll([{fd=4, events=POLLIN|POLLOUT}], 1, 4294967295) = 1 ([{fd=4, revents=POLLOUT}])
writev(4, [{"b\0\5\0\f\0\0\0BIG-REQUESTS", 20}], 1) = 20
poll([{fd=4, events=POLLIN}], 1, 4294967295) = 1 ([{fd=4, revents=POLLIN}])
recvfrom(4, "\1\0\1\0\0\0\0\0\1\205\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 4096, 0, NULL, NULL) = 32
poll([{fd=4, events=POLLIN|POLLOUT}], 1, 4294967295) = 1 ([{fd=4, revents=POLLOUT}])
writev(4, [{"\205\0\1\0", 4}], 1)       = 4
poll([{fd=4, events=POLLIN}], 1, 4294967295) = 1 ([{fd=4, revents=POLLIN}])
recvfrom(4, "\1\0\2\0\0\0\0\0\377\377?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 4096, 0, NULL, NULL) = 32
recvfrom(4, 0x7fb5ea7c8074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)


... but then it goes over the edge and repeats this several times per second without letting up.  It appears that it gets response from the X server, but it is as if the two are beating against each other:

writev(4, [{"\17\0\2\0]\2\0\0", 8}, {NULL, 0}, {"", 0}], 3) = 8
poll([{fd=4, events=POLLIN}], 1, 4294967295) = 1 ([{fd=4, revents=POLLIN}])
recvfrom(4, "\1\0\20\0\361\0\0\0]\2\0\0\0\0\0\0\361\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0, NULL, NULL) = 996
recvfrom(4, 0x7fb5ea7c8074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(4, 0x7fb5ea7c8074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN|POLLOUT}], 1, 4294967295) = 1 ([{fd=4, revents=POLLOUT}])

Comment 33

5 years ago
for me it appears more likely that this is the dbus socket, this is assuming I am reading the output of lsof correctly. Is there a way to tell firefox not to use dbus?

recvmsg(4, 0x7fff67603aa0, 0)           = -1 EAGAIN (Resource temporarily unavailable)


$ lsof -n |grep ^firefox |grep  -w 4[rwu]
firefox   14409            dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
firefox   14409 14530      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
firefox   14409 14533      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket

$ lsof |grep 4434682
firefox   14409            dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
gdbus     14409 14415      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
gmain     14409 14416      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
dconf     14409 14417      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Gecko_IOT 14409 14418      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Socket    14409 14419      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
JS        14409 14420      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
JS        14409 14421      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Hang      14409 14422      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
BgHangMan 14409 14423      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Cache2    14409 14424      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Timer     14409 14425      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Network   14409 14427      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
DOM       14409 14428      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
DOM       14409 14429      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Analysis  14409 14430      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Analysis  14409 14431      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Cache     14409 14432      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
HTML5     14409 14433      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
ImageDeco 14409 14436      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
MediaMana 14409 14443      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
mozStorag 14409 14445      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
mozStorag 14409 14446      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
localStor 14409 14447      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
mozStorag 14409 14448      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
threaded- 14409 14449      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Cert      14409 14450      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Proxy     14409 14451      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
URL       14409 14454      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
mozStorag 14409 14455      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
mozStorag 14409 14456      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
mozStorag 14409 14467      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Image     14409 14474      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
firefox   14409 14530      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
firefox   14409 14533      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
Media     14409 14534      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
source:sr 14409 14537      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
aqueue:sr 14409 14538      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
DOM       14409 14548      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
mozStorag 14409 14576      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
threaded- 14409 14792      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
mozStorag 14409 15903      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket
DNS       14409 16274      dlang    4u     unix 0x0000000000000000       0t0    4434682 socket

Comment 34

5 years ago
I just ran strace and had this exact same problem.  Was wondering where the CPU cycles were being wasted and I'm glad to see I'm not the only one experiencing this.  I'm getting constant EAGAIN (Resource temporarily unavailable) messages and its wasting CPU.

Comment 35

5 years ago
I'm not sure whether this information is useful, but I've experienced this issue on multiple Thunderbird versions.  While 24.4 and a beta version both had significant CPU consumption, reverting to 24.0 improved the behavior significantly.  The loop still occurs in the strace, but it happens much less frequently.  Something evidently different in 24.0 appears to mitigate the issue.

Comment 36

5 years ago
Currently running 30.0a2 (22-04), Arch Linux 3.14.1-1. i7 2.4Ghz (boost 3.4). Firefox uses 2 - 5% idle, and once every minute it jumps for a htop cycle to cpu use anywhere between 20 - 50%. It seems a little more mild than some of you, but regardless my strace is full off "poll(fd=....".

Perhaps interesting to note, even though these polls seem to be "bad", when I run a powertop of 1min on my idle system, Firefox uses just 1% cpu on average. So perhaps the problem is not at all as big as it might seem. Especially because average cpu use is a sketchy calculation in itself to measure what I think is called "load".

Let me know if I can do anything further to help!

Updated

5 years ago
Flags: needinfo?(rvitillo)

Comment 37

5 years ago
many of us who are reporting this are only looking for the polling after firefox peggs a core, so in our cases it is taking a lot of cpu.

We may be mistaken at the cause, and it may be something other than this obvious polling, but this isn't "not as big as it seems"
I don't see any significant CPU consumption on my Linux machine during idle. I could see though quite many calls to poll (at least one every 5 seconds).

The "culprit" seems to be http://dxr.mozilla.org/mozilla-central/source/widget/gtk/nsAppShell.cpp#36 which is set by http://dxr.mozilla.org/mozilla-central/source/widget/gtk/nsAppShell.cpp#87 with g_main_context_set_poll_func(). Since we are basically just wrapping glib's poll function while glib is calling it, is not clear to me yet what we can do about it.
Flags: needinfo?(rvitillo)

Comment 39

5 years ago
The problem is not in function, problem is in usage. The obvious fix to set timeout parameter to 10000ms, it's definetly fix useless looping, but can't say anything about side effects.

Comment 40

5 years ago
even setting a short timeout will have a huge effect if this is busy-looping.

a 10s timeout is likely to be very noticable, but a 10ms one wouldn't, and would limit this to 100 loops/sec, which would probably have far less of an impact
The main-thread runs onProcessNextEvent which internally processes also the native events by executing glib’s event loop iterations which poll a bunch of fds and handle events from X11, dbus etc. 

I still can’t reproduce the high (>0) CPU usage during idle and I want to make sure that the problem is the polling in the first place. Could you please use perf on a build with debug symbols? I.e. start firefox, wait a couple of minutes and then collect a profile with:
perf record —c 100 g -p $(pgrep firefox)

and inspect it with:
perf report

It might also be worthwhile to run perf on the whole system (-a option).

Comment 43

5 years ago
this is not a problem that is directly triggerable. It's frequent, but we can't trigger it on demand.

If you want us to run a debug version and be able to run the profiler, then please make these easily available.

I tried using the profiler a while ago, but it wouldn't run with the aurora build at the time and I wasn't willing to invest the time to figure out what was wrong.

Same thing about getting a debug version. I'm not going to compile it myself, but Since I run Aurora from the Ubuntu PPA, if someone were to make a debugging PPA, I'd be happy to use it instead.

Please help us report the info you need by making it easier for us to do so.

Comment 44

5 years ago
David, 10 sec is good, it'll reveal side effects more quickly.

Roberto, sorry, don't have linux nearby, can not check right now. But if strace still shows this line:

  recvfrom(4, 0x7fb5ea7c8074, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)

It looks like as serious program's defect for me! If there are any external events on any file handler the poll will returns immediately. If there is only reason to check gettimeofday() after EAGAIN - it's something badly programmed.

It should use different model
  int num=TIMEOUT_MESSAGE; usleep(wait_for); write(4,&num,sizeof(num));

not forever-loop masquaraded with poll(..,0):
    gettimeofday(&tv,&tz);
    if (tv.tv_sec>=wait_for.tv_sec && tv.tv_usec>=wait_for.tv_usec) timeoutOccured=1;
(In reply to David Lang from comment #43)
> I tried using the profiler a while ago, but it wouldn't run with the aurora
> build at the time and I wasn't willing to invest the time to figure out what
> was wrong.

Could you please give the Gecko profiler another shot? That seems like the easiest option. Thanks!
(In reply to Sergey from comment #44)
> David, 10 sec is good, it'll reveal side effects more quickly.
> 
> Roberto, sorry, don't have linux nearby, can not check right now. But if
> strace still shows this line:
> 
>   recvfrom(4, 0x7fb5ea7c8074, 4096, 0, 0, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> 
> It looks like as serious program's defect for me! 

That call is invoked within an event loop iteration of glib on a dbus related socket.

> If there is only reason to check gettimeofday() after EAGAIN - it's something badly
> programmed.

I don't see any calls to gettimeofday in my strace log.

We really need to analyze the problem with a profiler. Logs from strace are helpful in finding correlations but without breaking the cycles down by functions we can’t infer anything about causation. Since I can't reproduce it on my machine any help with collecting profiling reports is more than welcome.

What I think we have here is an artefact caused by a stream of events. For instance, when idling on about:blank I see a poll/recvfrom about every 5 seconds with strace. On the other hand if I open https://bug987212.bugzilla.mozilla.org/attachment.cgi?id=8395737, which triggers a bug that causes the refresh driver to activate every ~15ms, the frequency of the poll/recvfrom increases drastically. But If you would investigate that issue with a profiler it would tell you very clearly that the problem is in the refresh driver, not in the code related to poll & recvfrom.

Comment 47

5 years ago
gecko profiler output. for some reason it would never give me a url when I did the normal share thing, so I saved it to disk and am attaching it.
So some googleapi/plusone.js thing takes lots of cpu time.

Another interesting one, which might require some tweaking in Firefox is session saver.
Could you file a followup bug about that?
Something like "Investigate if Session Saver runs too often" and make that bug block this one.


I assume you profiled using a release version. Could you perhaps try nightly. That would give
better profile information (since it doesn't use pseudo stack information).
The profile does show some poll calls, but because of missing stack trace information it isn't
unfortunately too useful. So, if you could test Nightly. http://nightly.mozilla.org/

Comment 49

5 years ago
profile from nightly build
David, thanks for the profiles. Could you give us some more context, i.e.:

- How many tabs and which websites did you have open at the time you collected the profile? 
- What was the CPU usage reported by top at that time? 

Note that you should turn off the Gecko profiler before looking at the CPU usage reported by top or a similar tool. 

One of the pages you had open might have been performing some legitimate background activity causing a noticeable effect on the CPU consumption.

Comment 51

5 years ago
I don't know how many tab and windows I had open, I'd have to go through and count (but a few dozen windows and couple hundred tabs)

I had been running Aurora, I did an upgrade to the latest version, and performance went through the floor.

This is running the exact same tabs and windows as before the upgrade

if I switch away from firefox to a window running top, it takes 20+ seconds for the top display to stabilize with fire
fox eating ~70% cpu, during this stabilization time, I see xorg eating a large amount of cpu (up to 90%) with firefox
usually the next most busy item.

At this point it commonly takes several seconds from when I hit a control in firefox to the time that it takes effect.

David Lang
(In reply to David Lang from comment #51)
> I had been running Aurora, I did an upgrade to the latest version, and
> performance went through the floor.

In that case, it is possible that one of your addons has compatibility issues in the new version of Aurora. Can you try starting Firefox in safe mode[1] to check if that's the case?

Alternatively, you could also try using the "Reset Profile" feature accessible through the about:support page. It creates a new profile and only copies the absolute essentials from your current profile. You can always return to the original profile, which is left untouched.


[1]: https://support.mozilla.org/en-US/kb/troubleshoot-firefox-issues-using-safe-mode

Comment 53

5 years ago
I have been having this problem for years, I've tried safe mode, I've tried resetting my profile, they have not helped. This bug has been around for 5 years, reported by many people (and given the hassles of reporting, experienced by lots more), we've tried a lot of things over the years.

I assume that you will not accept this and I will go through this yet again.

A few posts up it sounded like you actually has a suspect as to what could be going on here, now it sounds like we're back at square one. :-(
(In reply to David Lang from comment #53)
> I have been having this problem for years, I've tried safe mode, I've tried
> resetting my profile, they have not helped. This bug has been around for 5
> years, reported by many people (and given the hassles of reporting,
> experienced by lots more), we've tried a lot of things over the years.

Ok, thanks for the explanation.

> 
> I assume that you will not accept this and I will go through this yet again.

Why wouldn't I accept that?

> 
> A few posts up it sounded like you actually has a suspect as to what could
> be going on here, now it sounds like we're back at square one. :-(

It seems you think that there is a unified "you" (or, from my perspective, "us") that makes decisions and assessments here. That's not the case: we're all engineers trying to shed light on the situation. I apologize for rehashing things that you clearly already ruled out, that wasn't my intention.
David, I beg to differ, I think we are making some progress here thanks to your inputs. I believe that what you are seeing might be due to a combination of known power gremlins (https://bugzilla.mozilla.org/show_bug.cgi?id=962573) , possibly unknown bugs and legitimate work being performed on the background. For instance, granted I recall correctly, this bug https://bugzilla.mozilla.org/show_bug.cgi?id=969325 causes Xorg's usage to spike on instagram.

Note that just because a page looks static, it doesn't necessarily mean that it's not performing any background work and the conditioned probability of that happening given a couple of hundred tabs open is quite high. Any activity that generates many events ends up causing a storm of poll/recvfrom with strace, i.e. the reason why this bug has been open for so long is that it's just too generic.

Unfortunately given the sheer amounts of tabs you have open we can't do much until we pinpoint which pages are causing this behavior. And once we identified some of the pages that cause high CPU consumption, it might very well be that the background activity they are performing just has to be done.
In other words, it would be great if you could keep an eye on your CPU consumption while you are browsing and if you happen to notice that it spikes after opening a specific page, then please report it. We are aware that some pages behave bad in terms of CPU (see power bugs above) and we are addressing those issues.

Comment 56

5 years ago
@till, I assumed that you wouldn't accept that because engineers usually don't ;-)

@roberto, the list of sites that I have open is actually pretty static. I'm one of those people who use open tabs as a sort of 'high priority bookmark', so many of those tabs are ones that I've had open for a long time.

If there's any way to identify which tab is using a lot of CPU, I'd be thrilled.

In the past, I've found that this bug has been more likely to be triggered with a lot of tabs than with any particular site being open. I've frequently been able to get out of the problem by closing some tabs and restarting firefox. It hasn't mattered much _which_ tabs I close (all-static tabs like lwn.net with ads disabled have frequently been enough)

If this FD is the communication to X, then given the high CPU of X.org when I switch screens, I wonder if the problem is triggered by something like running out of video ram to hold the window/tab contents?

Comment 57

5 years ago
Came across something interesting. poll...fd=4...; recvmsg...EAGAIN (Resource temporarily unavailable); are in all graphical applications on my Arch system. 

Before I go into that. The Gecko profile just shows some __poll_nocancel :? libc.so.6; taking 40 - 80% of ...whatever gets measured. I couldn't find any further information on what, or where this came from. Again if this does seem to be something firefox can do something about, at least for its own operation, let me know what I can do.

My CPU problem also seems resolved. I messed with some addons, and updated to 31, I don't recall anything else, but since my last post firefox now does not exceed 20% at any point while idle. Hovering around 2% mostly. The jumps are probably simply pages in the background doing the JS stuff. JS things fired by timers were the most intensive according to gecko after the strange poll thing.

In any case! It seems clear that the excessive calls problem is a lot broader at least on my system. I suspect the bumblebee workaround to get my optimus dGPU and iGPU playing nice, is not playing nice itself with some other part of the system. I'll also try to see if changing graphics drivers / using hardware acc. in firefox changes anything for at least firefox itself.
David, 

Could you tell which extensions you have installed on your profile? I tried reproducing the bug with your inputs but so far I had no luck. Your profile seems to suggest some activity related to one or more google plus buttons though (Bug 972473).

Yoric, there seems to be some session store related activity in his profile, do you think that it might be related to what David is experiencing considering the large amount of tabs open?
Flags: needinfo?(dteller)
I see lots of Google activity but I can't seem to find any trace of Session Restore in either profile. Where do you see it, Roberto?
Flags: needinfo?(dteller)
Yoric, I was referring to the following trace:

    ConvertUTF16toUTF8::write(char16_t const*, unsigned int) ??:?
    this.Utils<.makeURI() @ Utils.jsm:14
    EnterJIT
    js::RunScript
    SessionCookiesInternal._extractHostsFromEntry() @ SessionCookies.jsm:164
    SessionCookiesInternal._extractHostsFromEntry() @ SessionCookies.jsm:164
    SessionCookiesInternal.getHostsForWindow() @ SessionCookies.jsm:98
    SessionCookiesInternal.update() @ SessionCookies.jsm:54
    EnterJIT
    SessionCookiesInternal.update() @ SessionCookies.jsm:54
    this.SessionCookies<.update() @ SessionCookies.jsm:27
    js::RunScript
    SessionStoreInternal.getCurrentState() @ SessionStore.jsm:2029
    EnterJIT
    SessionStoreInternal.getCurrentState() @ SessionStore.jsm:2029
    this.SessionStore.getCurrentState() @ SessionStore.jsm:264
    js::RunScript
    SessionSaverInternal._saveState() @ SessionSaver.jsm:185
    SessionSaverInternal._saveStateAsync() @ SessionSaver.jsm:235
    SessionSaverInternal.runDelayed/this._timeoutID<() @ SessionSaver.jsm:151
    js::RunScript
    setTimeout_timer() @ Timer.jsm:28
    js::RunScript
    Timer::Fire
    Startup::XRE_Main
    (root)
Flags: needinfo?(david)

Comment 61

5 years ago
flashblock 1.5.17
gecko profiler 1.13.2
noscript 2.6.8.24rc2
oldbar 1.2
test pilot 1.2.3
ubunto firefox modifications 2.8 (disabled)


IcedTea web plugin (using IcedTea-web 1.5 (1.5-1ubuntu1)
shockwave flash 11.2.202.350
shockwave flash 12.1.r720
itunes application detector (disabled)

I don't know why there are two shockwave flash plugins, I'll disable the older one, but I doubt it will make much idfference
Flags: needinfo?(david)

Comment 62

5 years ago
I've disabled the old flash, oldbar, and test pilot plugins to simplify things (and it looks like today's batch of upgrades is going to update flash anyway)

Comment 63

5 years ago
one thing that I remembered sometimes seemed to help in the past was to shrink the size of sessionstore.js, either by closeing tabs or by erasing history

I tried closing a bunch of tabs, but it was still very slow, and a profile was 12M (too large to upload, but I have it if needed), I cleared history and things seem a bit better for now, and a new profile is only 7M

someone mentioned that they thought they saw some sessionstore stuff in the profile, I would be very suspicious of it.

I have a bunch of sessionstore files archived, many from times when I was having big problems with firefox (mostly of this type), sizes are:

rw------- 1 dlang dlang       0 Oct 28  2008 sessionstore-1.js
-rw------- 1 dlang dlang    4096 Oct 28  2008 sessionstore-2.js
-rw------- 1 dlang dlang  763900 Feb 12  2011 sessionstore.js.22
-rw------- 1 dlang dlang  296726 Feb 12  2011 sessionstore.js.21
-rw------- 1 dlang dlang  775776 Feb 12  2011 sessionstore.js.7
-rw------- 1 dlang dlang  604574 Feb 12  2011 sessionstore.js.del
-rw------- 1 dlang dlang  842071 Feb 12  2011 sessionstore.js.del10
-rw------- 1 dlang dlang  630567 Feb 12  2011 sessionstore.js.del11
-rw------- 1 dlang dlang  367880 Feb 12  2011 sessionstore.js.del13
-rw------- 1 dlang dlang  503272 Feb 12  2011 sessionstore.js.del14
-rw------- 1 dlang dlang  833865 Feb 12  2011 sessionstore.js.del16
-rw------- 1 dlang dlang 1011202 Feb 12  2011 sessionstore.js.del15
-rw------- 1 dlang dlang  794325 Feb 12  2011 sessionstore.js.del17
-rw-r--r-- 1 dlang dlang  784767 Feb 12  2011 sessionstore.js.del18
-rw------- 1 dlang dlang  539840 Feb 12  2011 sessionstore.js.del2
-rw-r--r-- 1 dlang dlang 1088975 Feb 12  2011 sessionstore.js.del19
-rw------- 1 dlang dlang  472198 Feb 12  2011 sessionstore.js.del3
-rw-r--r-- 1 dlang dlang 1037033 Feb 12  2011 sessionstore.js.del20
-rw------- 1 dlang dlang  836027 Feb 12  2011 sessionstore.js.del4
-rw------- 1 dlang dlang  734145 Feb 12  2011 sessionstore.js.del5
-rw------- 1 dlang dlang  666499 Feb 12  2011 sessionstore.js.del6
-rw------- 1 dlang dlang  649572 Feb 12  2011 sessionstore.js.del8
-rw------- 1 dlang dlang  692539 Feb 12  2011 sessionstore.js.del9
-rw------- 1 dlang dlang 1217245 May  2  2011 sessionstore.js.del23
-rw------- 1 dlang dlang 3761725 Nov 13  2011 sessionstore.del24
-rw------- 1 dlang dlang 2139395 Nov 13  2011 sessionstore.del25
-rw------- 1 dlang dlang 2948463 Nov 17  2012 sessionstore.del26
-rw------- 1 dlang dlang 2761747 Nov 17  2012 sessionstore.del27
-rw------- 1 dlang dlang 1687487 Apr 20  2013 sessionstore.js.28
-rw------- 1 dlang dlang 1346489 Oct  3  2013 sessionstore.del29
-rw------- 1 dlang dlang 1158491 Apr 28 19:57 sessionstore.del30
-rw------- 1 dlang dlang  397085 May 13 22:42 sessionstore.js
Ah, yeah, I see the sessionstore activity accounting for ~1% CPU. While it could explain spikes, it should work only every ~15 seconds, so if the process burns CPU continuously, that's probably not the main explanation.
Also, a sessionstore.js of ~400kb should be quite reasonable.

Comment 66

5 years ago
you missed that the 400KB sessionstore.js is after I closed about half my tabs and cleared history (and the problem diminished significantly). Normally my sessionstore.hs is more like the others listed, at 1MB or larger

Comment 67

5 years ago
Ubuntu 14.04 x64
Firefox 32.0.3
After 6 hours firefox constantly loads about 10-15% CPU, uses 1.5G of memory, freezes scrolling and typing. Even after closing all tabs freezing continues.

strace -C -p $(pidof firefox)
Many messages:
read(32, "\372", 1)                     = 1
recvmsg(4, 0x7fff80f0a600, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=12, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN}, {fd=32, events=POLLIN}, {fd=34, events=POLLIN}, {fd=14, events=POLLIN|POLLPRI}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fff80f0a600, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=12, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN}, {fd=32, events=POLLIN}, {fd=34, events=POLLIN}, {fd=14, events=POLLIN|POLLPRI}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fff80f0a600, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=12, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN}, {fd=32, events=POLLIN}, {fd=34, events=POLLIN}, {fd=14, events=POLLIN|POLLPRI}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fff80f0a600, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=12, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN}, {fd=32, events=POLLIN}, {fd=34, events=POLLIN}, {fd=14, events=POLLIN|POLLPRI}], 7, 4294967295) = 1 ([{fd=32, revents=POLLIN}])

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.00    0.003816          25       150           poll
  2.39    0.000095           1       157       151 recvmsg
  0.75    0.000030           1        37           read
  0.23    0.000009           1         9           futex
  0.23    0.000009           9         1           restart_syscall
  0.20    0.000008           3         3           write
  0.20    0.000008           1         6           writev
No longer blocks: 1077652
(In reply to Olli Pettay [:smaug] from comment #48)
> So some googleapi/plusone.js thing takes lots of cpu time.

bug 1015903

Comment 69

5 years ago
Firefox 33.0 on Ubuntu 14.04

I am seeing the exact same issue as vokuznetsov86@yandex.ru, but with cpu usage going to 150% (2 core). I have closed all tabse but one blank one, still churnning through CPU.

I know that closing that last tab will solve my issue, but it will also be back in a few days.

What other diagnostics can we run to help pinpointing this?

Comment 70

5 years ago
So I closed that last tab and got (note the process doesn't die, it stays stuck on the last futex, chewing my cpus):

```
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 0 (Timeout)
futex(0x7f1b98ee9040, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f1b98ee9040, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f1b5097e98c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7f1b5097e928, 3856) = 1
futex(0x7f1b5097e870, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 1 ([{fd=27, revents=POLLIN}])
read(27, "\372", 1)                     = 1
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 0 (Timeout)
futex(0x7f1b98ee9040, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f1af7ade8dc, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7f1af7ade878, 10) = 1
futex(0x7f1af7ade500, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 1 ([{fd=27, revents=POLLIN}])
read(27, "\372", 1)                     = 1
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 0 (Timeout)
futex(0x7f1b98ee9040, FUTEX_WAKE_PRIVATE, 1) = 1
munmap(0x7f1b237ff000, 8392704)         = 0
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 0 (Timeout)
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, futex(0x7f1b98ee9040, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f1b98ee9040, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f1b5097e98c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7f1b5097e928, 3856) = 1
futex(0x7f1b5097e870, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 1 ([{fd=27, revents=POLLIN}])
read(27, "\372", 1)                     = 1
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 0 (Timeout)
futex(0x7f1b98ee9040, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f1af7ade8dc, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7f1af7ade878, 10) = 1
futex(0x7f1af7ade500, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 1 ([{fd=27, revents=POLLIN}])
read(27, "\372", 1)                     = 1
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 0 (Timeout)
futex(0x7f1b98ee9040, FUTEX_WAKE_PRIVATE, 1) = 1
munmap(0x7f1b237ff000, 8392704)         = 0
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 0 (Timeout)
recvmsg(4, 0x7fffde2b56b0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN}, {fd=27, events=POLLIN}, {fd=9, events=POLLIN|POLLPRI}, {fd=26, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 0 (Timeout)
futex(0x7f1b97cdbe8c, FUTEX_WAIT_PRIVATE, 1, NULL
```

At this point, it's stuck, CPU usage is still at 150%. Once I kill it (simple kill, no -9):

```
) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=4118, si_uid=1001} ---
unlink("/export/home/yves/.mozilla/firefox/mfohne4x.default/lock") = 0
close(11)                               = 0
rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x7f1b983e4340}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
tgkill(31115, 31115, SIGTERM)           = 0
--- SIGTERM {si_signo=SIGTERM, si_code=SI_TKILL, si_pid=31115, si_uid=1001} ---
+++ killed by SIGTERM +++
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 95.05   37.456973        4513      8300       472 futex
  2.55    1.005900         123      8182           poll
  2.14    0.844023      844023         1           restart_syscall
  0.09    0.034559           4      8690      8175 recvmsg
  0.07    0.026227         170       154        11 read
  0.06    0.025495       12748         2           fsync
  0.01    0.005163          40       130           write
  0.01    0.004052        1351         3           rmdir
  0.00    0.001550           0      4001           madvise
  0.00    0.001522         190         8           getdents
  0.00    0.001144           2       471           writev
  0.00    0.000104           5        22           fcntl
  0.00    0.000047           3        17        13 access
  0.00    0.000044           6         8         2 open
  0.00    0.000038          19         2           rename
  0.00    0.000033           2        17           lstat
  0.00    0.000029           2        12           fstat
  0.00    0.000020           1        14           stat
  0.00    0.000019           1        15           close
  0.00    0.000015           5         3           unlink
  0.00    0.000013          13         1           rt_sigaction
  0.00    0.000013           3         4           openat
  0.00    0.000012           1        24           munmap
  0.00    0.000011           1        15           lseek
  0.00    0.000004           4         1           rt_sigprocmask
  0.00    0.000003           3         1           tgkill
  0.00    0.000000           0         5           mmap
  0.00    0.000000           0         2           clone
  0.00    0.000000           0         2           getrusage
------ ----------- ----------- --------- --------- ----------------
100.00   39.407013                 30107      8673 total

```
Yves, if you manage to reproduce the issue, could you share a report from the Gecko Profiler? See https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem. Thanks!

Comment 72

5 years ago
With Nightly 37.0a1 (2014-12-23), I get the following output from strace:

[pid 22584] futex(0x7fbf72039e4c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7fbf71e05f50, 931628 <unfinished ...>
[pid 22601] <... futex resumed> )       = 0
[pid 22601] futex(0x7fbf71e05f50, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 22601] futex(0x7fbf72039e4c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 931629, {1419372058, 363883000}, ffffffff <unfinished ...>
[pid 22584] <... futex resumed> )       = 1
[pid 22584] recvmsg(4, 0x7fffddedb4d0, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid 22584] poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=12, events=POLLIN}, {fd=24, events=POLLIN}], 6, 0) = 0 (Timeout)
[pid 22584] futex(0x7fbf72039e4c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7fbf71e05f50, 931630 <unfinished ...>
[pid 22601] <... futex resumed> )       = 0
[pid 22601] futex(0x7fbf71e05f50, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 22601] futex(0x7fbf72039e4c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 931631, {1419372058, 364398000}, ffffffff <unfinished ...>
[pid 22584] <... futex resumed> )       = 1
[pid 22584] recvmsg(4, 0x7fffddedb4d0, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid 22584] poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=12, events=POLLIN}, {fd=24, events=POLLIN}], 6, 0) = 0 (Timeout)
[pid 22584] futex(0x7fbf72039e4c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7fbf71e05f50, 931632 <unfinished ...>
[pid 22601] <... futex resumed> )       = 0
[pid 22601] futex(0x7fbf71e05f50, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 22601] futex(0x7fbf72039e4c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 931633, {1419372058, 364912000}, ffffffff <unfinished ...>
[pid 22584] <... futex resumed> )       = 1
[pid 22584] recvmsg(4, 0x7fffddedb4d0, 0) = -1 EAGAIN (Resource temporarily unavailable)

CPU usage is 100%.

22584 is the main firefox process, 22601 is BgHangManager.

I just opened ff (with 16 tabs open from last session, but I only clicked into
one or two). I then installed the Gecko profiler, and without a restart,
clicked analyze. During this whole time, I had mysqld running at 100%, but
it's probably unrelated.

about:support json info: http://yhager.com/~yuval/about-support.json

Gecko profile: http://yhager.com/~yuval/lOLwY4ZL.bin (this is ~19MB, I was
unable to upload it using the share button).
Could you retry with Gecko profiler.
I would install it, restart browser, stop the profiler, then start it, analyze and share.
And don't try to make too large profile.

(And please make the profile when you don't have other processes taking 100% cpu, especially if those might be IO heavy. Or generate a separate profile when such process is running.)

Comment 74

5 years ago
Okay, will try again when it happens again, and update here. Since last time, I've disabled the "Lazarus Form Recovery" extension, as I think it might have been the culprit in my case.

Comment 75

4 years ago
My attention was brought to this bug via LWN. For me Firefox using 30% is *normal*.

Anyway, I can fill in some of the missing details that might explain some of the traces above. Firstly, for others using strace, please use the -tt -f flags, otherwise the output is not useful.

Like above, my strace looks as follows, repeated endlessly:

3099  09:01:36.716125 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=19, events=POLLIN}, {fd=18, events=POLLIN}, {fd=43, events=POLLIN}, {fd=12, events=POLLIN}], 6, 0) = 0 (Timeout)
3099  09:01:36.717104 recvmsg(4,  <unfinished ...>
3099  09:01:36.717175 <... recvmsg resumed> 0x7fffd49ce3d0, 0) = -1 EAGAIN (Resource temporarily unavailable)
3116  09:01:36.717212 write(20, "\372", 1 <unfinished ...>
3099  09:01:36.717265 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=19, events=POLLIN}, {fd=18, events=POLLIN}, {fd=43, events=POLLIN}, {fd=12, events=POLLIN}], 6, 0 <unfinished ...>
3116  09:01:36.717323 <... write resumed> ) = 1
3099  09:01:36.717360 <... poll resumed> ) = 1 ([{fd=19, revents=POLLIN}])
3099  09:01:36.717459 read(19, "\372", 1) = 1
3099  09:01:36.717584 recvmsg(4, 0x7fffd49ce3d0, 0) = -1 EAGAIN (Resource temporarily unavailable)
3099  09:01:36.717705 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=19, events=POLLIN}, {fd=18, events=POLLIN}, {fd=43, events=POLLIN}, {fd=12, events=POLLIN}], 6, 0) = 0 (Timeout)
3099  09:01:36.717862 recvmsg(4, 0x7fffd49ce3d0, 0) = -1 EAGAIN (Resource temporarily unavailable)
3099  09:01:36.717985 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=19, events=POLLIN}, {fd=18, events=POLLIN}, {fd=43, events=POLLIN}, {fd=12, events=POLLIN}], 6, 0) = 0 (Timeout)
3116  09:01:36.718667 write(20, "\372", 1) = 1
3099  09:01:36.718869 recvmsg(4, 0x7fffd49ce3d0, 0) = -1 EAGAIN (Resource temporarily unavailable)
3099  09:01:36.719007 poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=19, events=POLLIN}, {fd=18, events=POLLIN}, {fd=43, events=POLLIN}, {fd=12, events=POLLIN}], 6, 0) = 1 ([{fd=19, revents=POLLIN}])
3099  09:01:36.719167 read(19, "\372", 1) = 1

What's causing all the wakeups is thread 3116 doing a write every millisecond. So what is thread 3116? Here is the GDB stacktrace with a breakpoint on the write syscall:

$ bt
#0  0x00007fcd9fdb935d in write () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fcd9a5f7f1d in nsAppShell::ScheduleNativeEventCallback (this=<optimised out>) at /build/buildd/firefox-35.0.1+build1/widget/gtk/nsAppShell.cpp:150
#2  0x00007fcd9a5dc286 in nsBaseAppShell::OnDispatchedEvent (this=0x14, thr=0x7fcd8aefec5f) at /build/buildd/firefox-35.0.1+build1/widget/xpwidgets/nsBaseAppShell.cpp:235
#3  0x00007fcd99a02a52 in nsThread::PutEvent (this=0x7fcd9eef6530, aEvent=0x7fcd5ce7abd8, aTarget=<optimised out>) at /build/buildd/firefox-35.0.1+build1/xpcom/threads/nsThread.cpp:509
#4  0x00007fcd99a05557 in nsTimerImpl::PostTimerEvent (aTimerRef=...) at /build/buildd/firefox-35.0.1+build1/xpcom/threads/nsTimerImpl.cpp:768
#5  0x00007fcd99a00d83 in TimerThread::Run (this=0x7fcd9eee8a10) at /build/buildd/firefox-35.0.1+build1/xpcom/threads/TimerThread.cpp:279
#6  0x00007fcd99a0333d in nsThread::ProcessNextEvent (this=0x7fcd8affc270, aMayWait=<optimised out>, aResult=0x7fcd8aefeddf) at /build/buildd/firefox-35.0.1+build1/xpcom/threads/nsThread.cpp:830
#7  0x00007fcd99a13fe9 in NS_ProcessNextEvent (aThread=<optimised out>, aMayWait=aMayWait@entry=0x0) at /build/buildd/firefox-35.0.1+build1/xpcom/glue/nsThreadUtils.cpp:265
#8  0x00007fcd99bc8b3c in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0x7fcd908c7f40, aDelegate=0x7fcd9ee8bf40) at /build/buildd/firefox-35.0.1+build1/ipc/glue/MessagePump.cpp:339
#9  0x00007fcd99bba6eb in RunHandler (this=0x7fcd9ee8bf40) at /build/buildd/firefox-35.0.1+build1/ipc/chromium/src/base/message_loop.cc:223
#10 MessageLoop::Run (this=this@entry=0x7fcd9ee8bf40) at /build/buildd/firefox-35.0.1+build1/ipc/chromium/src/base/message_loop.cc:197
#11 0x00007fcd99a02e8f in nsThread::ThreadFunc (aArg=0x7fcd8affc270) at /build/buildd/firefox-35.0.1+build1/xpcom/threads/nsThread.cpp:350
#12 0x00007fcd9e7d8478 in _pt_root (arg=0x7fcd908da100) at /build/buildd/firefox-35.0.1+build1/nsprpub/pr/src/pthreads/ptthread.c:212
#13 0x00007fcd9fdb2182 in start_thread (arg=0x7fcd8aeff700) at pthread_create.c:312
#14 0x00007fcd9f2d0efd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

So, it's some kind of timer thread. What's curious is that it's not while putting the event on the queue, but rather while it's calling mObserver->OnDispatchedEvent.

I tried working what the timer is, but couldn't work out where to look. Going up to TimerThread::Run and printing the timer variable produced the following. It means nothing to me though. Perhaps there is another way to see what timer events are being fired.

Breakpoint 4, TimerThread::Run (this=0x7fcd9eee8a10) at /build/buildd/firefox-35.0.1+build1/xpcom/threads/TimerThread.cpp:252
252	/build/buildd/firefox-35.0.1+build1/xpcom/threads/TimerThread.cpp: No such file or directory.
gdb-peda$ p timer
$10 = (nsTimerImpl *) 0x7fcd424a6a60
gdb-peda$ p *timer
$11 = {
  <nsITimer> = {
    <nsISupports> = {
      _vptr.nsISupports = 0x7fcd9d07bfd0 <vtable for nsTimerImpl+16>
    }, <No data fields>}, 
  members of nsTimerImpl: 
  mRefCnt = {
    static isThreadSafe = 0x1, 
    mValue = {
      <mozilla::detail::AtomicBaseIncDec<unsigned long, (mozilla::MemoryOrdering)2>> = {
        <mozilla::detail::AtomicBase<unsigned long, (mozilla::MemoryOrdering)2>> = {
          mValue = {
            <std::__atomic_base<unsigned long>> = {
              _M_i = 0x2
            }, <No data fields>}
        }, <No data fields>}, <No data fields>}
  }, 
  mEventTarget = {
    <nsCOMPtr_base> = {
      mRawPtr = 0x7fcd9eef6530
    }, <No data fields>}, 
  mClosure = 0x0, 
  mCallback = {
    c = 0x7fcd99ff4e81 <CCTimerFired(nsITimer*, void*)>, 
    i = 0x7fcd99ff4e81 <CCTimerFired(nsITimer*, void*)>, 
    o = 0x7fcd99ff4e81 <CCTimerFired(nsITimer*, void*)>
  }, 
  mTimerCallbackWhileFiring = {
    <nsCOMPtr_base> = {
      mRawPtr = 0x0
    }, <No data fields>}, 
  mCallbackType = 0x2, 
  mType = 0x1, 
  mFiring = 0x0, 
  mArmed = 0x1, 
  mCanceled = 0x0, 
  mGeneration = 0x10f3ce, 
  mDelay = 0x190, 
  mTimeout = {
    mValue = 0x148a4f2edad8
  }, 
  mStart = {
    mValue = 0x0
  }, 
  mStart2 = {
    mValue = 0x0
  }, 
  static sDeltaSum = 0, 
  static sDeltaSumSquared = 0, 
  static sDeltaNum = 0
}

Hope this helps,
That log points the finger at the cycle collector (CCTimerFired).
CC timer should run only if something had created possible garbage.

Martijn, could you perhaps test Nightly and install Gecko profiler[1] and share the
profile?




[1] https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler
Martijn, you might want to also set javascript.options.mem.log to true in about:config
to see why CC seems to run so often. The log is printed to Browser Console.
You could then copy the log to a file and attach to this bug.

Comment 79

4 years ago
Ok, so just to avoid confusion I went back to my GDB debugger and got it to print out the various values of timer->mCallback every time it fires. The CCTimerFired appears but it's not the most common, that's nsGlobalWindow::TimerCallback(nsITimer*, void*). However, gdb is so slow I can't rule out it interfering with the process.

I'm trying the nightly but just trying to type a comment here crashes all the tabs. I can verify that I'm getting 30% usage CPU even when all tabs are crashed, which I find impressive. I can also verify the tab recovery works pretty well :/.

Anyway, what I'm going to try and do is start the nightly and just leave the profiler running while doing nothing, since the process seems fairly flaky.
Any chance you could try a nightly build and the Gecko profiler and share the profile?
Do that with e10s disabled to get similar to release builds behavior.


nsGlobalWindow::TimerCallback(nsITimer*, void*) hints that some page or addon is using
lots of setTimeout or setInterval.

Comment 81

4 years ago
Ok, so I got a nightly build with the profiler. However, when I click the share button it says "Uploading", "waiting for server side compression", "Error 0 occurred uploading your file". Which appears to mean it didn't work. I've saved it locally (9MB) and I'll try again in the morning.

Another data point though, when I ran the nightly without any plugins it did indeed to go zero usage, though reenabling the profiler added quite a bit of overhead by itself.

Comment 82

4 years ago
Ok, I finally got an upload to succeed:

http://people.mozilla.org/~bgirard/cleopatra/#report=0cc2f9fce8d90d593fb8dbc6a5aa2dddf042dc98&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A540966,%22end%22%3A569890%7D%5D

Though I'm not sure it will help. It shows the time is going into the timer thread, but we knew that already. It looks like whatever the timer is firing is so cheap it doesn't show up.
That is very odd looking profile. Never seen anything like it.

Nathan, does that ring any bells?
Flags: needinfo?(nfroyd)

Comment 84

4 years ago
What I'm going to try to do next (when I get time) is figure out where in the code to drop a breakpoint to find out what is scheduling the timer and get a backtrace of that. But to me it appears that the overhead of the timer is by itself so high that firing a timer every millisecond means high CPU usage, even if the thing being triggered is a NOP.

Any tips on where I should be looking?
Martijn, I have used this patch [1] in the past to print the name of the callbacks invoked by the timer thread. This works if the callbacks are exported in the dynamic symbol table, which on OSX are by default. Thanks for looking into it!

[1] https://gist.github.com/vitillo/8937398
(In reply to Martijn from comment #82)
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=0cc2f9fce8d90d593fb8dbc6a5aa2dddf042dc98&filter=%5B%7B%22type%22%3A%2
> 2RangeSampleFilter%22,%22start%22%3A540966,%22end%22%3A569890%7D%5D

(In reply to Olli Pettay [:smaug] (review overload. No more review requests before Feb 8, please) from comment #83)
> That is very odd looking profile. Never seen anything like it.
> 
> Nathan, does that ring any bells?

I apparently don't see the same profile you see.  That profile (in the filtered range linked above or in the complete view) shows me 94% of the time being spent in poll().  Is that what you see?
Flags: needinfo?(nfroyd) → needinfo?(bugs)
I see the same, and I've never seen that kind of profile on 64bit linux.
But indeed, perhaps it is just that I don't have the symbols for libc
Flags: needinfo?(bugs)

Comment 88

4 years ago
the straces that people have posted show a thread in a constant polling loop (with -EAGAIN), the question has always been what does firefix think it's doing. the profiles that have been posted haven't seemed to help
I have seen similar profiles when I was tracking power issues (see Bug 962573); timer callbacks with few instructions might not show up in profiles. In most cases I was able to find the culprit by following comment 85.

Comment 90

4 years ago
Profiler trace from another high cpu incident:

http://yves.zioup.com/mozilla/A+wX7EDR/A+wX7EDR.bin.lz

Comment 91

4 years ago
Another profile trace of the same incident, with more "red" in it"

http://people.mozilla.org/~bgirard/cleopatra/#report=a96339d10e182daa9eb4e31d4812c6728c0ec317

Comment 92

4 years ago
blank_window.out: strace of a blank firefox window doing nothing consuming 100% cpu.

closing_the_window.out: strace of the fireforx process as I close the window, and the process survives.

no_window_process_eating_100_percent_cpu_and_crash_on_it_s_own.out: strace of the process once the window is closed and until firefox crashes.

Comment 93

4 years ago
Firefox 36.0 on Ubuntu 14.04

100% CPU even after closing all the tab but on blank one.
Process left around after the windows was closed.
The process crashed as I started to strace it.

I have captured a Gecko Profiler profile:
http://people.mozilla.org/~bgirard/cleopatra/#report=daf67af64eabc7eb2bf2bb3025301844ab40426b

and 3 strace of the process (attached to this ticket):
https://bugzilla.mozilla.org/attachment.cgi?id=8572710&action=edit

Comment 94

4 years ago
(In reply to rvitillo@mozilla.com from comment #71)
> Yves, if you manage to reproduce the issue, could you share a report from
> the Gecko Profiler? See
> https://developer.mozilla.org/en-US/docs/Mozilla/Performance/
> Reporting_a_Performance_Problem. Thanks!

Please see my latest crassh report above:
https://bugzilla.mozilla.org/show_bug.cgi?id=508427#c93
(In reply to Yves Dorfsman from comment #93)
> I have captured a Gecko Profiler profile:
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=daf67af64eabc7eb2bf2bb3025301844ab40426b
This one is probably about "Page speed" addon (or is it perhaps "ProCon Latte Content Filter" since also that seems to have ComponentCollectorService) or about mdn.net page.

Comment 96

4 years ago
strace files of when firefox was 100% CPU (pagespeed disabled).

See comment#96

Comment 97

4 years ago
(In reply to rvitillo@mozilla.com from comment #71)
> Yves, if you manage to reproduce the issue, could you share a report from
> the Gecko Profiler? See
> https://developer.mozilla.org/en-US/docs/Mozilla/Performance/
> Reporting_a_Performance_Problem. Thanks!

(In reply to Olli Pettay [:smaug] from comment #95)
> (In reply to Yves Dorfsman from comment #93)
> This one is probably about "Page speed" addon (or is it perhaps "ProCon
> Latte Content Filter" since also that seems to have
> ComponentCollectorService) or about mdn.net page.


100% CPU
killed plugin_container
still at 100% cpu

Firefox 36.0.1 on Ubuntu 14.04 x64
PageSpeed disabled

strace files:
https://bugzilla.mozilla.org/attachment.cgi?id=8581647

Gecko profile:
http://people.mozilla.org/~bgirard/cleopatra/#report=cb91e6f7fb13b7ef5c006ac542e85a2c91f3d537

Comment 98

4 years ago
I was also affected by this bug so today I tried to eliminate the issue by disabling some extensions. After disabling/enabling 30 extensions back and forth, I found 2 offending extensions: Self-Destructing Cookies and Pushbullet. Disabling those 2 extensions helped getting rid of tons of poll syscall and now my Firefox sits quietly in background.

I'm sure those are not the only faulty ones but I recommend checking if you have them installed & enabled.
Filed as bug 1167833 to auto-detect such add-ons. This shouldn't be too hard, once the add-on performance monitoring infrastructure is firmly in place.

Comment 100

4 years ago
This has been killing me now for maybe three months.  It may have existed to some extent prior to that, but maybe not too.  I generally don't update utilities so I could have had whatever firefox somes with FC20.
 
Symptoms similar to those described above: for no particular reason firefox sucks up 1.5GB of memory and spins away about 20% of my cpu capacity.

I noticed that I could often get some temporary improvement by killing plugin-container.  typically when experiencing the problem minimizing all windows does not mitigate the problem.  I've found it necessary to kill FF and restart it.

I read the thread here and went disabled all extensions and have all plugins set to ask.  I've authorized no plug in (at this moment).  The only exception is cisco openh624 which is set to is never.  Killed and restarted firefox.

In this configuration I can express/unexpress the problem depending on which URL are exposed.  Right now firefox is using ~5% (4-7) memory usage seems to get down to the high five hundred meg then work up to the high seven hundreds, then click down.  In the time it's taken me to type this the CPU ticked up to 13% briefly.  I have three windows open.  One is the add-on manager (one tab), one has two tabs (market watch and google search). The third window is this one (one tab) bugzilla.mozilla.org.

In the two tab window, the hidden tab has the marketwatch site.  The exposed tab is the search page "firefox chewing up 20% of my cpu time".  As soon as I expose the marketwatch tab my usage jumps into the twenty some odd percent.  Hide that tab and it calms back down to the 'normal' condition above.  I can actually hear the difference in my CPU fan speed.  It is not necessary to dismiss the tab with the marketwatch site - only to unexpose it.

Expose MW, twenty.  Hide MW three.  I often have the issue (nearly always, in fact) and I don't typically have this one site exposed, so I think there is some technique used on that site which is also used other places.  

I'm guessing here, but this site uses auto refresh?

----
Linux wichita.pacificbreeze 3.19.8-100.fc20.x86_64 #1 SMP Tue May 12 17:08:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

   FF 38.0.5
dmi
   BIOS Information
	   Vendor: Award Software International, Inc.
	   Version: F11
           Release Date: 05/17/2010
   Product Name: GA-MA785GM-US2H F1
   Processor Information
	Socket Designation: Socket M2
	Type: Central Processor
	Family: Athlon
	Manufacturer: AMD
	ID: 23 0F 10 00 FF FB 8B 17
    Handle 0x0008, DMI type 6, 12 bytes
    Memory Module Information
	Socket Designation: A2
	Bank Connections: 3
	Current Speed: 32 ns
	Type: Other Unknown EDO
	Installed Size: 2048 MB (Double-bank Connection)
	Enabled Size: 2048 MB (Double-bank Connection)
	Error Status: OK

    Handle 0x0009, DMI type 6, 12 bytes
    Memory Module Information
	Socket Designation: A3
	Bank Connections: 4
	Current Speed: 32 ns
	Type: Other Unknown EDO
	Installed Size: 2048 MB (Double-bank Connection)
	Enabled Size: 2048 MB (Double-bank Connection)
(In reply to douglas from comment #100)
> I noticed that I could often get some temporary improvement by killing
> plugin-container.
Sounds like a Flash issue, assuming you were using Flash and not other plugins.



> In the two tab window, the hidden tab has the marketwatch site.  The exposed
> tab is the search page "firefox chewing up 20% of my cpu time".  As soon as
> I expose the marketwatch tab my usage jumps into the twenty some odd
> percent. 
What is marketwatch? URL? http://www.marketwatch.com/ or something else?
Based on Gecko profiler profile http://www.marketwatch.com/ seems to run some rather heavy script all the time.
http://www.marketwatch.com/  seems to be rather cpu heavy also on other browsers.
(Testing on 64bit Fedora 22)

Comment 103

4 years ago
Yes: marketwatch.com.

I visited the mozilla site and they recommended adding ad blocker, script blocker and flash blocker plug-ins.  This accomplished I can not use marketwatch.  That's a loss - however by selectively allowing only the host site to run scripts FF is now usable for most other sites.

Flash is an issue.  However, with the blockers installed the thing behaves far more sanely.  Now on a heavy flash site the container chews up a lot of time, especially when establishing connections.  Firefox rarely uses much CPU and does not suck up all memory.  Even when the system monitor says the container is using 20% the rest of the system (and other FF tabs/windows) remain usable.  Prior to the blockers the browser would crush my whole system.  I've long suspected that browsers have used way too aggressive locking and can windup as uninterruptible.

So I'm guessing your intuition (in the following comment) is correct and the issue is scripting related.

In my day job I'm doomed to various windows machines and I use firefox there as well.  No issues.  I briefly used Chrome on this box and at the same revision levels for everything.  No issues.  Then Chrome updated and all new windows come up full screen.  Evidently no way to tweak this : yum remove google-chrome.
We also came across the described behavior (cf., https://forge.univention.org/bugzilla/show_bug.cgi?id=38866). I post this here, as the issue is reproducible with one of our VM images (https://www.univention.com/downloads/ucs-download/preinstalled-vm-images/download-vm-image/) which contain a Debian based server system.

Some words to understand the context... after the boot process, we launch Firefox in a plain X session to allow the user to configure the server system via a web based wizard. And this automatically launched Firefox instance for the setup process shows the same behavior which is described here.

The best way to access the system to inspect the Firefox process would be either by hitting CTRL+SHIFT+K when the wizard starts or via SSH with the user root when the setup process has been finished and the "successful installation" page is being shown (in the next step, Firefox is being shut down).

I hope this piece of information may help to identify the problem.

Comment 105

4 years ago
The problem comes from a bad/closed socket after talking with the local X server:
connect(4, {sa_family=AF_LOCAL, sun_path=@"/tmp/.X11-unix/X0"}, 20) = 0
getpeername(4, {sa_family=AF_LOCAL, sun_path=@"/tmp/.X11-unix/X0"}, [20]) = 0
...
recvmsg(4, 0xbff835b4, 0)               = -1 EAGAIN (Resource temporarily unavailable)

It is possible if plugins exist and you are not in safe mode for this to be descriptor 22 (or other).
To see if your problem is the same, use the simple repeat and trace steps below.

Firefox: 40.0.3
Fedora release 22 (Twenty Two), PAE i686, everything is up to date (dnf)

Not having pawed through the code, but seeing this annoyance for a while now I ran:
$ mv .mozilla/ .mozilla.old
$ strace firefox -safe-mode > FF.out 2>&1

Let Firefox sit idle for 10-30sec+, then quit.

The goal is to find what descriptor 4 is (was).

In less I searched for the first repeating 'Resource temporarily unavailable', in my case descriptor 4.
Now just back up from that point to find out what 4 is/was.

You see opens/closes using 4 like (which aren't it):
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=2492, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb775e000
read(4, "# Locale name alias data base.\n#"..., 4096) = 2492
read(4, "", 4096)                       = 0
close(4)                                = 0

The problem comes from talking to the X server..
futex(0xb62c4c7c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0) = 4             <-- not this instance
connect(4, {sa_family=AF_LOCAL, sun_path="/run/user/1001/:0"}, 20) = -1 ENOENT (No such file or directory)
close(4)                                = 0                   <-- 4 is closed here
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0) = 4             <- this is it!!!!!!!!!!!!!!!!!!!
connect(4, {sa_family=AF_LOCAL, sun_path=@"/tmp/.X11-unix/X0"}, 20) = 0
getpeername(4, {sa_family=AF_LOCAL, sun_path=@"/tmp/.X11-unix/X0"}, [20]) = 0
uname({sysname="Linux", nodename="<removed>", ...}) = 0
access("/run/user/1001/gdm/Xauthority", R_OK) = 0
open("/run/user/1001/gdm/Xauthority", O_RDONLY) = 5
fstat64(5, {st_mode=S_IFREG|0700, st_size=120, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7736000
read(5, "\1\0\0\20jtosh.aftinc.net\0\0\0\22MIT-MAGI"..., 4096) = 120
close(5)                                = 0
munmap(0xb7736000, 4096)                = 0
getsockname(4, {sa_family=AF_LOCAL, NULL}, [2]) = 0
fcntl64(4, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
poll([{fd=4, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=4, revents=POLLOUT}])
writev(4, [{"l\0\v\0\0\0\22\0\20\0\0\0", 12}, {"", 0}, {"MIT-MAGIC-COOKIE-1", 18}, {"\0\0", 2}, {"\321\326(I\245<\22\247\342k\271\305tJ\350T
", 16}, {"", 0}], 6) = 48
recv(4, "\1\0\v\0\0\0\362\0", 8, 0)     = 8
recv(4, "\360\216\262\0\0\0\240\2\377\377\37\0\0\1\0\0\16\0\377\377\1\7\0\0  \10\377\0\0\0\0"..., 968, 0) = 968
poll([{fd=4, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=4, revents=POLLOUT}])
writev(4, [{"b\0\5\0\f\0\0\0BIG-REQUESTS", 20}], 1) = 20
poll([{fd=4, events=POLLIN}], 1, -1)    = 1 ([{fd=4, revents=POLLIN}])
recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"\1\0\1\0\0\0\0\0\1\205\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 4096}], msg_controllen=0, m
sg_flags=0}, 0) = 32
poll([{fd=4, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=4, revents=POLLOUT}])
writev(4, [{"\205\0\1\0", 4}], 1)       = 4
poll([{fd=4, events=POLLIN}], 1, -1)    = 1 ([{fd=4, revents=POLLIN}])
recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"\1\0\2\0\0\0\0\0\377\377?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 4096}], msg_controllen=0, 
msg_flags=0}, 0) = 32
recvmsg(4, 0xbff835b4, 0)               = -1 EAGAIN (Resource temporarily unavailable)
...
(and from now on descriptor 4 continues to repeat the error until FF is stopped).
Whiteboard: [needs profile] → [needs profile][Power]
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #85)
> Martijn, I have used this patch [1] in the past to print the name of the
> callbacks invoked by the timer thread. This works if the callbacks are
> exported in the dynamic symbol table, which on OSX are by default. Thanks
> for looking into it!
> 
> [1] https://gist.github.com/vitillo/8937398

A version of this code has been landed in the tree. See https://developer.mozilla.org/en-US/docs/Mozilla/Performance/TimerFirings_logging for how to use it.
Summary: Firefox process chews through CPU even when in the background (or save my laptop battery) → High CPU on Linux even when in the background, possibly due to frequent polling
Whiteboard: [needs profile][Power] → [needs profile][Power:P3]

Comment 107

4 years ago
i don't think it is timer related because when idle (blank page and in background) in --safe-mode and with NSPR_LOG_MODULES=TimerFirings:4 it doesnt show more than 1-2 firings per few seconds, yet, strace shows constant FLOOD of:
recvmsg(5, 0xbf8ddcb8, 0)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=29, events=POLLIN}, {fd=30, events=POLLIN}], 8, 0) = 0 (Timeout)

and by FLOOD i mean tens/hundreds of those per SECOND. and as previously mentioned they come because of bad/closed X socket!

Comment 108

4 years ago
forgot to mention: firefox 43.0.2, linux x86. also, strace ran with -r (relative time between syscalls it suggests its not hundreds but thousands/tens of thousands of calls per second):
     0.000972 recvmsg(4, 0xbfbb4d98, 0) = -1 EAGAIN (Resource temporarily unavailable)
     0.000151 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=9, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=28, events=POLLIN}, {fd=29, events=POLLIN}], 8, 0) = 0 (Timeout)
     0.000186 recvmsg(4, 0xbfbb4d68, 0) = -1 EAGAIN (Resource temporarily unavailable)
     0.000122 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=9, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=28, events=POLLIN}, {fd=29, events=POLLIN}], 8, 0) = 0 (Timeout)
     0.000190 recvmsg(4, 0xbfbb4d98, 0) = -1 EAGAIN (Resource temporarily unavailable)
     0.000099 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=9, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=28, events=POLLIN}, {fd=29, events=POLLIN}], 8, 0) = 0 (Timeout)
     0.000212 recvmsg(4, 0xbfbb4d68, 0) = -1 EAGAIN (Resource temporarily unavailable)

Comment 109

4 years ago
(In reply to Jeff Buhrt from comment #105)
> The problem comes from a bad/closed socket after talking with the local X
> server:
> 
> [...]
>
> It is possible if plugins exist and you are not in safe mode for this to be
> descriptor 22 (or other).
> To see if your problem is the same, use the simple repeat and trace steps
> below.
> 
> Firefox: 40.0.3
> Fedora release 22 (Twenty Two), PAE i686, everything is up to date (dnf)
> 
> Not having pawed through the code, but seeing this annoyance for a while now
> I ran:
> $ mv .mozilla/ .mozilla.old
> $ strace firefox -safe-mode > FF.out 2>&1
> 
> [...]
>
> (and from now on descriptor 4 continues to repeat the error until FF is
> stopped).

I've got the exact same behaviour on Fedora 21 (last updated today) on Firefox 42.0, x86_64 running Xorg 1.16.3-2.fc21. Since the problem occurs with a fresh ~/.mozilla and Firefox in safe-mode, I assume it's safe to say that it should be regarded as a bug in Firefox?

I'm happy to test my environment but updated to Fedora 23, as well as on other distributions/environments running in VMs if that helps anyone.

Comment 110

4 years ago
(In reply to Tim Cambrant from comment #109)
> (In reply to Jeff Buhrt from comment #105)
> > The problem comes from a bad/closed socket after talking with the local X
> > server:
> > 
> > [...]
> >
> > It is possible if plugins exist and you are not in safe mode for this to be
> > descriptor 22 (or other).
> > To see if your problem is the same, use the simple repeat and trace steps
> > below.
> > 
> > Firefox: 40.0.3
> > Fedora release 22 (Twenty Two), PAE i686, everything is up to date (dnf)
> > 
> > Not having pawed through the code, but seeing this annoyance for a while now
> > I ran:
> > $ mv .mozilla/ .mozilla.old
> > $ strace firefox -safe-mode > FF.out 2>&1
> > 
> > [...]
> >
> > (and from now on descriptor 4 continues to repeat the error until FF is
> > stopped).
> 
> I've got the exact same behaviour on Fedora 21 (last updated today) on
> Firefox 42.0, x86_64 running Xorg 1.16.3-2.fc21. Since the problem occurs
> with a fresh ~/.mozilla and Firefox in safe-mode, I assume it's safe to say
> that it should be regarded as a bug in Firefox?
> 
> I'm happy to test my environment but updated to Fedora 23, as well as on
> other distributions/environments running in VMs if that helps anyone.

Tim,

I would assume for now it is a Firefox bug. The problem/fix could be inside a Gnome/X11 library function, but I haven't went to the point to building FF and isolating where the problem is occurring. 

I assume you also still see the looping on Fedora 23 (like I do)? [I see ~19-30% CPU when FF is idle on Fedora 4.2.8-300.fc23.x86_64 SMP, 4 core here). What about the other Linux also?

Comment 111

4 years ago
Anyone know how to make Firefox use GTK2 vs GTK3? Or anyone have a system with only GTK2 installed along with a new firefox? 

I am starting to think this looping might be related to GTK3 re: https://bugzilla.mozilla.org/show_bug.cgi?id=627699 
The reason I discovered this is Eclipse on Fedora 23 started using ~50% of a core when idle after an update.

a) Normal/default Eclipse eclipse-platform-4.5.1-6.fc23.x86_64:
[pid 26284] recvmsg(15, {msg_name(0)=NULL, msg_iov(1)=[{"\1\0Z\0\0\0\0\0\1\201@\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 32
[pid 26284] recvmsg(15, 0x7ff07ca58160, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid 26284] recvmsg(15, 0x7ff07ca58160, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid 26284] poll([{fd=15, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=15, revents=POLLOUT}])
[pid 26284] writev(15, [{"\201\0\1\0", 4}, {NULL, 0}, {"", 0}], 3) = 4
[pid 26284] poll([{fd=15, events=POLLIN}], 1, -1) = 1 ([{fd=15, revents=POLLIN}])

b) Workaround for Eclipse: Using 'eclipse --launcher.GTK_version 2' per https://hedayatvk.wordpress.com/2015/07/16/eclipse-problems-on-fedora-22/ now Eclipse is back to the normal 0.3% CPU. Without the GTK2 option it is back to 50%+ CPU.

I am making a reference in https://bugzilla.mozilla.org/show_bug.cgi?id=627699 to this ticket.

Comment 112

4 years ago
(In reply to Jeff Buhrt from comment #111)
> Anyone know how to make Firefox use GTK2 vs GTK3? Or anyone have a system
> with only GTK2 installed along with a new firefox? 
> 
> I am starting to think this looping might be related to GTK3 re:
> https://bugzilla.mozilla.org/show_bug.cgi?id=627699 
> The reason I discovered this is Eclipse on Fedora 23 started using ~50% of a
> core when idle after an update.
> 
> a) Normal/default Eclipse eclipse-platform-4.5.1-6.fc23.x86_64:
> [pid 26284] recvmsg(15, {msg_name(0)=NULL,
> msg_iov(1)=[{"\1\0Z\0\0\0\0\0\1\201@\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
> 0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 32
> [pid 26284] recvmsg(15, 0x7ff07ca58160, 0) = -1 EAGAIN (Resource temporarily
> unavailable)
> [pid 26284] recvmsg(15, 0x7ff07ca58160, 0) = -1 EAGAIN (Resource temporarily
> unavailable)
> [pid 26284] poll([{fd=15, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=15,
> revents=POLLOUT}])
> [pid 26284] writev(15, [{"\201\0\1\0", 4}, {NULL, 0}, {"", 0}], 3) = 4
> [pid 26284] poll([{fd=15, events=POLLIN}], 1, -1) = 1 ([{fd=15,
> revents=POLLIN}])
> 
> b) Workaround for Eclipse: Using 'eclipse --launcher.GTK_version 2' per
> https://hedayatvk.wordpress.com/2015/07/16/eclipse-problems-on-fedora-22/
> now Eclipse is back to the normal 0.3% CPU. Without the GTK2 option it is
> back to 50%+ CPU.
> 
> I am making a reference in
> https://bugzilla.mozilla.org/show_bug.cgi?id=627699 to this ticket.

For cross reference the note I made is https://bugzilla.mozilla.org/show_bug.cgi?id=627699#c302

Comment 113

4 years ago
I experience the same "busy polling" both in Firefox 43.0.4 and in Nightly.
I downloaded Nightly, created fresh profile, opened a page (google.pl it was), killed the page process (plugin-container -- Nightly apparently loads also page content in the container, not only plugins).

I experience continuous 25% CPU usage while the browser sits idle showing me the "page crashed" page stub. Profile @ https://cleopatra.io/#report=003c3047b6277e3a847e9a3414e4a0c52c0f8b17&selection=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14

The polled fds look like
/proc/10258/fd/12 -> pipe:[87881]
/proc/10258/fd/15 -> socket:[88605]
/proc/10258/fd/4 -> socket:[87877]
/proc/10258/fd/42 -> pipe:[88612]
/proc/10258/fd/5 -> anon_inode:[eventfd]

Now:

* interesting might be the fact that I experience this on one account but not the other on my machine. Completely fresh account (guest) exhibits the problem. There is no intentional configuration difference between those (non-fresh) accounts, I created and configured both quite alike.

* forgive my ignorance, but why the poll() syscall is called with timeout=0? `man poll' says 0 means "return immediately", so it's seems natural that the process is busy polling instead of hanging on it. I saw this zero timeout in the strace output attached by others earlier. Is it possible to know where this 0 value comes from?

Comment 114

3 years ago
For comparison, I strace'd the firefox process behaving correctly (in my case -- this is just on the other account, should that matter). The poll is called with timeout 4294967295 (max int) and it wakes up every 1-5 seconds (thus -- due to an event). So, my uneducated guess is that the 0 poll timeout is part of the bug (if not the cause itself).

Can someone fluent in FF source tell what the poll timeout should be and how can it come out as 0?

Comment 115

3 years ago
I have this too. I am not sure if it this or something else that is making my Firefox laggy and ignoring some of user input too. Using -c summary function of of strace for 60 seconds, here's my 'idle' Firefox 45.0b1 on Ubuntu:

mace@macentu:~$ sudo strace -c -p $(pidof firefox) & sleep 60; sudo kill $(pidof strace)
[1] 7285
strace: Process 6730 attached
strace: Process 6730 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 87.72    0.336605          16     21177           poll
  4.32    0.016595           2      9861        82 futex
  4.12    0.015805           1     27867     21158 recvmsg
  3.48    0.013366           2      6736           writev
  0.23    0.000900           0      2118           read
  0.07    0.000284           1       205           madvise
  0.05    0.000179           1       268         8 stat64
  0.00    0.000000           0         1           restart_syscall
  0.00    0.000000           0       142           write
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         2         1 rt_sigreturn
  0.00    0.000000           0         7           mmap2
------ ----------- ----------- --------- --------- ----------------
100.00    0.383734                 68385     21249 total
mace@macentu:~$

Comment 116

3 years ago
This issue was opened in 2009.  The issue is so common that many sites advertise advertise that Firefox will give unacceptable performance.  There are 115 comments many listing exact methods of reproducing the issue and many others point out exactly the issue.

I will summarize the issue.  The main poll loop does not check for file descriptor closed events rending the polling loop a while true be busy loop.  The Firefox team is too busy changing fonts and widgets to actually care that the product is unusable.

I've long gone to Chrome. but several times since then I've pulled the latest FF on an updated Fedora distribution and demonstrated the problem immediately.  It makes you wonder what browser the developers use.
(In reply to douglas from comment #116)
> I will summarize the issue.  The main poll loop does not check for file
> descriptor closed events rending the polling loop a while true be busy loop.
> The Firefox team is too busy changing fonts and widgets to actually care
> that the product is unusable.

That summary is wrong.

Partly because the people working on the browser's UI aren't the people working on the underlying platform, and the latter are very much focused on fixing issues like this.

> I've long gone to Chrome. but several times since then I've pulled the
> latest FF on an updated Fedora distribution and demonstrated the problem
> immediately.  It makes you wonder what browser the developers use.

And partly, the explanation is probably in this last sentence: your experience is not universal. A substantial part of the platform developers use Linux as their main, or only, OS. If they had the same experience you have, you're entirely right: they wouldn't use Firefox. They do, however, because they don't have that experience.

I realize that that doesn't help the people affected by this issue - and I'm sorry to say that actually offering a solution is outside of my area of expertise - but neither do snide remarks and wrong, vastly simplifying, "summaries".

Comment 118

3 years ago
(In reply to Till Schneidereit [:till] from comment #117)
> (In reply to douglas from comment #116)
> > I will summarize the issue.  The main poll loop does not check for file
> > descriptor closed events rending the polling loop a while true be busy loop.
> > The Firefox team is too busy changing fonts and widgets to actually care
> > that the product is unusable.
> 
> That summary is wrong.
> 
> Partly because the people working on the browser's UI aren't the people
> working on the underlying platform, and the latter are very much focused on
> fixing issues like this.

which part of the summary are you disagreeing with? the technical part that is the first plausible thing I've seen posted here in several years? or his explanation of why it's not fixed yet?
 
rather than complain about his evaluation of why the developers haven't fixed this bug, how about explaining what you think the problem is if his evaluation of the loop not handling socket disconnect events isn't valid?

> > I've long gone to Chrome. but several times since then I've pulled the
> > latest FF on an updated Fedora distribution and demonstrated the problem
> > immediately.  It makes you wonder what browser the developers use.
> 
> And partly, the explanation is probably in this last sentence: your
> experience is not universal. A substantial part of the platform developers
> use Linux as their main, or only, OS. If they had the same experience you
> have, you're entirely right: they wouldn't use Firefox. They do, however,
> because they don't have that experience.

And on Linux, I have been having this problem for many years. But Developers have very different usage patterns for the product they are developing than other users. We've seen that having large numbers of windows with a lot of history makes this more common, Developers who restart things frequently aren't going to have either of those. When people like me who have 200-400 tabs open across a couple dozen windows say so, the almost universal reaction from developers has been "nobody does that"

But back to technical issues. Is there a way to pause firefox and then use a debugger or something to 'finish closing' a socket and make it so that the main loop stops polling it? If someone can identify how to do this (including a patched version of firefox with an ugly hack that we can run instead of the main version), then we can test and see if that solves the symptoms. If so, then the developers should be able to start looking for what race condition can cause that (or instrument a version to give more detail when problems happen) and we can finally make progress
Cc:ing Karl in case he has cycles to look into this issue. I for one use Firefox on Linux on a daily basis and have never ran into this, but clearly there's something going on here...

Comment 120

3 years ago
for me, I have 2 dozen windows open (across different screens/desktops), and if I don't aggressivly prune things, it's not unusual to find that I have 400+ tabs open, almost none to sites with active content. If I let the history grow to the point where the sessionstore.js grows over 1M, I am almost certain to run into grief before very long. This on a 2-core laptop with 8G ram (T61)

I don't at all believe that there is a hard limit that I am running into, it's not that deterministic. But it would make a lot of sense that stretching things as I do, both on the firefox side, and on the Xorg side, that I make some low probability event much more common, either in causing X or dbus or something to run out of resources and close a socket to firefox and firefox to not notice that it happens.

Comment 121

3 years ago
for me this issue is showing ON A FRESH PROFILE + SAFE MODE + ONE TAB (fresh start, don't need to open any site). as someone noticed earlier it might have to do something with gtk3 and system setup in general, which triggers this particular bug in firefox.
(In reply to David Lang from comment #118)
> which part of the summary are you disagreeing with? the technical part that
> is the first plausible thing I've seen posted here in several years? or his
> explanation of why it's not fixed yet?

The latter. The fact that bugs in completely unrelated areas get fixed in the interim doesn't mean that this bug has been prioritized lower than those bugs. If you think about it, that doesn't make any sense at all: how would people changing fonts, who're probably frontend engineers or even designers, be capable of fixing this bug?

The more important point is, however, that it's a false assumption that this bug is so clearly the most important thing to work on. If the experience you, douglas and a number of other people have would affect all or even just a substantial fraction of all Linux users, then it would be. And since a substantial fraction of our engineers who'd be most qualified to fix this issues would be *directly affected* by it if it were so common, it would long be fixed.

Nick Nethercote explained this much more clearly than I could, so I'll just link to his blog post:
https://blog.mozilla.org/nnethercote/2014/01/02/yeinu-your-experience-is-not-universal/

To be clear, just as Nick, I also don't think it's ok for serious issues like this one to affect anyone at all. We don't have an unlimited amount of resources, though, so we have to prioritize things.
>  
> rather than complain about his evaluation of why the developers haven't
> fixed this bug, how about explaining what you think the problem is if his
> evaluation of the loop not handling socket disconnect events isn't valid?

I wish I could! Sadly, I don't know the first thing about socket event handling, so I'm not even remotely qualified to comment on this. (Just as I'm not qualified to change fonts, fwiw.)

> And on Linux, I have been having this problem for many years. But Developers
> have very different usage patterns for the product they are developing than
> other users. We've seen that having large numbers of windows with a lot of
> history makes this more common, Developers who restart things frequently
> aren't going to have either of those. When people like me who have 200-400
> tabs open across a couple dozen windows say so, the almost universal
> reaction from developers has been "nobody does that"

I know that I'm not alone in having hundreds to thousands of tabs open at any point: if anything that's much more in line with common developers' usage patterns than with average users'.
It would be really helpful if people can identify call stacks associated with these rapid poll/recvmsg calls.  One way to do that is using GDB (at least 7.8 or so):

# If plugin-container is taking up CPU, then this should be
# gdb --pid $PLUGIN_CONTAINER_PID /path/to/plugin-container
# instead
#
$ gdb --pid $FIREFOX_PID /path/to/firefox/binary
(gdb) catch syscall recvmsg
(gdb) commands 1
> where
> p $rax
> c
> end
(gdb) catch syscall poll
(gdb) commands 2
> where
> p $rdx
> c
> end
(gdb) continue

and let that run for a little while (~10s should produce plenty of output).  You'll need debug symbols for all the relevant libraries (looking at my own firefox, that's X11 libraries, gtk, gdk, glib, and libc); if you see "??" in your stacks, that's a sign you probably need more debug symbol packages.  I think you should be able to get symbols for Firefox's libxul without doing anything special, but stack traces are probably valuable even without libxul symbols.

Comment 124

3 years ago
(In reply to Nathan Froyd [:froydnj] from comment #123)
> It would be really helpful if people can identify call stacks associated
> with these rapid poll/recvmsg calls.  

Doesn't  https://cleopatra.io/#report=003c3047b6277e3a847e9a3414e4a0c52c0f8b17&selection=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 (from comment 113) present some kind of a call stack?
(In reply to Piotr Findeisen from comment #124)
> (In reply to Nathan Froyd [:froydnj] from comment #123)
> > It would be really helpful if people can identify call stacks associated
> > with these rapid poll/recvmsg calls.  
> 
> Doesn't 
> https://cleopatra.io/
> #report=003c3047b6277e3a847e9a3414e4a0c52c0f8b17&selection=0,1,2,3,4,5,6,7,8,
> 9,10,11,12,13,14 (from comment 113) present some kind of a call stack?

Indeed it does.  For some reason I thought it was much older than a week or two and was unlikely to still have the necessary data.

What's probably happening is that something is pushing Gecko runnables into the main thread's event loop, which triggers:

http://mxr.mozilla.org/mozilla-central/source/widget/gtk/nsAppShell.cpp#206

which would wake up GTK's main poll-driven event loop.  The next question would be who's sending all those runnables when the browser is quiescent (e.g. comment 121 says this behavior is visible on a fresh profile with no tabs, which is not something I see locally).  I can easily imagine that if you have lots of web pages open, enough events are coming through that poll does get woken up quite a bit...

I think the recvmsg problems discussed upthread are different, since those are related to X itself and not GTK's event loop.

Comment 126

3 years ago
Note it is likely the 'X' problem is GTK3 caused. The polling is likely inside a library and maybe not Firefox itself, unless it is a Firefox set timeout value=0 as noted above.
If you see my note above about Eclipse it loops the same way as Firefox when using GTK3, but when forced back to GTK2 it is << 1% CPU. Either way fix what you find and I guess we see if there is a different X loop as well. Thanks!

For a stack dump of the GTK3 problem as requested on Fedora 4.3.3-301.fc23.x86_64:

(gdb) catch syscall recvmsg
Catchpoint 1 (syscall 'recvmsg' [47])
(gdb) commands 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>where
>p $rax
>c
>end
(gdb) catch syscall poll
Catchpoint 2 (syscall 'poll' [7])
(gdb) commands 2
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>> where
>> p $rdx
>> c
>> end
>
>end
(gdb) commands 2
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>where
>p $rdx
>c
>end
(gdb) c
Continuing.
[Switching to Thread 0x7ff6aebff700 (LWP 10806)]

Catchpoint 2 (call to syscall poll), 0x00007ff6d6502fdd in poll () at ../sysdeps/unix/syscall-template.S:84
84	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
#0  0x00007ff6d6502fdd in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ff65cebd8a1 in poll_func () from /lib64/libpulse.so.0
#2  0x00007ff65ceaeec1 in pa_mainloop_poll () from /lib64/libpulse.so.0
#3  0x00007ff65ceaf55e in pa_mainloop_iterate () from /lib64/libpulse.so.0
#4  0x00007ff65ceaf610 in pa_mainloop_run () from /lib64/libpulse.so.0
#5  0x00007ff65cebd836 in thread () from /lib64/libpulse.so.0
#6  0x00007ff65cc5dc78 in internal_thread_func () from /usr/lib64/pulseaudio/libpulsecommon-7.1.so
#7  0x00007ff6d727360a in start_thread (arg=0x7ff6aebff700) at pthread_create.c:334
#8  0x00007ff6d650ea4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
$1 = 4294967295
[Switching to Thread 0x7ff6d7687740 (LWP 10660)]

Catchpoint 1 (call to syscall recvmsg), 0x00007ff6d727bf7d in recvmsg () at ../sysdeps/unix/syscall-template.S:84
84	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
#0  0x00007ff6d727bf7d in recvmsg () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ff6d05ff3e7 in _xcb_in_read (c=c@entry=0x7ff6bac02000) at xcb_in.c:982
#2  0x00007ff6d05ffcd8 in poll_for_next_event (c=0x7ff6bac02000, queued=queued@entry=0) at xcb_in.c:709
#3  0x00007ff6d05ffcf7 in xcb_poll_for_event (c=<optimized out>) at xcb_in.c:718
#4  0x00007ff6d4a70838 in poll_for_event (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:257
#5  0x00007ff6d4a7099b in poll_for_response (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:289
#6  0x00007ff6d4a70cad in _XEventsQueued (dpy=dpy@entry=0x7ff6d62f9000, mode=mode@entry=2) at xcb_io.c:363
#7  0x00007ff6d4a626a7 in XPending (dpy=0x7ff6d62f9000) at Pending.c:55
---Type <return> to continue, or q <return> to quit---
#8  0x00007ff6d51d7ab1 in gdk_check_xpending (display=<optimized out>) at gdkeventsource.c:266
#9  gdk_event_source_check (source=0x7ff6d62a4380) at gdkeventsource.c:303
#10 0x00007ff6d2806bb1 in g_main_context_check (context=context@entry=0x7ff6bac0f110, max_priority=2147483647, 
    fds=fds@entry=0x7ff68c1e1700, n_fds=n_fds@entry=6) at gmain.c:3681
#11 0x00007ff6d2807110 in g_main_context_iterate (context=context@entry=0x7ff6bac0f110, block=block@entry=1, 
    dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3837
#12 0x00007ff6d280727c in g_main_context_iteration (context=0x7ff6bac0f110, context@entry=0x0, may_block=1)
    at gmain.c:3901
#13 0x00007ff6c8b2223e in nsAppShell::ProcessNextNativeEvent (this=<optimized out>, mayWait=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/gtk/nsAppShell.cpp:208
#14 0x00007ff6c8b05793 in nsBaseAppShell::DoProcessNextNativeEvent (this=this@entry=0x7ff6b1a3f0a0, 
    mayWait=<optimized out>) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:138
#15 0x00007ff6c8b079ba in nsBaseAppShell::OnProcessNextEvent (this=0x7ff6b1a3f0a0, thr=0x7ff6baccb180, 
    mayWait=true) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:289
#16 0x00007ff6c79ba03f in nsThread::ProcessNextEvent (this=0x7ff6baccb180, aMayWait=<optimized out>, 
    aResult=0x7ffced2b7707) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/xpcom/threads/nsThread.cpp:922
#17 0x00007ff6c79d6d61 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=aMayWait@entry=true)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/xpcom/glue/nsThreadUtils.cpp:277
#18 0x00007ff6c7baa112 in mozilla::ipc::MessagePump::Run (this=0x7ff6bacce240, aDelegate=0x7ff6d62930b0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/glue/MessagePump.cpp:127
#19 0x00007ff6c7b9a0e4 in MessageLoop::RunHandler (this=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/chromium/src/base/message_loop.cc:227
#20 MessageLoop::Run (this=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/chromium/src/base/message_loop.cc:201
#21 0x00007ff6c8b03e21 in nsBaseAppShell::Run (this=0x7ff6b1a3f0a0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:156
#22 0x00007ff6c90addd3 in nsAppStartup::Run (this=0x7ff6b1a360b0)
---Type <return> to continue, or q <return> to quit---
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/components/startup/nsAppStartup.cpp:281
#23 0x00007ff6c90eae59 in XREMain::XRE_mainRun (this=this@entry=0x7ffced2b79a8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4288
#24 0x00007ff6c90eb10f in XREMain::XRE_main (this=this@entry=0x7ffced2b79a8, argc=argc@entry=1, 
    argv=argv@entry=0x7ffced2b8ec8, aAppData=aAppData@entry=0x7ffced2b7bb8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4381
#25 0x00007ff6c90eb35a in XRE_main (argc=1, argv=0x7ffced2b8ec8, aAppData=0x7ffced2b7bb8, aFlags=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4483
#26 0x00005590dc424371 in do_main (argc=1, argv=0x7ffced2b8ec8, xreDirectory=0x7ff6d62a0a80)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/browser/app/nsBrowserApp.cpp:212
#27 0x00005590dc423a72 in main (argc=1, argv=0x7ffced2b8ec8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/browser/app/nsBrowserApp.cpp:399
$2 = -38

Catchpoint 1 (returned from syscall recvmsg), 0x00007ff6d727bf7d in recvmsg ()
    at ../sysdeps/unix/syscall-template.S:84
84	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
#0  0x00007ff6d727bf7d in recvmsg () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ff6d05ff3e7 in _xcb_in_read (c=c@entry=0x7ff6bac02000) at xcb_in.c:982
#2  0x00007ff6d05ffcd8 in poll_for_next_event (c=0x7ff6bac02000, queued=queued@entry=0) at xcb_in.c:709
#3  0x00007ff6d05ffcf7 in xcb_poll_for_event (c=<optimized out>) at xcb_in.c:718
#4  0x00007ff6d4a70838 in poll_for_event (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:257
#5  0x00007ff6d4a7099b in poll_for_response (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:289
#6  0x00007ff6d4a70cad in _XEventsQueued (dpy=dpy@entry=0x7ff6d62f9000, mode=mode@entry=2) at xcb_io.c:363
#7  0x00007ff6d4a626a7 in XPending (dpy=0x7ff6d62f9000) at Pending.c:55
#8  0x00007ff6d51d7ab1 in gdk_check_xpending (display=<optimized out>) at gdkeventsource.c:266
#9  gdk_event_source_check (source=0x7ff6d62a4380) at gdkeventsource.c:303
---Type <return> to continue, or q <return> to quit---
#10 0x00007ff6d2806bb1 in g_main_context_check (context=context@entry=0x7ff6bac0f110, max_priority=2147483647, 
    fds=fds@entry=0x7ff68c1e1700, n_fds=n_fds@entry=6) at gmain.c:3681
#11 0x00007ff6d2807110 in g_main_context_iterate (context=context@entry=0x7ff6bac0f110, block=block@entry=1, 
    dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3837
#12 0x00007ff6d280727c in g_main_context_iteration (context=0x7ff6bac0f110, context@entry=0x0, may_block=1)
    at gmain.c:3901
#13 0x00007ff6c8b2223e in nsAppShell::ProcessNextNativeEvent (this=<optimized out>, mayWait=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/gtk/nsAppShell.cpp:208
#14 0x00007ff6c8b05793 in nsBaseAppShell::DoProcessNextNativeEvent (this=this@entry=0x7ff6b1a3f0a0, 
    mayWait=<optimized out>) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:138
#15 0x00007ff6c8b079ba in nsBaseAppShell::OnProcessNextEvent (this=0x7ff6b1a3f0a0, thr=0x7ff6baccb180, 
    mayWait=true) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:289
#16 0x00007ff6c79ba03f in nsThread::ProcessNextEvent (this=0x7ff6baccb180, aMayWait=<optimized out>, 
    aResult=0x7ffced2b7707) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/xpcom/threads/nsThread.cpp:922
#17 0x00007ff6c79d6d61 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=aMayWait@entry=true)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/xpcom/glue/nsThreadUtils.cpp:277
#18 0x00007ff6c7baa112 in mozilla::ipc::MessagePump::Run (this=0x7ff6bacce240, aDelegate=0x7ff6d62930b0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/glue/MessagePump.cpp:127
#19 0x00007ff6c7b9a0e4 in MessageLoop::RunHandler (this=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/chromium/src/base/message_loop.cc:227
#20 MessageLoop::Run (this=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/chromium/src/base/message_loop.cc:201
#21 0x00007ff6c8b03e21 in nsBaseAppShell::Run (this=0x7ff6b1a3f0a0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:156
#22 0x00007ff6c90addd3 in nsAppStartup::Run (this=0x7ff6b1a360b0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/components/startup/nsAppStartup.cpp:281
#23 0x00007ff6c90eae59 in XREMain::XRE_mainRun (this=this@entry=0x7ffced2b79a8)
---Type <return> to continue, or q <return> to quit---
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4288
#24 0x00007ff6c90eb10f in XREMain::XRE_main (this=this@entry=0x7ffced2b79a8, argc=argc@entry=1, 
    argv=argv@entry=0x7ffced2b8ec8, aAppData=aAppData@entry=0x7ffced2b7bb8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4381
#25 0x00007ff6c90eb35a in XRE_main (argc=1, argv=0x7ffced2b8ec8, aAppData=0x7ffced2b7bb8, aFlags=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4483
#26 0x00005590dc424371 in do_main (argc=1, argv=0x7ffced2b8ec8, xreDirectory=0x7ff6d62a0a80)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/browser/app/nsBrowserApp.cpp:212
#27 0x00005590dc423a72 in main (argc=1, argv=0x7ffced2b8ec8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/browser/app/nsBrowserApp.cpp:399
$3 = 4096

Catchpoint 1 (call to syscall recvmsg), 0x00007ff6d727bf7d in recvmsg () at ../sysdeps/unix/syscall-template.S:84
84	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
#0  0x00007ff6d727bf7d in recvmsg () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ff6d05ff3e7 in _xcb_in_read (c=c@entry=0x7ff6bac02000) at xcb_in.c:982
#2  0x00007ff6d05ffcd8 in poll_for_next_event (c=0x7ff6bac02000, queued=queued@entry=0) at xcb_in.c:709
#3  0x00007ff6d05ffcf7 in xcb_poll_for_event (c=<optimized out>) at xcb_in.c:718
#4  0x00007ff6d4a70838 in poll_for_event (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:257
#5  0x00007ff6d4a7099b in poll_for_response (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:289
#6  0x00007ff6d4a70cad in _XEventsQueued (dpy=dpy@entry=0x7ff6d62f9000, mode=mode@entry=2) at xcb_io.c:363
#7  0x00007ff6d4a626a7 in XPending (dpy=0x7ff6d62f9000) at Pending.c:55
#8  0x00007ff6d51d7ab1 in gdk_check_xpending (display=<optimized out>) at gdkeventsource.c:266
#9  gdk_event_source_check (source=0x7ff6d62a4380) at gdkeventsource.c:303
#10 0x00007ff6d2806bb1 in g_main_context_check (context=context@entry=0x7ff6bac0f110, max_priority=2147483647, 
    fds=fds@entry=0x7ff68c1e1700, n_fds=n_fds@entry=6) at gmain.c:3681
#11 0x00007ff6d2807110 in g_main_context_iterate (context=context@entry=0x7ff6bac0f110, block=block@entry=1, 
---Type <return> to continue, or q <return> to quit---
    dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3837
#12 0x00007ff6d280727c in g_main_context_iteration (context=0x7ff6bac0f110, context@entry=0x0, may_block=1)
    at gmain.c:3901
#13 0x00007ff6c8b2223e in nsAppShell::ProcessNextNativeEvent (this=<optimized out>, mayWait=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/gtk/nsAppShell.cpp:208
#14 0x00007ff6c8b05793 in nsBaseAppShell::DoProcessNextNativeEvent (this=this@entry=0x7ff6b1a3f0a0, 
    mayWait=<optimized out>) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:138
#15 0x00007ff6c8b079ba in nsBaseAppShell::OnProcessNextEvent (this=0x7ff6b1a3f0a0, thr=0x7ff6baccb180, 
    mayWait=true) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:289
#16 0x00007ff6c79ba03f in nsThread::ProcessNextEvent (this=0x7ff6baccb180, aMayWait=<optimized out>, 
    aResult=0x7ffced2b7707) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/xpcom/threads/nsThread.cpp:922
#17 0x00007ff6c79d6d61 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=aMayWait@entry=true)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/xpcom/glue/nsThreadUtils.cpp:277
#18 0x00007ff6c7baa112 in mozilla::ipc::MessagePump::Run (this=0x7ff6bacce240, aDelegate=0x7ff6d62930b0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/glue/MessagePump.cpp:127
#19 0x00007ff6c7b9a0e4 in MessageLoop::RunHandler (this=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/chromium/src/base/message_loop.cc:227
#20 MessageLoop::Run (this=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/chromium/src/base/message_loop.cc:201
#21 0x00007ff6c8b03e21 in nsBaseAppShell::Run (this=0x7ff6b1a3f0a0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:156
#22 0x00007ff6c90addd3 in nsAppStartup::Run (this=0x7ff6b1a360b0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/components/startup/nsAppStartup.cpp:281
#23 0x00007ff6c90eae59 in XREMain::XRE_mainRun (this=this@entry=0x7ffced2b79a8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4288
#24 0x00007ff6c90eb10f in XREMain::XRE_main (this=this@entry=0x7ffced2b79a8, argc=argc@entry=1, 
    argv=argv@entry=0x7ffced2b8ec8, aAppData=aAppData@entry=0x7ffced2b7bb8)
---Type <return> to continue, or q <return> to quit---
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4381
#25 0x00007ff6c90eb35a in XRE_main (argc=1, argv=0x7ffced2b8ec8, aAppData=0x7ffced2b7bb8, aFlags=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4483
#26 0x00005590dc424371 in do_main (argc=1, argv=0x7ffced2b8ec8, xreDirectory=0x7ff6d62a0a80)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/browser/app/nsBrowserApp.cpp:212
#27 0x00005590dc423a72 in main (argc=1, argv=0x7ffced2b8ec8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/browser/app/nsBrowserApp.cpp:399
$4 = -38

Catchpoint 1 (returned from syscall recvmsg), 0x00007ff6d727bf7d in recvmsg ()
    at ../sysdeps/unix/syscall-template.S:84
84	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
#0  0x00007ff6d727bf7d in recvmsg () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ff6d05ff3e7 in _xcb_in_read (c=c@entry=0x7ff6bac02000) at xcb_in.c:982
#2  0x00007ff6d05ffcd8 in poll_for_next_event (c=0x7ff6bac02000, queued=queued@entry=0) at xcb_in.c:709
#3  0x00007ff6d05ffcf7 in xcb_poll_for_event (c=<optimized out>) at xcb_in.c:718
#4  0x00007ff6d4a70838 in poll_for_event (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:257
#5  0x00007ff6d4a7099b in poll_for_response (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:289
#6  0x00007ff6d4a70cad in _XEventsQueued (dpy=dpy@entry=0x7ff6d62f9000, mode=mode@entry=2) at xcb_io.c:363
#7  0x00007ff6d4a626a7 in XPending (dpy=0x7ff6d62f9000) at Pending.c:55
#8  0x00007ff6d51d7ab1 in gdk_check_xpending (display=<optimized out>) at gdkeventsource.c:266
#9  gdk_event_source_check (source=0x7ff6d62a4380) at gdkeventsource.c:303
#10 0x00007ff6d2806bb1 in g_main_context_check (context=context@entry=0x7ff6bac0f110, max_priority=2147483647, 
    fds=fds@entry=0x7ff68c1e1700, n_fds=n_fds@entry=6) at gmain.c:3681
#11 0x00007ff6d2807110 in g_main_context_iterate (context=context@entry=0x7ff6bac0f110, block=block@entry=1, 
    dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3837
#12 0x00007ff6d280727c in g_main_context_iteration (context=0x7ff6bac0f110, context@entry=0x0, may_block=1)
---Type <return> to continue, or q <return> to quit---
    at gmain.c:3901
#13 0x00007ff6c8b2223e in nsAppShell::ProcessNextNativeEvent (this=<optimized out>, mayWait=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/gtk/nsAppShell.cpp:208
#14 0x00007ff6c8b05793 in nsBaseAppShell::DoProcessNextNativeEvent (this=this@entry=0x7ff6b1a3f0a0, 
    mayWait=<optimized out>) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:138
#15 0x00007ff6c8b079ba in nsBaseAppShell::OnProcessNextEvent (this=0x7ff6b1a3f0a0, thr=0x7ff6baccb180, 
    mayWait=true) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:289
#16 0x00007ff6c79ba03f in nsThread::ProcessNextEvent (this=0x7ff6baccb180, aMayWait=<optimized out>, 
    aResult=0x7ffced2b7707) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/xpcom/threads/nsThread.cpp:922
#17 0x00007ff6c79d6d61 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=aMayWait@entry=true)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/xpcom/glue/nsThreadUtils.cpp:277
#18 0x00007ff6c7baa112 in mozilla::ipc::MessagePump::Run (this=0x7ff6bacce240, aDelegate=0x7ff6d62930b0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/glue/MessagePump.cpp:127
#19 0x00007ff6c7b9a0e4 in MessageLoop::RunHandler (this=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/chromium/src/base/message_loop.cc:227
#20 MessageLoop::Run (this=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/chromium/src/base/message_loop.cc:201
#21 0x00007ff6c8b03e21 in nsBaseAppShell::Run (this=0x7ff6b1a3f0a0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:156
#22 0x00007ff6c90addd3 in nsAppStartup::Run (this=0x7ff6b1a360b0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/components/startup/nsAppStartup.cpp:281
#23 0x00007ff6c90eae59 in XREMain::XRE_mainRun (this=this@entry=0x7ffced2b79a8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4288
#24 0x00007ff6c90eb10f in XREMain::XRE_main (this=this@entry=0x7ffced2b79a8, argc=argc@entry=1, 
    argv=argv@entry=0x7ffced2b8ec8, aAppData=aAppData@entry=0x7ffced2b7bb8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4381
#25 0x00007ff6c90eb35a in XRE_main (argc=1, argv=0x7ffced2b8ec8, aAppData=0x7ffced2b7bb8, aFlags=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4483
#26 0x00005590dc424371 in do_main (argc=1, argv=0x7ffced2b8ec8, xreDirectory=0x7ff6d62a0a80)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/browser/app/nsBrowserApp.cpp:212
#27 0x00005590dc423a72 in main (argc=1, argv=0x7ffced2b8ec8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/browser/app/nsBrowserApp.cpp:399
$5 = 3104

Catchpoint 1 (call to syscall recvmsg), 0x00007ff6d727bf7d in recvmsg () at ../sysdeps/unix/syscall-template.S:84
84	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
#0  0x00007ff6d727bf7d in recvmsg () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ff6d05ff3e7 in _xcb_in_read (c=c@entry=0x7ff6bac02000) at xcb_in.c:982
#2  0x00007ff6d05ffcd8 in poll_for_next_event (c=0x7ff6bac02000, queued=queued@entry=0) at xcb_in.c:709
#3  0x00007ff6d05ffcf7 in xcb_poll_for_event (c=<optimized out>) at xcb_in.c:718
#4  0x00007ff6d4a70838 in poll_for_event (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:257
#5  0x00007ff6d4a7099b in poll_for_response (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:289
#6  0x00007ff6d4a70cad in _XEventsQueued (dpy=dpy@entry=0x7ff6d62f9000, mode=mode@entry=2) at xcb_io.c:363
#7  0x00007ff6d4a626a7 in XPending (dpy=0x7ff6d62f9000) at Pending.c:55
#8  0x00007ff6d51d7ab1 in gdk_check_xpending (display=<optimized out>) at gdkeventsource.c:266
#9  gdk_event_source_check (source=0x7ff6d62a4380) at gdkeventsource.c:303
#10 0x00007ff6d2806bb1 in g_main_context_check (context=context@entry=0x7ff6bac0f110, max_priority=2147483647, 
    fds=fds@entry=0x7ff68c1e1700, n_fds=n_fds@entry=6) at gmain.c:3681
#11 0x00007ff6d2807110 in g_main_context_iterate (context=context@entry=0x7ff6bac0f110, block=block@entry=1, 
    dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3837
#12 0x00007ff6d280727c in g_main_context_iteration (context=0x7ff6bac0f110, context@entry=0x0, may_block=1)
    at gmain.c:3901
#13 0x00007ff6c8b2223e in nsAppShell::ProcessNextNativeEvent (this=<optimized out>, mayWait=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/gtk/nsAppShell.cpp:208
---Type <return> to continue, or q <return> to quit---
#14 0x00007ff6c8b05793 in nsBaseAppShell::DoProcessNextNativeEvent (this=this@entry=0x7ff6b1a3f0a0, 
    mayWait=<optimized out>) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:138
#15 0x00007ff6c8b079ba in nsBaseAppShell::OnProcessNextEvent (this=0x7ff6b1a3f0a0, thr=0x7ff6baccb180, 
    mayWait=true) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:289
#16 0x00007ff6c79ba03f in nsThread::ProcessNextEvent (this=0x7ff6baccb180, aMayWait=<optimized out>, 
    aResult=0x7ffced2b7707) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/xpcom/threads/nsThread.cpp:922
#17 0x00007ff6c79d6d61 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=aMayWait@entry=true)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/xpcom/glue/nsThreadUtils.cpp:277
#18 0x00007ff6c7baa112 in mozilla::ipc::MessagePump::Run (this=0x7ff6bacce240, aDelegate=0x7ff6d62930b0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/glue/MessagePump.cpp:127
#19 0x00007ff6c7b9a0e4 in MessageLoop::RunHandler (this=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/chromium/src/base/message_loop.cc:227
#20 MessageLoop::Run (this=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/ipc/chromium/src/base/message_loop.cc:201
#21 0x00007ff6c8b03e21 in nsBaseAppShell::Run (this=0x7ff6b1a3f0a0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:156
#22 0x00007ff6c90addd3 in nsAppStartup::Run (this=0x7ff6b1a360b0)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/components/startup/nsAppStartup.cpp:281
#23 0x00007ff6c90eae59 in XREMain::XRE_mainRun (this=this@entry=0x7ffced2b79a8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4288
#24 0x00007ff6c90eb10f in XREMain::XRE_main (this=this@entry=0x7ffced2b79a8, argc=argc@entry=1, 
    argv=argv@entry=0x7ffced2b8ec8, aAppData=aAppData@entry=0x7ffced2b7bb8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4381
#25 0x00007ff6c90eb35a in XRE_main (argc=1, argv=0x7ffced2b8ec8, aAppData=0x7ffced2b7bb8, aFlags=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/toolkit/xre/nsAppRunner.cpp:4483
#26 0x00005590dc424371 in do_main (argc=1, argv=0x7ffced2b8ec8, xreDirectory=0x7ff6d62a0a80)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/browser/app/nsBrowserApp.cpp:212
---Type <return> to continue, or q <return> to quit---
#27 0x00005590dc423a72 in main (argc=1, argv=0x7ffced2b8ec8)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/browser/app/nsBrowserApp.cpp:399
$6 = -38

Catchpoint 1 (returned from syscall recvmsg), 0x00007ff6d727bf7d in recvmsg ()
    at ../sysdeps/unix/syscall-template.S:84
84	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
#0  0x00007ff6d727bf7d in recvmsg () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ff6d05ff3e7 in _xcb_in_read (c=c@entry=0x7ff6bac02000) at xcb_in.c:982
#2  0x00007ff6d05ffcd8 in poll_for_next_event (c=0x7ff6bac02000, queued=queued@entry=0) at xcb_in.c:709
#3  0x00007ff6d05ffcf7 in xcb_poll_for_event (c=<optimized out>) at xcb_in.c:718
#4  0x00007ff6d4a70838 in poll_for_event (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:257
#5  0x00007ff6d4a7099b in poll_for_response (dpy=dpy@entry=0x7ff6d62f9000) at xcb_io.c:289
#6  0x00007ff6d4a70cad in _XEventsQueued (dpy=dpy@entry=0x7ff6d62f9000, mode=mode@entry=2) at xcb_io.c:363
#7  0x00007ff6d4a626a7 in XPending (dpy=0x7ff6d62f9000) at Pending.c:55
#8  0x00007ff6d51d7ab1 in gdk_check_xpending (display=<optimized out>) at gdkeventsource.c:266
#9  gdk_event_source_check (source=0x7ff6d62a4380) at gdkeventsource.c:303
#10 0x00007ff6d2806bb1 in g_main_context_check (context=context@entry=0x7ff6bac0f110, max_priority=2147483647, 
    fds=fds@entry=0x7ff68c1e1700, n_fds=n_fds@entry=6) at gmain.c:3681
#11 0x00007ff6d2807110 in g_main_context_iterate (context=context@entry=0x7ff6bac0f110, block=block@entry=1, 
    dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3837
#12 0x00007ff6d280727c in g_main_context_iteration (context=0x7ff6bac0f110, context@entry=0x0, may_block=1)
    at gmain.c:3901
#13 0x00007ff6c8b2223e in nsAppShell::ProcessNextNativeEvent (this=<optimized out>, mayWait=<optimized out>)
    at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/gtk/nsAppShell.cpp:208
#14 0x00007ff6c8b05793 in nsBaseAppShell::DoProcessNextNativeEvent (this=this@entry=0x7ff6b1a3f0a0, 
    mayWait=<optimized out>) at /usr/src/debug/firefox-43.0.3/firefox-43.0.3/widget/nsBaseAppShell.cpp:138
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) q
A debugging session is active.

	Inferior 1 [process 10660] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/lib64/firefox/firefox, process 10660

Comment 127

3 years ago
(In reply to mace from comment #115)

Complementing myself above in comment #115 (https://bugzilla.mozilla.org/show_bug.cgi?id=508427#c115)

> mace@macentu:~$ sudo strace -c -p $(pidof firefox) & sleep 60; sudo kill
> ...
> % time     seconds  usecs/call     calls    errors syscall
> ...
>  87.72    0.336605          16     21177           poll
> ...
>   4.12    0.015805           1     27867     21158 recvmsg
> ...
> 100.00    0.383734                 68385     21249 total

Here is the output after closing Firefox, logging out, logging in again and starting Firefox, and visiting each of my ~10 tabs in 4 windows and waiting for them to be rendered:

mace@macentu:~$ sudo strace -c -p $(pidof firefox) & sleep 60; sudo kill $(pidof strace)
[1] 14613
strace: Process 13730 attached
strace: Process 13730 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.27    0.101851         194       525           poll
  0.35    0.000354           1       485         2 futex
  0.14    0.000146           2        91         1 read
  0.13    0.000135           0       631       538 recvmsg
  0.11    0.000110           0       268         8 stat64
  0.00    0.000000           0         1           restart_syscall
  0.00    0.000000           0        17           write
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         1           clone
  0.00    0.000000           0        92           writev
  0.00    0.000000           0         1           rt_sigreturn
  0.00    0.000000           0         1           mmap2
  0.00    0.000000           0        16           madvise
------ ----------- ----------- --------- --------- ----------------
100.00    0.102596                  2130       549 total

So, from (21177+27867)/68385 down to (525+631)/2130 calls for (poll+recvmsg)/total. I am not sure how to be helpful. I love Firefox (and all the fantastic stuff Mozilla does) and want to keep using it.

Comment 128

3 years ago
I am getting -1 EAGAIN errors for a whole lot of recvmsg(2) calls for other programs too. Chrome, xeyes, xman, xterm are some of the ones I tried.

If I trace Firefox with "sudo strace -p $(pidof firefox)"

> recvmsg(4, 0xbffef7e4, 0)               = -1 EAGAIN (Resource temporarily unavailable)
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=26, events=POLLIN}, {fd=27, events=POLLIN}, {fd=78, events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLIN}])
> futex(0xab95be08, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xab949100, 268284) = 1
> recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"U\2G\353v\337-\0\3\4\4\0\0\0\0\0\0\0\0\4\4\4\4\4\0\0\3\37%\2\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 160
> recvmsg(4, 0xbffef7e4, 0)               = -1 EAGAIN (Resource temporarily unavailable)
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=26, events=POLLIN}, {fd=27, events=POLLIN}, {fd=78, events=POLLIN}], 6, 0) = 0 (Timeout)
> futex(0xab95be08, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xab949100, 268286) = 1
> poll([{fd=4, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=4, revents=POLLOUT}])
> writev(4, [{"\31\0\v\0\177\0\0\0\0\0\30\0! \0\0\177\0\0\0004\1\0\0A\1\0\0R\1\0\0"..., 176}, {NULL, 0}, {"", 0}], 3) = 176
> recvmsg(4, 0xbffef714, 0)               = -1 EAGAIN (Resource temporarily unavailable)
> recvmsg(4, 0xbffef7e4, 0)               = -1 EAGAIN (Resource temporarily unavailable)

Which, if I understand anything, are coming from file descriptor number 4. If I run "sudo lsof -E -p $(pidof firefox)", I will find:

> firefox 7530 mace    4u     unix 0xf1d04dc0      0t0   83170 type=STREAM ->INO=84019 5012,Xorg,36u

Then, if I run "sudo lsof -E -p $(pidof Xorg) |grep '36u'", I get:

> Xorg    5012 root   36u     unix 0xf1d05080      0t0      84019 @/tmp/.X11-unix/X0 type=STREAM ->INO=83170 7530,firefox,4u

Further, I can run "ls -l /tmp/.X11-unix/X0", I get:

> srwxrwxrwx 1 root root 0 helmi  8 19:43 /tmp/.X11-unix/X0

Then I don't know how to proceed, or if I am tracing anything useful anymore. I've tried to strace processes as they start, receive an fd which will start producing the -1 EAGAIN errors (which vary, often 3, 4, 12, 19 or 20), but I don't know enough to understand what is the problem. If the file descriptor closed, and a pointer remains open? Or is there something wrong with /tmp/.X11-unix/X0?

How can I help?
Poll may be called with zero or non-zero timeouts at different stages of the event loop.  If called with zero timeout, it is just asking whether there are pending events.  If there are no pending events, there will eventually be a poll call with a non-zero timeout indicates a wait for the next event.

If all timeouts are zero then essentially something is telling the loop that
there are events to be processed.

Comment 75 identifies that poll finds fd 19 ready for a read and that is
consistent with the read end of a pipe, which is available for reading after writes to fd 20.  If that is the kind of scenario, then similar info from a bt from
nsBaseAppShell::OnDispatchedEvent() will be helpful.

It seems likely however, that different people here are observing quite
different issues.

In comment 8, for example, the poll with non-zero timeout is woken by fd 4,
the X server connection.  One message is read from the connection, and then
there are no pending messages in subsequent recvmsg calls in the part of the
trace quoted.  Is it usually fd 4 returned from the poll (when poll returns
non-zero)?

EAGAIN from recvmsg is not surprising as it is a normal state when there no
new messages.
Note that Chrome has similar issues: https://bugs.chromium.org/p/chromium/issues/detail?id=554201

The poll/recvmsg calls seem to be the way xcb talks to the X server (see e.g. https://bugs.chromium.org/p/chromium/issues/detail?id=550961#c27), but it still seems that Thunderbird does it much more than other UI programs.

E.g. when you move the cursor around over Sublime Text 3 (with the setting `"caret_style": "wide"` so that it doesn't refresh the UI all the time for the smooth cursor), it seems there is way less CPU usage and poll calls than when you do the same over Thunderbird.

Comment 131

3 years ago
I may have some observations that could shed some light on this problem for others.  I have noticed this same issue of mostly idle on about:blank firefox instances accumulating implausible amounts of CPU time over hours and days.  I found this thread and narrowed some things down.  First, I did an about:config browser.sessionrestore.interval tweak to set it to 2 minutes to get that activity out of the way.  Then I noticed that there were twice/minute polls to all my font files.  I am on Gentoo Linux and every 30 seconds 100 stat()s were firing off.  The fix for that is the "rescan" setting in /etc/fonts/fonts.conf.  By default that was 30. I never change that stuff - so I cranked it to 86400 seconds (1 day).

Even with all the above, though, for a fresh profile with --safe-mode I was still seeing around 60 cycles/second of poll/recv/read cycles - this is even when the firefox window was not exposed (i.e. on another virtual desktop).  That was about all the "system call" level activity going on.  So, I ran that under perf record.  perf report --hierarchy gave some high scores to "nsRefreshDriver::Tick" (and "SoftwareVsyncTh" related to the software vsync thread).

A little hunting around layout/base/nsRefreshDriver.cpp in the source code to firefox and I found "layout.frame_rate" which defaults to 60 frames/second (if set to -1 in the about:config page).  If in about.config I set layout.frame_rate to 10 then I would get about 10 wake ups per second. [ I am counting a wakup as a whole cycle of those poll()s, recvfrom()s and a read("\372") that you see in strace. ]  Set layout.frame_rate to 3 and you get 3 wakeup/sec.

With lower rate settings I get very slow/laggy UI responsiveness (for things like typing a new URI in the URI bar).  A rate of 15 is pretty usable, though, and yields about 4X less aggressive wake ups/second -- which for laptop battery people may be helpful.  So, at least on my system and I suspect others this seems to be a the "knob" to control the effect.  It also may answer some questions raised earlier about "which timer thread" and so forth.

These observations may beg a bunch of other questions related to firefox rendering internals which I am not expert on, e.g. why refresh timers have to fire when the app is iconified/on another desktop, etc., how fundamental this is to the whole rendering engine and so on.  It just seemed like it might bridge the gap between the "end-user language" and developer "beats me what you're complaining about" domains.  Unless all this only works the way I described above for me.

Comment 132

3 years ago
Charle layout.frame_rate=15 really helps on Fedora 24. I went from ~24% CPU with 3 idle Firefox's to to 2-8%!

...

> 
> A little hunting around layout/base/nsRefreshDriver.cpp in the source code
> to firefox and I found "layout.frame_rate" which defaults to 60
> frames/second (if set to -1 in the about:config page).  If in about.config I
> set layout.frame_rate to 10 then I would get about 10 wake ups per second. [
> I am counting a wakup as a whole cycle of those poll()s, recvfrom()s and a
> read("\372") that you see in strace. ]  Set layout.frame_rate to 3 and you
> get 3 wakeup/sec.
> 
> With lower rate settings I get very slow/laggy UI responsiveness (for things
> like typing a new URI in the URI bar).  A rate of 15 is pretty usable,
> though, and yields about 4X less aggressive wake ups/second -- which for
> laptop battery people may be helpful.  
...
(In reply to Alex from comment #36)
> Currently running 30.0a2 (22-04), Arch Linux 3.14.1-1. i7 2.4Ghz (boost
> 3.4). Firefox uses 2 - 5% idle, and once every minute it jumps for a htop
> cycle to cpu use anywhere between 20 - 50%. It seems a little more mild than
> some of you, but regardless my strace is full off "poll(fd=....".

I see this behaviour for Thunderbird 45 on Ubuntu 16.04; currently every 20 seconds past the full minute, there's a 50% CPU spike for 2 seconds.

Comment 134

3 years ago
Setting "layout.frame_rate = 24" seems to have eliminated random high cpu usage in Firefox for me.

Note that I upgrade frequently (currently on version 48), so one of the upgrade could have brought a fix for it too.
I've attached a screenshot of the system calls an idle Firefox does on my machine (you can do this by e.g. `apt-get install sysdig` on Ubuntu 16.04, and running `sudo csysdig`, then finding your firefox process - usually at the top as per this bug, pressing enter, F2 for "Views" and selecting "System Calls"2).

In text, rounded:

1000 per second futex
 900 per second recvmsg
 900 per second poll
 400 per second getrusage
 200 per second read
 200 per second write
  30 per second sysdigevent
(In reply to Yves Dorfsman from comment #134)
> Setting "layout.frame_rate = 24" seems to have eliminated random high cpu
> usage in Firefox for me.

mail, does this help you with Thunderbird?
Flags: needinfo?(mail)
Comment hidden (typo)
Comment hidden (typo)
Comment hidden (typo)

Comment 140

2 years ago
This has not removed it for me (CentOS 7.3 Firefox 54.01 64 bit).

strace gives me this endless output and it scrolls pretty fast.

(how is one making this text a code thing?)

```
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 4294967295) = 1 ([{fd=28, revents=POLLIN}])
read(28, "\372", 1)                     = 1
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 4294967295) = 1 ([{fd=28, revents=POLLIN}])
read(28, "\372", 1)                     = 1
getrusage(0x1 /* RUSAGE_??? */, {ru_utime={37, 279230}, ru_stime={15, 618929}, ...}) = 0
write(13, "\0", 1)                      = 1
write(13, "\0", 1)                      = 1
getrusage(0x1 /* RUSAGE_??? */, {ru_utime={37, 279874}, ru_stime={15, 619199}, ...}) = 0
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 4294967295) = 1 ([{fd=28, revents=POLLIN}])
read(28, "\372", 1)                     = 1
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 4294967295) = 1 ([{fd=28, revents=POLLIN}])
read(28, "\372", 1)                     = 1
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 4294967295) = 1 ([{fd=28, revents=POLLIN}])
read(28, "\372", 1)                     = 1
write(13, "\0", 1)                      = 1
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 4294967295) = 1 ([{fd=28, revents=POLLIN}])
read(28, "\372", 1)                     = 1
write(13, "\0", 1)                      = 1
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=57, events=POLLIN}], 7, 0) = 0 (Timeout)
recvmsg(4, 0x7fffb6819dc0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
```
OK, some update (thanks Wayne Mery for pinging me):

I am now on Thunderbird 52.2.1 as is in Ubuntu 16.04 and things look a bit better.

Large CPU spikes once per minute are either one or much shorter now so that they are less visible.
I suspect that has to do with the sessionrestore topic, which was discussed also here for Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=1304389

Regarding fonts stat()ing, I see it happen every 30 seconds in strace (for example grep for "type1/gfonts"), but timings (strace -t -t -t) show that the 152 fonts that it stats are done in 12 ms total each time. So 12 ms every 30 seconds isn't a real problem.

The layout.frame_rate setting certainly brings down the number of syscalls but using it isn't nice because low UI FPS when using a program are a pain.

It seems we have established (especially comments 127 and 130) that the poll/recvmsg is how X applications talk to the X server.

To give some more detail, this is an strace of a gnome-terminal with a flashing cursor that flashes every ~0.6 seconds (sudo strace -t -t -t -f -p $(pidof gnome-terminal-server)):

[pid 12497] 1501520214.564856 recvmsg(4, 0x7ffebb2e33f0, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12497] 1501520214.566188 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 3, 0) = 0 (Timeout)
[pid 12497] 1501520214.567313 poll([{fd=4, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=4, revents=POLLOUT}])
[pid 12497] 1501520214.567470 writev(4, [{"5 \4\0004D\300\3uB\300\3\f\0\23\0\213\4\6\0005D\300\0034D\300\3&\0\0\0"..., 304}, {NULL, 0}, {"", 0}], 3) = 304
[pid 12497] 1501520214.567675 recvmsg(4, 0x7ffebb2e33f0, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12497] 1501520214.567785 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=14, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=21, events=POLLIN}], 7, 598) = 0 (Timeout)

[pid 12497] 1501520215.166758 recvmsg(4, 0x7ffebb2e33f0, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12497] 1501520215.167876 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 3, 0) = 0 (Timeout)
[pid 12497] 1501520215.168985 poll([{fd=4, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=4, revents=POLLOUT}])
[pid 12497] 1501520215.169140 writev(4, [{"5 \4\0006D\300\3uB\300\3\f\0\23\0\213\4\6\0007D\300\0036D\300\3&\0\0\0"..., 332}, {NULL, 0}, {"", 0}], 3) = 332
[pid 12497] 1501520215.169324 recvmsg(4, 0x7ffebb2e33f0, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12497] 1501520215.169519 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=14, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=21, events=POLLIN}], 7, 598) = 0 (Timeout)

[pid 12497] 1501520215.768489 recvmsg(4, 0x7ffebb2e33f0, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12497] 1501520215.769896 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 3, 0) = 0 (Timeout)
[pid 12497] 1501520215.770731 poll([{fd=4, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=4, revents=POLLOUT}])
[pid 12497] 1501520215.770927 writev(4, [{"5 \4\0008D\300\3uB\300\3\f\0\23\0\213\4\6\0009D\300\0038D\300\3&\0\0\0"..., 304}, {NULL, 0}, {"", 0}], 3) = 304
[pid 12497] 1501520215.771073 recvmsg(4, 0x7ffebb2e33f0, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12497] 1501520215.771322 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=14, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=21, events=POLLIN}], 7, 598) = 0 (Timeout)

Notably, when the window is not focused and the user doesn't do anything, then *all* strace output stops, so gnome-terminal isn't doing any idle IO. Interestingly though, there is strace output for it when the user presses a key (for example Ctrl), even when the gnome-terminal window isn't focused, which suggests that gnome-terminal gets global key press notifications.

Doing the same with xterm reveals that it doens't do that.

What it does show though is this strange use of recvmsg on Window focus:

501520751.933500  recvmsg(3, 0x7ffd9d1acc00, 0) = -1 EAGAIN (Resource temporarily unavailable)
1501520751.933563 recvmsg(3, 0x7ffd9d1aca00, 0) = -1 EAGAIN (Resource temporarily unavailable)
1501520751.933618 recvmsg(3, 0x7ffd9d1aca00, 0) = -1 EAGAIN (Resource temporarily unavailable)
1501520751.933669 poll([{fd=3, events=POLLIN}], 1, 0) = 0 (Timeout)

Here recvmgs is called 3 times in arow with the same result. Not sure why a program (or the X lib) would do that. But thunderbird displays similar behaviour:

1501520882.093979 recvmsg(4, 0x7fff1dcd9890, 0) = -1 EAGAIN (Resource temporarily unavailable)
1501520882.094150 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=26, events=POLLIN}, {fd=27, events=POLLIN}, {fd=66, events=POLLIN}], 5, 0) = 0 (Timeout)
1501520882.094312 recvmsg(4, 0x7fff1dcd9890, 0) = -1 EAGAIN (Resource temporarily unavailable)
1501520882.094504 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=26, events=POLLIN}, {fd=27, events=POLLIN}, {fd=66, events=POLLIN}], 5, 0) = 0 (Timeout)
1501520882.094788 recvmsg(4, 0x7fff1dcd9890, 0) = -1 EAGAIN (Resource temporarily unavailable)
1501520882.094940 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=26, events=POLLIN}, {fd=27, events=POLLIN}, {fd=66, events=POLLIN}], 5, 0) = 0 (Timeout)
1501520882.095176 recvmsg(4, 0x7fff1dcd9890, 0) = -1 EAGAIN (Resource temporarily unavailable)
1501520882.095343 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=26, events=POLLIN}, {fd=27, events=POLLIN}, {fd=66, events=POLLIN}], 5, -1 <unfinished ...>

3 calls to recvmsg (each with EAGAIN), 3 seemingly pointless calls to poll() with 0-timeout, and only the 4th poll() uses timeout -1 (for infinite wait until some event happens, which makes sense).

All of that happens even when Thunderbird runs in "Work offline" mode, which I recommend everone to check out in order to separate network issues from plain Xorg drawing issues.

Even in "Work offline" mode, there's quite a lot going on in strace, but there's a lot more when you're online.

It seems to me that a good way forward with this bug would be to investigate where all the syscalls come from (especially the ones that look redundant), and to check if Thunderbird does any kind of unnecessary idle redrawing (gnome-terminal and xterm show that this can apparently completely be avoided).
Flags: needinfo?(mail)
You need to log in before you can comment on or make changes to this bug.