Closed Bug 1415387 Opened 4 years ago Closed 4 years ago
High CPU usage in Main Process
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0 Build ID: 20171027085204 Steps to reproduce: It happens just by visiting randomn (afaik) pages. The profile I link later has been taken in a new profile in Nightly build 20171107220115 just by visiting https://perf-html.io/ to install the profiler extension. Actual results: High CPU usage in Main Process. It uses 100% of a thread of a core for more than 30sec. Expected results: It shouldn't use a full thread of a core, especially for more that 30sec. Captured profile: https://perfht.ml/2AiFnWh mozregression result: 8:28.13 INFO: Last good revision: 04d539bc9bca98f22ca51de00a40eceb516405e3 8:28.13 INFO: First bad revision: e15196e25f9e783b2c04689936074990908170b0 8:28.13 INFO: Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=04d539bc9bca98f22ca51de00a40eceb516405e3&tochange=e15196e25f9e783b2c04689936074990908170b0
Has Regression Range: --- → yes
Component: Untriaged → Networking
Product: Firefox → Core
Nick, bug 1402014 is the regression.
Priority: -- → P1
Oh "cool", it's causing poll() to suck up a ton of CPU. *And* I can't reproduce! Reporter, can you capture a socket log when this happens? You can follow the directions at https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging Also, meanwhile, ni mcmanus to see if he has any offhand ideas about why the patch in question (which is mostly null checks) would cause poll() to spin. Best I can think is the socket is in some funky condition, and we SetWriteCallbacks() which causes the socket to be put into the poll list, and poll doesn't like that. But that's (1) not a very good guess, and (2) not very helpful even if it's right, because it's so vauge.
Flags: needinfo?(hurley) → needinfo?(BM-2cTcNTGnqPSPLe8pLzfeykcebHkDP2BB6G)
I think you are right (and its vague :().. but maybe that will lend some insight into the overall problem. I'd backout the patch for the moment and we can both take another look at it with this result in mind.
Ugh :( Alright, I've got to run out for an appointment, but I'll back out when I'm back. (Or you can feel free to back out, if you want to get that done ASAP.)
I tried to upload it here, but it was larger than allowed. Here is the requested HTTP log: https://www.dropbox.com/s/v5nhzq6ee7jjl22/HTTP_logging.tar.gz?dl=0 I should note that, I had trouble reproducing it in https://perf-html.io/ as in my original report, so this is from youtube.
Flags: needinfo?(BM-2cTcNTGnqPSPLe8pLzfeykcebHkDP2BB6G) → needinfo?(hurley)
Landed the backout in https://hg.mozilla.org/integration/mozilla-inbound/rev/cc25abc4eab682a2c9881cf0d81343bc28ef34f0 tgn-ff - thanks for the log. I'll start taking a look to see if there's anything in there to point to a root cause. I'll also continue trying to reproduce locally (since it seems to be intermittent at best).
Since I can reproduce it quite easily but no one else can, I thought that providing some profiles of the 'Socket Thread' might be useful. All the following profiles were captured using build 20171108221316 and a new profile. 1) This (https://perfht.ml/2hV5KLd) profile captured, shortly after FF started, by visiting youtube (site's main page). 2) A while after (1) CPU usage dropped to normal, but the issue appeared again "spontaneously" (ie. I had the YT tab open and I was interacting with the previously mentioned profile, but I didn't (re)load any page) and captured in this (https://perfht.ml/2hTkpGO) profile. For comparison, the following profiles captured while the issue was _NOT_ present (and having some profile tabs open). i) Visiting youtube: https://perfht.ml/2hV6hwH ii) Doing nothing: https://perfht.ml/2hUspaE
I have the same issue. Here's a capture from a fresh firefox profile: https://perfht.ml/2AmKdSH ~10s, opened twitter. CPU 100% ~60s, CPU returns to normal ~70s, opened reddit. CPU 100% If you need any other captures just say the word :)
I also experienced this behaviour from time to time, it started a few days ago. Please see the sysprof profiles attached.
We've backed out some changed that hopefully fixed this. Reporter, can you still reproduce?
Status: UNCONFIRMED → NEW
Ever confirmed: true
No, the issue has stopped for quite a few days now. Also, the comments between (and including) 24 and 33 in Bug 1382886 were referring to the same issue.
fixed by back out bug 1402014.
Some people signaled at least bugs 1415691 and 1415735 as possible dupes of this one. I can't read bug 1402014 to consider if this may be related or not. If someone here can tell about it?
Setting a needinfo regarding my previous comment so that you can confirm if this was dupe or not. Please first one to check and able to confirm or reject the dupes remove others needinfo.
I don't know. It depends if the people who reported those other bugs are still seeing the problem.
You need to log in before you can comment on or make changes to this bug.