High CPU usage of parent process when network isn't good
Categories
(Core :: Networking, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox93 | --- | fixed |
People
(Reporter: lilydjwg, Assigned: dragana)
References
Details
(Whiteboard: [necko-triaged])
Attachments
(1 file)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
Steps to reproduce:
- use a pac to proxy some requests via socks5 (remote dns)
- open a couple of tabs, use some extensions that sends background requests (e.g. Google Keep, GMail, Mastodon, the Inoreader addon)
- wait until at least one request hangs because the proxy connects but receives no traffic and keeps waiting
Actual results:
The parent process starts to consume more than 100% CPU for some time. I notice because the fan starts spinning wildly.
When it happens, the UI is hardly responsive. I can only click on the profiler's icon and wait until firefox calms down.
Expected results:
No high CPU usage. If the request hangs, firefox should just wait until it times out. Firefox should remain responsive.
This also happens when a new tab request hangs but that's less likely for me. This has being happening for at least several days.
I don't know if the proxy plays part in this. If the request isn't going via the proxy, I won't know it hangs. I'm still looking for other clues on how it reproduces.
Comment 1•3 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Networking' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.
Comment 2•3 years ago
|
||
Based on several profiles lilydjwg shared with me, it seems that the socket thread dispatches lots of net::nsHttpActivityDistributor::ObserveActivityWithArgs
to the main thread, and then the main thread spent lots of time in handling them as well as going through the normal event loop over and over again spending time on checking native events as well.
Assignee | ||
Updated•3 years ago
|
It seems that a proxy is not necessary for it to happen. I've been notified by my fan even when there is no proxy connection. lsof shows only a TCP connection to 52.88.186.163 (state is ESTABLISHED) which I believe belongs to Sync or Telemetry.
I can hardly get any more details on this. Firefox isn't responsive when it's happening and I don't get source line information from libxul.so debug symbols. This has affected me and several other people in two Telegram groups I'm in.
This doesn't only happen when idle. It happens a lot when I'm using Firefox. E.g. at exactly this moment while I'm typing. Please help!
(English is not my native language, please excuse typing errors.
A proxy is definitely not necessary --- a slow network can reproduce it several times a day. And it seems that it is more likely to happen when Firefox is in background. I'm using awesomewm, and most of the time when I switch to another tag and switch back, I would discover that Firefox freezes.
I'm not familiar with Firefox or networking stuffs, so... sorry that I don't know what other information I should provide.
This happens in clean profile and safe mode as well. My steps to reproduce:
- prepare a good proxy and a bad proxy that connects but doesn't send back anything
- start the good proxy
- start firefox, configure its proxy to the good proxy
- open some tab, wait some time
- stop the good proxy and replace it with the bad proxy. Requests now keep waiting for data
- firefox starts freezing and eating CPU. If not, open a new tab to send out some requests
The above proxies are socks5 proxies and used to control network. This problem doesn't happen when either the following is true:
- firefox can never connect to the network
- all packets are dropped (so using iptables to reproduce won't work)
- ServerHello is received
similar issue here Bug 1719046
Comment 7•3 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+11 from comment #2)
Based on several profiles lilydjwg shared with me, it seems that the socket thread dispatches lots of
net::nsHttpActivityDistributor::ObserveActivityWithArgs
to the main thread, and then the main thread spent lots of time in handling them as well as going through the normal event loop over and over again spending time on checking native events as well.
net::nsHttpActivityDistributor::ObserveActivityWithArgs
this should be only called when the network panel in devtool is shown. Could you check if you can still reproduce this without opening devtool?
Also, we recently fixed bug 1720079 that could cause high cpu loading. Could you try to use the latest nightly to reproduce this again? Thanks.
I've made a new profiler with latest nightly + an empty profile. The issue is still there. I didn't open devtools.
https://share.firefox.dev/2UTLbVJ
Comment 9•3 years ago
•
|
||
(In reply to lilydjwg from comment #8)
I've made a new profiler with latest nightly + an empty profile. The issue is still there. I didn't open devtools.
https://share.firefox.dev/2UTLbVJ
Well, it turns out that every transaction send HttpActivity
to something else. Not sure if this is the problem of this high cpu usage.
I've created a build that disable sending HttpActivity
. Could you give this build a try?
Thanks.
p.s., the build is created from this try push
Reporter | ||
Comment 10•3 years ago
|
||
Still not good: https://share.firefox.dev/3hHhLDq
It uses more CPU. 300%+ for parent and 200%+ for a Web Content for some seconds at start. On my second try the content process calms down later: https://share.firefox.dev/36L6tre
Comment 13•3 years ago
|
||
Could you try to get a http log? Please start logging after you start the bad proxy
, otherwise the log file would be too big. Thanks.
Reporter | ||
Comment 14•3 years ago
|
||
It is still very large (785M). I've compressed it with xz and uploaded here: https://transfer.sh/1hfEmXO/fxnet-log.txt-main.1791000.moz_log.xz
Comment 15•3 years ago
|
||
Hi Reporter,
I'd like to ask for more information.
- Does this issue happen on previous firefox versions (e.g., ESR 78)? If not, could you try to use mozregression to find out which patch causes this issue?
- Does this issue only happen with socks proxy? If yes, could you try to create a new http log with adding
SOCKS:5
toMOZ_LOG
env?
Thanks.
Reporter | ||
Comment 16•3 years ago
|
||
- Yes. It's weird that I can reproduce it but I didn't have this issue in the past.
- Probably. I didn't succeeded to reproduce it with direct connections and iptables.
New log: https://transfer.sh/1vP8C6n/fxnet-log2.txt-main.2146335.moz_log.xz
Comment 17•3 years ago
|
||
Same here. I am in China,.
When Firefox start random freezing, generally I am opened ad Google Search page, I am using a socks5 proxy to access Google services.
This bug happen since Firefox 89, is rather annoying.
Comment 18•3 years ago
|
||
For those who can reproduce this bug, please help to use mozregression to find out the problem. This would be the fastest way. Thanks.
Comment 19•3 years ago
|
||
There is an easy way to reproduce it
- Install qv2ray and new a qv2ray config(just type something, it should be wrong).
- Open some website by Firefox such as twitch, youtube or others.
- Config Firefox to let it connect with socks.
- And you can got it.
Reporter | ||
Comment 20•3 years ago
|
||
strace shows that, Firefox is poll
ing the socks5 connection fd. It returns POLLOUT
and then Firefox tries to recvfrom
the fd. This keeps repeating.
Following the code, I find that setting security.tls.enable_0rtt_data
to false works around this issue. I believe this commit is relevant.
Comment 21•3 years ago
|
||
(In reply to lilydjwg from comment #20)
strace shows that, Firefox is
poll
ing the socks5 connection fd. It returnsPOLLOUT
and then Firefox tries torecvfrom
the fd. This keeps repeating.Following the code, I find that setting
security.tls.enable_0rtt_data
to false works around this issue. I believe this commit is relevant.
Thanks for this information. Based on the commit you provided, I've created this test build. Could you give this a test?
Another question: did you find this commit by mozregression?
Could you also use this build to get another http log? Please add pipnss:5
to the MOZ_LOG
env. Thanks.
Reporter | ||
Comment 22•3 years ago
|
||
Tested. It still happens. The log is here: https://transfer.sh/1wN7ZNk/fxnet-log3.txt-main.2371822.moz_log.xz
And no, mozregression doesn't work. I tested multiple versions. I chrooted to earlier system backups. I downgraded a system inside my virtual machine to half a year ago. I downgraded the proxy software (v2ray). It reproduced every time.
I read the log and searched the code, and found that commit which was recent and touched the code. I suspect that it might be that Google changed their server settings (this issue happens when connecting to Google's servers; I don't remember if there were other cases).
Reporter | ||
Comment 23•3 years ago
|
||
I confirm that it happens with Google, but not for Baidu, Wikipedia and this bugzilla. And it doesn't happen when Firefox has never successfully connected to Google since process start.
Reporter | ||
Comment 24•3 years ago
|
||
I kind of reproduced this issue without using a proxy, but not easy anyway.
- Disable HTTP/3
- Visit www.google.com once
- Block ServerHello packets somehow. I wrote a NFQUEUE program to do that
- Wait until the TCP connection to Google is closed (about:networking#sockets no longer shows it)
- Refresh the google tab
The difference is that the Socket Thread uses 100% but other threads are calm. In the proxy case, Socket Thread 100%, BHMgr Monitor ~50% and the whole process ~250% (for recent nightlies. Earlier it was about ~140%).
Comment 25•3 years ago
|
||
Could you try this build and see if there is any improvement?
Note that this is just a mitigation for this problem. I added a check: if a connection waiting for 0RTT response for more than 5s, the connection will be restarted without 0RTT.
Please also get a http log with this build.
Thanks.
Reporter | ||
Comment 26•3 years ago
|
||
200% CPU usage, 100% for Socket Thread and the other 100% for main thread. It doesn't change after 5s.
The log is larger because I wait longer: https://transfer.sh/1IccUpx/fxnet-log4.txt-main.163995.moz_log.xz
Comment 28•3 years ago
|
||
Hello developers and guys, I always use Firefox 78 ESR. @Kershaw Chang
I see people reporting bugs ( this one and 1719046, 1703232 ). Allow me to report my similar problem.
Recently (about 1 or 2 months) I encounter Firefox lagging/UI not responding/high CPU (for 2 minutes) problem (just like the video in 1719046) .
I tried disabling some addons. It definitely helped but not completely solved.
Many of them said it usually happend when using proxy or in bad network connection. I feel the same.
People also said such problem is since Firefox 87 or 89, which is quite consistent with my problem time. So I think there's probability that my problem is related to this bug.
(In reply to Kershaw Chang [:kershaw OOO until Aug 9th] from comment #15)
Comment 29•3 years ago
|
||
Hi, I've created a test build based on this patch.
Could you check if things are improved?
Thanks.
Reporter | ||
Comment 30•3 years ago
|
||
It still happens. 200% CPU usage, 100% for Socket Thread and the other 100% for main thread. Compared to nightly 2021-08-09, 50-60% less CPU usage as the BHMgr Monitor thread is quiet.
The proxy fd is still being polled POLLIN|POLLOUT|POLLPRI
and returns POLLOUT
, then a recvfrom
appears and gets EAGAIN
.
New log file: https://transfer.sh/1tIdmIC/fxnet-log4.txt-main.1974308.moz_log.xz
Comment 31•3 years ago
|
||
Thanks for testing this.
However, it seems that 0RTT is not used at all according to the log. Could you try to enable 0RTT and test again?
Also, could you explain how you tested this briefly? Did you block the server hello packet from www.google.com permanently?
Reporter | ||
Comment 32•3 years ago
|
||
Weird. Every time I test, I start with an empty profile, and the 0RTT pref is enabled by default.
I'm using the steps in comment 5, by switching proxies. The "block server hello" method (in comment 24) takes more time because I need to wait the connection to be dropped. I've uploaded the program I use here: https://github.com/lilydjwg/drop-serverhello
Assignee | ||
Comment 33•3 years ago
|
||
Patch for bug 1382886 landed. Can you try to reproduce the issue? If you can reproduce the issue please make a http log.
Thanks!
Reporter | ||
Comment 34•3 years ago
|
||
Nothing seems to have changed, log https://transfer.sh/29U/fxnet-log5.txt-main.3179748.moz_log.xz
nightly version 2021-08-19 (zh-CN variant).
Assignee | ||
Comment 35•3 years ago
|
||
Thank you for the log. I think I know what is causing this:
- you are using proxy, therefore Firefox(necko) does not try to use 0RTT, but the nss library still enables 0RTT
- my patch from bug 1382886 mitigates the problem in case that necko knows that 0RTT is used, butt when proxy is use necko does not check that.
I will extend the fix from bug 1382886 to proxies as well.
Assignee | ||
Comment 36•3 years ago
|
||
Currently we do not use the early-data when a proxy is used, but we do not disable it in the nss layer.
Comment 37•3 years ago
|
||
Comment 38•3 years ago
|
||
Backed out for causing bustages in build-fuzzing.
Backout link: https://hg.mozilla.org/integration/autoland/rev/4bf4379056766929dece43246db61770a4ac90df
Assignee | ||
Comment 39•3 years ago
|
||
Comment 40•3 years ago
|
||
Comment 41•3 years ago
|
||
bugherder |
Description
•