Open Bug 1107251 Opened 10 years ago Updated 4 months ago

TCP client connections (e.g., HTTP) hang with moderate CPU load when server is not reachable (e.g., misconfigured proxy)

Categories

(MailNews Core :: General, defect, P3)

Tracking

(Not tracked)

People

(Reporter: it, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [necko-triaged][necko-priority-review])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:33.0) Gecko/20100101 Firefox/33.0 Build ID: 20141030112145 Steps to reproduce: Opening a new bug report on FF since older related ones (like 5+ years old bug 482642) got neglected. Simply configure a non-existing http proxy, e.g., IP address 1.2.3.4 and port 5. Try (re)loading any web page. Actual results: Firefox hangs trying to connect, with high CPU load, wasting battery on mobile devices. BTW, this happens similarly with Thunderbird (see e.g. bug 919485), as reported 7+ years ago. Expected results: Both Firefox and Thunderbird must not use busy loop(s) waiting for any connection. They should wait for a while, with negligible CPU load, and then time out gracefully.
I am pretty sure that the issue is fundamental one, since it is observable also for Thunderbird (and likely any other Mozilla product; BTW, there is finally some movement on the related FF bug 919485) for all sorts of HTTP connections. Likely it is even independent of proxy use. Thus solving it should be a huge gain for many (if not most) users.
Severity: normal → major
OS: Mac OS X → All
Priority: -- → P2
Product: Firefox → Core
Hardware: x86 → All
(Priority is not a field for users to set) You might start by providing a URL to a profile https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem
Flags: needinfo?(mueller8)
Keywords: perf
Priority: P2 → --
Blocks: 919485
The cpu usage is around 15-20% for the first page that is tried to load (on my Macbook Air running the latest FF and TB release versions). When FF tries loading several pages simultaneously, the figures increase only marginally. Unfortunately I cannot afford the time to install the trunk version and use profiling tools for more details results. Yet any developer interested could do for himself easily.
Flags: needinfo?(mueller8)
Summary: HTTP(S) connections hang with high CPU load when proxy is not reachable → TCP client connections (e.g., HTTP) hang with moderate CPU load when server is not reachable (e.g., misconfigured proxy)
(In reply to David von Oheimb from comment #3) > Unfortunately I cannot afford the time to install the trunk version and use > profiling tools for more details results. Yet any developer interested could > do for himself easily. Except, of course, if they can't reproduce... I certainly can't on my Windows machine, which only uses 0-2% throughout. My Mac shows the behaviour mentioned in comment #3. The reason this issue isn't getting much attention is that there isn't enough detail and there are hundreds of thousands of other issues demanding our attention. Not only that, the number of users that sit around refreshing pages on misconfigured proxy settings is tiiiiiiiiny. So comment #0 rather overstates the importance of this bug. If you think this is important for you, you should at least make a case as to why that is so, for a large number of users, and/or provide more detail (such as a profile and/or a more detailed indication of what you think Firefox's code on OS X is doing wrong).
Flags: needinfo?(mueller8)
OS: All → Mac OS X
As I stated before, the problem is not restricted to Mac OS. I just reproduced it for both Windows and Linux. Please note that proxy misconfiguration is just one or many ways of reproducing the very general issue. Sure it's a rather uncommon and artificial one, but it is very convenient since it allows to consistently reproduce the issue in a very simple way also in cases everything else (network, servers, etc.) works well. There are certainly dozens of other ways of suffering from that busy loop on network connection bug, in particular: * non-responsive (web) servers, or * a flaky mobile/wireless/etc. network connection. The CPU load can become much higher in case multiple connections are hanging, for instance: * restoring a session with many pages, or * refreshing all tabs of a window at the same time.
Flags: needinfo?(mueller8)
OS: Mac OS X → All
(In reply to David von Oheimb from comment #5) > As I stated before, the problem is not restricted to Mac OS. > I just reproduced it for both Windows and Linux. So why can't I reproduce this on Windows? On what version of Windows are you testing? What version of Firefox? Have you tried just one machine or also others? On what kind of network environment is this machine / are these machines? (home, office, wifi/wired, ...)
Component: Untriaged → Networking: HTTP
Flags: needinfo?(mueller8)
I don't think there is a busy loop anywhere. Its more likely you are experiencing some kind of level triggered condition (POLLERR maybe?) that isn't being cleared efficiently.. so the the level persists and the code that is supposed to deal with it is called again and again and again.. I've tried the STR on firefox linux64 with a debug build and it behaves normally for me. (i.e. no cpu drain). if you grab a http log, that will show in detail what the network thread is doing.. if its networking that is spinning it ought to show up there: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
Thanks Gijs and Patrick for your questions and comments. In this bug report I refer to the latest publicly available Firefox version, which is 34.0. Interestingly, I could not reproduce the problem on a different machine running Windows 7 Starter (32-bit). Yet I can still reproduce it on Windows Enterprise 64-bit (CPU use increase about 10% on a dual-core machine) and on a virtual machine on the same host running Ubuntu Linux 12.4 LTE (CPU use increase about 30% on a single-core VM). I a minute I'll add a network log produced in the way mentioned by Patrick on my Win7 Enterprise 64 machine with Firefox starting up with HTTP proxy misconfigured as 1.2.3.4 and port 5.
Flags: needinfo?(mueller8)
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #2) > (Priority is not a field for users to set) > > You might start by providing a URL to a profile > https://developer.mozilla.org/en-US/docs/Mozilla/Performance/ > Reporting_a_Performance_Problem These instructions ask you to use a nightly build. But I don't believe that's still an absolute requirement. Please try a release build. Or if you no longer reproduce, please update and close the bug. Thanks
Flags: needinfo?(mueller8)
Sorry that it took a while until I noticed this modified info request. Here is the profiling data: https://cleopatra.io/#report=23163a60ebdeefc8b96278eb6a995d29f5479540 It was a bit tricky to produce because the profiler uses online web access while for the test itself I had to change the network options to use a non-existent proxy; hopefully I was able to switch back early enough for the online analysis tool to get the relevant recent profiling data.
Flags: needinfo?(mueller8)
Whiteboard: [necko-backlog]
(In reply to David von Oheimb from comment #11) > Sorry that it took a while until I noticed this modified info request. > Here is the profiling data: > https://cleopatra.io/#report=23163a60ebdeefc8b96278eb6a995d29f5479540 cites most time spent in js::DirectProxyHandler::get Also a significant amount of time involved ghostery addon
Thanks for analyzing. This confirms my suspicion that there is a busy wait when using an http proxy. The problem exists also without Ghostery; here is a profile with this add-on disabled: https://cleopatra.io/#report=4a07e9fa4007769f8bb6b3a5cc81778283c43e81
(In reply to David von Oheimb from comment #9) > Created attachment 8544862 [details] > network access log of Firefox 34.0 starting up with HTTP proxy misconfigured > as 1.2.3.4 and port 5 on Win7 Enterprise 64bit Are the log and profile helpful, or is more information needed from David?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(mcmanus)
I will take a look at the log.
Flags: needinfo?(mcmanus)
Any results from looking at the log? The issue appears the same as in bug 919485 (for FF) and could be related to bug 683651 (for TB).
The issue is still present with 50.1.0 and ESR 45.6.0 (at least on Win7 64, which I just verified). CPU load is needlessly increased when a non-reachable proxy such as 1.2.3.4 is used until the connection timeouts after 20 (or some 120 for ESR) seconds. Similarly when trying to get a page directly (without needing to use a proxy), except that in this case the timeout is just 10 seconds. BTW, I wonder where these timeouts are set. I cannot find them in the configuration editor, so they seem hard-coded somewhere. Similarly to TB (see bug 683651, comment 35) also for FF the different timeouts observed not only indicate that there is even more than one busy loop somewhere down in the network layer, but also should be of good help spotting them.
(In reply to David von Oheimb from comment #17) > The issue is still present with 50.1.0 and ESR 45.6.0 (at least on Win7 64, > which I just verified). > > CPU load is needlessly increased when a non-reachable proxy such as 1.2.3.4 > is used until the connection timeouts after 20 (or some 120 for ESR) > seconds. Similarly when trying to get a page directly (without needing to > use a proxy), except that in this case the timeout is just 10 seconds. > > BTW, I wonder where these timeouts are set. I cannot find them in the > configuration editor, so they seem hard-coded somewhere. > > Similarly to TB (see bug 683651, comment 35) also for FF the different > timeouts observed not only indicate that there is even more than one busy > loop somewhere down in the network layer, but also should be of good help > spotting them. Sockets are timed out by the kernel, FF does not enforce timeouts.
Flags: needinfo?(dd.mozilla)
> Sockets are timed out by the kernel, FF does not enforce timeouts. Maybe, but when I use instead wget or curl trying to reach http://1.2.3.4/ (when no proxy is needed) or reach an existing site when an unreachable proxy is used, they show a different timeout behavior - and much more importantly, they do not misuse the CPU and drain the battery while waiting/blocking. Moreover, the issue occurs in different systems. So FF (and TB) is to blame making my CPU sweat and eating my battery, not the system/kernel.
(In reply to David von Oheimb from comment #19) > > Sockets are timed out by the kernel, FF does not enforce timeouts. > > Maybe, but when I use instead wget or curl trying to reach http://1.2.3.4/ > (when no proxy is needed) or reach an existing site when an unreachable > proxy is used, they show a different timeout behavior - > and much more importantly, they do not misuse the CPU and drain the battery > while waiting/blocking. > Moreover, the issue occurs in different systems. > > So FF (and TB) is to blame making my CPU sweat and eating my battery, not > the system/kernel. I was not blaming the kernel for cpu usage, I was just saying that FF does not have any timeouts parameter and it is not closing sockets, the kernel does that. This statement does not have anything to do with CPU usage!
Analyzing log: socketThread sleeps most of the time - from creating a socket until timeout took ~21s. the thread was in sleep for 20953ms there is no much activity on main thread done by nsHttpHandle nsHttpConnectionMgr as well. This is not a necko bug. necko does nothing the whole time. Comment 12 mentions js::DirectProxyHandler::get. I was searching for js::DirectProxyHandler::get in the current repository but it does not exist any more. Wayne, can you reproduce this with a newer version? Do you know where js::DirectProxyHandler::get belongs? I cannot find it in dxr.mozilla.org
Flags: needinfo?(dd.mozilla) → needinfo?(vseerror)
(In reply to Dragana Damjanovic [:dragana] from comment #21) > Analyzing ... > Wayne, can you reproduce this with a newer version? redirecting to David - can you reproduce with a nightly build? https://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/ > Do you know where js::DirectProxyHandler::get belongs? I cannot find it in dxr.mozilla.org DirectProxyHandler substantially changed in bug 1269928 and then was removed in bug 1296324 Also mentioned in https://mzl.la/2iTEpec and bug 1203977, Bug 1265242, bug 1169513, bug 934251
Flags: needinfo?(vseerror) → needinfo?(mueller8)
Wayne, at first I could not run the latest TB nightly build (firefox-53.0a1.en-US.mac-x86_64.sdk.tar.bz2) on my Mac because the OS X version 10.8.6 I use is not not anymore (officially) supported by Firefox and some other software, which is quite a bummer in itself. Yet tweaking the OS version number given in FirefoxNightly.app/Contents/Info.plist made it run. The bug is still there - FF wastes 10% of a CPU core chewing on http://1.2.3.4/ Note that, as I wrote already in comment 5, this bug is reproducible even without using a proxy. (Thus, as expected, it has not been solved by any changes w.r.t. bug 1269928 and bug 1296324). As I wrote already several times w.r.t various related Bugzilla reports, anyone should be able to reproduce this bug by trying to connect to an unreachable IP address such as 1.2.3.4. and since this sort of bug has been around for many years, I'm pretty confident it has not been fixed recently ;-)
Flags: needinfo?(mueller8)
Oops, I forgot to delete the last three lines of my above comment; please disregard these.
See Also: → 683651

Bug 1650632 contains some new performance profiles, recorded with TB 91.4.0, e.g. https://share.firefox.dev/3DXO74R

It is a shame that this fundamental major bug is present since at least 7 years (likely since much longer, since the beginning of Mozilla)
and still no developer has taken care to fix it.

Since this is the experience also with many many other Mozilla bugs, part of which are 15+ years old,
I am currently considering the ultimate user-perspective bug fix: switch to a different email client.
Evolution seems to be an interesting alternative.

Hello I have tried to reproduce the issue with firefox 97.0a1(2021-12-13) on Windows 10, MacOS 10.15 and Ubuntu 20. Marking this issue as RESOLVED->WORKSFORME as there seems to be no activity on this issue and the issue is not reproducible on the latest Firefox version.

If the issue is still vali please feel free to reopen it or file another bug.

Have a nice day!

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME

Negritas Sergiu, who are you to close others' bug reports that you obviously have no clue about?
You are wrong. The bug persists, with both the latest Thunderbird, and Firefox.

I just confirmed the bug again with latest nightly, 97.0a1.
It occurs both when trying to load pages such as http://1.2.3.4
and when manually setting the proxy to an unreachable address such as 1.2.3.4.

Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---

Sorry David for closing this issue.

But as I have mentioned I have tried to reproduce the issue after configuring a proxy as the one mentioned in the description, but in my case I haven't seen a high CPU usage other than normal on the mentioned OS's.

Sorry again for the inconvenience created.

Have a nice day!

(In reply to Negritas Sergiu from comment #32)

Sorry David for closing this issue.

Okay.

Here is a further performance profile that hopefully helps pinning down the root cause: https://share.firefox.dev/32bNese

(In reply to David von Oheimb from comment #33)

(In reply to Negritas Sergiu from comment #32)

Sorry David for closing this issue.

Okay.

Here is a further performance profile that hopefully helps pinning down the root cause: https://share.firefox.dev/32bNese

50% of the js activity is under refreshWindow chrome://cardbook/content/wdw_cardbook.js
Also significant (10% ?) GC/CC between such things as nsObserverService::NotifyObservers cycle-collector-forget-skippable and js::GCRuntime::markUntilBudgetExhausted via nsJSContext::GarbageCollectNow INTER_SLICE_GC

So maybe the CPU usage has moved.

Does your CPU load occur if cardbook has been removed?

Flags: needinfo?(nl0)

Sorry that I did not notice your reply earlier.

Indeed, disabling the CardBook extension significantly lowered the idle CPU usage - thanks a lot for this hint!
What a crappy extension, which I had installed once but had not really used.
Yet even with all add-ons disabled, the idle CPU load averages to 3 - 5 % according to 'top', which is still too much.

And when I replace the real name/address of the IMAP server of any of my mail accounts by the IP address 1.2.3.4,
which simulates intermittent network connection or server availability,
the CPU load rises to some 30 - 35 % as long as "Connection to 1.2.3.4..." is shown, i.e., until a connection timeout occurs (after 90 seconds).

Again, this should be easy to reproduce.

Flags: needinfo?(nl0)

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: major → --

(In reply to David von Oheimb from comment #35)

And when I replace the real name/address of the IMAP server of any of my mail accounts by the IP address 1.2.3.4,
which simulates intermittent network connection or server availability,
the CPU load rises to some 30 - 35 % as long as "Connection to 1.2.3.4..." is shown, i.e., until a connection timeout occurs (after 90 seconds).

David, as far as I understand, you are reproducing this with Thunderbird, right?
Are you able to do it with Firefox?

Flags: needinfo?(it)

Sorry for my late reply, I did not notice your request for info until now.

Yes, I am able to reproduce needlessly high CPU load on hanging connections also with a current Firefox (version 112),
by setting up proxy 1.2.3.4 as stated in my OP above, which meanwhile is 8.5 years back.
With Firefox meanwhile "only" 15% of a CPU core gets wasted on my current Linux machine,
apparently due to some silly busy waiting.

With current Thunderbird the wast is even about 60% - see also bug 1830641

Flags: needinfo?(it)

Related bug reports include also bug 683651

David - Can you get a profile and upload it? Thanks!

Status: REOPENED → NEW
Flags: needinfo?(it)

For the related bug report https://bugzilla.mozilla.org/show_bug.cgi?id=683651 I already spent the effort to provide profiles several times,
which were not used to fix the bug. So I'm sufficiently frustrated to spend no more effort on this.

Why don't you produce the profile yourself? And as I mentioned in Bugzilla meanwhile certainly at least 15 times,
there is an easy way to reproduce the issue, such that anyone analyzing this type of bug can produce relevant profiles him/herself.
It should be reproducible at least on Linux (e.g., Debian) and MacOS.

Flags: needinfo?(it)
Blocks: 1830641
See Also: → 1752641

Moving bug to Core/Networking: Proxy.

Component: Networking: HTTP → Networking: Proxy

Investigations in Bug 683651.

Whiteboard: [necko-backlog] → [necko-triaged][necko-priority-review]

Assessed as a front end bug, moving to Mailnews Core:General for triage.

Component: Networking: Proxy → General
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: