Firefox stops working after 900 connections when using NTLM proxy

RESOLVED FIXED in Firefox -esr52

Status

()

defect
--
major
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: will69, Assigned: u408661)

Tracking

({regression})

53 Branch
mozilla55
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(relnote-firefox 53+, firefox-esr5253+ fixed, firefox53blocking fixed, firefox54 fixed, firefox55 fixed)

Details

(Whiteboard: [necko-active][ntlm], URL)

Attachments

(2 attachments)

(Reporter)

Description

2 years ago
STR: Surf to http://vrr.de or any other web site via a proxy using NTLM auth. Press Ctrl-F5 until spinner stops turning clockwise and turns gray. Firefox is now effectively dead and needs to be restarted.

There is a setting "network.http.max-connections=900" . Firefox stops retrieving web content after 900 connections. 900 *active* connections total are listed in about:networking . A single page view accounts for about 25 active connections. Connections are not closed or reused.

A restart of Firefox resets the connection counter. A higher value for the above setting mitigates the problem.

mozregression yields bug #1346392 as the only culprit.
Assignee: nobody → hurley
Flags: needinfo?(hurley)
Whiteboard: [necko-active]
(Assignee)

Comment 1

2 years ago
I'm not confident this is the case, but I'm wondering if this is another symptom of bug 1355858 - that bug causes us to dial back to NTLM-speaking servers & proxies with h2, so it could be the case that we end up opening a ton of unneeded connections that (for whatever reason) never get reclaimed. (That's something we should look into separately, btw, even if my harebrained theory pans out.)

Reporter - would it be possible for you to try the builds at https://archive.mozilla.org/pub/firefox/try-builds/hurley@mozilla.com-56dd3152ccfb352e1f46a97b2a8064101a13fb43/ and see if the issue persists? If it doesn't, then we can mark this as a duplicate of that bug, otherwise there will be more investigation that needs to happen.
Flags: needinfo?(hurley) → needinfo?(will69)
(Reporter)

Comment 2

2 years ago
Your test build improves the situation somewhat: Again referring to http://vrr.de, 32 connections are established, 30 are moved from "active" to "idle" almost immediately, two remain active and are stuck. Those two even remain active when the tab is closed. The idle connections are freed completely after a considerable amount of time (minutes rather than seconds).

Surfing the web for some time, almost every host leaves some connections in "active" state that do not get reclaimed. Even Firefox internal connections to Mozilla and Google hosts get stuck that way. Setting "network.http.max-connections=10" instead of 900, Firefox goes into lockdown state using telemetry and other internal connections alone.

With Firefox 52, this is not the case. All connections get reclaimed and after some time there are only zeroes in the "active" and "idle" columns.

BTW, "about:networking" shows two lines for every host, differing only in the number of active and idle connections. With Firefox 52 it shows only one entry per host.
Flags: needinfo?(will69)
Duplicate of this bug: 1361350
I just want to make a note that I have seen what sounds like the
same thing as will69 and basically cannot use the proxy server
with FF 53.

The difference in my case is that I am using kerberos
authentication (negotiate-auth).  So it seems that it is not NTLM
itself that is the issue but rather something in common for both
types of authentication.
(Reporter)

Updated

2 years ago
Has Regression Range: --- → yes
Has STR: --- → yes
Keywords: regression
Summary: Regression: Firefox stops working after 900 connections when using NTLM proxy → Firefox stops working after 900 connections when using NTLM proxy

Comment 5

2 years ago
I experience the same behavior in FF 53 and 53.0.2. Increasing network.http.max-connections does not seem to be a solution, not even temporary

Comment 6

2 years ago
Hello, could this bug be related?

https://bugzilla.mozilla.org/show_bug.cgi?id=1363030

Comment 7

2 years ago
I'm facing the same or a similar problem. I'm behind a corporate proxy that uses NTLM, and Firefox seems to gradually stop working after it's been open for a while. Once Firefox is "dead", when I try to visit most pages, the tab shows "Connecting..." with a counterclockwise spinner forever. I can still visit some pages (perhaps pages that Firefox still has a connection to?).

This is on Windows 7. I first noticed the problem in 53.0.2 (32-bit). I built version 55.0a1 (2017-05-10) (32-bit), and that one also has the problem.

I've looked at what Firefox is doing in Fiddler. (I still have the problem when I'm not using Fiddler.) The normal sequence of events when Firefox opens an HTTPS tunnel is:

* Firefox sends a CONNECT request. It receives a 407 response.
* Firefox sends another CONNECT request with "Proxy-Authorization: Negotiate". It receives another 407 response.
* Firefox sends another CONNECT request with "Proxy-Authorization: NTLM". It receives a 407 response with "Proxy-Authenticate: NTLM".
* Firefox sends one last CONNECT request with "Proxy-Authorization: NTLM". It receives a 200 response.

When Firefox is "dead", only the first two steps happen; Firefox never issues the third CONNECT request (the first one that says NTLM).

Procmon.exe and tcpview.exe both show that Firefox isn't even attempting to open any new TCP connections. (It does send data over a TCP connection that it has to itself, but I'm guessing this is unrelated.)

For me, about:networking doesn't show 900 connections. I looked at that page after Firefox died once, and it listed 125 hosts with a total of 539 active connections. I then tried changing network.http.max-connections to 9000, but Firefox died again after that. That time, about:networking listed 61 hosts with 180 active connections.

I have a debugger so I can step through the source code, but I have no clue where to look.

Comment 8

2 years ago
I'm experiencing the same issue running 53.0.2 (64-Bit) on Windows 10.

Comment 9

2 years ago
We are using  proxy server at our organization.

We use browsers Firefox, Chrome, IE. Chrome, IE and Firefox 52 - work fine and proxy works nice for them.

After update Firefox to last version (53.0.2) - have problems, browser after 30 minutes (or more), does not open or refresh pages. Just blankpage and refresh icon at tabs. If close and open browser, problem repeat after 30 minutes.
The problem has described in this bug articles   Bug 1360574  and Bug 1363030 .

FF version (53.0.2) doesn't reset value of network.http.connections after tabs closed  (open sites):

http://my.jetscreenshot.com/demo/20170511-3ckm-35kb

And after the value reaches "network.http.max-connections=900"  Firefox stops working.

The problem doesn't happen on PCs with direct connection to the Internet.

Increasing "network.http.max-connections" does not seem to be a solution,  even temporary.

We need help, because this affects a many of users.

Thanks for your attention.
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1359864
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1360008
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1360604
Duplicate of this bug: 1363030

Comment 14

2 years ago
[Tracking Requested - why for this release]:
the issue sounds like a major regression for users in an affected environment
Duplicate of this bug: 1364048
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Assignee)

Comment 16

2 years ago
Reporter, I have a build that at least has a chance of fixing the issue. However, I need some help in testing it, would it be possible for you to try one of the builds from https://archive.mozilla.org/pub/firefox/try-builds/hurley@mozilla.com-bf0b722756c1608522b505f98411ce9191bf564e/ to see if it fixes the issue?

Thanks!
Flags: needinfo?(will69)

Comment 17

2 years ago
(In reply to [:philipp] from comment #14)
> [Tracking Requested - why for this release]:
> the issue sounds like a major regression for users in an affected environment

Firefox ESR 52 is also affected.
I'm running Firefox ESR 52.1.1 (windows 32-bit), and it seems I hit the exact same issue (bug 1364048)
(Reporter)

Comment 18

2 years ago
Nailed it! This fixes the issue on Win7/32 and Win7/64.

Thank you very much to everybody who contributed to this!

Can you let this ride along with a dot release should there be one?
Flags: needinfo?(will69)

Comment 19

2 years ago
Testing the build provided in comment 16 on Windows 7 64-bit.

It looks like it fixes the issue for me.
The number of active connections listed in about:networking stays low even after loading several pages in multiples tabs.

If the fix proves to be good, will it be committed into Firefox ESR 52.x release as well?
(Assignee)

Comment 20

2 years ago
Excellent, thanks! I'm going to clean up the patch and fix the commit message, get it reviewed, and then commit + request uplifts to beta, esr, and release (for any other potential regular 52.x we do in the future).
(Assignee)

Updated

2 years ago
Blocks: 1345392
Whiteboard: [necko-active] → [necko-active][ntlm]
(Assignee)

Updated

2 years ago
Blocks: 1346392
No longer blocks: 1345392
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 23

2 years ago
mozreview-review
Comment on attachment 8867214 [details]
Bug 1360574 - Detect & handle connection auth at the http/2 layer.

https://reviewboard.mozilla.org/r/138790/#review142076

::: netwerk/protocol/http/Http2Session.cpp:1394
(Diff revision 1)
>                                                       mDecompressBuffer,
>                                                       mFlatHTTPResponseHeaders,
>                                                       httpResponseCode);
>    if (rv == NS_ERROR_ABORT) {
> -    LOG(("Http2Session::ResponseHeadersComplete ConvertResponseHeaders aborted\n"));
> -    if (mInputFrameDataStream->IsTunnel()) {
> +    LOG(("Http2Session::ResponseHeadersComplete %p ConvertResponseHeaders aborted\n", this));
> +    // This means the stream found connection-oriented auth. Treat this like we

I don't understand why this whole isTunnel() path is going away - isn't it still needed?

::: netwerk/protocol/http/Http2Session.cpp:1395
(Diff revision 1)
>                                                       mFlatHTTPResponseHeaders,
>                                                       httpResponseCode);
>    if (rv == NS_ERROR_ABORT) {
> -    LOG(("Http2Session::ResponseHeadersComplete ConvertResponseHeaders aborted\n"));
> -    if (mInputFrameDataStream->IsTunnel()) {
> -      rv = gHttpHandler->ConnMgr()->CancelTransactions(
> +    LOG(("Http2Session::ResponseHeadersComplete %p ConvertResponseHeaders aborted\n", this));
> +    // This means the stream found connection-oriented auth. Treat this like we
> +    // got a reset or goaway with HTTP_1_1_REQUIRED.

just a reset - not a goaway (which would be whole session)

::: netwerk/protocol/http/Http2Stream.cpp:999
(Diff revision 1)
>    mTxInlineFrameUsed = Http2Session::kFrameHeaderBytes;
>    mTxStreamFrameSize = dataLength;
>  }
>  
> +static bool
> +HasConnectionBasedAuth(nsHttpResponseHead &head, nsHttpAtom headerName)

I think we can do this much less expesnively directly in Http2Decompressor::OutputHeader, which already compares each header name rather than going through this extra header parsing from serialization step.

I think if that returned NS_ERROR_NET_RESET that could be an unambiguous signal to do the retry stuff.. and leave the NS_ERROR_ABORT path untouched.

wdyt?
Attachment #8867214 - Flags: review?(mcmanus)

Comment 24

2 years ago
mozreview-review
Comment on attachment 8867213 [details]
Bug 1360574 - Backout original fix to bug 1346392

https://reviewboard.mozilla.org/r/138788/#review142082
Attachment #8867213 - Flags: review?(mcmanus) → review+
(Assignee)

Comment 25

2 years ago
(In reply to Patrick McManus [:mcmanus] from comment #23)
> Comment on attachment 8867214 [details]
> Bug 1360574 - Detect & handle connection auth at the http/2 layer.
> 
> https://reviewboard.mozilla.org/r/138790/#review142076
> 
> ::: netwerk/protocol/http/Http2Session.cpp:1394
> (Diff revision 1)
> >                                                       mDecompressBuffer,
> >                                                       mFlatHTTPResponseHeaders,
> >                                                       httpResponseCode);
> >    if (rv == NS_ERROR_ABORT) {
> > -    LOG(("Http2Session::ResponseHeadersComplete ConvertResponseHeaders aborted\n"));
> > -    if (mInputFrameDataStream->IsTunnel()) {
> > +    LOG(("Http2Session::ResponseHeadersComplete %p ConvertResponseHeaders aborted\n", this));
> > +    // This means the stream found connection-oriented auth. Treat this like we
> 
> I don't understand why this whole isTunnel() path is going away - isn't it
> still needed?

Nope. Check the commit message - you removed the code in ConvertResponseHeaders that returns NS_ERROR_ABORT a while back (bug 1037082), so that was dead code.

> ::: netwerk/protocol/http/Http2Session.cpp:1395
> (Diff revision 1)
> >                                                       mFlatHTTPResponseHeaders,
> >                                                       httpResponseCode);
> >    if (rv == NS_ERROR_ABORT) {
> > -    LOG(("Http2Session::ResponseHeadersComplete ConvertResponseHeaders aborted\n"));
> > -    if (mInputFrameDataStream->IsTunnel()) {
> > -      rv = gHttpHandler->ConnMgr()->CancelTransactions(
> > +    LOG(("Http2Session::ResponseHeadersComplete %p ConvertResponseHeaders aborted\n", this));
> > +    // This means the stream found connection-oriented auth. Treat this like we
> > +    // got a reset or goaway with HTTP_1_1_REQUIRED.
> 
> just a reset - not a goaway (which would be whole session)

Right, good catch.

> ::: netwerk/protocol/http/Http2Stream.cpp:999
> (Diff revision 1)
> >    mTxInlineFrameUsed = Http2Session::kFrameHeaderBytes;
> >    mTxStreamFrameSize = dataLength;
> >  }
> >  
> > +static bool
> > +HasConnectionBasedAuth(nsHttpResponseHead &head, nsHttpAtom headerName)
> 
> I think we can do this much less expesnively directly in
> Http2Decompressor::OutputHeader, which already compares each header name
> rather than going through this extra header parsing from serialization step.

That would certainly be less expensive, I'll give it a shot.

> I think if that returned NS_ERROR_NET_RESET that could be an unambiguous
> signal to do the retry stuff.. and leave the NS_ERROR_ABORT path untouched.
> 
> wdyt?

So we'd probably just remove the NS_ERROR_ABORT path entirely (see above) and replace it with one that handles NS_ERROR_NET_RESET. I'll have to be careful with the return values - the decompressor does some funky things with return values and killing the session in certain cases, as well as masking various return values at different stages. Shouldn't be too nasty, though.
Hello,
I've tested build https://archive.mozilla.org/pub/firefox/try-builds/hurley@mozilla.com-bf0b722756c1608522b505f98411ce9191bf564e/ 
and I confirm that fixes also my use case explained in duplicated bug 1359864

BR
Comment hidden (mozreview-request)

Comment 28

2 years ago
mozreview-review
Comment on attachment 8867214 [details]
Bug 1360574 - Detect & handle connection auth at the http/2 layer.

https://reviewboard.mozilla.org/r/138790/#review142116
Attachment #8867214 - Flags: review?(mcmanus) → review+
Comment hidden (mozreview-request)

Comment 30

2 years ago
Pushed by hurley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b99dc310ac08
Backout original fix to bug 1346392 r=mcmanus
https://hg.mozilla.org/integration/autoland/rev/9051132444a0
Detect & handle connection auth at the http/2 layer. r=mcmanus
(Assignee)

Comment 31

2 years ago
Comment on attachment 8867213 [details]
Bug 1360574 - Backout original fix to bug 1346392

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: continues to fix ntlm breakage, a feature widely used in enterprise (ntlm, not the breakage)
User impact if declined: firefox will stop talking to the internet after a while and need restarted
Fix Landed on Version: 55
Risk to taking this patch (and alternatives if risky): low - backs out a previous "fix" for another bug, and fixes it in a similar but un-broken way
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Approval Request Comment
[Feature/Bug causing the regression]: bug 1346392
[User impact if declined]: see esr request above
[Is this code covered by automated tests?]: nope. the joys of our ntlm implementation currently preclude that.
[Has the fix been verified in Nightly?]: not yet - verified by reporters of this and duplicate bugs through a try build.
[Needs manual test from QE? If yes, steps to reproduce]: nope 
[List of other uplifts needed for the feature/fix]: other patch in this bug
[Is the change risky?]: nope
[Why is the change risky/not risky?]: backs out an old fix for another bug, and the other patch in this bug fixes it without breaking other things :)
[String changes made/needed]: none
Attachment #8867213 - Flags: approval-mozilla-release?
Attachment #8867213 - Flags: approval-mozilla-esr52?
Attachment #8867213 - Flags: approval-mozilla-beta?
(Assignee)

Comment 32

2 years ago
Comment on attachment 8867214 [details]
Bug 1360574 - Detect & handle connection auth at the http/2 layer.

See approval request comment for the other patch - same information applies to this one, as well. Both patches need to land (this one second) for uplifts to be of any use.
Attachment #8867214 - Flags: approval-mozilla-release?
Attachment #8867214 - Flags: approval-mozilla-esr52?
Attachment #8867214 - Flags: approval-mozilla-beta?

Comment 33

2 years ago
I can confirm, that the try build fixes the issue for me.
Situation before: tons of open connectios in about:networking especially on particular websites.
Situation with try build: just a few connections which get cloed pretty quick.
https://hg.mozilla.org/mozilla-central/rev/9051132444a0
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8867213 [details]
Bug 1360574 - Backout original fix to bug 1346392

Fix a regression that Firefox will stop working after 900 connections when using NTLM proxy and was verified by the reporter. Beta54+. Should be in 54 beta 8.
Attachment #8867213 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8867214 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(Reporter)

Comment 36

2 years ago
Today's Nightly fixes the problem on Windows and Linux. (Mac OS not tested.) Thanks again!
We do have a 53.0.3 dot release coming up this week. So we could include this fix. 
I didn't have one planned for ESR. From talking with Nicholas sounds like we may also need an ESR dot release.
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #31)
> [Is this code covered by automated tests?]: nope. the joys of our ntlm
> implementation currently preclude that.
> [Has the fix been verified in Nightly?]: not yet - verified by reporters of
> this and duplicate bugs through a try build.
> [Needs manual test from QE? If yes, steps to reproduce]: nope 

Setting qe-verify- based on Nicholas' assessment on manual testing needs.
Flags: qe-verify-

Comment 40

2 years ago
I congfirm that it is fixed in Firefox 54b8. Thank you very much.
Comment on attachment 8867213 [details]
Bug 1360574 - Backout original fix to bug 1346392

Let's uplift this to m-r and esr52 for a dot release.
Attachment #8867213 - Flags: approval-mozilla-release?
Attachment #8867213 - Flags: approval-mozilla-release+
Attachment #8867213 - Flags: approval-mozilla-esr52?
Attachment #8867213 - Flags: approval-mozilla-esr52+
Attachment #8867214 - Flags: approval-mozilla-release?
Attachment #8867214 - Flags: approval-mozilla-release+
Attachment #8867214 - Flags: approval-mozilla-esr52?
Attachment #8867214 - Flags: approval-mozilla-esr52+
Included in 53.0.3 and 52.1.2esr relnotes.

Comment 45

2 years ago
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #38)
> We do have a 53.0.3 dot release coming up this week. So we could include
> this fix. 
> I didn't have one planned for ESR. From talking with Nicholas sounds like we
> may also need an ESR dot release.

Hello, Liz

Can you said exact date release 53.0.3?  
We use stable version Firefox. And it affects a many of users in our organization.
Duplicate of this bug: 1366174
Duplicate of this bug: 1365302

Updated

2 years ago
Depends on: 1366824

Comment 48

2 years ago
Hello. I am on FF54 beta 11 here, I am behind an NTML firewall and I had this problem massively with v53 and early v54. I used to "fix" it with Fiddler though. Latest v54 seemed to fine. Till today, where I got all the symptoms again (on v54 v11). A regression, a fluke?

Comment 49

2 years ago
This bug seems to be back on latest FF55, at least the same behavior: start loading a few pages than hangs and the spinner is always on (black).
Only happens behind corporate proxy.
(In reply to Leonardo Cervo from comment #49)
> This bug seems to be back on latest FF55, at least the same behavior: start
> loading a few pages than hangs and the spinner is always on (black).
> Only happens behind corporate proxy.

Nick, do you think you can confirm this?
Flags: needinfo?(hurley)

Comment 51

2 years ago
Actually I confirmed it's related to bug 1388370.
The symptoms are the same, sorry for the spam.

Comment 52

2 years ago
thanks leonardo, if you're affected by the issue yourself & can fairly easily reproduce it could you help us debug the issue in the other report?
(Assignee)

Updated

2 years ago
Flags: needinfo?(hurley)

Comment 53

2 years ago
Sure, how can I help?
Depends on: 1411193
No longer depends on: 1411193
You need to log in before you can comment on or make changes to this bug.