Closed Bug 1360574 Opened 7 years ago Closed 7 years ago

Firefox stops working after 900 connections when using NTLM proxy

Categories

(Core :: Networking, defect)

53 Branch
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
relnote-firefox --- 53+
firefox-esr52 53+ fixed
firefox53 blocking fixed
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: will69, Assigned: u408661)

References

()

Details

(Keywords: regression, Whiteboard: [necko-active][ntlm])

Attachments

(2 files)

STR: Surf to http://vrr.de or any other web site via a proxy using NTLM auth. Press Ctrl-F5 until spinner stops turning clockwise and turns gray. Firefox is now effectively dead and needs to be restarted. There is a setting "network.http.max-connections=900" . Firefox stops retrieving web content after 900 connections. 900 *active* connections total are listed in about:networking . A single page view accounts for about 25 active connections. Connections are not closed or reused. A restart of Firefox resets the connection counter. A higher value for the above setting mitigates the problem. mozregression yields bug #1346392 as the only culprit.
Assignee: nobody → hurley
Flags: needinfo?(hurley)
Whiteboard: [necko-active]
I'm not confident this is the case, but I'm wondering if this is another symptom of bug 1355858 - that bug causes us to dial back to NTLM-speaking servers & proxies with h2, so it could be the case that we end up opening a ton of unneeded connections that (for whatever reason) never get reclaimed. (That's something we should look into separately, btw, even if my harebrained theory pans out.) Reporter - would it be possible for you to try the builds at https://archive.mozilla.org/pub/firefox/try-builds/hurley@mozilla.com-56dd3152ccfb352e1f46a97b2a8064101a13fb43/ and see if the issue persists? If it doesn't, then we can mark this as a duplicate of that bug, otherwise there will be more investigation that needs to happen.
Flags: needinfo?(hurley) → needinfo?(will69)
Your test build improves the situation somewhat: Again referring to http://vrr.de, 32 connections are established, 30 are moved from "active" to "idle" almost immediately, two remain active and are stuck. Those two even remain active when the tab is closed. The idle connections are freed completely after a considerable amount of time (minutes rather than seconds). Surfing the web for some time, almost every host leaves some connections in "active" state that do not get reclaimed. Even Firefox internal connections to Mozilla and Google hosts get stuck that way. Setting "network.http.max-connections=10" instead of 900, Firefox goes into lockdown state using telemetry and other internal connections alone. With Firefox 52, this is not the case. All connections get reclaimed and after some time there are only zeroes in the "active" and "idle" columns. BTW, "about:networking" shows two lines for every host, differing only in the number of active and idle connections. With Firefox 52 it shows only one entry per host.
Flags: needinfo?(will69)
I just want to make a note that I have seen what sounds like the same thing as will69 and basically cannot use the proxy server with FF 53. The difference in my case is that I am using kerberos authentication (negotiate-auth). So it seems that it is not NTLM itself that is the issue but rather something in common for both types of authentication.
Has Regression Range: --- → yes
Has STR: --- → yes
Keywords: regression
Summary: Regression: Firefox stops working after 900 connections when using NTLM proxy → Firefox stops working after 900 connections when using NTLM proxy
I experience the same behavior in FF 53 and 53.0.2. Increasing network.http.max-connections does not seem to be a solution, not even temporary
I'm facing the same or a similar problem. I'm behind a corporate proxy that uses NTLM, and Firefox seems to gradually stop working after it's been open for a while. Once Firefox is "dead", when I try to visit most pages, the tab shows "Connecting..." with a counterclockwise spinner forever. I can still visit some pages (perhaps pages that Firefox still has a connection to?). This is on Windows 7. I first noticed the problem in 53.0.2 (32-bit). I built version 55.0a1 (2017-05-10) (32-bit), and that one also has the problem. I've looked at what Firefox is doing in Fiddler. (I still have the problem when I'm not using Fiddler.) The normal sequence of events when Firefox opens an HTTPS tunnel is: * Firefox sends a CONNECT request. It receives a 407 response. * Firefox sends another CONNECT request with "Proxy-Authorization: Negotiate". It receives another 407 response. * Firefox sends another CONNECT request with "Proxy-Authorization: NTLM". It receives a 407 response with "Proxy-Authenticate: NTLM". * Firefox sends one last CONNECT request with "Proxy-Authorization: NTLM". It receives a 200 response. When Firefox is "dead", only the first two steps happen; Firefox never issues the third CONNECT request (the first one that says NTLM). Procmon.exe and tcpview.exe both show that Firefox isn't even attempting to open any new TCP connections. (It does send data over a TCP connection that it has to itself, but I'm guessing this is unrelated.) For me, about:networking doesn't show 900 connections. I looked at that page after Firefox died once, and it listed 125 hosts with a total of 539 active connections. I then tried changing network.http.max-connections to 9000, but Firefox died again after that. That time, about:networking listed 61 hosts with 180 active connections. I have a debugger so I can step through the source code, but I have no clue where to look.
I'm experiencing the same issue running 53.0.2 (64-Bit) on Windows 10.
We are using proxy server at our organization. We use browsers Firefox, Chrome, IE. Chrome, IE and Firefox 52 - work fine and proxy works nice for them. After update Firefox to last version (53.0.2) - have problems, browser after 30 minutes (or more), does not open or refresh pages. Just blankpage and refresh icon at tabs. If close and open browser, problem repeat after 30 minutes. The problem has described in this bug articles Bug 1360574 and Bug 1363030 . FF version (53.0.2) doesn't reset value of network.http.connections after tabs closed (open sites): http://my.jetscreenshot.com/demo/20170511-3ckm-35kb And after the value reaches "network.http.max-connections=900" Firefox stops working. The problem doesn't happen on PCs with direct connection to the Internet. Increasing "network.http.max-connections" does not seem to be a solution, even temporary. We need help, because this affects a many of users. Thanks for your attention.
[Tracking Requested - why for this release]: the issue sounds like a major regression for users in an affected environment
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter, I have a build that at least has a chance of fixing the issue. However, I need some help in testing it, would it be possible for you to try one of the builds from https://archive.mozilla.org/pub/firefox/try-builds/hurley@mozilla.com-bf0b722756c1608522b505f98411ce9191bf564e/ to see if it fixes the issue? Thanks!
Flags: needinfo?(will69)
(In reply to [:philipp] from comment #14) > [Tracking Requested - why for this release]: > the issue sounds like a major regression for users in an affected environment Firefox ESR 52 is also affected. I'm running Firefox ESR 52.1.1 (windows 32-bit), and it seems I hit the exact same issue (bug 1364048)
Nailed it! This fixes the issue on Win7/32 and Win7/64. Thank you very much to everybody who contributed to this! Can you let this ride along with a dot release should there be one?
Flags: needinfo?(will69)
Testing the build provided in comment 16 on Windows 7 64-bit. It looks like it fixes the issue for me. The number of active connections listed in about:networking stays low even after loading several pages in multiples tabs. If the fix proves to be good, will it be committed into Firefox ESR 52.x release as well?
Excellent, thanks! I'm going to clean up the patch and fix the commit message, get it reviewed, and then commit + request uplifts to beta, esr, and release (for any other potential regular 52.x we do in the future).
Blocks: 1345392
Whiteboard: [necko-active] → [necko-active][ntlm]
Blocks: 1346392
No longer blocks: 1345392
Comment on attachment 8867214 [details] Bug 1360574 - Detect & handle connection auth at the http/2 layer. https://reviewboard.mozilla.org/r/138790/#review142076 ::: netwerk/protocol/http/Http2Session.cpp:1394 (Diff revision 1) > mDecompressBuffer, > mFlatHTTPResponseHeaders, > httpResponseCode); > if (rv == NS_ERROR_ABORT) { > - LOG(("Http2Session::ResponseHeadersComplete ConvertResponseHeaders aborted\n")); > - if (mInputFrameDataStream->IsTunnel()) { > + LOG(("Http2Session::ResponseHeadersComplete %p ConvertResponseHeaders aborted\n", this)); > + // This means the stream found connection-oriented auth. Treat this like we I don't understand why this whole isTunnel() path is going away - isn't it still needed? ::: netwerk/protocol/http/Http2Session.cpp:1395 (Diff revision 1) > mFlatHTTPResponseHeaders, > httpResponseCode); > if (rv == NS_ERROR_ABORT) { > - LOG(("Http2Session::ResponseHeadersComplete ConvertResponseHeaders aborted\n")); > - if (mInputFrameDataStream->IsTunnel()) { > - rv = gHttpHandler->ConnMgr()->CancelTransactions( > + LOG(("Http2Session::ResponseHeadersComplete %p ConvertResponseHeaders aborted\n", this)); > + // This means the stream found connection-oriented auth. Treat this like we > + // got a reset or goaway with HTTP_1_1_REQUIRED. just a reset - not a goaway (which would be whole session) ::: netwerk/protocol/http/Http2Stream.cpp:999 (Diff revision 1) > mTxInlineFrameUsed = Http2Session::kFrameHeaderBytes; > mTxStreamFrameSize = dataLength; > } > > +static bool > +HasConnectionBasedAuth(nsHttpResponseHead &head, nsHttpAtom headerName) I think we can do this much less expesnively directly in Http2Decompressor::OutputHeader, which already compares each header name rather than going through this extra header parsing from serialization step. I think if that returned NS_ERROR_NET_RESET that could be an unambiguous signal to do the retry stuff.. and leave the NS_ERROR_ABORT path untouched. wdyt?
Attachment #8867214 - Flags: review?(mcmanus)
Attachment #8867213 - Flags: review?(mcmanus) → review+
(In reply to Patrick McManus [:mcmanus] from comment #23) > Comment on attachment 8867214 [details] > Bug 1360574 - Detect & handle connection auth at the http/2 layer. > > https://reviewboard.mozilla.org/r/138790/#review142076 > > ::: netwerk/protocol/http/Http2Session.cpp:1394 > (Diff revision 1) > > mDecompressBuffer, > > mFlatHTTPResponseHeaders, > > httpResponseCode); > > if (rv == NS_ERROR_ABORT) { > > - LOG(("Http2Session::ResponseHeadersComplete ConvertResponseHeaders aborted\n")); > > - if (mInputFrameDataStream->IsTunnel()) { > > + LOG(("Http2Session::ResponseHeadersComplete %p ConvertResponseHeaders aborted\n", this)); > > + // This means the stream found connection-oriented auth. Treat this like we > > I don't understand why this whole isTunnel() path is going away - isn't it > still needed? Nope. Check the commit message - you removed the code in ConvertResponseHeaders that returns NS_ERROR_ABORT a while back (bug 1037082), so that was dead code. > ::: netwerk/protocol/http/Http2Session.cpp:1395 > (Diff revision 1) > > mFlatHTTPResponseHeaders, > > httpResponseCode); > > if (rv == NS_ERROR_ABORT) { > > - LOG(("Http2Session::ResponseHeadersComplete ConvertResponseHeaders aborted\n")); > > - if (mInputFrameDataStream->IsTunnel()) { > > - rv = gHttpHandler->ConnMgr()->CancelTransactions( > > + LOG(("Http2Session::ResponseHeadersComplete %p ConvertResponseHeaders aborted\n", this)); > > + // This means the stream found connection-oriented auth. Treat this like we > > + // got a reset or goaway with HTTP_1_1_REQUIRED. > > just a reset - not a goaway (which would be whole session) Right, good catch. > ::: netwerk/protocol/http/Http2Stream.cpp:999 > (Diff revision 1) > > mTxInlineFrameUsed = Http2Session::kFrameHeaderBytes; > > mTxStreamFrameSize = dataLength; > > } > > > > +static bool > > +HasConnectionBasedAuth(nsHttpResponseHead &head, nsHttpAtom headerName) > > I think we can do this much less expesnively directly in > Http2Decompressor::OutputHeader, which already compares each header name > rather than going through this extra header parsing from serialization step. That would certainly be less expensive, I'll give it a shot. > I think if that returned NS_ERROR_NET_RESET that could be an unambiguous > signal to do the retry stuff.. and leave the NS_ERROR_ABORT path untouched. > > wdyt? So we'd probably just remove the NS_ERROR_ABORT path entirely (see above) and replace it with one that handles NS_ERROR_NET_RESET. I'll have to be careful with the return values - the decompressor does some funky things with return values and killing the session in certain cases, as well as masking various return values at different stages. Shouldn't be too nasty, though.
Hello, I've tested build https://archive.mozilla.org/pub/firefox/try-builds/hurley@mozilla.com-bf0b722756c1608522b505f98411ce9191bf564e/ and I confirm that fixes also my use case explained in duplicated bug 1359864 BR
Comment on attachment 8867214 [details] Bug 1360574 - Detect & handle connection auth at the http/2 layer. https://reviewboard.mozilla.org/r/138790/#review142116
Attachment #8867214 - Flags: review?(mcmanus) → review+
Pushed by hurley@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b99dc310ac08 Backout original fix to bug 1346392 r=mcmanus https://hg.mozilla.org/integration/autoland/rev/9051132444a0 Detect & handle connection auth at the http/2 layer. r=mcmanus
Comment on attachment 8867213 [details] Bug 1360574 - Backout original fix to bug 1346392 [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: continues to fix ntlm breakage, a feature widely used in enterprise (ntlm, not the breakage) User impact if declined: firefox will stop talking to the internet after a while and need restarted Fix Landed on Version: 55 Risk to taking this patch (and alternatives if risky): low - backs out a previous "fix" for another bug, and fixes it in a similar but un-broken way String or UUID changes made by this patch: none See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info. Approval Request Comment [Feature/Bug causing the regression]: bug 1346392 [User impact if declined]: see esr request above [Is this code covered by automated tests?]: nope. the joys of our ntlm implementation currently preclude that. [Has the fix been verified in Nightly?]: not yet - verified by reporters of this and duplicate bugs through a try build. [Needs manual test from QE? If yes, steps to reproduce]: nope [List of other uplifts needed for the feature/fix]: other patch in this bug [Is the change risky?]: nope [Why is the change risky/not risky?]: backs out an old fix for another bug, and the other patch in this bug fixes it without breaking other things :) [String changes made/needed]: none
Attachment #8867213 - Flags: approval-mozilla-release?
Attachment #8867213 - Flags: approval-mozilla-esr52?
Attachment #8867213 - Flags: approval-mozilla-beta?
Comment on attachment 8867214 [details] Bug 1360574 - Detect & handle connection auth at the http/2 layer. See approval request comment for the other patch - same information applies to this one, as well. Both patches need to land (this one second) for uplifts to be of any use.
Attachment #8867214 - Flags: approval-mozilla-release?
Attachment #8867214 - Flags: approval-mozilla-esr52?
Attachment #8867214 - Flags: approval-mozilla-beta?
I can confirm, that the try build fixes the issue for me. Situation before: tons of open connectios in about:networking especially on particular websites. Situation with try build: just a few connections which get cloed pretty quick.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8867213 [details] Bug 1360574 - Backout original fix to bug 1346392 Fix a regression that Firefox will stop working after 900 connections when using NTLM proxy and was verified by the reporter. Beta54+. Should be in 54 beta 8.
Attachment #8867213 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8867214 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Today's Nightly fixes the problem on Windows and Linux. (Mac OS not tested.) Thanks again!
We do have a 53.0.3 dot release coming up this week. So we could include this fix. I didn't have one planned for ESR. From talking with Nicholas sounds like we may also need an ESR dot release.
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #31) > [Is this code covered by automated tests?]: nope. the joys of our ntlm > implementation currently preclude that. > [Has the fix been verified in Nightly?]: not yet - verified by reporters of > this and duplicate bugs through a try build. > [Needs manual test from QE? If yes, steps to reproduce]: nope Setting qe-verify- based on Nicholas' assessment on manual testing needs.
Flags: qe-verify-
I congfirm that it is fixed in Firefox 54b8. Thank you very much.
Comment on attachment 8867213 [details] Bug 1360574 - Backout original fix to bug 1346392 Let's uplift this to m-r and esr52 for a dot release.
Attachment #8867213 - Flags: approval-mozilla-release?
Attachment #8867213 - Flags: approval-mozilla-release+
Attachment #8867213 - Flags: approval-mozilla-esr52?
Attachment #8867213 - Flags: approval-mozilla-esr52+
Attachment #8867214 - Flags: approval-mozilla-release?
Attachment #8867214 - Flags: approval-mozilla-release+
Attachment #8867214 - Flags: approval-mozilla-esr52?
Attachment #8867214 - Flags: approval-mozilla-esr52+
Included in 53.0.3 and 52.1.2esr relnotes.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #38) > We do have a 53.0.3 dot release coming up this week. So we could include > this fix. > I didn't have one planned for ESR. From talking with Nicholas sounds like we > may also need an ESR dot release. Hello, Liz Can you said exact date release 53.0.3? We use stable version Firefox. And it affects a many of users in our organization.
Depends on: 1366824
Hello. I am on FF54 beta 11 here, I am behind an NTML firewall and I had this problem massively with v53 and early v54. I used to "fix" it with Fiddler though. Latest v54 seemed to fine. Till today, where I got all the symptoms again (on v54 v11). A regression, a fluke?
This bug seems to be back on latest FF55, at least the same behavior: start loading a few pages than hangs and the spinner is always on (black). Only happens behind corporate proxy.
(In reply to Leonardo Cervo from comment #49) > This bug seems to be back on latest FF55, at least the same behavior: start > loading a few pages than hangs and the spinner is always on (black). > Only happens behind corporate proxy. Nick, do you think you can confirm this?
Flags: needinfo?(hurley)
Actually I confirmed it's related to bug 1388370. The symptoms are the same, sorry for the spam.
thanks leonardo, if you're affected by the issue yourself & can fairly easily reproduce it could you help us debug the issue in the other report?
Flags: needinfo?(hurley)
Sure, how can I help?
Depends on: 1411193
No longer depends on: 1411193
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: