Last Comment Bug 762162 - Cannot connect to Gmail and Twitter with SPDY on
: Cannot connect to Gmail and Twitter with SPDY on
Status: RESOLVED FIXED
[spdy][qa?]
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: 13 Branch
: All All
: -- normal (vote)
: mozilla17
Assigned To: Patrick McManus [:mcmanus] PTO until Sep 6
: Virgil Dicu [:virgil] [QA]
Mentors:
: 762025 (view as bug list)
Depends on: 770264 775508 775515
Blocks: 775813 780522
  Show dependency treegraph
 
Reported: 2012-06-06 11:33 PDT by Matt Grimes [:Matt_G]
Modified: 2012-08-27 04:47 PDT (History)
21 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
+
wontfix
+
wontfix
fixed
fixed


Attachments
patch 0 (9.58 KB, patch)
2012-07-19 07:25 PDT, Patrick McManus [:mcmanus] PTO until Sep 6
no flags Details | Diff | Splinter Review
patch v1 (13.08 KB, patch)
2012-07-20 06:00 PDT, Patrick McManus [:mcmanus] PTO until Sep 6
honzab.moz: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Matt Grimes [:Matt_G] 2012-06-06 11:33:23 PDT
This issue came in on the SUMO forums today. You can find the post here:

https://support.mozilla.org/en-US/questions/928731

The system details are listed in the ticket as well. Let me know if you need more information. I can reach out to the user for more details.
Comment 1 Alex Keybl [:akeybl] 2012-06-06 11:35:36 PDT
Tracking for release of FF13/14, and adding qawanted to see if we're able to reproduce.
Comment 2 [:Cww] 2012-06-06 11:38:34 PDT
This may not be a 13-specific thing, in spite of that user's claims.  I can't open gmail on nightly starting in the last 3 days on a Mac.
Comment 3 [:Cww] 2012-06-06 12:52:44 PDT
After a suggestion from Jet: I can't repro in Private browsing only when I'm not in private browsing mode.
Comment 4 [:Cww] 2012-06-06 17:53:07 PDT
I cleared all my google cookies and cache and it's still broken.  Next stop: new profile, I guess.
Comment 5 Matt Grimes [:Matt_G] 2012-06-07 13:11:13 PDT
We've got several more threads popping up on SUMO for this issue. It doesn't seem to be isolated to XP as initially thought, so we might want to change the title. I've got reports that it is impacting both gmail and twitter. I'm requesting more information. 

You can find the threads here:
https://support.mozilla.org/en-US/questions/928888
https://support.mozilla.org/en-US/questions/928182

This post and a few others seem top indicate there might be a network.http.keep-alive tie in to the issue:
https://support.mozilla.org/en-US/questions/928888#answer-339456
Comment 6 [:Cww] 2012-06-07 13:23:14 PDT
Ok, here's my matrix:

http.keep-alive true, spdy true: not working EXCEPT it works in private browsing. (Yes, I tried clearing cache, cookies, etc etc)
http.keep-alive false, spdy true: not working EXCEPT in private browsing
http.keep-alive true, spdy false: works.
http.keep-alive true, spdy false: works.

I do sometimes get this (except in the private browsing case):

Some Gmail features have failed to load due to an Internet connectivity problem. If this problem persists, try reloading the page, or using the basic HTML version. Learn More.
Comment 7 [:philipp] 2012-06-07 13:40:39 PDT
*** Bug 762650 has been marked as a duplicate of this bug. ***
Comment 8 Patrick McManus [:mcmanus] PTO until Sep 6 2012-06-07 13:51:27 PDT
ccw - can you get a clean profile (all default config) and produce a HTTP NSPR log of the failure?

https://developer.mozilla.org/en/HTTP_Logging

The non-default keep-alive=false issue can be worked under a different (less impt) bug.

I'm actually going on vacation the next 10 days, so some patience might be required.

Also, if you can try a debug nightly then you can get the SSL session key you are using and include that with a wireshark capture corresponding the nspr log. That would be key information as well. https://developer.mozilla.org/en/NSS_Key_Log_Format
Comment 9 Patrick McManus [:mcmanus] PTO until Sep 6 2012-06-07 13:58:13 PDT
track any issues with keep-alive=false over in bug 762650

we'll use this bug for tracking failures with the default config, such as :Cww indicates. Best not to confuse them. Obviously accessing gmail WFM so a STR (perhaps environmental) would be good - but the logging from comment 8 would also be helpful.
Comment 10 Josh Aas 2012-06-07 14:16:18 PDT
Honza - Patrick may be unavailable for a week or so, can you investigate here? We need a resolution quickly.
Comment 11 [:Cww] 2012-06-08 09:46:56 PDT
I can't make the problem happen with a clean profile.
Comment 12 [:Cww] 2012-06-08 10:15:09 PDT
And something changed and now I can't repro with my regular profile.  I wonder if it was the browser restart that did it.
Comment 13 Jason Smith [:jsmith] 2012-06-08 15:48:02 PDT
My vague memory remembers that QA saw a dup of this bug in an unconfirmed bug triage. Let me see if I can find it...
Comment 14 Jason Smith [:jsmith] 2012-06-08 15:51:02 PDT
(In reply to Jason Smith [:jsmith] from comment #13)
> My vague memory remembers that QA saw a dup of this bug in an unconfirmed
> bug triage. Let me see if I can find it...

Found it. It's a different bug, but also has to do with gmail. bug 758453
Comment 15 Jason Smith [:jsmith] 2012-06-08 15:52:27 PDT
(In reply to Jason Smith [:jsmith] from comment #14)
> (In reply to Jason Smith [:jsmith] from comment #13)
> > My vague memory remembers that QA saw a dup of this bug in an unconfirmed
> > bug triage. Let me see if I can find it...
> 
> Found it. It's a different bug, but also has to do with gmail. bug 758453

Although that bug indicates it's unrelated to SPDY now.
Comment 16 Honza Bambas (:mayhemer) 2012-06-08 17:02:19 PDT
@Cww: please see https://bugzilla.mozilla.org/show_bug.cgi?id=762650#c4

I'm not able to reproduce with keep-alive = true.  Cww, if you are, please give us more info (http log is the best).

(In reply to [:Cww] from comment #12)
> And something changed and now I can't repro with my regular profile.  I
> wonder if it was the browser restart that did it.

If you were changing the prefs at run-time (w/o browser restart) it could potentially influence the test and you could just experience bug 762650.  If you cannot reproduce anymore, then the test results reported in comment 6 could be non-precise.
Comment 17 [:Cww] 2012-06-08 18:41:59 PDT
Hm... so I can't repro but I was running with default settings when this first showed up.
Comment 18 Patrick McManus [:mcmanus] PTO until Sep 6 2012-06-08 19:59:22 PDT
(In reply to [:Cww] from comment #17)
> Hm... so I can't repro but I was running with default settings when this
> first showed up.

were you using FF13 at that time? I believe there is an unrelated issue that is only on FF15/FF16.
Comment 19 [:Cww] 2012-06-08 20:13:18 PDT
I'm on nightly. Have a bug for the "unrelated issue"?
Comment 20 Patrick McManus [:mcmanus] PTO until Sep 6 2012-06-08 20:39:40 PDT
(In reply to [:Cww] from comment #19)
> I'm on nightly. Have a bug for the "unrelated issue"?

bug 762025
Comment 21 Patrick McManus [:mcmanus] PTO until Sep 6 2012-06-08 20:42:41 PDT
matt_g - now that :Cww has been identified as a nightly user, are there confirmed cases of FF13 uses with default prefs having this issue? (or if not confirmed - suspected?)

If so - can honza or I get in touch with such a user?

If not, then I think it makes sense to dup this to 762650 and reopen if necessary.
Comment 22 [:philipp] 2012-06-10 06:25:52 PDT
so far, i have seen two forum users who reportedly had connection issues with SPDY & all network.http. settings on default

https://support.mozilla.org/questions/928182#answer-339495
https://support.mozilla.org/questions/928888#answer-340283
Comment 23 juan becerra [:juanb] 2012-06-12 13:45:13 PDT
I've spent some time trying to reproduce this on a Win7 machine using Fx13, and I haven't been able to see this problem. Given the last comments in bug 762650, do we still need qawanted on this bug?
Comment 24 Patrick McManus [:mcmanus] PTO until Sep 6 2012-06-12 18:00:40 PDT
(In reply to juan becerra [:juanb] from comment #23)
> I've spent some time trying to reproduce this on a Win7 machine using Fx13,
> and I haven't been able to see this problem. Given the last comments in bug
> 762650, do we still need qawanted on this bug?

given comment 22, yes. bug 762500 is about a specific pref configuration that comment 22 says is not in play for this bug.
Comment 25 Alex Keybl [:akeybl] 2012-06-14 09:29:45 PDT
(In reply to Patrick McManus [:mcmanus] from comment #24)
> given comment 22, yes. bug 762500 is about a specific pref configuration
> that comment 22 says is not in play for this bug.

Do you have any suggestions for attempting to reproduce? Blindly logging into Gmail/Twitter with the same configuration/connection doesn't sound like it'll get us any more leads.
Comment 26 [:philipp] 2012-06-15 08:32:13 PDT
the user from my second link in comment 22 has provided more details about his situation: 
adding attachments in gmail does not work with spdy enabled (so this seems to be related to upload) & only when the win xp laptop is connected via a HUAWEI k3565 mobile broadband usb dongle. when the same machine is accessing gmail on a different network attachments in gmail will work normally with spdy enabled.
Comment 27 Alex Keybl [:akeybl] 2012-06-20 16:23:20 PDT
(In reply to philipp from comment #26)
> the user from my second link in comment 22 has provided more details about
> his situation: 
> adding attachments in gmail does not work with spdy enabled (so this seems
> to be related to upload) & only when the win xp laptop is connected via a
> HUAWEI k3565 mobile broadband usb dongle. when the same machine is accessing
> gmail on a different network attachments in gmail will work normally with
> spdy enabled.

Patrick - does that make sense to you? Does certain networking HW not work properly with SPDY?

We can order the USB dongle to reproduce the issue, in that case.
Comment 28 Patrick McManus [:mcmanus] PTO until Sep 6 2012-06-20 17:36:59 PDT
(In reply to Alex Keybl [:akeybl] from comment #27)
> (In reply to philipp from comment #26)
> > the user from my second link in comment 22 has provided more details about
> > his situation: 
> > adding attachments in gmail does not work with spdy enabled (so this seems
> > to be related to upload) & only when the win xp laptop is connected via a
> > HUAWEI k3565 mobile broadband usb dongle. when the same machine is accessing
> > gmail on a different network attachments in gmail will work normally with
> > spdy enabled.
> 
> Patrick - does that make sense to you? Does certain networking HW not work
> properly with SPDY?

It would be really weird, but is plausible if the hardware is broken in a peculiar way. See a problem chrome had in a similar vein: http://code.google.com/p/chromium/issues/detail?id=69813
> 
> We can order the USB dongle to reproduce the issue, in that case.

its worth trying.

I'm trying to get a handle on how many people are seeing this problem, using default prefs, in 13 (or 14) as opposed to nightly/aurora. I'm aware of 2. Is that a massive mis-statement, and if we've got more can I contact them directly somehow?
Comment 29 Matt Grimes [:Matt_G] 2012-06-20 17:48:12 PDT
There are definitely more than 2. There are at least 55 me too votes this week. I've asked if any users are willing to work with us directly. I'll let you know if I get any offers.
Comment 30 [:philipp] 2012-06-21 11:27:35 PDT
another case with the same symptoms (adding attachments to gmail won't work with spdy enabled), but very different configuration: 
win7 64bit, firefox 13.0.1, router: D-Link DR655, ISP: Cox Communications, security: ms security essentials
https://support.mozilla.org/questions/930282
Comment 31 Patrick McManus [:mcmanus] PTO until Sep 6 2012-06-21 11:40:57 PDT
(In reply to philipp from comment #30)
> another case with the same symptoms (adding attachments to gmail won't work
> with spdy enabled), but very different configuration: 
> win7 64bit, firefox 13.0.1, router: D-Link DR655, ISP: Cox Communications,
> security: ms security essentials
> https://support.mozilla.org/questions/930282

Ha! I have the exact same hardware :(
Comment 32 Ioana (away) 2012-06-22 06:11:28 PDT
Mozilla/5.0 (Windows NT 6.1; rv:14.0) Gecko/20100101 Firefox/14.0 (20120619191901)
Firefox 14.0 beta 8, Windows 7 32-bit, Sonic Wall + switch, ISP: RDS (I'm not using any routers).

I couldn't reproduce this bug at all on the above environment. I couldn't reproduce it on the 13.0 RC either.
Comment 33 Alex Keybl [:akeybl] 2012-06-24 12:34:27 PDT
(In reply to Patrick McManus [:mcmanus] from comment #31)
> Ha! I have the exact same hardware :(

Given our inability to reproduce with other affected HW, and the fact that we may not actually be able to hook up to 2G/3G using the HUAWEI k3565 in America, I'm going hold off on ordering the device.
Comment 34 Matt Grimes [:Matt_G] 2012-06-27 09:00:27 PDT
I've got a SUMO user experiencing this issue who is willing to work one on one. I'll email Patrick his contact information.
Comment 35 bugz 2012-06-28 04:46:25 PDT
I'm having this exact bug. Restarting Firefox fixed it temporarily for me, but then I started googling for a solution when it happened again. All network.http.* settings are default. Changing spdy to disabled, it immediately starts working (even if the tab was trying to load while changing the settings).

So, yeah, can't access gmail at all with spdy enabled.
Win7 64bit, Firefox 13.0.1, non-wireless connection (fiber AFAIK)

I have a ton of extensions so usually I'd just blame them, but since disabling spdy fixes it immediately and others have the same bug, I don't think it's that.
Comment 36 Patrick McManus [:mcmanus] PTO until Sep 6 2012-06-28 05:37:44 PDT
(In reply to bugz from comment #35)
> I'm having this exact bug. Restarting Firefox fixed it temporarily for me,
> but then I started googling for a solution when it happened again. All

I'm sorry about the trouble you're seeing - but I really appreciate that you've stepped forward! There is some important data I need to see from someone having the problem, I hope you can help.

First, please do the experiment with disabling your addons and seeing if that helps. No matter where the blame lies, it could certainly be a problem with that interaction and I'd want to dig into it as best as I could.

Second, I'd love to see an HTTP LOG that covers the period of when this happens.. It's a little bit of a pain to do, but a fair amount of time it contains enough information to at least define a next step of investigation.. see https://developer.mozilla.org/en/HTTP_Logging for instructions.

Lastly, I'm pretty interested to know if this is reproducible on the beta firefox. The beta does connection dispatch in a different way and its possible that is the problem area. The easiest way to install the beta is just from a zip (http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/14.0b9-candidates/build1/win32/en-US/firefox-14.0b9.zip).. you can run it right from the expanded folder (as long as your other firefox is shutdown) and then just remove the folder when you're done.

Thanks for any help you can provide!
Comment 37 bugz 2012-06-28 12:18:50 PDT
It's not a big deal, I was just a bit confused at first but I think I can live without spdy for a while ;) I actually did post in hope to provide some help with catching the bug. Unfortunately I don't have the time to do extensive testing right now. But I do have a question: how can I be sure that restarting Firefox doesn't affect it if I try for example reproducing it without addons or with HTTP logging? Since that fixed it last time for a while.

As I still have the current session open with the bug happening, I did a quick test and if I go to https://mail.google.com/mail/ Firefox doesn't open any new connections (as seen from process explorer) and Wireshark doesn't log anything from the google IP range. If I go to http://mail.google.com/mail/ though, it logs a "Moved Temporarily" response, goes through https://accounts.google.com and then hangs the same way, so the logging at least works and process explorer shows the connections correctly.
Comment 38 Patrick McManus [:mcmanus] PTO until Sep 6 2012-06-28 12:51:28 PDT
(In reply to bugz from comment #37)
> It's not a big deal, I was just a bit confused at first but I think I can
> live without spdy for a while ;) I actually did post in hope to provide some
> help with catching the bug. Unfortunately I don't have the time to do
> extensive testing right now. But I do have a question: how can I be sure
> that restarting Firefox doesn't affect it if I try for example reproducing
> it without addons or with HTTP logging? Since that fixed it last time for a
> while.
> 

re addons: you are right it would be ambiguous. I had misunderstood you to indidcate that it reliably reproduced at least fairly quickly for you.

re: logging - that won't fix anything, it will just log stuff and I want to see the log with the error. But if it takes a long time to produce then that isn't a really great option either as the logs can be quite verbose and slow.

> As I still have the current session open with the bug happening, I did a
> quick test and if I go to https://mail.google.com/mail/ Firefox doesn't open
> any new connections (as seen from process explorer) and Wireshark doesn't
> log anything from the google IP range. If I go to
> http://mail.google.com/mail/ though, it logs a "Moved Temporarily" response,
> goes through https://accounts.google.com and then hangs the same way, so the
> logging at least works and process explorer shows the connections correctly.

that's interesting information. It supports my basic assumption about the area of code that is failing - the new connection logic. and that indeed is a bit different in beta/14 so there is hope there even if I don't know what the bug might be in particular.

can you use the netstat command in windows to see if you've got any connections open to mail.google.com port 443? (I know it isn't making any new ones, I want to know if there is an old one.. maybe in a strange state).
Comment 39 bugz 2012-06-28 13:27:49 PDT
(In reply to Patrick McManus [:mcmanus] from comment #38)
> can you use the netstat command in windows to see if you've got any
> connections open to mail.google.com port 443? (I know it isn't making any
> new ones, I want to know if there is an old one.. maybe in a strange state).

nope, no mail.google.com or anything related.
Comment 40 Patrick McManus [:mcmanus] PTO until Sep 6 2012-06-28 15:19:08 PDT
here are two possible sources in my mind

#1 is the spdy restriction code that tries to make sure we only have 1 spdy connection open at a time to the same host in order to maximize the multiplexing opportunities

#2 is the general AtActiveConnectionLimit code that limits us to 6 conns per host/port

both of those would cause us to not open a new connection when a new transaction is presented to necko. So they fit the facts.

I favor the interpretation of #1 just because I can't see anything special about #2 in this context. But I'll admit that's mostly gut.

In the context of #1, there are 2 paths to the restriction being enforced.. in the first one there are (what we believe) to be live connection[s].. one of them has pass the "candirectlyactivate" check to enforce the restriction.. anything that had been zombied for a long time would fail this due to the ping timer code (not to mention that it should be actively taken down when the socket went away).. 

the second path to the restriction being enforced is the presence of a halfopen connection (i.e. a tcp connection in progress).. those aren't verified in any way and they don't have explicit timers on them. For that reason I find it more plausible that somehow one of those objects, which should be rather ephemeral, is getting stuck.

At least that's my leading theory - and I need some kind of theory. (though I'd rather have an HTTP log :))

Here's what we can speculatively do quite safely and backport:

* add some JS console based warning level logging into these paths. The only reporters we have either can't get HTTP logs or can't reliably reproduce so the size of an HTTP log makes them untennable.

* abandon half opens at some point (even 2 minutes).. we've already got the timer tick to do it with. 

Barring objection, I'll put together that patch.
Comment 41 Virgil Dicu [:virgil] [QA] 2012-06-29 01:13:17 PDT
Removing qawanted for the moment, as there is not much QA can do here now - Ioana could not reproduce with the network configuration we have here (comment 32). Juan either in comment 23.
Comment 42 Honza Bambas (:mayhemer) 2012-06-29 08:12:26 PDT
Add logging.  This is my suggestion.  Console may not be the best option, since it cannot be simply filtered.  Having about:network that lists at least pending transactions, how long they are pending and reason (!) they are pending is IMO better.  It should be very simple to introduce this code, you can just enumerate the arrays of http conn manager, everything is there!

The complete list of features:
- list pending halfopens and their state (what IP does it connect to right now)
- list open connections that are active and idle (how long they are idle, etc)
- list spdy sessions and streams + their states and times of being open
- list pending and running transactions, why are pending (what piece of code decided to let a trans wait)

Simple enums of these in tabbed sections is a luxury already.  Having a usage tree would be awesome.  That is what I would like to do in bug 765694.  Anything basic we start here can be a base for it.
Comment 43 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-02 08:11:37 PDT
(In reply to Matt G, Mozilla SUMO (irc: Matt_G) from comment #34)
> I've got a SUMO user experiencing this issue who is willing to work one on
> one. I'll email Patrick his contact information.

this user became wfm before he could add any new information.
Comment 44 Matt Grimes [:Matt_G] 2012-07-09 16:52:16 PDT
Hey Patrick. I've got a couple more volunteers willing to help. I'll forward you their information as soon as I have it.
Comment 45 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-10 05:51:01 PDT
bug 770264 landed something that could help debug this even without an NSPR log. That's on FF16 and FF15 only.

To use it if this happens to you - open the JS console and then open about:config. Find network.http.diagnostics and flip it to true - that will trigger a dump of a whole bunch of stuff to the JS console. paste that here!

(a google summer of code project is working on better way with an about:network page, but this gives us something to work with right now.)
Comment 46 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-19 07:01:01 PDT
*** Bug 762025 has been marked as a duplicate of this bug. ***
Comment 47 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-19 07:25:01 PDT
Created attachment 643839 [details] [diff] [review]
patch 0

as we saw in 762025, under extremely rare circumstances its possible for a half open to never resolve itself naturally - nsSocketTransport appears to not be calling OnOutputStreamReady for it. If that happens RestrictConnections() will stay restricted - causing a deadlock to the host.

This is some pretty simple code that after a high timeout will close the pending connection and if it still isn't resolved a few seconds later will abandon it. I've set the timer at 90 seconds - so its well clear of any normal operating conditions, but I've tested it at very low values and everything works as you would expect.

This should defensively get us out of this particular hole. The code is just additive, so its a lot safer to backport than any changes to nsSocketTransport, which are kind of scary to contemplate. (but obviously we still need to look there - its pretty hard to figure out exactly what happens in all states of socketTransport when it gets an error).
Comment 48 Honza Bambas (:mayhemer) 2012-07-19 15:14:27 PDT
Comment on attachment 643839 [details] [diff] [review]
patch 0

Review of attachment 643839 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +1729,5 @@
>  void
>  nsHttpConnectionMgr::StartedConnect()
>  {
>      mNumActiveConns++;
> +    ActivateTimeoutTick();

I don't see an opposite Deactivate call for this one.

@@ +2203,5 @@
> +    if (ent->mHalfOpens.Length()) {
> +        TimeStamp now = TimeStamp::Now();
> +        double maxConnectTime = gHttpHandler->ConnectTimeout();  /* in milliseconds */
> +
> +        for (PRInt32 index = ent->mHalfOpens.Length() - 1; index >= 0; --index) {

for (PRUint32 index = ent->mHalfOpens.Length(); index > 0;
{
  --index;
  ..
}

@@ +2215,5 @@
> +                if (half->SocketTransport())
> +                    half->SocketTransport()->Close(NS_ERROR_NET_TIMEOUT);
> +                if (half->BackupTransport())
> +                    half->BackupTransport()->Close(NS_ERROR_NET_TIMEOUT);
> +            }

I'd rather see this encapsulated in nsHalfOpenSocket.  CheckTimeout(now, maxTime, maxTimeToAbandone)?  But up to you.

@@ +2219,5 @@
> +            }
> +
> +            // If this half open hangs around for 5 seconds after we've closed() it
> +            // then just abandon the socket.
> +            if (delta > maxConnectTime + 5000) {

Maybe have a pref for this (5000) too?

@@ +2563,5 @@
> +        mEnt->mHalfOpens.RemoveElement(this);
> +        if (!mEnt->UnconnectedHalfOpens())
> +            // in case this reverted RestrictConnections()
> +            gHttpHandler->ConnMgr()->ProcessPendingQForEntry(mEnt);
> +    }

Have a method for this?  Could be useful on other places too.
Comment 49 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-19 18:32:00 PDT
> ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
> @@ +1729,5 @@
> >  void
> >  nsHttpConnectionMgr::StartedConnect()
> >  {
> >      mNumActiveConns++;
> > +    ActivateTimeoutTick();
> 
> I don't see an opposite Deactivate call for this one.

The corresponding function is RecvdConnect and it already contains a call to ConditionallyStopReadTimeoutTick()

ActivateTimeoutTick wasn't needed here before this patch because the tick only referenced mActiveConns.. now of course it references mHalfOpens so it needs to be live while those are in progress too. For the vast majority of cases of course this will make no difference because the connections are so much faster than the tick that the tick will generally see the length of half opens as 0.

in a different patch we can rename readtimeouttick to just timeouttick .. I don't want to do it here because I think we will want to backport this one and that would touch a lot more lines.
Comment 50 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-19 19:31:20 PDT
(In reply to Honza Bambas (:mayhemer) from comment #48)
> 
> @@ +2219,5 @@
> > +            }
> > +
> > +            // If this half open hangs around for 5 seconds after we've closed() it
> > +            // then just abandon the socket.
> > +            if (delta > maxConnectTime + 5000) {
> 
> Maybe have a pref for this (5000) too?

I decided not to.. we're really looking for "a few seconds later" and I don't see any situation where it should be locally tweaked.
Comment 51 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-20 06:00:49 PDT
Created attachment 644278 [details] [diff] [review]
patch v1
Comment 52 Honza Bambas (:mayhemer) 2012-07-21 15:07:19 PDT
Comment on attachment 644278 [details] [diff] [review]
patch v1

Review of attachment 644278 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab with these comments and remaining comments from comment 48.

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +1729,5 @@
>  void
>  nsHttpConnectionMgr::StartedConnect()
>  {
>      mNumActiveConns++;
> +    ActivateTimeoutTick();

Add a comment that RecvdConnect() turns it off again.

::: netwerk/protocol/http/nsHttpHandler.cpp
@@ +1149,5 @@
> +    if (PREF_CHANGED(HTTP_PREF("connection-timeout"))) {
> +        rv = prefs->GetIntPref(HTTP_PREF("connection-timeout"), &val);
> +        if (NS_SUCCEEDED(rv))
> +            // the pref is in seconds, but the variable is in milliseconds
> +            mConnectTimeout = clamped(val, 1, 0xffff) * 1000;

There is NSPR const for the 1000.
Comment 53 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-23 16:20:08 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/7039771a5329

if there is a non-connection-throttle problem mixed in here too (there might be) we'll open a different issue.
Comment 54 Ed Morley [:emorley] 2012-07-24 02:58:36 PDT
https://hg.mozilla.org/mozilla-central/rev/7039771a5329
Comment 55 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-27 08:33:43 PDT
Comment on attachment 644278 [details] [diff] [review]
patch v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): original spdy problem
User impact if declined: rare situation where spdy-hosts cannot be connected to again without a browser restart. (twitter/google/soon FB). probably linked to network disruption (perhaps as simple as laptop sleep mode).
Testing completed (on m-c, etc.): on m-c since 7/24
Risk to taking this patch (and alternatives if risky): moderate risk. its fairly small and defensive in nature. should be a nop if the critical condition does not arrive.
String or UUID changes made by this patch: none
Comment 56 Alex Keybl [:akeybl] 2012-07-27 10:52:20 PDT
Comment on attachment 644278 [details] [diff] [review]
patch v1

[Triage Comment]
Given moderate risk, let's land on Aurora ahead of next week's Beta.
Comment 57 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-27 17:43:14 PDT
ff16
https://hg.mozilla.org/releases/mozilla-aurora/rev/068e4ca57007
Comment 58 Alex Keybl [:akeybl] 2012-07-30 12:29:25 PDT
Comment on attachment 644278 [details] [diff] [review]
patch v1

[Triage Comment]
No new surprises on nightly/aurora, so let's uplift to beta in time for tomorrow's go to build. Please land asap.
Comment 59 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-30 13:24:10 PDT
https://hg.mozilla.org/releases/mozilla-beta/rev/0db5f9580a05
Comment 60 Virgil Dicu [:virgil] [QA] 2012-08-17 06:07:58 PDT
Nobody could reproduce the original issue on our end, but I can confirm that google services and twitter with spdy enabled (gmail, docs, calendar, google plus etc.) work fine in 15b5 - with our hardware, at least. Navigated through given options, sent e-mails, loaded long posts on Google plus and basically walked around other google services.

Checked with Ubuntu, Mac and Windows 7. I'll remove the verifyme keyword, but will wait for someone who encountered the problem or Sumo guys to report this as fixed with the status flag.

Is someone has some extra pointers for verification, will be happy to give it a try.
Comment 61 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-08-17 09:46:13 PDT
I'll flag this one as [qa?] just in case there is follow-up for QA.
Comment 62 Patrick McManus [:mcmanus] PTO until Sep 6 2012-08-27 04:47:36 PDT
this was backed out of ff15 as it required follown fixes that weren't done on beta.

backout
https://hg.mozilla.org/releases/mozilla-beta/rev/ac27ec3a105c

followon fixes bug https://bugzilla.mozilla.org/show_bug.cgi?id=780522

Note You need to log in before you can comment on or make changes to this bug.