Closed Bug 762162 Opened 12 years ago Closed 12 years ago

Cannot connect to Gmail and Twitter with SPDY on

Categories

(Core :: Networking: HTTP, defect)

13 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17
Tracking Status
firefox13 + ---
firefox14 + wontfix
firefox15 + wontfix
firefox16 --- fixed
firefox17 --- fixed

People

(Reporter: Matt_G, Assigned: mcmanus)

References

Details

(Whiteboard: [spdy][qa?])

Attachments

(1 file, 1 obsolete file)

This issue came in on the SUMO forums today. You can find the post here: https://support.mozilla.org/en-US/questions/928731 The system details are listed in the ticket as well. Let me know if you need more information. I can reach out to the user for more details.
Tracking for release of FF13/14, and adding qawanted to see if we're able to reproduce.
Keywords: qawanted
This may not be a 13-specific thing, in spite of that user's claims. I can't open gmail on nightly starting in the last 3 days on a Mac.
After a suggestion from Jet: I can't repro in Private browsing only when I'm not in private browsing mode.
I cleared all my google cookies and cache and it's still broken. Next stop: new profile, I guess.
We've got several more threads popping up on SUMO for this issue. It doesn't seem to be isolated to XP as initially thought, so we might want to change the title. I've got reports that it is impacting both gmail and twitter. I'm requesting more information. You can find the threads here: https://support.mozilla.org/en-US/questions/928888 https://support.mozilla.org/en-US/questions/928182 This post and a few others seem top indicate there might be a network.http.keep-alive tie in to the issue: https://support.mozilla.org/en-US/questions/928888#answer-339456
Ok, here's my matrix: http.keep-alive true, spdy true: not working EXCEPT it works in private browsing. (Yes, I tried clearing cache, cookies, etc etc) http.keep-alive false, spdy true: not working EXCEPT in private browsing http.keep-alive true, spdy false: works. http.keep-alive true, spdy false: works. I do sometimes get this (except in the private browsing case): Some Gmail features have failed to load due to an Internet connectivity problem. If this problem persists, try reloading the page, or using the basic HTML version. Learn More.
ccw - can you get a clean profile (all default config) and produce a HTTP NSPR log of the failure? https://developer.mozilla.org/en/HTTP_Logging The non-default keep-alive=false issue can be worked under a different (less impt) bug. I'm actually going on vacation the next 10 days, so some patience might be required. Also, if you can try a debug nightly then you can get the SSL session key you are using and include that with a wireshark capture corresponding the nspr log. That would be key information as well. https://developer.mozilla.org/en/NSS_Key_Log_Format
track any issues with keep-alive=false over in bug 762650 we'll use this bug for tracking failures with the default config, such as :Cww indicates. Best not to confuse them. Obviously accessing gmail WFM so a STR (perhaps environmental) would be good - but the logging from comment 8 would also be helpful.
Component: General → Networking: HTTP
Product: Firefox → Core
QA Contact: general → networking.http
Whiteboard: [spdy]
Honza - Patrick may be unavailable for a week or so, can you investigate here? We need a resolution quickly.
Assignee: nobody → honzab.moz
Summary: SPDY on Win XP SP3 cannot connect to Gmail → Cannot connect to Gmail with SPDY on
OS: Mac OS X → All
I can't make the problem happen with a clean profile.
And something changed and now I can't repro with my regular profile. I wonder if it was the browser restart that did it.
My vague memory remembers that QA saw a dup of this bug in an unconfirmed bug triage. Let me see if I can find it...
(In reply to Jason Smith [:jsmith] from comment #13) > My vague memory remembers that QA saw a dup of this bug in an unconfirmed > bug triage. Let me see if I can find it... Found it. It's a different bug, but also has to do with gmail. bug 758453
(In reply to Jason Smith [:jsmith] from comment #14) > (In reply to Jason Smith [:jsmith] from comment #13) > > My vague memory remembers that QA saw a dup of this bug in an unconfirmed > > bug triage. Let me see if I can find it... > > Found it. It's a different bug, but also has to do with gmail. bug 758453 Although that bug indicates it's unrelated to SPDY now.
@Cww: please see https://bugzilla.mozilla.org/show_bug.cgi?id=762650#c4 I'm not able to reproduce with keep-alive = true. Cww, if you are, please give us more info (http log is the best). (In reply to [:Cww] from comment #12) > And something changed and now I can't repro with my regular profile. I > wonder if it was the browser restart that did it. If you were changing the prefs at run-time (w/o browser restart) it could potentially influence the test and you could just experience bug 762650. If you cannot reproduce anymore, then the test results reported in comment 6 could be non-precise.
Hm... so I can't repro but I was running with default settings when this first showed up.
(In reply to [:Cww] from comment #17) > Hm... so I can't repro but I was running with default settings when this > first showed up. were you using FF13 at that time? I believe there is an unrelated issue that is only on FF15/FF16.
I'm on nightly. Have a bug for the "unrelated issue"?
(In reply to [:Cww] from comment #19) > I'm on nightly. Have a bug for the "unrelated issue"? bug 762025
matt_g - now that :Cww has been identified as a nightly user, are there confirmed cases of FF13 uses with default prefs having this issue? (or if not confirmed - suspected?) If so - can honza or I get in touch with such a user? If not, then I think it makes sense to dup this to 762650 and reopen if necessary.
Hardware: x86 → All
Summary: Cannot connect to Gmail with SPDY on → Cannot connect to Gmail and Twitter with SPDY on
so far, i have seen two forum users who reportedly had connection issues with SPDY & all network.http. settings on default https://support.mozilla.org/questions/928182#answer-339495 https://support.mozilla.org/questions/928888#answer-340283
I've spent some time trying to reproduce this on a Win7 machine using Fx13, and I haven't been able to see this problem. Given the last comments in bug 762650, do we still need qawanted on this bug?
(In reply to juan becerra [:juanb] from comment #23) > I've spent some time trying to reproduce this on a Win7 machine using Fx13, > and I haven't been able to see this problem. Given the last comments in bug > 762650, do we still need qawanted on this bug? given comment 22, yes. bug 762500 is about a specific pref configuration that comment 22 says is not in play for this bug.
(In reply to Patrick McManus [:mcmanus] from comment #24) > given comment 22, yes. bug 762500 is about a specific pref configuration > that comment 22 says is not in play for this bug. Do you have any suggestions for attempting to reproduce? Blindly logging into Gmail/Twitter with the same configuration/connection doesn't sound like it'll get us any more leads.
the user from my second link in comment 22 has provided more details about his situation: adding attachments in gmail does not work with spdy enabled (so this seems to be related to upload) & only when the win xp laptop is connected via a HUAWEI k3565 mobile broadband usb dongle. when the same machine is accessing gmail on a different network attachments in gmail will work normally with spdy enabled.
(In reply to philipp from comment #26) > the user from my second link in comment 22 has provided more details about > his situation: > adding attachments in gmail does not work with spdy enabled (so this seems > to be related to upload) & only when the win xp laptop is connected via a > HUAWEI k3565 mobile broadband usb dongle. when the same machine is accessing > gmail on a different network attachments in gmail will work normally with > spdy enabled. Patrick - does that make sense to you? Does certain networking HW not work properly with SPDY? We can order the USB dongle to reproduce the issue, in that case.
(In reply to Alex Keybl [:akeybl] from comment #27) > (In reply to philipp from comment #26) > > the user from my second link in comment 22 has provided more details about > > his situation: > > adding attachments in gmail does not work with spdy enabled (so this seems > > to be related to upload) & only when the win xp laptop is connected via a > > HUAWEI k3565 mobile broadband usb dongle. when the same machine is accessing > > gmail on a different network attachments in gmail will work normally with > > spdy enabled. > > Patrick - does that make sense to you? Does certain networking HW not work > properly with SPDY? It would be really weird, but is plausible if the hardware is broken in a peculiar way. See a problem chrome had in a similar vein: http://code.google.com/p/chromium/issues/detail?id=69813 > > We can order the USB dongle to reproduce the issue, in that case. its worth trying. I'm trying to get a handle on how many people are seeing this problem, using default prefs, in 13 (or 14) as opposed to nightly/aurora. I'm aware of 2. Is that a massive mis-statement, and if we've got more can I contact them directly somehow?
There are definitely more than 2. There are at least 55 me too votes this week. I've asked if any users are willing to work with us directly. I'll let you know if I get any offers.
another case with the same symptoms (adding attachments to gmail won't work with spdy enabled), but very different configuration: win7 64bit, firefox 13.0.1, router: D-Link DR655, ISP: Cox Communications, security: ms security essentials https://support.mozilla.org/questions/930282
(In reply to philipp from comment #30) > another case with the same symptoms (adding attachments to gmail won't work > with spdy enabled), but very different configuration: > win7 64bit, firefox 13.0.1, router: D-Link DR655, ISP: Cox Communications, > security: ms security essentials > https://support.mozilla.org/questions/930282 Ha! I have the exact same hardware :(
Mozilla/5.0 (Windows NT 6.1; rv:14.0) Gecko/20100101 Firefox/14.0 (20120619191901) Firefox 14.0 beta 8, Windows 7 32-bit, Sonic Wall + switch, ISP: RDS (I'm not using any routers). I couldn't reproduce this bug at all on the above environment. I couldn't reproduce it on the 13.0 RC either.
(In reply to Patrick McManus [:mcmanus] from comment #31) > Ha! I have the exact same hardware :( Given our inability to reproduce with other affected HW, and the fact that we may not actually be able to hook up to 2G/3G using the HUAWEI k3565 in America, I'm going hold off on ordering the device.
I've got a SUMO user experiencing this issue who is willing to work one on one. I'll email Patrick his contact information.
I'm having this exact bug. Restarting Firefox fixed it temporarily for me, but then I started googling for a solution when it happened again. All network.http.* settings are default. Changing spdy to disabled, it immediately starts working (even if the tab was trying to load while changing the settings). So, yeah, can't access gmail at all with spdy enabled. Win7 64bit, Firefox 13.0.1, non-wireless connection (fiber AFAIK) I have a ton of extensions so usually I'd just blame them, but since disabling spdy fixes it immediately and others have the same bug, I don't think it's that.
(In reply to bugz from comment #35) > I'm having this exact bug. Restarting Firefox fixed it temporarily for me, > but then I started googling for a solution when it happened again. All I'm sorry about the trouble you're seeing - but I really appreciate that you've stepped forward! There is some important data I need to see from someone having the problem, I hope you can help. First, please do the experiment with disabling your addons and seeing if that helps. No matter where the blame lies, it could certainly be a problem with that interaction and I'd want to dig into it as best as I could. Second, I'd love to see an HTTP LOG that covers the period of when this happens.. It's a little bit of a pain to do, but a fair amount of time it contains enough information to at least define a next step of investigation.. see https://developer.mozilla.org/en/HTTP_Logging for instructions. Lastly, I'm pretty interested to know if this is reproducible on the beta firefox. The beta does connection dispatch in a different way and its possible that is the problem area. The easiest way to install the beta is just from a zip (http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/14.0b9-candidates/build1/win32/en-US/firefox-14.0b9.zip).. you can run it right from the expanded folder (as long as your other firefox is shutdown) and then just remove the folder when you're done. Thanks for any help you can provide!
It's not a big deal, I was just a bit confused at first but I think I can live without spdy for a while ;) I actually did post in hope to provide some help with catching the bug. Unfortunately I don't have the time to do extensive testing right now. But I do have a question: how can I be sure that restarting Firefox doesn't affect it if I try for example reproducing it without addons or with HTTP logging? Since that fixed it last time for a while. As I still have the current session open with the bug happening, I did a quick test and if I go to https://mail.google.com/mail/ Firefox doesn't open any new connections (as seen from process explorer) and Wireshark doesn't log anything from the google IP range. If I go to http://mail.google.com/mail/ though, it logs a "Moved Temporarily" response, goes through https://accounts.google.com and then hangs the same way, so the logging at least works and process explorer shows the connections correctly.
(In reply to bugz from comment #37) > It's not a big deal, I was just a bit confused at first but I think I can > live without spdy for a while ;) I actually did post in hope to provide some > help with catching the bug. Unfortunately I don't have the time to do > extensive testing right now. But I do have a question: how can I be sure > that restarting Firefox doesn't affect it if I try for example reproducing > it without addons or with HTTP logging? Since that fixed it last time for a > while. > re addons: you are right it would be ambiguous. I had misunderstood you to indidcate that it reliably reproduced at least fairly quickly for you. re: logging - that won't fix anything, it will just log stuff and I want to see the log with the error. But if it takes a long time to produce then that isn't a really great option either as the logs can be quite verbose and slow. > As I still have the current session open with the bug happening, I did a > quick test and if I go to https://mail.google.com/mail/ Firefox doesn't open > any new connections (as seen from process explorer) and Wireshark doesn't > log anything from the google IP range. If I go to > http://mail.google.com/mail/ though, it logs a "Moved Temporarily" response, > goes through https://accounts.google.com and then hangs the same way, so the > logging at least works and process explorer shows the connections correctly. that's interesting information. It supports my basic assumption about the area of code that is failing - the new connection logic. and that indeed is a bit different in beta/14 so there is hope there even if I don't know what the bug might be in particular. can you use the netstat command in windows to see if you've got any connections open to mail.google.com port 443? (I know it isn't making any new ones, I want to know if there is an old one.. maybe in a strange state).
(In reply to Patrick McManus [:mcmanus] from comment #38) > can you use the netstat command in windows to see if you've got any > connections open to mail.google.com port 443? (I know it isn't making any > new ones, I want to know if there is an old one.. maybe in a strange state). nope, no mail.google.com or anything related.
here are two possible sources in my mind #1 is the spdy restriction code that tries to make sure we only have 1 spdy connection open at a time to the same host in order to maximize the multiplexing opportunities #2 is the general AtActiveConnectionLimit code that limits us to 6 conns per host/port both of those would cause us to not open a new connection when a new transaction is presented to necko. So they fit the facts. I favor the interpretation of #1 just because I can't see anything special about #2 in this context. But I'll admit that's mostly gut. In the context of #1, there are 2 paths to the restriction being enforced.. in the first one there are (what we believe) to be live connection[s].. one of them has pass the "candirectlyactivate" check to enforce the restriction.. anything that had been zombied for a long time would fail this due to the ping timer code (not to mention that it should be actively taken down when the socket went away).. the second path to the restriction being enforced is the presence of a halfopen connection (i.e. a tcp connection in progress).. those aren't verified in any way and they don't have explicit timers on them. For that reason I find it more plausible that somehow one of those objects, which should be rather ephemeral, is getting stuck. At least that's my leading theory - and I need some kind of theory. (though I'd rather have an HTTP log :)) Here's what we can speculatively do quite safely and backport: * add some JS console based warning level logging into these paths. The only reporters we have either can't get HTTP logs or can't reliably reproduce so the size of an HTTP log makes them untennable. * abandon half opens at some point (even 2 minutes).. we've already got the timer tick to do it with. Barring objection, I'll put together that patch.
Removing qawanted for the moment, as there is not much QA can do here now - Ioana could not reproduce with the network configuration we have here (comment 32). Juan either in comment 23.
Keywords: qawanted
Add logging. This is my suggestion. Console may not be the best option, since it cannot be simply filtered. Having about:network that lists at least pending transactions, how long they are pending and reason (!) they are pending is IMO better. It should be very simple to introduce this code, you can just enumerate the arrays of http conn manager, everything is there! The complete list of features: - list pending halfopens and their state (what IP does it connect to right now) - list open connections that are active and idle (how long they are idle, etc) - list spdy sessions and streams + their states and times of being open - list pending and running transactions, why are pending (what piece of code decided to let a trans wait) Simple enums of these in tabbed sections is a luxury already. Having a usage tree would be awesome. That is what I would like to do in bug 765694. Anything basic we start here can be a base for it.
(In reply to Matt G, Mozilla SUMO (irc: Matt_G) from comment #34) > I've got a SUMO user experiencing this issue who is willing to work one on > one. I'll email Patrick his contact information. this user became wfm before he could add any new information.
Depends on: 770264
Hey Patrick. I've got a couple more volunteers willing to help. I'll forward you their information as soon as I have it.
bug 770264 landed something that could help debug this even without an NSPR log. That's on FF16 and FF15 only. To use it if this happens to you - open the JS console and then open about:config. Find network.http.diagnostics and flip it to true - that will trigger a dump of a whole bunch of stuff to the JS console. paste that here! (a google summer of code project is working on better way with an about:network page, but this gives us something to work with right now.)
Depends on: 775508
Depends on: 775515
Assignee: honzab.moz → mcmanus
Attached patch patch 0 (obsolete) — Splinter Review
as we saw in 762025, under extremely rare circumstances its possible for a half open to never resolve itself naturally - nsSocketTransport appears to not be calling OnOutputStreamReady for it. If that happens RestrictConnections() will stay restricted - causing a deadlock to the host. This is some pretty simple code that after a high timeout will close the pending connection and if it still isn't resolved a few seconds later will abandon it. I've set the timer at 90 seconds - so its well clear of any normal operating conditions, but I've tested it at very low values and everything works as you would expect. This should defensively get us out of this particular hole. The code is just additive, so its a lot safer to backport than any changes to nsSocketTransport, which are kind of scary to contemplate. (but obviously we still need to look there - its pretty hard to figure out exactly what happens in all states of socketTransport when it gets an error).
Attachment #643839 - Flags: review?(honzab.moz)
Comment on attachment 643839 [details] [diff] [review] patch 0 Review of attachment 643839 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp @@ +1729,5 @@ > void > nsHttpConnectionMgr::StartedConnect() > { > mNumActiveConns++; > + ActivateTimeoutTick(); I don't see an opposite Deactivate call for this one. @@ +2203,5 @@ > + if (ent->mHalfOpens.Length()) { > + TimeStamp now = TimeStamp::Now(); > + double maxConnectTime = gHttpHandler->ConnectTimeout(); /* in milliseconds */ > + > + for (PRInt32 index = ent->mHalfOpens.Length() - 1; index >= 0; --index) { for (PRUint32 index = ent->mHalfOpens.Length(); index > 0; { --index; .. } @@ +2215,5 @@ > + if (half->SocketTransport()) > + half->SocketTransport()->Close(NS_ERROR_NET_TIMEOUT); > + if (half->BackupTransport()) > + half->BackupTransport()->Close(NS_ERROR_NET_TIMEOUT); > + } I'd rather see this encapsulated in nsHalfOpenSocket. CheckTimeout(now, maxTime, maxTimeToAbandone)? But up to you. @@ +2219,5 @@ > + } > + > + // If this half open hangs around for 5 seconds after we've closed() it > + // then just abandon the socket. > + if (delta > maxConnectTime + 5000) { Maybe have a pref for this (5000) too? @@ +2563,5 @@ > + mEnt->mHalfOpens.RemoveElement(this); > + if (!mEnt->UnconnectedHalfOpens()) > + // in case this reverted RestrictConnections() > + gHttpHandler->ConnMgr()->ProcessPendingQForEntry(mEnt); > + } Have a method for this? Could be useful on other places too.
Attachment #643839 - Flags: review?(honzab.moz)
> ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp > @@ +1729,5 @@ > > void > > nsHttpConnectionMgr::StartedConnect() > > { > > mNumActiveConns++; > > + ActivateTimeoutTick(); > > I don't see an opposite Deactivate call for this one. The corresponding function is RecvdConnect and it already contains a call to ConditionallyStopReadTimeoutTick() ActivateTimeoutTick wasn't needed here before this patch because the tick only referenced mActiveConns.. now of course it references mHalfOpens so it needs to be live while those are in progress too. For the vast majority of cases of course this will make no difference because the connections are so much faster than the tick that the tick will generally see the length of half opens as 0. in a different patch we can rename readtimeouttick to just timeouttick .. I don't want to do it here because I think we will want to backport this one and that would touch a lot more lines.
Blocks: 775813
(In reply to Honza Bambas (:mayhemer) from comment #48) > > @@ +2219,5 @@ > > + } > > + > > + // If this half open hangs around for 5 seconds after we've closed() it > > + // then just abandon the socket. > > + if (delta > maxConnectTime + 5000) { > > Maybe have a pref for this (5000) too? I decided not to.. we're really looking for "a few seconds later" and I don't see any situation where it should be locally tweaked.
Attached patch patch v1Splinter Review
Attachment #643839 - Attachment is obsolete: true
Attachment #644278 - Flags: review?(honzab.moz)
Comment on attachment 644278 [details] [diff] [review] patch v1 Review of attachment 644278 [details] [diff] [review]: ----------------------------------------------------------------- r=honzab with these comments and remaining comments from comment 48. ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp @@ +1729,5 @@ > void > nsHttpConnectionMgr::StartedConnect() > { > mNumActiveConns++; > + ActivateTimeoutTick(); Add a comment that RecvdConnect() turns it off again. ::: netwerk/protocol/http/nsHttpHandler.cpp @@ +1149,5 @@ > + if (PREF_CHANGED(HTTP_PREF("connection-timeout"))) { > + rv = prefs->GetIntPref(HTTP_PREF("connection-timeout"), &val); > + if (NS_SUCCEEDED(rv)) > + // the pref is in seconds, but the variable is in milliseconds > + mConnectTimeout = clamped(val, 1, 0xffff) * 1000; There is NSPR const for the 1000.
Attachment #644278 - Flags: review?(honzab.moz) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/7039771a5329 if there is a non-connection-throttle problem mixed in here too (there might be) we'll open a different issue.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Comment on attachment 644278 [details] [diff] [review] patch v1 [Approval Request Comment] Bug caused by (feature/regressing bug #): original spdy problem User impact if declined: rare situation where spdy-hosts cannot be connected to again without a browser restart. (twitter/google/soon FB). probably linked to network disruption (perhaps as simple as laptop sleep mode). Testing completed (on m-c, etc.): on m-c since 7/24 Risk to taking this patch (and alternatives if risky): moderate risk. its fairly small and defensive in nature. should be a nop if the critical condition does not arrive. String or UUID changes made by this patch: none
Attachment #644278 - Flags: approval-mozilla-beta?
Attachment #644278 - Flags: approval-mozilla-aurora?
Comment on attachment 644278 [details] [diff] [review] patch v1 [Triage Comment] Given moderate risk, let's land on Aurora ahead of next week's Beta.
Attachment #644278 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 644278 [details] [diff] [review] patch v1 [Triage Comment] No new surprises on nightly/aurora, so let's uplift to beta in time for tomorrow's go to build. Please land asap.
Attachment #644278 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Blocks: 780522
Keywords: verifyme
Nobody could reproduce the original issue on our end, but I can confirm that google services and twitter with spdy enabled (gmail, docs, calendar, google plus etc.) work fine in 15b5 - with our hardware, at least. Navigated through given options, sent e-mails, loaded long posts on Google plus and basically walked around other google services. Checked with Ubuntu, Mac and Windows 7. I'll remove the verifyme keyword, but will wait for someone who encountered the problem or Sumo guys to report this as fixed with the status flag. Is someone has some extra pointers for verification, will be happy to give it a try.
Keywords: verifyme
QA Contact: virgil.dicu
I'll flag this one as [qa?] just in case there is follow-up for QA.
Whiteboard: [spdy] → [spdy][qa?]
this was backed out of ff15 as it required follown fixes that weren't done on beta. backout https://hg.mozilla.org/releases/mozilla-beta/rev/ac27ec3a105c followon fixes bug https://bugzilla.mozilla.org/show_bug.cgi?id=780522
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: