Closed Bug 724519 Opened 10 years ago Closed 10 years ago
syn retry causes the connection to hang
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0) Gecko/20100101 Firefox/10.0 Steps to reproduce 1.start up Firefox with a new profile. 2.Go to "https://addons.mozilla.org/en-US/firefox/" and open up any extension's page. (i.e."https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/") (It could happen on any website actually, but it is the only one where I could find easy to reproduce.) 3.Leave that tab open. 4.Make sure the startup option is set to "show my windows and tabs from last time". 5.Clear the disk cache and shut down Firefox. 6.Start up Firefox again. 7.While the first tab is loading "https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/"(from example), immediately open up a new tab and go to any website(your homepage would be fine). You must do this "before" the first tab finishes loading, or the bug doesn't reproduce. Reproducible:Environment dependent Setting "network.http.connection-retry-timeout" to "0(disable)" seems to fix the problem. This thing has been bugging me quite a long time and I did a thorough research on it. What I found out is, it's more likely to happen on Wireless Lan system. The difficult thing about this is that it doesn't always happen even when you're on the environment where you can encounter this bug. Sometimes it works and sometimes it doesn't. I thought the affected component would be "networking" but I selected "general" as I couldn't find it in the list. Sorry in advance if I made some kind of mistake because this is my first time filing a bug.
I forgot to add what the bug actually does. When the bug happens, Firefox stops loading and it doesn't resume loading until you close the tab that caused the stop(in above example, it's "https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/").
Quoting a comment from bug592284 "It seems this causes some problems. It is hard to reproduce (takes a while to manifest), but since this landed I ended up with a sort of hanged connection state a few times. When this happens, all loading tabs stay as "Connecting...". Only way to get things moving is to close the tab that caused the stall, but there is no way to find out which it was, so it's a matter of luck (ie closing one by one, until it starts working again). Today it did hang because of a download, the download was running fine (and I tried IE, and that was loading just fine too), but nothing else could be loaded in Minefield until the download finished (20 minutes)."
Component: General → Networking: HTTP
Product: Firefox → Core
QA Contact: general → networking.http
Appending more information to STR. 7.While the first tab is loading "https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/"(from example), immediately open up a new tab and go to any website(your homepage would be fine). If it loaded fine, repeat the same thing over and over(Open up another tab and try again). You must do this "before" the first tab finishes loading, or the bug doesn't reproduce. Sometimes, just few tabs(1~3tabs) won't be enough to reproduce the bug. Leave few more tabs with any websites open in addtion to the first tab that's supposed to cause the bug and start over from Step 5. I found out that sometimes Firefox takes unnaturally long time to load "reviews" on the addon page and when this happens,the bug above almost always reproduces. I don't know how to reproduce that yet, but it's more likely to happen when loading the page for the first time after the boot of Firefox with its cache cleared.
More update on STR "immediately open up a new tab and go to any website(your homepage would be fine)." Regarding this part, "About:home" does not reproduce the bug, so I recommend changing your homepage to "Google.com" or whatever your country's google is in order to make it easier to reproduce it.
@Marc42410 - thanks for filing the bug and putting such energy into the STR. unforunately, despite spending the morning trying to reproduce this I cannot yet confirm it myself. I've tried it on windows 7, I've tried it on linux, I've tried it with syn-retry set to a hair trigger (1), set to the default, and set to double (500). I've tried it on wired, I've tried it on wireless, I've even tried it on a packet shaped network that added all kinds of latency and packet drops. I feel like a dr seuss book :) it all works ok for me. would you be able to reproduce with HTTP Logging enabled? ( https://developer.mozilla.org/en/HTTP_Logging ) - that might be our best path forward.
can you also try it with nightly? Your test case involves ssl and there are some pretty interesting ssl changes in between the release the bug is filed against and nightly.
This is a http log for Bug 724519 which includes a reproduction of the bug.
@Patrick - Thank you so much for your efforts, and telling me how to get a log for http. Thanks to you, I think I've caught it in the act. I attached the log file, so please take a look at it when you have the time. I did this test with ""https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/" and "http://www.google.co.jp/"(Google.com redirects to co.jp because I'm in Japan.). What I did in the test follows the STR, but I closed "https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/" at the end of the test in order to record how closing it solves the problem, though you might know when you check out the log. As for nightly, I haven't been able to reproduce it after half an hour of testing, but since this bug comes and goes as I wrote "sometimes it works and sometimes it doesn't", I can't be sure if it was fixed. I think I'll do more testing and post a follow-up later.
@Patrick - thanks again. I need to give the log more study - but is there any chance you can try your repro on nightly? One of the issues at hand is the interaction of SSL and various threading/blocking states (especially involving OCSP). If this is resolved because of those changes (which mostly happened on 12 iirc) then we can just be done with it. my guess is this isn't really syn-retry specific but that feature is putting pressure on one of the connection counters and triggering a deadlock.
This http log includes another reproduction case for Bug724519.
I repeated the STR on both Firefox 10.0.1 and Nightly build. Firefox 10.0.1 reproduced the bug 4 times out of 10(4/10). Nightly 13.0a1(2/10) reproduced the bug 0 times out of 30(0/30). and I accidently found another page where the bug can reproduce. So I made another http log and attached it. I don't know if this one has anything to do with SSL just like the previous case, but again I did a testing on this page following the STR on nightly and repeated it 10 times and the bug didn't reproduce. I would call this "fixed". What do you think?
The comment glitched. "Nightly 13.0a1(2/10) reproduced the <missing bug number> times out of 30(0/30)." the correct sentence is "Nightly 13.0a1(2/10) reproduced zero times out of 30(0/30)."
excellent - thanks for all the help.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.