Closed Bug 614677 Opened 14 years ago Closed 14 years ago

Connection is reset message appears intermittently after landing of bug 592284

Categories

(Core :: Networking: HTTP, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- final+

People

(Reporter: danne.da, Assigned: mcmanus)

References

Details

(Keywords: regression, Whiteboard: [http-conn])

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b8pre) Gecko/20101124 Firefox/4.0b8pre
Build Identifier: 

Ever since bug 602284 landed on trunk I've been getting "The Connection Was Reset" message, but only intermittently. It doesn't seem specific to any site, but it appears to be more frequent to sites which have a high ping, perhaps something around 250 ms and above.

I've not been able to reproduce it at will.


Reproducible: Sometimes
Confirmed on Win7 as well. Lots of people seeing it on mozillazine; http://forums.mozillazine.org/viewtopic.php?p=10161971#p10161971

Seeing the problem itself on the mozillazine site itself.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Mac OS X → All
Version: unspecified → Trunk
I have not seen this at all. Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b8pre) Gecko/20101124 Firefox/4.0b8pre (Windows 7 Pro x64).
Status: NEW → UNCONFIRMED
Ever confirmed: false
Confirmed. >Win 7x64 latest nightly.
I'm traveling for the holiday and won't be able to look at it thoroughly until monday. I haven't seen it myself, but I'll spend some time with mozillazine.
Would the reporter try this build:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mcmanus@ducksong.com-2be20026e175/

That build removes just the reclaim logic associated with surplus extra connections. If that does successfully isolate the problem we can build something a little more nuanced for merge.
The root cause here seems to be that nsSocketTransport::IsAlive(), which is called before a persistent connection is used does not detect a server generated FIN as I had thought it did.

I need to figure out if that is a bug in IsAlive() or not.

I have a patch that works around the issue by seeting the isResued flag on the connection - that lets the http transaction send it a second time if it gets a RST the first time (which makes sense to deal with the inherent reuse race condition - so I'll keep it no matter what), but it would be better to avoid that path if we know the fin has already arrived.
(In reply to comment #6)
> The root cause here seems to be that nsSocketTransport::IsAlive(), which is
> called before a persistent connection is used does not detect a server
> generated FIN as I had thought it did.
>

you can scratch this comment - I mistakenly drew this conclusion due to how I was using my debugger. My err. I think the patch I will post shortly addresses the problem.
track the patch for this in 613977
Depends on: 613977
(In reply to comment #8)
> track the patch for this in 613977

This means that the try build you posted on 613977 comments should solve the bug 614677 ? Because it doesn't for me.
(In reply to comment #9)
> (In reply to comment #8)
> > track the patch for this in 613977
> 
> This means that the try build you posted on 613977 comments should solve the
> bug 614677 ? Because it doesn't for me.

yes, it should. Sorry to hear you're still having problems.

Does the build in Comment 5 of this bug solve the problem for you? (that's not a real fix, but it will help identify what might be causing the issue).

Do you have any advice on steps to reproduce? For instance, an earlier commenter mentioned forums.mozillazine.org and that was helpful to me even though it wasn't an exact recipe.
Yes the build in Comment 5 helps a lot (I have just reinstalled this build to tell you after some testing if it solves totally or partially the problem, I will let you know)

I can't tell exactly steps helping to reproduce the bug but I have noticed it is often happening when I try to open several tabs almost at same time (for example when opening in a new tabs different links to read).

And as told before, when THE tab is hanging, all the other are "waiting".
I'm sorry to be so imprecise.

Hope it helps (a little ;-)
Well, unfortunately, some bad news, even with the Comment 5 builds, it's still happening a lot after some browsing time, maybe a clue ?
To be more precise, it mainly stucks to "Connecting..." every time and not necessary when opening a new tab (but often).
(In reply to comment #13)
> To be more precise, it mainly stucks to "Connecting..." every time and not
> necessary when opening a new tab (but often).

other than the retry timeout, do you have any non-default configuration networking preferences? I'm thinking specifically about the various connection maximums, but any non-standard ones would be interesting.
I've identified a case where the backup socket can exceed global connection limits in a way that will lockup some of the normal socket allocations (at least until the connection manager times out some idle ones).

the fix means getting an nshttpconnection() object from the server manager when the backup socket is created, subject to its limits, and not just when it is being recycled.

That would certainly explain the lingering weirdness.
In: https://bugzilla.mozilla.org/show_bug.cgi?id=613977#c31

there is a new patch and URL for a try server build (which is just getting started as I write this) which I hope resolves the lingering problems. Please give it a try when its ready.
Blocks: 592284
Keywords: regression
Hope I'm posting in the right thread this time.

This tryserver build seems to be an improvement over it's predecessor. After several hours of use, I haven't seen any connection reset messages. I also haven't seen any of the noticeable tab hangs I was getting before.
(In reply to comment #18)
> Hope I'm posting in the right thread this time.
> 
> This tryserver build seems to be an improvement over it's predecessor. After
> several hours of use, I haven't seen any connection reset messages. I also
> haven't seen any of the noticeable tab hangs I was getting before.

great!

zouk?
Sorry for the late reply Patrick, I wanted to be sure before to give you my feelings/testing, and there are also good so far !
I do not see as before any tab "stucks" or other connection problems (so far again).
For me also, this tryserver build is BIG improvement. Thank you a lot for  you continuous work ! :-)
If anything changes, I will let you know.
Justed wanted to add that this build is still working nicely and I'm not ready to update it until patch is included in official nightly build ;-)
Sorry to bother you Patrick, I have a question because I'm not very aware of the "patching including process".
I would like to ask you when can we expect to see your work included in a official nightly release ?
(In reply to comment #22)
> Sorry to bother you Patrick, I have a question because I'm not very aware of
> the "patching including process".
> I would like to ask you when can we expect to see your work included in a
> official nightly release ?

the patch in 613977 needs a review and also needs to be approved for gecko 2.0.. It is flagged as needing both, so folks will get there as soon as they can. I think I read that the list of candidate blockers for 2.0 is being re-triaged today.

so I think the answer is "soonish"
I'm not sure this is the same problem but after landing of https://bugzilla.mozilla.org/show_bug.cgi?id=592284 , i can't use the browser for more that 4/5 hours. I've to restart it because it's unable to connect to anything, it gets stuck in looking up or connecting phase. Looks like it's out of sockets
Status: UNCONFIRMED → NEW
blocking2.0: --- → ?
Ever confirmed: true
Disables underlying feature by setting default of 
pref("network.http.connection-retry-timeout", 0);

If you have a local value for this pref the feature will still be enabled.

a=shaver
Keywords: checkin-needed
Assignee: nobody → mcmanus
Pref turned off for beta8:

  http://hg.mozilla.org/mozilla-central/rev/0a9e64523c06

Should this bug be closed now?  Or is there work here beyond what bug 613977 will fix?
Thanks jason, that should fix it for default configs - the rest can live in 613977. Relieved to see it make it in b8.
Status: NEW → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
blocking2.0: ? → final+
Whiteboard: [http-conn]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: