Closed Bug 782570 Opened 12 years ago Closed 12 years ago

FF14 does not close network connections correctly (no TCP FIN packet sent)

Categories

(Core :: Networking: HTTP, defect)

14 Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla17
Tracking Status
firefox14 --- wontfix
firefox15 --- wontfix
firefox16 --- wontfix
firefox17 --- verified

People

(Reporter: dpp, Assigned: mcmanus)

Details

Attachments

(2 files)

Attached file ff14-tcp-fin-bug.zip
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.57 Safari/537.1

Steps to reproduce:

open connection (XMLHttpRequest or EventSource)
wait for data
close connection
>> TCP FIN packet should be sent <<

to demostrate


Actual results:

Firefox 14
root@dpp-thinkpad /home/dpp/nginx/html> tcpdump -n -i lo 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
12:38:13.268730 IP 127.0.0.1.46503 > 127.0.0.1.80: Flags [P.], seq 49322342:49322678, ack 1914234413, win 256, options [nop,nop,TS val 697211607 ecr 697200172], length 336
12:38:13.308646 IP 127.0.0.1.80 > 127.0.0.1.46503: Flags [.], ack 336, win 256, options [nop,nop,TS val 697211647 ecr 697211607], length 0
12:38:14.271288 IP 127.0.0.1.80 > 127.0.0.1.46503: Flags [P.], seq 1:265, ack 336, win 256, options [nop,nop,TS val 697212609 ecr 697211607], length 264
12:38:14.271314 IP 127.0.0.1.46503 > 127.0.0.1.80: Flags [.], ack 265, win 256, options [nop,nop,TS val 697212609 ecr 697212609], length 0
12:38:15.270952 IP 127.0.0.1.80 > 127.0.0.1.46503: Flags [P.], seq 265:301, ack 336, win 256, options [nop,nop,TS val 697213609 ecr 697212609], length 36
12:38:15.270966 IP 127.0.0.1.46503 > 127.0.0.1.80: Flags [.], ack 301, win 256, options [nop,nop,TS val 697213609 ecr 697213609], length 0
12:38:15.271175 IP 127.0.0.1.46503 > 127.0.0.1.80: Flags [R.], seq 336, ack 301, win 256, options [nop,nop,TS val 697213609 ecr 697213609], length 0
^C
7 packets captured
14 packets received by filter
0 packets dropped by kernel
root@dpp-thinkpad /home/dpp/nginx/html>


Expected results:

Firefox 13
root@dpp-thinkpad /home/dpp/nginx/html> tcpdump -n -i lo 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
12:39:42.232148 IP 127.0.0.1.46548 > 127.0.0.1.80: Flags [P.], seq 4236839640:4236839976, ack 4085474941, win 256, options [nop,nop,TS val 697300570 ecr 697293162], length 336
12:39:42.271645 IP 127.0.0.1.80 > 127.0.0.1.46548: Flags [.], ack 336, win 256, options [nop,nop,TS val 697300610 ecr 697300570], length 0
12:39:43.235369 IP 127.0.0.1.80 > 127.0.0.1.46548: Flags [P.], seq 1:265, ack 336, win 256, options [nop,nop,TS val 697301573 ecr 697300570], length 264
12:39:43.235394 IP 127.0.0.1.46548 > 127.0.0.1.80: Flags [.], ack 265, win 256, options [nop,nop,TS val 697301573 ecr 697301573], length 0
12:39:43.729816 IP 127.0.0.1.46548 > 127.0.0.1.80: Flags [F.], seq 336, ack 265, win 256, options [nop,nop,TS val 697302068 ecr 697301573], length 0
12:39:43.730045 IP 127.0.0.1.80 > 127.0.0.1.46548: Flags [F.], seq 265, ack 337, win 256, options [nop,nop,TS val 697302068 ecr 697302068], length 0
12:39:43.730084 IP 127.0.0.1.46548 > 127.0.0.1.80: Flags [.], ack 266, win 256, options [nop,nop,TS val 697302068 ecr 697302068], length 0
^C
7 packets captured
14 packets received by filter
0 packets dropped by kernel
root@dpp-thinkpad /home/dpp/nginx/html>
Thanks for the report - that's interesting. The issue isn't really the lack of fin, but rather the rst that prempts it, yes?

almost certainly firefox is closing the connection in 14 and finding some unconsumed data on the socket - that generates the resest. two questions: what is the data, and why is ff14 initiating the close instead of the server as in your ff13 trace.

it would help to have a plug and play str (test case) for this as the server data seems to play a role.
This is really about lack of FIN - back-end can't detect that connection is actually closed. RST is a result of back-end writing to closed socket.

In FF13 client is initiating close as you can see in logs above:
12:39:43.729816 IP 127.0.0.1.46548 > 127.0.0.1.80: Flags [F.], seq 336, ack 265, win 256, options [nop,nop,TS val 697302068 ecr 697301573], length 0

As you can see in attached sample of nodejs backend - data is just a ping message once a second. Live example: http://my.dpp.su/badoo/ff14-fin-bug/
This is minimal sample to reproduce the problem we found streaming data to client.

Internally, we have `one connection` limit in application, that's why when client calls close() then open() it stops working in FF14 because back-end discards second connection (not knowing that previous one is actually closed) due to app logic.
OS: Linux → All
Hardware: x86_64 → All
related bug https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/1031750 (but this affects all OS)
By the way, FF15beta does not have this bug
xhr is to a never ending chunked response and the js contains this:

setTimeout(function(){t.abort();}, 1500);

in ff13/15/17 it seems to work and generates a normal fin from firefox at that t=~1500. in ff14 that close doesn't happen and when new data comes in from the server at t=2100 a close is triggered but that new data is apparently not fully consumed which makes the rst happen.

q 1: what's up with the timer and is it already really fixed?

q 2: can we drain data off of http sockets before actively closing them to lower the chances of a rst. Note that this is a common, but racy, strategy and if you've got a never ending response stream there is always going to be some chance of a rst.
Component: Networking: HTTP → Preferences: Backend
(In reply to Patrick McManus [:mcmanus] from comment #4)

> q 1: what's up with the timer and is it already really fixed?

Even though this is WFM on 15, I'm glad I looked into it - there's a real bug here.

It is related to the wider use of pipeline datastructures in FF14.

The change from 760608 (included in FF15) means we use those datastructures a little less - your XHR code is one of those cases so you saw the regression disappear. But the bug is still present in the pipeline code - it probably has been there for years - and we still use those data structures in lots of places in >= 15 (yes, even when pipelining is disabled) (just not with xhr or some other similar things). So there is a fix that needs to be made. Thanks!

The issue is basically that a canceled transaction from a pipeline lets the connection hang until activity if that cancelation reduces the pipeline to 0 transactions. (i.e. if it consists only of 1 transaction to start with).

I'll put together a patch.
Assignee: nobody → mcmanus
Assignee: mcmanus → nobody
Component: Preferences: Backend → Networking: HTTP
thanks for the participation!
we'll have to fix it some how on backend - 150mln worldwide instant messagening network is affected...
instead of closing new connections as we do for correct situations we will ping old ones and wait about a second for rst to understand which connection is really alive
Attached patch patch 0Splinter Review
Assignee: nobody → mcmanus
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #653004 - Flags: review?(jduell.mcbugs)
Attachment #653004 - Flags: review?(jduell.mcbugs) → review+
https://hg.mozilla.org/mozilla-central/rev/a1d9dc563731
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
@dpp, can you please confirm this is fixed with the latest Firefox 17 builds from aurora.mozilla.org? Thanks.
yes, it is fixed (i have checked firefox-17.0a2.en-US.linux-x86_64)
(In reply to dpp from comment #12)
> yes, it is fixed (i have checked firefox-17.0a2.en-US.linux-x86_64)

Thanks a lot for helping with the verification.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: