Last Comment Bug 809984 - HTTP Connections drain socket input before close to avoid RST
: HTTP Connections drain socket input before close to avoid RST
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: unspecified
: x86_64 Linux
-- normal (vote)
: mozilla19
Assigned To: Patrick McManus [:mcmanus]
:
: Jason Duell [:jduell] (needinfo me)
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-08 11:57 PST by Patrick McManus [:mcmanus]
Modified: 2012-11-09 07:31 PST (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch 0 (1.84 KB, patch)
2012-11-08 12:02 PST, Patrick McManus [:mcmanus]
cbiesinger: review+
Details | Diff | Splinter Review

Description User image Patrick McManus [:mcmanus] 2012-11-08 11:57:13 PST
If you close a socket with data pending on it in the kernel (but unconsumed by necko), a rst is generated to the peer.

that's a problem when the peer gets that rst and still has unconsumed data in its kernel buffers - that data is typically dumped.

Consider a spdy session that wants to finish up because 

0] we read a GOAWAY from the peer.

1] we generates our own GOAWAY frame with information interesting to the other server (last good id and reason code)

2] it sends that

3] it closes the socket

But that [0] GOAWAY was probably followed by a SSL Application Alert about the SSL session being shutdown in one direction. This data remains on the socket buffer when [3] happens and a RST is immediately generated.

And the race is on. If the RST gets received by the server's kernel before the userspace application on the server consumes the GOAWAY then the GOAWAY information is lost :(

This problem isn't limited to SPDY (as that SSL Alert message occurs in HTTPS too), but the synchronous nature of HTTP/1 makes it less likely that the unorderly shutdowns would impact anything outside of debugging.

I'm going to propose to work around this at the HTTP layer, not the transport layer. My patch does this just by a non blocking short read of the socket we're trying to close to pump any data (within reason) up from the kernel. The HTTP fix is kind of just a 99% fix (leaving the possibility of a race condition still alive while the socket transport layer could handle this with a real lingering close), but I'm not convinced that this is the right behavior for any arbitrary protocol to silently absorb.
Comment 1 User image Patrick McManus [:mcmanus] 2012-11-08 12:02:26 PST
Created attachment 679773 [details] [diff] [review]
patch 0
Comment 2 User image Honza Bambas (:mayhemer) 2012-11-08 12:23:16 PST
Proper use of shutdown() in Necko would be nicer IMO.
Comment 3 User image Patrick McManus [:mcmanus] 2012-11-08 12:36:25 PST
(In reply to Honza Bambas (:mayhemer) from comment #2)
> Proper use of shutdown() in Necko would be nicer IMO.

of course, but I already addressed this obliquely. PSM doesn't even hook shutdown and http shouldn't be dealing with timers et al to be sure this is done, and its not clear this is a cross protocol requirement. It gets complicated quickly.

we need to take the easy win in stuff like this and move on without feeling guilty about it.
Comment 4 User image Christian :Biesinger (don't email me, ping me on IRC) 2012-11-08 12:46:20 PST
Comment on attachment 679773 [details] [diff] [review]
patch 0

Why the total < 8000 check?

I'd move at least the SetEventSink call before the read, because there is no point in generating the progress callbacks here.
Comment 5 User image Patrick McManus [:mcmanus] 2012-11-08 17:51:17 PST
(In reply to Christian :Biesinger (don't email me, ping me on IRC) from comment #4)
> Comment on attachment 679773 [details] [diff] [review]
> patch 0
> 
> Why the total < 8000 check?
> 

The kernel is executing in parallel, so its conceivable this would never complete if something could fill the buffer as fast as we can read from it.. if you assume the kernel will get a higher priority than userspace on a really stressed system it starts to sound plausible enough to at least put a limit of some kind in there.. I made it 64K in my update.
Comment 6 User image Patrick McManus [:mcmanus] 2012-11-08 18:13:53 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/f3fa8d88ab2d

Note You need to log in before you can comment on or make changes to this bug.