Closed Bug 92224 Opened 21 years ago Closed 20 years ago

8.2.4 Client Behavior if Server Prematurely Closes Connection

Categories

(Core :: Networking: HTTP, defect, P2)

x86
Windows 2000
defect

Tracking

()

RESOLVED FIXED
mozilla0.9.4

People

(Reporter: ian, Assigned: darin.moz)

References

()

Details

(Keywords: testcase, Whiteboard: [Hixie-P2])

Attachments

(5 files)

If an HTTP/1.1 client sends a request which includes a request body, but which 
does not include an Expect request-header field with the "100-continue" 
expectation, and if the client is not directly connected to an HTTP/1.1 origin 
server, and if the client sees the connection close before receiving any status 
from the server, the client SHOULD retry the request. If the client does retry 
this request, it MAY use the following "binary exponential backoff" algorithm 
to be assured of obtaining a reliable response:

      1. Initiate a new connection to the server

      2. Transmit the request-headers

      3. Initialize a variable R to the estimated round-trip time to the
         server (e.g., based on the time it took to establish the
         connection), or to a constant value of 5 seconds if the round-
         trip time is not available.

      4. Compute T = R * (2**N), where N is the number of previous
         retries of this request.

      5. Wait either for an error response from the server, or for T
         seconds (whichever comes first)

      6. If no error response is received, after T seconds transmit the
         body of the request.

      7. If client sees that the connection is closed prematurely,
         repeat from step 1 until the request is accepted, an error
         response is received, or the user becomes impatient and
         terminates the retry process.

If at any point an error status is received, the client

      - SHOULD NOT continue and

      - SHOULD close the connection if it has not completed sending the
        request message.

Currently, we try to rerequest the resource about a gazillion times a second, 
resulting in my server's error log growing to an unearthly (and expensive) size.

STEPS TO REPRODUCE
   See http://www.hixie.ch/tests/adhoc/http/error/nph-001.cgi
   This test script simulates a failure on the server side: it doesn't return
   anything (not even headers). This kind of failure occurs, for example, when
   an nph script has a bug. (This is how I found the bug in the first place.)

ACTUAL RESULTS
   The test script aborts after 100 requests are sent in less than 25 seconds.

EXPECTED RESULTS
   See above for a quote from the HTTP spec. Other browsers don't handle this
   in a particularly stellar way either, but they at least don't overload the
   server like we do:

     WinIE6-pre: Shows a "cannot find server or DNS error" message after sending
       2 requests back to back.
     Opera5: Sends 5 requests over a period of 3 seconds then shows an alert 
       dialog saying "Connection closed by remote server".
     MacIE5: Displays an error dialog after sending 2 reqests back to back.
     iCab2.51-pre: Sends 1 request and then does nothing (no user feedback).
     Escape4.7: Displays a blank page after sending 3 requests back to back.
     Lynx2.8.3-pre: Displays 2 status line alerts after sending 1 request.

This is a particularly important bug because it is only likely to manifest 
itself when the server is in trouble, and it results in the server being in even
more trouble. Basically, it makes the server-side "drop all incoming connections
in an effort to save ourselves" tactic backfire horribly.

(Note: If you are trying this in other browsers, make sure to clear their cache
between each attempt. Some browsers do not follow the spec's recommendation of
not caching query-based GETs.)
Keywords: testcase
Whiteboard: [Hixie-P2]
-> darin, -> major

darin, we could use mReuseCount to determine if this is a case where wewant to
detect this for timed out keep-alives, if we incremented that variable at some
point.
Assignee: neeti → darin
Severity: normal → major
bbaetz: as we talked about that won't work since NSS is using this "feature" for
handling TLS intolerant servers.

i'd like to start by simply limiting the number of consecutive restarts.  what's
a good number?  perhaps no more than 2 consecutive restart attempts?

-> 0.9.4
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.4
The spec says:

... "and if the client is not directly connected to an HTTP/1.1 origin server,"
What does this mean? If we haven't got a connection, we can't know if it was 1.1
or not. Is this section only meant to apply if we're talking to proxies. Does it
imply that if we were to do fallback to different servers, we SHOULD retry the
original proxy at least once? (That makes sense, I guess)

I'd say 2 would be fine, if we're not going to do exponential backoff. If we
are, maybe 5? Or maybe try for n seconds (however many retries that involves,
following this algorithm), up to a maximum number of requests.

From Ian's findings, 2-3 would be enough though. We should check if the
behaviour is different for proxy servers.
*** Bug 92685 has been marked as a duplicate of this bug. ***
Priority: -- → P2
btw: the exponential backoff algorithm described in section 8.2.4 applies to
requests which contain a message body.  however, this bug deals with GET
requests.  the "restart feature" currently implemented doesn't really have
anything to do with section 8.2.4.  instead, it simply has to do with the way
TCP/IP works... that is, we may be able to successfully write to a socket which
the server has closed, but will only discover this when we try to read. 
therefore, the "restart feature" is necessary, especially when using persistent
connections.

we have to be careful when limiting the number of restart attempts b/c we have
to allow for each idle connection to fail in this manner.  since we currently
allow up to 8 persistent connections per server, and because PSM is utilizing
this "feature", i'm going to submit a patch which limits the number of restarts
to be no more than 10 (i'd like to allow an additional restart just for good
measure).  this patch will include a preference to configure this number.
Keywords: patch
+    PRUint16      MaxRequestAttempts() { return (PRUint16) mMaxRequestAttempts; }

Why the cast? Yes, this will never overflow, but still, store it with the same
type that we'll be using it as. Or just return it as a 32 bit number.
Ditto for mHttpVersion (Although that was there before)
yeah.. just be'in a bit sloppy.  wanted to avoid adding a local variable when
calling GetIntPref, but of course that's really not the right trade-off ;-)
I'm not too sure about the change to use bitfields - wouldn't this be slower?

I'll apply the patch tomorrow, but apart from that it looks good.
yeah.. actually, using bitfields is more costly (an extra instruction on x86). 
perhaps it not worth the savings of a couple bytes.
r=bbaetz if you run the patch through john taylor's tests.
results from jtaylor's page loading test shows minor to no improvement.
http://g.mcom.com/~jtaylor/necko/bin/userData.pl?user=darin@netscape.com
sr=mscott
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
*** Bug 90976 has been marked as a duplicate of this bug. ***
QA Contact: benc → tever
You need to log in before you can comment on or make changes to this bug.