Last Comment Bug 248827 - Support HTTP/1.1 408 response code [was: 408 request timeout on a used, persistent, keep-alive connection is mistakenly used as the response on a subsequent request]
: Support HTTP/1.1 408 response code [was: 408 request timeout on a used, persi...
Status: RESOLVED FIXED
: helpwanted
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: All All
: -- major (vote)
: mozilla1.8beta2
Assigned To: Darin Fisher
:
: Patrick McManus [:mcmanus]
Mentors:
https://bankieren.rabobank.nl/rib/
: 243495 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-06-27 11:01 PDT by Ruud van Melick
Modified: 2006-03-12 17:40 PST (History)
2 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
mozilla nsHttp:5 logfile that demonstrates the problem (38.36 KB, text/plain)
2004-06-27 11:11 PDT, Ruud van Melick
no flags Details
v1 patch (6.23 KB, patch)
2005-04-29 22:55 PDT, Darin Fisher
no flags Details | Diff | Splinter Review
nsHttp:5,nsSocketTransport:5 logfile for first patch giving segfault (56.32 KB, text/plain)
2005-04-30 08:01 PDT, Ruud van Melick
no flags Details
v2 patch (4.93 KB, patch)
2005-04-30 15:22 PDT, Darin Fisher
cbiesinger: review+
bzbarsky: superreview+
shaver: approval1.8b2+
shaver: approval1.8b3+
Details | Diff | Splinter Review

Description Ruud van Melick 2004-06-27 11:01:45 PDT
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040624 Debian/1.7-2
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040624 Debian/1.7-2

What appears to happen is this:
1. GET https://bankieren.rabobank.nl/rib/
2. 200 OK response
3. Persistent, keep-alive connection remains open
4. 90 seconds later, the server sends a 408 request timeout
5. POST https://bankieren.rabobank.nl/rib/qslad.htm (by pressing the 'ga verder'
button)
6. Instead of actually doing a POST request, the 408 request body (received in
step 5) is shown immediately. I've used sslsniff
(http://www.thoughtcrime.org/ie.html) to check when the 408 response arrives
(after 90 sec.) and to see if a POST request was sent (it isn't).


Reproducible: Always
Steps to Reproduce:
0. Make sure keep-alive is enabled in the preferences
1. Visit: https://bankieren.rabobank.nl/rib/
2. Fill the input fields with numbers (normally accountnr + access code)
3. Wait 90 seconds (server side request timeout)
4. Press the "ga verder" button (= continue)

Actual Results:  
An error message appears on an orange background:
"Welkom bij de Rabobank. Om technische redenen is het door u gekozen onderdeel
van de site momenteel niet bereikbaar."

[translated: Welcome to the Rabobank. For technical reasons the section of the
site you chose is momentarily unreachable.]

Expected Results:  
That page is meant for logging in on your personal bank account information, so
if the account-number/access-code combination entered was correct, you, the
visitor, are logged in and see you account information etc.
And of course if the user/pass combination entered is incorrect, you are once
again asked to log in and an appropriate error code/message is shown at the top
of the login page.

Some questions arise. Bear in mind that I know nothing of Mozilla internals, so
I'm just thinking aloud.

Q: May the server send a 408 request timeout if it wishes to close the
persistent keep-alive connection?
A: RFC 2616, section 8.14 says: "When a client or server wishes to time-out it
SHOULD issue a graceful close on the transport connection" [1]. On the
ietf-http-wg mailinglist I found two relevant messages that clarify the
'graceful close' part: "close the TCP connection, don't RESET it." [2] and "The
phrase means that the agent should try to close its connection in a way that
lets any packets in transit reach their destination." [3]
I'm not sure if this implies that sending a "408 Request Timeout" response is
not allowed for persistent, keep-alive connections.

Q: Should a "408 Request Timeout" response be ignored by the client if all
previous requests sent over that same persistent connection were answered and
the only reason for keeping the connection alive is for possible future requests?
A: I think such a response can (should?) be safely ignored by the client,
because (i assume) the persistent connection is closed by the server and the
client immediately after receiving the 408 response.

Q: How can it be that a request (POST in this case, but I've seen it happen with
GET requests also) is not sent, but instead an unexpected server response that
was received earlier is used as a response to that request? 
A: ... ? 

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.4
[2] http://lists.w3.org/Archives/Public/ietf-http-wg-old/1998JanApr/0475.html
[3] http://lists.w3.org/Archives/Public/ietf-http-wg-old/1998JanApr/0614.html

PS. I've set severity to 'Major' because the website that demonstrates this
problem is the largest Internet banking website in Europe (if I'm informed
correctly) and people experience this problem very often when using the default
settings in Mozilla and Firefox.
Comment 1 Ruud van Melick 2004-06-27 11:11:25 PDT
Created attachment 151822 [details]
mozilla nsHttp:5 logfile that demonstrates the problem

While testing, I've used the default cache and keep-alive settings. Disabling
all cache settings I could find (about:config, filter on 'cache') did not
affect the result. I did switch off image loading, to keep the logfile as short
as possible.

Something I forgot to mention: after pressing the 'ga verder' button on the
website, which should issue a POST request, the 408 response is shown, BUT...
in the address bar, the correct URL is shown (for the POST request):
https://bankieren.rabobank.nl/rib/qslad.htm
Comment 2 Christian :Biesinger (don't email me, ping me on IRC) 2004-06-27 11:27:15 PDT
>Q: How can it be that a request (POST in this case, but I've seen it happen with
>GET requests also) is not sent

that's trivially explained. mozilla does not read from the socket when it does
not expect any data. then, it sends the request. only then does it start
reading, and see the server's 408 response.

from [2]:
>	i) "When a client or server wishes to time-out, it   SHOULD issue a
>	   graceful close on the transport   connection".
>    
>	Does this imply some sort of action at the http level? 
>	That is, should a 4xx (or 5xx) response be sent? 
>
>No, this is about the transport connection.  Since some people believe
>that other transport protocols besides TCP might be used, there was
>some pressure to avoid specific discussions about TCP here.

that seems to be pretty clear.

>the 408 response is shown, BUT...
>in the address bar, the correct URL is shown (for the POST request):

yes. mozilla thinks that the server responded with 408 to the POST request.


probably mozilla should follow the SHOULD clause of rfc 2616:
>                                      Clients and servers SHOULD both
>   constantly watch for the other side of the transport close, and
>   respond to it as appropriate.
Comment 3 Ruud van Melick 2004-06-27 15:43:45 PDT
Contrary to what I first thought happened, it looks like the POST request *is*
sent. I can't be sure, because of the SSL-encryption; I'm just looking at the
TCP flags and packet sizes:

tcpdump shows:
0 seconds: TCP connection is established and the page is loaded
90 seconds: Server sends 408 response, ends with a packet where the FIN flag is
set, client sents ACK, so the connection is now in CLOSE_WAIT state (confirmed
by checking with netstat).
120 seconds: I submit the form, client sends a 897 byte packet, followed by a
263 byte packet with FIN flag set. Server responds with ACK.

The last 2 packets the client (mozilla) sends possibly contain the POST request.
I do wonder why these are sent using a connection that is already in CLOSE_WAIT
state for 30sec? And also: why did the client did not send a FIN in response
immediately after receiving a FIN from the server?

I also looked at what happened if I didn't submit the form after the 408 request
was received: after a bit more than 300 seconds, mozilla closes the connection
with a RST packet.

Those 300 seconds are the value set for network.http.keep-alive.timeout, so I
tried setting that to 60 seconds, which makes the mozilla timeout the connection
before the server does. When the timeout occurs, mozilla sends a FIN packet,
server reacts with ACK/FIN, mozilla sends ACK, so the connection is immediately
closed. I expected the same thing to happen in situations where it is the server
who sends the first FIN packet.

I would like to emphasize that although the test case used here ends with a POST
request, the exact same thing would have happened on a GET request (and probably
other request types), but I can't find an easy test case for that situation.
Comment 4 Darin Fisher 2004-06-28 14:15:36 PDT
RFC 2616 is very limited in what it has to say about a 408 response code:

   10.4.9 408 Request Timeout

      The client did not produce a request within the time that the server
      was prepared to wait. The client MAY repeat the request without
      modifications at any later time.

It does not say that the client MUST repeat the request, and it is not very
specific about when a server might send a 408 response.

We have code that re-sends requests on new connections when certain errors
occur.  Perhaps we should just add 408 as another trigger for such cases.
Comment 5 Ruud van Melick 2004-06-29 12:32:07 PDT
(In reply to comment #4)
> We have code that re-sends requests on new connections when certain errors
> occur.  Perhaps we should just add 408 as another trigger for such cases.

In this case the 408 error occurs *before* sending the request for the first
time. If mozilla can spot the error early enough and can react by opening a new
connection, re-sending is not needed. This is especially important for POST
requests:

  "Non-idempotent methods or sequences MUST NOT be automatically retried,
   although user agents MAY offer a human operator the choice of retrying
   the request(s)."                   -- RFC 2616, section 8.1.4

I think Christian (comment #2) is right when suggesting mozilla follows:

  "Clients and servers SHOULD both constantly watch for the other side
   of the transport close, and respond to it as appropriate."
                                      -- RFC 2616, section 8.1.4

Is is possible to first check if the server closed the connection (by sending a
408 or otherwise) before sending the next request? 

As I understand it, when using keep-alive without pipelining, a new request is
not sent over the persistent connection until the response to the previous
request has been received. If, after receiving the response, something else
follows, that's probably the server sending an error message and/or closing the
connection, right?

When pipelining is enabled, the previous paragraph should still apply to at
least POST requests (and other non-idempotent methods):

  "Clients SHOULD NOT pipeline requests using non-idempotent methods or
   non-idempotent sequences of methods (see section 9.1.2). [...]
   A client wishing to send a non-idempotent request SHOULD wait to send
   that request until it has received the response status for the previous
   request.                           -- RFC 2616, section 8.1.2.2

I assume that errors (i.e. server closing the connection) occurring while the
persistent connection is actively used are already detected soon enough by
mozilla. That leaves errors that occur when a persistent connection is
temporarily *not* actively used (active = busy sending requests, waiting for or
receiving requested responses).

SUMMARY: if all previous requests (pipelined or not) on a persistent connection
have been completely responded to, then before sending a new request over that
same connection, mozilla should first check if the server closed the connection
and open a new connection if necessary.

Re-sending a request should only necessary if the 408 error arrives while the
request is being sent (or in response to a request), but not when the 408 is
received before the request is first sent.
Comment 6 Darin Fisher 2004-06-29 15:31:00 PDT
The challenge is that we do not read data from the sockets until we have sent a
request.  Idle sockets are monitored to see if they have closed, but we do not
monitor them to see if the server has sent us a 408.  Fixing this requires some
non-trivial changes IMO.  That doesn't mean we shouldn't do it, but it means
that it will take a fair amount of work to implement.

I agree that we should not solve this by re-issuing requests.
Comment 7 Christian :Biesinger (don't email me, ping me on IRC) 2004-09-03 05:25:36 PDT
(In reply to comment #6)
> The challenge is that we do not read data from the sockets until we have sent a
> request.  Idle sockets are monitored to see if they have closed, but we do not
> monitor them to see if the server has sent us a 408.  Fixing this requires some
> non-trivial changes IMO.

hm... shouldn't it be possible to notice that an idle socket has data available
to be read (PR_POLL_READ flag for PR_Poll), and if so, assume some unexpected
thing happened and close the socket? also, maybe the http code should check if
unexpected data is left in the socket before issuing a new request on it?
Comment 8 Darin Fisher 2005-04-29 21:48:38 PDT
*** Bug 243495 has been marked as a duplicate of this bug. ***
Comment 9 Darin Fisher 2005-04-29 22:55:39 PDT
Created attachment 182236 [details] [diff] [review]
v1 patch

This is a prototype patch.  I could really use some help testing this.
Comment 10 Ruud van Melick 2005-04-30 08:01:35 PDT
Created attachment 182270 [details]
nsHttp:5,nsSocketTransport:5 logfile for first patch giving segfault

I applied the patch to Firefox 1.0.3 in Debian (apt-src install
mozilla-firefox; [apply patch]; apt-src build mozilla-firefox) and it gave a
segmentation fault when clicking a link/form-button on a webpage after the 408
response is received. Logfile attached. Clicking earlier (before 408 timeout is
ever sent) still works fine.
Comment 11 Darin Fisher 2005-04-30 12:28:59 PDT
Thanks for giving this patch a try.  Investigating...
Comment 12 Darin Fisher 2005-04-30 12:30:44 PDT
I think the problem is that the HTTP connection object is not getting closed.
Comment 13 Darin Fisher 2005-04-30 15:22:20 PDT
Created attachment 182294 [details] [diff] [review]
v2 patch

OK, this patch actually works.	The changes are the following:

Have nsHttpConnection::OnHeadersAvailable check for 408 response.  If it finds
it, then close the socket transport with NS_ERROR_NET_RESET.  This causes the
connection to stop feeding data to the transaction (we don't want it to read
the response body included with the 408 response), and it causes the connection
to close the transaction with error NS_ERROR_NET_RESET.  That error code is
special as it tells the transaction to try to repeat the request on a new
connection.  Finally, we make OnHeadersAvailable set the 'reset' flag upon
return to instruct the transaction to reset its state in preparation for a new
response.  I added code to that section to clear the mSentData and
mReceivedData flags since the transaction wouldn't be repeated if those flags
are set.  The only other use for the transaction reset code is HTTPS proxy
(CONNECT method handling), and this shouldn't affect that in any negative way. 
It makes sense to say that the transaction has not received or sent data yet
since it will be receiving and sending data for a fresh transaction over a
fresh connection.
Comment 14 Ruud van Melick 2005-05-01 14:32:59 PDT
Patch v2 works for me. Thank you!
Comment 15 Darin Fisher 2005-05-01 14:53:09 PDT
So, what this patch doesn't do is it doesn't make the browser periodically check
the socket for a 408 response.  Instead, it just makes us recover well when we
encounter a 408 as the "next response" from the server when we try to issue a
request over a connection.  That's sufficient to satisfy the requirements of the
RFC I think.
Comment 16 Ruud van Melick 2005-05-01 16:59:04 PDT
Using this patch, does the browser first send the request, then check for a 408
response and (if the 408 response did occur) resend the request?

If this is true, then that's fine according to RFC2616 in section 8.1.4:

   "This means that clients, servers, and proxies MUST be able
    to recover from asynchronous close events. Client software
    SHOULD reopen the transport connection and retransmit the
    aborted sequence of requests without user interaction so
    long as the request sequence is idempotent (see section 9.1.2)."

But that section continues by stating:

   "Non-idempotent methods or sequences MUST NOT be automatically
    retried, although user agents MAY offer a human operator the
    choice of retrying the request(s)."

If my assumption about how the patch works is correct, then the patch violates
the above section of the RFC for POST requests (which are automatically retried
as well).
Comment 17 Darin Fisher 2005-05-01 18:27:43 PDT
> If this is true, then that's fine according to RFC2616 in section 8.1.4:

Section 8.1.4 does not mention the 408 error code.  In fact, Mozilla follows the
recommendations of that section (in particular w.r.t. non-idempotent requests)
when it comes to premature TCP closes and resets.  In the case of an explicit
408, however, I would argue that those guidelines are not applicable.  Why? 
Because the server has already told us that it was not interested in the request.

From section 10.4.9 "408 Request Timeout":

   The client did not produce a request within the time that the server
   was prepared to wait. The client MAY repeat the request without
   modifications at any later time.

That's all RFC 2616 has to say about the 408 response code.  It seems to support
the solution I've taken here.  We are repeating the request without modification
based on what this section says we may do.
Comment 18 Christian :Biesinger (don't email me, ping me on IRC) 2005-05-04 15:50:48 PDT
Comment on attachment 182294 [details] [diff] [review]
v2 patch

>So, what this patch doesn't do is it doesn't make the browser periodically check
>the socket for a 408 response.

wouldn't it be possible to poll the socket for readable data, even when we're
supposed to be idle? I suppose this approach here is simpler (although it sends
a request in cases where we could know that it can't succeed)
Comment 19 Boris Zbarsky [:bz] (still a bit busy) 2005-05-09 19:21:46 PDT
Comment on attachment 182294 [details] [diff] [review]
v2 patch

>Index: nsHttpConnection.cpp
>+    // trigger the transactions 'restart' mechanism.  We tell it 

transaction's

sr=bzbarsky.  Nice fix!
Comment 20 Mike Shaver (:shaver -- probably not reading bugmail closely) 2005-05-09 21:41:48 PDT
Comment on attachment 182294 [details] [diff] [review]
v2 patch

a=shaver for 1.8b3, and 1.8b2 if you want it there.  It sounds like our
behaviour in the face of a 408 is already a bit busted, and this looks
well-contained to those cases
Comment 21 Darin Fisher 2005-05-09 23:10:23 PDT
Here, I naively thought 1.1a was over (wasn't last Friday the plan?)... anyways,
fixed-on-trunk for 1.8b2 / 1.1a.
Comment 22 Syzop 2005-12-02 05:45:37 PST
Thanks guys! Good to see this fixed! ;)
I've always been going crazy when trying to wire some money using my (this) bank site, didn't realize it could be a FF bug ;p.

Confirmed to be fixed by going to the transaction screen, filling in info, waiting 2 minutes, and then continueing, worked fine.

Note You need to log in before you can comment on or make changes to this bug.