38.36 KB, text/plain
56.32 KB, text/plain
4.93 KB, patch
|Details | Diff | Splinter Review|
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040624 Debian/1.7-2 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040624 Debian/1.7-2 What appears to happen is this: 1. GET https://bankieren.rabobank.nl/rib/ 2. 200 OK response 3. Persistent, keep-alive connection remains open 4. 90 seconds later, the server sends a 408 request timeout 5. POST https://bankieren.rabobank.nl/rib/qslad.htm (by pressing the 'ga verder' button) 6. Instead of actually doing a POST request, the 408 request body (received in step 5) is shown immediately. I've used sslsniff (http://www.thoughtcrime.org/ie.html) to check when the 408 response arrives (after 90 sec.) and to see if a POST request was sent (it isn't). Reproducible: Always Steps to Reproduce: 0. Make sure keep-alive is enabled in the preferences 1. Visit: https://bankieren.rabobank.nl/rib/ 2. Fill the input fields with numbers (normally accountnr + access code) 3. Wait 90 seconds (server side request timeout) 4. Press the "ga verder" button (= continue) Actual Results: An error message appears on an orange background: "Welkom bij de Rabobank. Om technische redenen is het door u gekozen onderdeel van de site momenteel niet bereikbaar." [translated: Welcome to the Rabobank. For technical reasons the section of the site you chose is momentarily unreachable.] Expected Results: That page is meant for logging in on your personal bank account information, so if the account-number/access-code combination entered was correct, you, the visitor, are logged in and see you account information etc. And of course if the user/pass combination entered is incorrect, you are once again asked to log in and an appropriate error code/message is shown at the top of the login page. Some questions arise. Bear in mind that I know nothing of Mozilla internals, so I'm just thinking aloud. Q: May the server send a 408 request timeout if it wishes to close the persistent keep-alive connection? A: RFC 2616, section 8.14 says: "When a client or server wishes to time-out it SHOULD issue a graceful close on the transport connection" . On the ietf-http-wg mailinglist I found two relevant messages that clarify the 'graceful close' part: "close the TCP connection, don't RESET it."  and "The phrase means that the agent should try to close its connection in a way that lets any packets in transit reach their destination."  I'm not sure if this implies that sending a "408 Request Timeout" response is not allowed for persistent, keep-alive connections. Q: Should a "408 Request Timeout" response be ignored by the client if all previous requests sent over that same persistent connection were answered and the only reason for keeping the connection alive is for possible future requests? A: I think such a response can (should?) be safely ignored by the client, because (i assume) the persistent connection is closed by the server and the client immediately after receiving the 408 response. Q: How can it be that a request (POST in this case, but I've seen it happen with GET requests also) is not sent, but instead an unexpected server response that was received earlier is used as a response to that request? A: ... ?  http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.4  http://lists.w3.org/Archives/Public/ietf-http-wg-old/1998JanApr/0475.html  http://lists.w3.org/Archives/Public/ietf-http-wg-old/1998JanApr/0614.html PS. I've set severity to 'Major' because the website that demonstrates this problem is the largest Internet banking website in Europe (if I'm informed correctly) and people experience this problem very often when using the default settings in Mozilla and Firefox.
Created attachment 151822 [details] mozilla nsHttp:5 logfile that demonstrates the problem While testing, I've used the default cache and keep-alive settings. Disabling all cache settings I could find (about:config, filter on 'cache') did not affect the result. I did switch off image loading, to keep the logfile as short as possible. Something I forgot to mention: after pressing the 'ga verder' button on the website, which should issue a POST request, the 408 response is shown, BUT... in the address bar, the correct URL is shown (for the POST request): https://bankieren.rabobank.nl/rib/qslad.htm
>Q: How can it be that a request (POST in this case, but I've seen it happen with >GET requests also) is not sent that's trivially explained. mozilla does not read from the socket when it does not expect any data. then, it sends the request. only then does it start reading, and see the server's 408 response. from : > i) "When a client or server wishes to time-out, it SHOULD issue a > graceful close on the transport connection". > > Does this imply some sort of action at the http level? > That is, should a 4xx (or 5xx) response be sent? > >No, this is about the transport connection. Since some people believe >that other transport protocols besides TCP might be used, there was >some pressure to avoid specific discussions about TCP here. that seems to be pretty clear. >the 408 response is shown, BUT... >in the address bar, the correct URL is shown (for the POST request): yes. mozilla thinks that the server responded with 408 to the POST request. probably mozilla should follow the SHOULD clause of rfc 2616: > Clients and servers SHOULD both > constantly watch for the other side of the transport close, and > respond to it as appropriate.
Contrary to what I first thought happened, it looks like the POST request *is* sent. I can't be sure, because of the SSL-encryption; I'm just looking at the TCP flags and packet sizes: tcpdump shows: 0 seconds: TCP connection is established and the page is loaded 90 seconds: Server sends 408 response, ends with a packet where the FIN flag is set, client sents ACK, so the connection is now in CLOSE_WAIT state (confirmed by checking with netstat). 120 seconds: I submit the form, client sends a 897 byte packet, followed by a 263 byte packet with FIN flag set. Server responds with ACK. The last 2 packets the client (mozilla) sends possibly contain the POST request. I do wonder why these are sent using a connection that is already in CLOSE_WAIT state for 30sec? And also: why did the client did not send a FIN in response immediately after receiving a FIN from the server? I also looked at what happened if I didn't submit the form after the 408 request was received: after a bit more than 300 seconds, mozilla closes the connection with a RST packet. Those 300 seconds are the value set for network.http.keep-alive.timeout, so I tried setting that to 60 seconds, which makes the mozilla timeout the connection before the server does. When the timeout occurs, mozilla sends a FIN packet, server reacts with ACK/FIN, mozilla sends ACK, so the connection is immediately closed. I expected the same thing to happen in situations where it is the server who sends the first FIN packet. I would like to emphasize that although the test case used here ends with a POST request, the exact same thing would have happened on a GET request (and probably other request types), but I can't find an easy test case for that situation.
RFC 2616 is very limited in what it has to say about a 408 response code: 10.4.9 408 Request Timeout The client did not produce a request within the time that the server was prepared to wait. The client MAY repeat the request without modifications at any later time. It does not say that the client MUST repeat the request, and it is not very specific about when a server might send a 408 response. We have code that re-sends requests on new connections when certain errors occur. Perhaps we should just add 408 as another trigger for such cases.
(In reply to comment #4) > We have code that re-sends requests on new connections when certain errors > occur. Perhaps we should just add 408 as another trigger for such cases. In this case the 408 error occurs *before* sending the request for the first time. If mozilla can spot the error early enough and can react by opening a new connection, re-sending is not needed. This is especially important for POST requests: "Non-idempotent methods or sequences MUST NOT be automatically retried, although user agents MAY offer a human operator the choice of retrying the request(s)." -- RFC 2616, section 8.1.4 I think Christian (comment #2) is right when suggesting mozilla follows: "Clients and servers SHOULD both constantly watch for the other side of the transport close, and respond to it as appropriate." -- RFC 2616, section 8.1.4 Is is possible to first check if the server closed the connection (by sending a 408 or otherwise) before sending the next request? As I understand it, when using keep-alive without pipelining, a new request is not sent over the persistent connection until the response to the previous request has been received. If, after receiving the response, something else follows, that's probably the server sending an error message and/or closing the connection, right? When pipelining is enabled, the previous paragraph should still apply to at least POST requests (and other non-idempotent methods): "Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methods (see section 9.1.2). [...] A client wishing to send a non-idempotent request SHOULD wait to send that request until it has received the response status for the previous request. -- RFC 2616, section 22.214.171.124 I assume that errors (i.e. server closing the connection) occurring while the persistent connection is actively used are already detected soon enough by mozilla. That leaves errors that occur when a persistent connection is temporarily *not* actively used (active = busy sending requests, waiting for or receiving requested responses). SUMMARY: if all previous requests (pipelined or not) on a persistent connection have been completely responded to, then before sending a new request over that same connection, mozilla should first check if the server closed the connection and open a new connection if necessary. Re-sending a request should only necessary if the 408 error arrives while the request is being sent (or in response to a request), but not when the 408 is received before the request is first sent.
The challenge is that we do not read data from the sockets until we have sent a request. Idle sockets are monitored to see if they have closed, but we do not monitor them to see if the server has sent us a 408. Fixing this requires some non-trivial changes IMO. That doesn't mean we shouldn't do it, but it means that it will take a fair amount of work to implement. I agree that we should not solve this by re-issuing requests.
(In reply to comment #6) > The challenge is that we do not read data from the sockets until we have sent a > request. Idle sockets are monitored to see if they have closed, but we do not > monitor them to see if the server has sent us a 408. Fixing this requires some > non-trivial changes IMO. hm... shouldn't it be possible to notice that an idle socket has data available to be read (PR_POLL_READ flag for PR_Poll), and if so, assume some unexpected thing happened and close the socket? also, maybe the http code should check if unexpected data is left in the socket before issuing a new request on it?
*** Bug 243495 has been marked as a duplicate of this bug. ***
Created attachment 182236 [details] [diff] [review] v1 patch This is a prototype patch. I could really use some help testing this.
Created attachment 182270 [details] nsHttp:5,nsSocketTransport:5 logfile for first patch giving segfault I applied the patch to Firefox 1.0.3 in Debian (apt-src install mozilla-firefox; [apply patch]; apt-src build mozilla-firefox) and it gave a segmentation fault when clicking a link/form-button on a webpage after the 408 response is received. Logfile attached. Clicking earlier (before 408 timeout is ever sent) still works fine.
Thanks for giving this patch a try. Investigating...
I think the problem is that the HTTP connection object is not getting closed.
Created attachment 182294 [details] [diff] [review] v2 patch OK, this patch actually works. The changes are the following: Have nsHttpConnection::OnHeadersAvailable check for 408 response. If it finds it, then close the socket transport with NS_ERROR_NET_RESET. This causes the connection to stop feeding data to the transaction (we don't want it to read the response body included with the 408 response), and it causes the connection to close the transaction with error NS_ERROR_NET_RESET. That error code is special as it tells the transaction to try to repeat the request on a new connection. Finally, we make OnHeadersAvailable set the 'reset' flag upon return to instruct the transaction to reset its state in preparation for a new response. I added code to that section to clear the mSentData and mReceivedData flags since the transaction wouldn't be repeated if those flags are set. The only other use for the transaction reset code is HTTPS proxy (CONNECT method handling), and this shouldn't affect that in any negative way. It makes sense to say that the transaction has not received or sent data yet since it will be receiving and sending data for a fresh transaction over a fresh connection.
Patch v2 works for me. Thank you!
So, what this patch doesn't do is it doesn't make the browser periodically check the socket for a 408 response. Instead, it just makes us recover well when we encounter a 408 as the "next response" from the server when we try to issue a request over a connection. That's sufficient to satisfy the requirements of the RFC I think.
Using this patch, does the browser first send the request, then check for a 408 response and (if the 408 response did occur) resend the request? If this is true, then that's fine according to RFC2616 in section 8.1.4: "This means that clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction so long as the request sequence is idempotent (see section 9.1.2)." But that section continues by stating: "Non-idempotent methods or sequences MUST NOT be automatically retried, although user agents MAY offer a human operator the choice of retrying the request(s)." If my assumption about how the patch works is correct, then the patch violates the above section of the RFC for POST requests (which are automatically retried as well).
> If this is true, then that's fine according to RFC2616 in section 8.1.4: Section 8.1.4 does not mention the 408 error code. In fact, Mozilla follows the recommendations of that section (in particular w.r.t. non-idempotent requests) when it comes to premature TCP closes and resets. In the case of an explicit 408, however, I would argue that those guidelines are not applicable. Why? Because the server has already told us that it was not interested in the request. From section 10.4.9 "408 Request Timeout": The client did not produce a request within the time that the server was prepared to wait. The client MAY repeat the request without modifications at any later time. That's all RFC 2616 has to say about the 408 response code. It seems to support the solution I've taken here. We are repeating the request without modification based on what this section says we may do.
Comment on attachment 182294 [details] [diff] [review] v2 patch >So, what this patch doesn't do is it doesn't make the browser periodically check >the socket for a 408 response. wouldn't it be possible to poll the socket for readable data, even when we're supposed to be idle? I suppose this approach here is simpler (although it sends a request in cases where we could know that it can't succeed)
Comment on attachment 182294 [details] [diff] [review] v2 patch >Index: nsHttpConnection.cpp >+ // trigger the transactions 'restart' mechanism. We tell it transaction's sr=bzbarsky. Nice fix!
Comment on attachment 182294 [details] [diff] [review] v2 patch a=shaver for 1.8b3, and 1.8b2 if you want it there. It sounds like our behaviour in the face of a 408 is already a bit busted, and this looks well-contained to those cases
Here, I naively thought 1.1a was over (wasn't last Friday the plan?)... anyways, fixed-on-trunk for 1.8b2 / 1.1a.
Thanks guys! Good to see this fixed! ;) I've always been going crazy when trying to wire some money using my (this) bank site, didn't realize it could be a FF bug ;p. Confirmed to be fixed by going to the transaction screen, filling in info, waiting 2 minutes, and then continueing, worked fine.