Closed Bug 365898 Opened 18 years ago Closed 18 years ago

SSL handshake timeout is too short

Categories

(Core :: Security: PSM, defect)

1.8 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: litzung.chen, Assigned: KaiE)

References

Details

(Keywords: verified1.8.1.4)

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Avant Browser; Avant Browser) Build Identifier: FireFox 2.0 On some lower CPU power embedded systems with SSL web server. If the server uses longer RSA key (e.q 1024 bit), during SSL handshake key exchange it will take a longer time to make decryption. If the decryption time takes too long (about more than 7 seconds) or the network traffic was too busy. FF 2.0 will not connect to this web server due to timeout. Is it possible to extend the timeout value? Reproducible: Always Steps to Reproduce: 1. 2. 3. Extend the SSL handshake timeout.
Version: unspecified → 2.0 Branch
I am also interested in a fix for this issue. Is there a certain value that any spec dictates to? I would like a value not less than 15 seconds, to support slow CPU embedded web servers. Thanks, Babu
I have located the problem starting Firefox 2.0 RC1 in the file "mozilla\security\manager\ssl\src\nsNSSIOLayer.cpp". The timeout is hardcoded as 8 seconds. We have confirmed this timeout from our Ethreal traces. I would like to take ownership of this bug and fix it. I am new to Mozilla development so if some one else can make that single line change, that would be great. I recommend changing the timeout from 8 to 30 seconds. I could not find any spec enforced values in SSL specs, but 30seconds is a reasonable value for my issue here. Other browser such as IE, FF 1.5, Opera etc do not enforce such timeout (from our Ethreal traces). The existing code snippet in nsNSSIOLayer.cpp is #define HANDSHAKE_TIMEOUT_SECONDS 8 PRBool nsNSSSocketInfo::HandshakeTimeout() { if (!mHandshakeInProgress) return PR_FALSE; return ((PRIntervalTime)(PR_IntervalNow() - mHandshakeStartTime) > PR_SecondsToInterval(HANDSHAKE_TIMEOUT_SECONDS)); }
Assignee: nobody → kengert
Component: Security → Security: PSM
Product: Firefox → Core
QA Contact: firefox
Version: 2.0 Branch → 1.8 Branch
We had introduced this timeout recently, with bug 340359, when we enountered a new class of broken web servers, that do not respond at all, when we connect to them with our latest security protocols. I'm ok to increase the timeout. Maybe it will also cause problems over slower networks. Nelson, Bob, are you ok with a timeout of 30 seconds?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Yes, from the Ethreal traces we have, 30 seconds should work with our web server. Thanks, Babu
I think 30 seconds timeout will be suitable to support slow CPU embedded web servers. Are Nelson and Bob agree with a timeout of 30 seconds? If it is OK, I would like to know when the fix will be availiable? Thanks.
I think 30 seconds will be painfully long for users who experience the more common case of a TLS-intolerant server that hangs upon seeing a TLS client hello. I'm reluctant to make the user experience much worse for the majority for the benefit of the minority, in this case. Maybe we should have a pref for the timeout time, so that users of these unusually slow servers can adjust their browsers accordingly.
Making it as a settable preference is a good idea. However, I think we should increase the "default" timeout to may be atleast 20 seconds. Any one knows if IE7 or Opera use such a timeout, because these browsers seem not to have any problem with our web servers. Actually, we have a big number of these servers out in the field. I dont have the exact number but it is several thousands. In Windows world we could live with IE7 but Firefox is the only supported browser for Linux clients. So it will be very useful if you can fix this issue in the next Firefox release. If this fix can be rolled out into a nightly build we can test those web servers which have problem with TLS. I can volunteer testing a few. Thanks, Babu
(In reply to comment #6) > I think 30 seconds will be painfully long for users who experience the more > common case of a TLS-intolerant server that hangs upon seeing a TLS client > hello. I'm reluctant to make the user experience much worse for the > majority for the benefit of the minority, in this case. Are we sure it is common? Might the slowness be actually more common for people who try to connect over a slow wire? Might Firefox actually be switching back to the older security on such connections? This is a new question that came to my mind, when thinking about this. If the answer is yes, I'm in favor of increasing the timeout. > Maybe we should > have a pref for the timeout time, so that users of these unusually slow > servers can adjust their browsers accordingly. I suppose Babu is looking for a fix earlier than Firefox 3. Not sure we'll be allowed to introduce a new pref on the stable branch for Firefox 2.0.0.x releases. Alternative idea: Would you be able to change your web server implementation? Does the SSL protocol allow for some "i need more time" response from the server? The server could then send an intermediate response before starting the RSA calulation, this would prevent from detecting an initial timeout. If you can manage to have the server send out at least one single byte over the wire, prior to your calculation, you will be fine.
(In reply to comment #7) > Making it as a settable preference is a good idea. However, I think we should > increase the "default" timeout to may be atleast 20 seconds. Any one knows if > IE7 or Opera use such a timeout, because these browsers seem not to have any > problem with our web servers. The new class of broken servers that we encountered in bug 340359 is related to our changes to use "TLS hello extensions". I do not know whether IE7 or Opera are making use of that. If they do not, they did not yet have the requirement to work around it. But I expect, sooner or later they will implement the extensions and run into the same problem. > If this fix can be rolled out into a nightly build we can test those web > servers which have problem with TLS. I can volunteer testing a few. Once we have produced and submitted a fix, it will appear in nightly builds first. Watch this bug for a "resolved fixed" change.
It appears, on Windows XP SP 2, IE 7 sends out a V2 hello message. Opera 9.02 uses a V3 hello message, but it does not include TLS extensions.
I made a private Firefox build based on 2.0.0.1 source with 30 seconds timeout. It fixed my issue. So far I dont see any problem after testing with several web sites. I will keep you all posted if I see any issues. Thanks, Babu
Kai, Do you know that it does on VISTA? IE 7 on Windows XP does not have the TLS extension support (that crypto code is in the OS, not in the browser). Vista does use hello extensions, so if a site has problems with Firefox TLS, extensions, it is likely to have problems when Vista is released as well. So the real question here is what is more common, TLS intolerance the results in a hang (rather than a rejected connection) or servers that are slow in decrypting an RSA key. I think the former is primarily SSL2 servers. The latter (for any real key sizes used), should also be pretty rare. A standard PC back in 1990 could do 3 RSA ops/sec (1024). Most modern boxes can do hundreds of ops/sec. Even a modern smart card can do a 1024 bit RSA op in about a second. Seven seconds for a 1024 bit RSA op doesn't sound like a commercially viable box to me. Anyway if the TLS intolerant hang rate is extremely low, then an increased timeout is may be in order, especially if the timeout results in no connection (rather than a retry as SSL3 without extensions). bob
If the embedded system hardware is poor with slower 40 MHz CPU and 1 MB SRAM, more than 20 complex threads are runing simultaneously(include HTTPS server). The RSA decryption(1024) could be slowly. This might be a special case. But I believe there are some commercial embedded systems faced the same issue.
(In reply to comment #12) > Kai, Do you know that it does on VISTA? IE 7 on Windows XP does not have the > TLS extension support (that crypto code is in the OS, not in the browser). > > Vista does use hello extensions, so if a site has problems with Firefox TLS, > extensions, it is likely to have problems when Vista is released as well. Ok, I just installed the final version of Vista in a virtual machine, and used IE7 and ssltap to inspect a connection to https://cfspart.impots.gouv.fr/ I confirm that IE7 does send TLS hello extensions with a TLS client hello version 3.1. The connection stalls for 2 minutes. Not sure who closes the connection after that time. Then IE7 opens a new connection with a client hello version 3.0, no TLS hello extensions. This connection succeeeds. But IE7 is not very smart, it does not remember the server behaviour, I get the 2-minutes delay repeatedly. > Anyway if the TLS intolerant hang rate is extremely low, then an increased > timeout is may be in order, especially if the timeout results in no connection > (rather than a retry as SSL3 without extensions). Sorry, I don't understand. After the timeout we retry with TLS disabled, SSL 3 enabled, using a v2 compatible hello. How can the timeout "result in no connection"? I think it is likely that we get a connection and response on second attempt (after the timeout).
Hi Kai, Will you be able to put this fix in Firefox 2.0.0.2 release? We wanted to message to our customers how long they will have to use Firefox 1.5, before going to FF2.0, to support the products which are affected by this issue. I have tested both Linux and Windows builds of Firefox 2.0 with timeout change to 30 seconds and did not find any problems so far. I tested it with some bank sites and other secure sites as well and no issues. Thanks, Babu
(In reply to comment #8) > (In reply to comment #6) > > I think 30 seconds will be painfully long for users who experience the more > > common case of a TLS-intolerant server that hangs upon seeing a TLS client > > hello. I'm reluctant to make the user experience much worse for the > > majority for the benefit of the minority, in this case. > Are we sure it is common? > Might the slowness be actually more common for people who try to connect over a > slow wire? Might Firefox actually be switching back to the older security on > such connections? This is a new question that came to my mind, when thinking > about this. If the answer is yes, I'm in favor of increasing the timeout. > > Maybe we should > > have a pref for the timeout time, so that users of these unusually slow > > servers can adjust their browsers accordingly. > I suppose Babu is looking for a fix earlier than Firefox 3. Not sure we'll be > allowed to introduce a new pref on the stable branch for Firefox 2.0.0.x > releases. > Alternative idea: > Would you be able to change your web server implementation? > Does the SSL protocol allow for some "i need more time" response from the > server? I had checked SSL protocol spec and couldn't find such kind of alert or message format. Does any body know about this? > The server could then send an intermediate response before starting the RSA > calulation, this would prevent from detecting an initial timeout. > If you can manage to have the server send out at least one single byte over the > wire, prior to your calculation, you will be fine. I tried to send a packet(e.g. server hello) before starting the RSA decription. But the FireFox popup an alert and says it got an incorrect or unexpected message(Error code: -12245). The SSL handshake processing was stopped. At this moment, FireFox will not accept other packets except "Change Cipher Spec".
I was waiting for Bob Relyea to make a comment, as I had not had fully understood his recommendation, see comment 14 part 2. We are probably too late for 2.0.0.2, we can try to get this into 2.0.0.3.
Babu, Litzung, I have a question. Are your slow server using invalid certificates? That is, when you connect to your server, does Firefox bring up any "cert warning" dialogs? (If one of your servers is reachable on the public internet, it would be great if you could send me hostname/port - thanks)
I think when working on this, we should not simply increase the timeout. I propose we do an additional change. The timeout is only necessary on the initial attempt - while we hope the server might be able to understand TLS. As soon as we have concluded the server is TLS intolerant, we no longer need to timeout and retry. This only-timeout-until-we-assume-TLS-intolerance would actually be sufficient as a fix for your slow servers, assuming they work with the older handshake protocols. But it might still be a good idea to extend the timeout, to allow to use most recent crypto protocols as often as possible.
Thanks Kai. Do you know when 2.0.0.3 will be released? Is 2.0.0.2 already code frozen? If you can make the default timeout to at least 20 seconds that would fix my issue, even though I prefer 30sec timeout, to allow simultaneous client login. That being a 1 line change, you could probably put in 2.0.0.2 and the preference setting can be implemented in 2.0.0.3. I just noticed your new mail. The certificate shows a warning but it is "host mismatch" warning. I tried permanently installing the certificate into the browser and the issue is still there. It tries to connect for several minutes after accepting the certificate. About public access, my server is within the company proxy and so it is difficult to put outside. Litzung, will you be able to put the server in a public IP address? Thanks, Babu
(In reply to comment #20) > Do you know when 2.0.0.3 will be released? No > Is 2.0.0.2 already code frozen? Not yet, but we don't have a patch yet, we don't have agreements, and we don't have code reviews yet. > If you can make the default timeout to at least > 20 seconds that would fix my issue, even though I prefer 30sec timeout, to > allow simultaneous client login. That being a 1 line change, you could probably > put in 2.0.0.2 and the preference setting can be implemented in 2.0.0.3. > I just noticed your new mail. The certificate shows a warning but it is "host > mismatch" warning. I tried permanently installing the certificate into the > browser and the issue is still there. Ok, the host mismatch warning is always shown, even with the cert installed and trusted. > It tries to connect for several minutes after accepting the certificate. This is new information. But based on my comment 19 I know understand what happens. We retry each time with that shorter timeout! We really should not, but only use a timeout for the initial attempt.
Yes, that is true. I just cleared my certificate cache so the browser can keep asking me about "Domain Name Mismatch". With that change, I get the domain name mismatch exactly every 13 seconds. And then it keeps trying to connect on every "ok" select. The time it takes is exactly 13 seconds.
I assume that you are quick in dismissing the "domain name mismatch" dialog. I assume it does not have a big impact on the overall timeout. Given the fact that it takes 13 seconds for you to get the dialog again, I assume our timeout value is counting with no dialog shown, most of the time. I'm mentioning this because of Nelson's claim in bug 368126. While he is right, the timeout-still-counting-while-warning-shown is a problem for some users, it is not a problem for you. I'm no longer in favor of a pref for the timeout value. In my opinion, a total timeout time of 30 seconds should be ok, and I propose to use a value of "25" in the code. I think such servers are rare. And they are behaving badly. We should not disable TLS that quickly as we currently do. And given that other browser vendors don't even handle that at all, I believe we are find. I have produced a patch for both issues and attached it to bug 368126.
Yes, I started the stop watch after clicking ok on the "domain mismatch" dialog. Thanks for the patch. I will watch for the nightly build with this fix and then test it with my embedded web server. Can this fix get in 2.0.0.2? - Babu
Depends on: 368126
Here are thoughts on 25 seconds. I think that every FF user has encountered one or more TLS-intolerant https server at some time. Many (perhaps thousands) FF users encounter one daily. Today, all those users simply perceive that the TLS-intolerant server is just slow to get started, taking ~8 seconds longer than most other https servers. The TLS intolerance workaround works well enough today that most users have no idea that it even exists. With this proposed change to 25 seconds, I predict that the users will stop perceiving that their TLS-intolerant servers are merely slow starting, and will instead perceive that they no longer work. I think most users are likely to give up before 25 seconds elapse. If a large percentage of the users who today benefit from the TLS-intolerance workaround change their behavior, and stop/abandon attempts to use their TLS-intolerant servers before 25 seconds elapse, then the TLS-intolerance workaround will have become completely ineffective for them. I think that the most likely result of this proposed change is that a large number of FF users have worse/degraded perception of FF as a result, and will complain "before FF version x.x.x.x, I could reach my merchant's site, but not I can't". It would be FINE if that resulted in loss of traffic to TLS-intolerant servers, and consequently pressured those server owners to get better servers, but I am afraid it will simply drive FF users away from FF instead. This is a FF usability issue, and IMO this decision should be undertaken with SOME input from FF's usability folks. Having said that, I will leave it others' hands to make the decision.
Nelson, my understanding is: Most TLS intolerant servers respond in some way immediately, and we run into one of the error SSL error codes. My understanding is, most people don't even notice there is a delay. Because, whenever libSSL returns an error code that is clearly in the TLS-intolerance list, we *immediately* retry. No delay. So, we are not increasing the delay for the majority of other TLS intolerance sites. We only increase the delay for those new class of "server stalls forever" problem that was first reported in this bug, and which I hope is rare. Also note that according to my tests, other browsers don't even retry at all, but wait forever.
Nelson, in order to support my arguments a bit more. In the past, there was no time out at all. Not at the PSM level. Only if one the of following errors were given, an automatic retry was attempted - without any delay: static PRBool isTLSIntoleranceError(PRInt32 err, PRBool withInitialCleartext) { switch (err) { case PR_CONNECT_RESET_ERROR: if (!withInitialCleartext) return PR_TRUE; break; case PR_END_OF_FILE_ERROR: case SSL_ERROR_BAD_MAC_ALERT: case SSL_ERROR_BAD_MAC_READ: case SSL_ERROR_HANDSHAKE_FAILURE_ALERT: case SSL_ERROR_HANDSHAKE_UNEXPECTED_ALERT: case SSL_ERROR_CLIENT_KEY_EXCHANGE_FAILURE: case SSL_ERROR_ILLEGAL_PARAMETER_ALERT: case SSL_ERROR_NO_CYPHER_OVERLAP: case SSL_ERROR_BAD_SERVER: case SSL_ERROR_BAD_BLOCK_PADDING: case SSL_ERROR_UNSUPPORTED_VERSION: case SSL_ERROR_PROTOCOL_VERSION_ALERT: case SSL_ERROR_RX_MALFORMED_FINISHED: case SSL_ERROR_BAD_HANDSHAKE_HASH_VALUE: case SSL_ERROR_DECODE_ERROR_ALERT: case SSL_ERROR_RX_UNKNOWN_ALERT: return PR_TRUE; } return PR_FALSE; } As an example, we had introduced + case SSL_ERROR_HANDSHAKE_UNEXPECTED_ALERT: + case SSL_ERROR_CLIENT_KEY_EXCHANGE_FAILURE: with bug 335859. When I go to https://bugzilla.mozilla.org/show_bug.cgi?id=335859 and trace PSM's behaviour, I can still see we run into the TLS intolerance detection. But there is no delay. It doesn't make a difference whether the code uses a handshake timeout of 8 or seconds, because this is not reached.
In the previous comment 27 my intention was to point you to https://login.pizzapizza.ca/login.html as an example TLS intolerant server, where the timeout value does not make any difference.
I perform a large amount of Work over SSL. I frequently receive an error response "Error establishing an encrypted connect to.....Error Code - 8187" This error has nothing to do with the site I am browsing. I DO have all required certificates, in the event I am required to store the CA root certificates. Many sites that have their total content over SSL or transfer from HTTP to HTTPS traffic for a secure content will almost invariable deliver this message, however the session will untimately be established after the above warnings in abundance.
This bug has been fixed on trunk and 1.8 branch, fixed1.8.1.4, and is expected to be in Firefox 2.0.0.4
Status: NEW → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1.4
Resolution: --- → FIXED
Babu or Litzung: Can you please test this again with the latest 1.8 nightly build (http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla1.8/) to verify that it is fixed? If things look good, please change the fixed1.8.1.4 keyword to verified1.8.1.4. Thanks!
It appears the issue has been resolved for me via updates, however I feel the original requirement for the inclusion of the time_out value is invalid and requires re-assessment not modification of value. RE:#3. Nominal redundancy should be applicable. LS-intolerance list, we *immediately* retry. No delay. In presence of NoACK or RTS response we display server not responding and drop packet. In presence of returned error we *immediately* retry. Basically we are going to get either a valid response, or No response or error. IF there is an error we retry etc. We cannot just say your taking too long, for who of us is to say what IS too long. At some stage we ARE going to get NoACK or RTS and then we drop to error response. If the user gets fed up waiting let them terminate the request!
Thanks every one. I tested with 2.0.0.4 from at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2007-04-18-04-mozilla1.8/firefox-2.0.0.4pre.en-US.win32.installer.exe and the embedded web server that we used to have problem is now fixed. Let me know when the final release of 2.0.0.4 will be avaialable. - Babu
v.fixed on 1.8 branch based Babu's comment #33.
Not sure how open to reopened a bug, but _please_ reopen this bug. The timeout in itself is a bug. It does not matter whether it is set to 8 seconds or 25 seconds. During an SSL/TLS connection, Firefox may need to access an external PKCS#11 device (smartcard, USB key, biometric authentication device, etc, etc...). By setting a timeout on the WHOLE handshake, you severely restrict the use of these devices. If you really want to keep the timeout to avoid issues with very old servers, what needs to be done, IMHO, is to set this timeout on the server reply, not on the whole handshake. Or at least, to stop the timeout when interacting with the PKCS#11 device. If this is too complicated to be done fast, please at least set the timeout to one minute. I believe that even a device which would do biometric authentication coupled with a password can manage in this time lapse, and that one can justify to an end user that it has to complete the process in one minute. Regards, -- Julien
Agree with comment #35. This is a comms issues and provided the browser does no receive a NOACK it should continue just as long as comms dictate - There is no arbitrary time out value. and I also have no clue how to re-open but request it so.
Sorry, guys. There are many flawed SSL servers out there that accept the connection, accept the client's initial SSL "hello" message, and then hang. They keep the connection open, but they never respond, and never finish the handshake. They are so common that they were considered a major source of browser user dissatisfaction. So, the handshake timeouts are here to stay.
Nelson, I'm fine with the handshake timeout staying, but it is not at the right place. It introduces a bug as major as the one it fixes: e.g. it mostly prevents the usage of smartcards. I've taken a quick look at the code, and saw that there was already a "hack" to reset the timeout when the bad cert dialog was shown. That's good, but we also need to reset this timeout for the "choose cert" UI and after the dialog with the PKCS#11 token. I insist that is a really practical bug. For those of you using smartcard authentication. It is easy to test: if you take more than 8 (or 25 in 2.0.0.4) seconds to 1) Choose the right certificate from the dialog 2) Grab your smartcard 3) Insert it in the reader 4) Input your PIN Then, the TLS connection will fail again and again and again...
You need to log in before you can comment on or make changes to this bug.