[Please clear the security flag if you don't think it is needed.] [Note that everything here applies to CKM_SSL3_MASTER_KEY_DERIVE as well.] In non-bypass mode, libssl derives a master key from the premaster secret using the PKCS#11 CKM_TLS_MASTER_KEY_DERIVE mechanism. Then it checks the version number that the mechanism returns, in order to detect version rollback attacks. If it detects a version rollback, it generates a new random PMS and then derives another master secret. So, when the version fields do not match, two master secrets are derived, but when the version fields do match, only one is derived. Since deriving the master secret takes a measurable amount of time, an attacker can time how long it takes for the server to respond to its ClientKeyExchange message, and use this information to determine if there was a version mismatch. Also, if version rollback detection hasn't been disabled, then libssl should treat a NULL version returned from PKCS#11 module as a mismatched version, instead of skipping the version check. Similarly, ssl3_GenerateRSAPMS may take a measurable amount of time on its own, so in the bypass case, libssl cannot wait until RSA decryption/unwrapping failed in order to call it; instead, it must always call ssl3_GenerateRSAPMS, regardless of whether the decryption/unwrapping succeeded. I believe the correct way of handling this is: 1. Always generate a random premaster secret with ssl3_GenerateRSAPMS. 2. Attempt to unwrap/decrypt the PMS. If that succeeds, throw away the random PMS. Otherwise, use the random PMS as the PMS. 3. Always generate a random master secret. 4. Derive the master key from the premaster secret. 5. If the version check is disabled or the versions match, throw away the random master key. Otherwise, throw away the derived master key and use the random master key in its place. Note that this analysis assumes that ssl3_GenerateRandom and CKM_TLS_MASTER_KEY_DERIVE take a measurable amount of time to execute. But, if they *don't* take a measurable amount of time then there is no performance penalty to doing things the way I described. http://tools.ietf.org/html/rfc5246#section-188.8.131.52 Klima, V., Pokorny, O., Rosa, T., "Attacking RSA-based Sessions in SSL/TLS", http://eprint.iacr.org/2003/052/, March 2003.
Generating an alternative PMS and MS for every session would be far too costly in terms of performance. However, it might be reasonable to pre-generate a pool of alternative Master Secrets to be used when version rollback is detected. Because no additional key generation/derivation would be needed *at the time when the rollback is discovered*, it would defeat this attack, and probably perform better than the present code.
Summary: Problems with RSA premaster secret version check → Attack against servers doing RSA version rollback check
1. A similar kind of timing attack can be done against the fallback from triple-bypass to double-bypass and against the fallback from bypass to non-bypass mode (that fallback is usually disabled). To prevent timing attacks like this, libssl needs to fully commit to a particular bypass mode before trying to decode the encoded PMS. If it commits to triple bypass then it only needs to generate a fake PMS. If it commits to double bypass then it only needs to generate a fake MS. I believe this will eliminate the need to generate one of the fake keys, because you'll only need a fake PMS *or* a fake MS, never both. Also, AFAICT, the random key has to be from the same PKCS#11 slot that would have decrypted/unwrapped/derived the PMS/MS, or else an attacker can analyze the difference in performance between the PKCS#11 implementations. 2. Is generating the fake keys actually slow? How does the performance of generating a random 48-byte key compare to the generation of a 16-byte IV for *each record* using PK11_GenerateRandom in TLS 1.1/1.2 for CBC ciphersuites? With the patches I've seen in the "implement TLS 1.1" bug, it won't be unusual for libssl to be using PK11_GenerateRandom to generate hundreds or thousands of bytes of random data per connection. So, 48 bytes more seems relatively minor. 3. During this kind of attack the server would likely be flooded with bad PMSs, so the pool size would effectively be reduced to zero and you'd be generating the fake keys synchronously with the requests. Consequently, I don't think there's anything that can be done to avoid generating at least random key per session. 4. I think I've found some problems with softoken's CKM_TLS_MASTER_SECRET_DERIVE when used in conjunction with other key derivation mechanisms and key wrapping/unwrapping, where it will leak all the bytes of a key to the calling application (not just the first two), even when the PMS is marked as sensitive, and even when the key is not a premaster secret (i.e. it looks like you can extract the master secret out of softoken even in FIPS mode). I haven't verified this yet but it looks very plausible. It could be that a new PKCS#11 interface for TLS is needed to be FIPS compliant, as I described in bug 577019. Obviously, that would have an impact on the resolution of this bug too. 5. Probably the simplest mitigation would be to avoid doing the version rollback check if the negotiated version is the highest the server supports. I haven't thought about this too much though.
Whiteboard: [sg:high] → [sg:high] server-side bug, clients like Firefox are OK
I do not think it is productive to define a PKCS#11 interface for TLS 1.2 without being aware of this issue; the current draft PKCS#11 interface will also be prone to such timing attacks with the same costly countermeasures. A new style of PKCS#11 interface can avoid the timing attacks with cheaper countermeasures. So, email@example.com and others working on the PKCS#11 interface for TLS 1.2 (bug 577019) should be added to the CC list of this bug.
One important difference between this attack and Bleichenbacher's attack of 10 years ago is that this is purely a timing attack, not an attack based on a difference in the way the server responds to the client. The timing variations are ENTIRELY due to the time taken to generate false pre-master secrets in the case where the so-called RSA-encrypted pre-master secret fails to decrypt to a correct value. Therefore, it MUST be the case that that generation takes measurable time, or else this timing attack would fail. I'm not sure why you suggest that a random master secret is needed. The time required to generate the master secret from the pre-master secret is constant regardless of which pre-master secret (real or "random") is used, so I see no TIMING benefit to generating a random master secret in addition to generating a random pre-master secret. I continue to believe that there are other ways to eliminate that measurable time cost of generating a false pre-master secret each time an RSA decryption produces an unacceptable PMS. Regarding the suggestion that the "pool size would effectively be reduced to zero", I see no reason the false PMSes cannot be reused. I'm not sure that more than one randomly generated PMS is needed in the lifetime of any one server process. You just need alternative bits to feed into the PMS-to-MS derivation, which (as stated above) is time constant. I proposed a pool just to add slightly more variance. I might go so far as to propose always using a single randomly generated byte to select one of the false PMSes from the pool. We've got lots of randomly generated bytes lying around. We could have the PRNG generate one more than is needed when it generates the server-random string, and use the extra for this purpose. That would obviate extra calls to the PRNG.
I am sorry I was unclear. There are multiple bugs relating to timing attacks in the server-side RSA key exchange. Above, I mentioned at least three different timing-related bugs that exist in the current code; there may be even more. For example, the PKCS#11 specification doesn't specify which error conditions may be reported *before* timing-critical operations are performed, and which error conditions may be reported *after* timing critical operations (the PRF in particular) are performed. It could very well be that it isn't possible to write portable (between PKCS#11 modules) code that is resistant to timing attacks. If you look at the code in pkcs11c.c, you can see that some error conditions which are detected/reported before the PRF is applied, and some error conditions are detected/reported after the PRF is applied. (In reply to comment #4) > One important difference between this attack and Bleichenbacher's attack of > 10 years ago is that this is purely a timing attack, not an attack based on > a difference in the way the server responds to the client. The timing > variations are ENTIRELY due to the time taken to generate false pre-master > secrets in the case where the so-called RSA-encrypted pre-master secret > fails to decrypt to a correct value. There are several causes of timing variations. Some are due to generating fake premaster secrets differing numbers of times. Others are caused by applying the PRF differing numbers of times. And, at least one is caused by using different PKCS#11 modules depending on the inputs given by the client/attacker. > I'm not sure why you suggest that a random master secret is needed. The time > required to generate the master secret from the pre-master secret is constant > regardless of which pre-master secret (real or "random") is used, so I see > no TIMING benefit to generating a random master secret in addition to > generating a random pre-master secret. Usually one master secret is derived. But, in the event of version rollback detection, two master secrets are derived (= one extra application of the PRF for 48 bytes out output = multiple applications of the iterated HMAC-SHA1/HMAC-MD5 and/or HMAC-SHA256/HMAC-384). > I continue to believe that there are other ways to eliminate that measurable > time cost of generating a false pre-master secret each time an RSA decryption > produces an unacceptable PMS. Regarding the suggestion that the "pool size > would effectively be reduced to zero", I see no reason the false PMSes cannot > be reused. I'm not sure that more than one randomly generated PMS is needed > in the lifetime of any one server process. This might be true. AFAICT, nobody has ever analyzed the effects of using a premaster secret more than once as the key for the PRF. But, I am pretty sure sure that such a strategy of reusing the same fake premaster secret will absolutely not work if the server supports Google's proposed Snap Start extension to TLS, because the server cannot rely on the randomness of the server random value for part of the handshake. Also, if you have a pool of fake premaster/master secrets, they must be loaded in the PKCS#11 module at all times, because it can take a significant amount of time to load them into the module. And, the way that pk11wrap multiplexes sessions on a slot (using state saving/restoration) and the way it distributes operations across slots all have to be carefully checked. That is one reason why, in bug 577019, I am advocating a new style of TLS mechanisms that are designed in a way where the PKCS#11 module can handle all the timing-critical operations. (The other reason is that the CKM_TLS_MASTER_KEY_DERIVE and CKM_SSL3_MASTER_KEY_DERIVE mechanisms can be used in conjunction with other mechanisms and operations to extract entire keys--even sensitive keys and even keys that are not related to SSL/TLS at all--out of the PKCS#11 module. AFAICT, this is a serious problem with softoken but it seems to be an inherent design flaw in the way the mechanism lets the application extract the two first bytes of any 48-byte secret key. I will try to file another bug with the details of this, when I have spare time.)
> in the event of version rollback detection, two master secrets are derived If true, that seems like a bug. Version rollback detection should occur immediately after recovery of the PMS, before the MS is derived from it. The MS should be derived only once, from the chosen (real or fake) PMS. (I'm ignoring bypass for this analysis)
Search for "Generate a faux master secret in the same slot as the old one."
I see. Even though the PMS derivation decrypts the version number used for the roll-back detection, that operation does not output the version number. Instead, it is output later as a side effect of the subsequent master-secret derivation. That's MOST unfortunate. It's an aspect of the PKCS#11 API design that I'd forgotten after not touching it for many years. The PKCS#11 "triple bypass" code has a clear advantage here, because it has access to the version number immediately after the PMS is derived, and need not attempt to derive the Master Secret before detecting a version roll-back.
I noticed that the check of the PKCS padding also had data-dependent timing. This patch is not production-ready, but it shows that it is possible to implement a new unwrapping mechanism (CKM_NSS_TLS_RSA_PREMASTER_SECRET) that checks the version number and the PKCS padding in constant time and that implements the error processing recommended in the TLS 1.2 specification . Using this mechanism instead of the normal CKM_RSA_PKCS mechanism and then using CKM_TLS_MASTER_KEY_DERIVE_DH instead of CKM_TLS_MASTER_KEY_DERIVE will prevent some of the timing attacks I mentioned above--in particular, the one I was originally reporting. I propose this mechanism (or a similar one) should be add to the PKCS#11 standard, and that CKM_TLS_MASTER_KEY_DERIVE & CKM_SSL3_MASTER_KEY_DERIVE should be deprecated. Now would be a good time to do that as new mechanisms need to be added to the standard anyway to support TLS 1.2; see Bug 577019. This is why I suggested only one kind of master key derivation in my patch in that bug. I also think that the current implementation of NSC_UnwrapKey should be changed in the following way: if the template includes CKA_VALUE_LEN, then constant-time check of the padding (similar to the one in this patch) should be done for all mechanisms. For PKCS#1 1.5 padding and similar padding mechanisms where the key length can be determined from the padding, the call should fail if the CKA_VALUE_LEN value doesn't match exactly the actual length of the wrapped key. The current way that NSC_UnwrapKey deals with these scenerios is unsafe, IMO. I think the PKCS#11 standard allows the behaviors I am suggesting for the existing mechanisms but I don't know what things would break. Regardless, even this safer implementation of the existing mechanisms would not be sufficient to implement the RFC5246-recommended semantics for key unwrapping. Feedback would be greatly appreciated.  http://tools.ietf.org/html/rfc5246#section-184.108.40.206
This version fixes a misleading comment and included my changes to pkcs11n.h. Just to be clear, this proposed mechanism would work for all TLS versions, not just TLS 1.2+ like the other ones added to pkcs11n.h.
Fixed obvious bug (forgot to set bsize) and clarified operator precedence.
Assigning to Brian since there seems to be progress here.
QA Contact: libraries → brian
Assignee: nobody → brian
QA Contact: brian → libraries
Dan, I don't think I'm the best person to do this now that I'm @ Mozilla as this is effectively dead code for our applications and apparently performance-critical to the server products that actually use it. (In the long run this code should be completely cut out of the build for the version of libssl that ships in Firefox and Thunderbird on platforms on which we don't use a system-provided NSS.) That said, I am happy to assist in any way, including providing feedback on the correctness of any patch.
Assignee: bsmith → nobody
Whiteboard: [sg:high] server-side bug, clients like Firefox are OK → [sg:nse] server-side bug, clients like Firefox are OK
Comment on attachment 471093 [details] [diff] [review] v2: Example of new constant-time RSA unwrapping mechanism for Softoken Clearing feedback request as I don't intend to work on this.
Summary: Attack against servers doing RSA version rollback check → Server side of TLS RSA key exchange is vulnerable to timing attacks
Ryan, please prepare a gallon of bleach for your eyes before attempting to read the patches I attached to this bug. But, according to the description they purport to at least partially resolve some of the timing issues. They are basically untested and may be completely wrong.
Brian: Eye bleach aside, it's very similar to a patch I have locally, which I will finish up 'soon' :)
Duplicate of this bug: 1166795
I'm opening this up since it's been almost 5 years since I reported it (before I was at Mozilla).
You need to log in before you can comment on or make changes to this bug.