Closed Bug 1278434 Opened 3 years ago Closed 3 years ago

SSL_ERROR_SESSION_KEY_GEN_FAILURE occurs if 256-bit AES-GCM is enabled and the server supports TLS Extended Master Secret extension

Categories

(NSS :: Libraries, defect, major)

defect
Not set
major

Tracking

(firefox49+ fixed, firefox50 fixed)

VERIFIED FIXED
Tracking Status
firefox49 + fixed
firefox50 --- fixed

People

(Reporter: masayuki, Assigned: ttaubert)

References

Details

(Keywords: regression)

Attachments

(2 files)

When I access HTTPS contents, I see a SSL_ERROR_SESSION_KEY_GEN_FAILURE error in the latest Nightly. And I see a lot of broken images in HTTPS contents. I guess that this is same bug.

Windows (x86) build in https://ftp.mozilla.org/pub/firefox/nightly/2016/06/2016-06-05-03-02-15-mozilla-central/ doesn't have this bug but after https://ftp.mozilla.org/pub/firefox/nightly/2016/06/2016-06-06-03-02-19-mozilla-central/ builds are broken.

I've tested with 2 Win10 environment, one is release build, the other is build 14342.
E.g., I see this error in a popup which is opened by clicking "Share this page" button of Firefox.
Can you reproduce this with AES-256-GCM cipher suites disabled?
(Flip prefs starting with "security.ssl3." and including "aes_256_gcm" to disable AES-256-GCM cipher suites.)
Flags: needinfo?(masayuki)
I've tested on twitter.com. Then, disabling "security.ssl3.ecdhe_rsa_aes_256_gcm_sha384" fixes the issue.
Flags: needinfo?(masayuki)
Component: Networking → Security: PSM
The direct regression bug would be bug 975832, but it only turned on NSS features.
I guess NSS implements AES-256-GCM wrongly.
Needinfo'ing people who worked on bug 923089.
Blocks: 975832, 923089
Flags: needinfo?(martin.thomson)
Flags: needinfo?(kaie)
Flags: needinfo?(emaldona)
[Tracking Requested - why for this release]:
Firefox 49 has this bug. NSS 3.25 should have the fix eventually.
The lesson here is that we need to do some interoperability testing before turning a feature on.  We should back out bug 1277255 until we understand what the problem is.  This should be relatively easy to work out.
Flags: needinfo?(martin.thomson)
How can we do the interoperability test without enabling the feature? I'd rather flip the prefinstead of backing out the entire change just like TLS 1.3.
Talk to mwobensmith about what he does for testing.  And yes, flipping the default on the pref would be OK.
Comment #9.
Flags: needinfo?(mwobensmith)
After testing it seems like security.ssl3.ecdhe_rsa_aes_256_gcm_sha384 works fine in the latest nightly on Mac, Win7, and Windows Server 2012. I can't speak for Win10 though.
Hmm, that means that mwobensmith's tests wouldn't have picked this up: his run on a mac typically.  I will try myself with win10 when I get home and report.
(In reply to Martin Thomson [:mt:] from comment #7)
> The lesson here is that we need to do some interoperability testing before
> turning a feature on.  We should back out bug 1277255 until we understand
> what the problem is.  This should be relatively easy to work out.

When working on bug 923089 Kai did test interoperability with GnuTLS (both as client and server) with most ciphers, ecdhe_rsa_aes_256_gcm_sha384 included. That being said, I didn't have the time yet to do extensive tests. I do plan to finish them today/tomorrow.
If this is platform specific, then we wouldn't have caught it.  Specifically, if it is x86 assembler on Windows, then we have a problem.  I have a win10 machine, and downloaded the x86 build, but can't reproduce the problem, probably because I don't know what sites would accept 256-bit AES-GCM.  Twitter picks 128-bit for me.
Thanks to Franziskus who pointed out that you can disable 128-bit ciphers to get twitter to use 256-bit.  Both 32 and 64 bit builds work fine for me on twitter.com.
Per comment #12.
Flags: needinfo?(mwobensmith)
Nakano-san, could you provide a more specific STR? We have difficulties in reproducing your issue.
Flags: needinfo?(masayuki)
Could you please also provide an about:support config dump? Your config settings must be non-default as Twitter would usually negotiate an AES-128 suite, at least it does on all the machines we tested.
Attached file trouble shooting info
Here is my troubleshooting information. I removed some redundant or privacy information:
* printer settings
* path in a pref
* extension.* which were created by Nightly Tester Tools (legacy prefs to disable add-ons with version check).
* crash reports

And I'm really confused. Don't you have any problems?

For example, just I open twitter.com (with login), I saw broken icons of users randomly. And then, reload twitter.com, some CSS files are not applied.

Another example, I try to download from ftp.mozilla.org, I see the error at navigating its directories.

Finally, I see this error when I open "Share this page" panel. Then, the area which shows SNS icons (which may be in an <iframe>) shows the error message.


I can reproduce this bug in safe mode too. And although I'm using ESET Smart Security 9, I tried to disable "network protection" of it temporarily, but I reproduced this bug. My network router is Aterm WG1800HP.
Flags: needinfo?(masayuki)
> security.ssl3.ecdhe_rsa_aes_256_gcm_sha384: false

Oh, this is for avoiding this bug.
I've run cipher interoperability tests with RSA, RSA-DHE, RSA-ECDHE, ECDSA-ECDHE and DSS-DHE key exchange with AES-CBC, AES-GCM and 3DES-CBC ciphers, using TLSv1.1 and TLSv1.2, all with and without client certificates against GnuTLS 3.3.22 and openssl 1.0.1 on 64 bit Linux with current (as of writing) nspr and nss.

All connections were successful.

That doesn't look to me like a simple bug in aes-256-gcm implementation...
(In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) (working slowly due to injured) from comment #19)
> And I'm really confused. Don't you have any problems?
> 
> For example, just I open twitter.com (with login), I saw broken icons of
> users randomly. And then, reload twitter.com, some CSS files are not applied.
> 
> Another example, I try to download from ftp.mozilla.org, I see the error at
> navigating its directories.
> 
> Finally, I see this error when I open "Share this page" panel. Then, the
> area which shows SNS icons (which may be in an <iframe>) shows the error
> message.
> 
> 
> I can reproduce this bug in safe mode too. And although I'm using ESET Smart
> Security 9, I tried to disable "network protection" of it temporarily, but I
> reproduced this bug. My network router is Aterm WG1800HP.

Could you reproduce it using other browsers (Chrome canary 53, IE 11 on Windows 10, or Edge)?
I tested briefly with Canary 53 and Edge (they're in default settings), but I cannot reproduce similar bug on twitter.com nor ftp.mozilla.org.
Okay, I've found the cause!

My test wasn't enough, sorry. I stop ESET Smart Security's "HTTPS scanning" of its firewall, then, I don't reproduce this bug. But I'm not sure why that affects only to Nightly, not so for Canary and Edge...
I advise you to contact ESET.
(In reply to Masatoshi Kimura [:emk] from comment #26)
> I advise you to contact ESET.

Yeah, if it's really their bug.


Additionally, somebody should check if similar bug occurs with other major security software. Do MoCo has such team? I don't know...

Anyway, we shouldn't enable it in release builds until we confirm any major security software have no problems.
Summary: SSL_ERROR_SESSION_KEY_GEN_FAILURE occurs with a lot of websites in the latest Nightly build → SSL_ERROR_SESSION_KEY_GEN_FAILURE occurs with a lot of websites if 256-bit AES-GCM is enabled and ESET Smart Security's HTTPS scanning is enabled
(In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) (working slowly due to injured) from comment #27)
> Yeah, if it's really their bug.

I think ESET has more information than us whether this is their fault. They can read our source while we can't read their source.
FYI ESET had a Chrome-specific bug in the past.
https://bugs.chromium.org/p/chromium/issues/detail?id=345512
I ran canary on today's Aurora and found the error there. This is with the defaults; no special prefs were set. So, dunno if it's exactly the same as this bug, or if it indicates another problem.

From the Alexa top sites list, we have 16 failures for ssl_error_session_key_gen_failure. See the link for more details (sort by error message):

https://tlscanary.mozilla.org/runs/2016-06-10-10-09-40/index.htm

FWIW, in the future, if anyone has any doubt about a change - whether or not it requires a pref flip - get in touch and I'm always happy to do a canary run.
BTW, that run was done on a Mac. I can do it on Linux if requested also.
OK, this was our fault.

1. tls_ComputeExtendedMasterSecretInt() assumes that the hash mechanism is SHA-256 which is wrong if the cipher suite is AES-256:
   https://dxr.mozilla.org/nss/rev/e5b31e62b46d16fbe13bc43429b4086e7cb03500/nss/lib/ssl/ssl3con.c#4039

2. NSC_DeriveKey fails a consistency check because the hash length does not match:
   https://dxr.mozilla.org/nss/rev/e5b31e62b46d16fbe13bc43429b4086e7cb03500/nss/lib/softoken/pkcs11c.c#6407
   https://dxr.mozilla.org/nss/rev/e5b31e62b46d16fbe13bc43429b4086e7cb03500/nss/lib/softoken/pkcs11c.c#6415

3. ssl3_ComputeMasterSecretFinish fails and set SSL_ERROR_SESSION_KEY_GEN_FAILURE:
   https://dxr.mozilla.org/nss/rev/e5b31e62b46d16fbe13bc43429b4086e7cb03500/nss/lib/ssl/ssl3con.c#3884

Probably ESET's local server prefers AES-256-GCM and uses Extended Master Secret.
Assignee: nobody → nobody
Component: Security: PSM → Libraries
Flags: needinfo?(kaie)
Flags: needinfo?(emaldona)
Product: Core → NSS
Summary: SSL_ERROR_SESSION_KEY_GEN_FAILURE occurs with a lot of websites if 256-bit AES-GCM is enabled and ESET Smart Security's HTTPS scanning is enabled → SSL_ERROR_SESSION_KEY_GEN_FAILURE occurs with a lot of websites if 256-bit AES-GCM is enabled and the server supports TLS Extended Master Secret extension
Target Milestone: --- → 3.25
Version: Trunk → trunk
OS: Windows 10 → All
Hardware: x86 → All
Summary: SSL_ERROR_SESSION_KEY_GEN_FAILURE occurs with a lot of websites if 256-bit AES-GCM is enabled and the server supports TLS Extended Master Secret extension → SSL_ERROR_SESSION_KEY_GEN_FAILURE occurs if 256-bit AES-GCM is enabled and the server supports TLS Extended Master Secret extension
Great find, I'll take care of this.
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
I've tested with zip builds and looks like that everything works fine! Thank you!
Flags: needinfo?(masayuki)
Attachment #8762259 - Flags: review?(franziskuskiefer) → review+
https://hg.mozilla.org/projects/nss/rev/bc8f3d07c064
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Duplicate of this bug: 1279774
for the record, I was able to reproduce the issue on NSS server side too, and I did confirm that Tim's solution solves the problem there

the test script is here: https://github.com/tomato42/tlsfuzzer/blob/master/scripts/test-extended-master-secret-extension.py
ttaubert:

Thank you for your fix, but I have a question, when is the NSS change going to be in mozilla-central? I'd like to know when I should change the settings to test it on m-c again.
Flags: needinfo?(ttaubert)
(In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) (not in London) from comment #40)
> when is the NSS change going
> to be in mozilla-central? I'd like to know when I should change the settings
> to test it on m-c again.

Please follow bug 1277255. NSS_3.25_RC0.patch will include this fix.
Flags: needinfo?(ttaubert)
For the record, the fix has been merged to m-c:
https://hg.mozilla.org/mozilla-central/rev/bb5316a4c7c2
I confirmed the fix of this bug on Nightly build. Thank you very much!
Status: RESOLVED → VERIFIED
This should already be in 49 from the upgrade to 3.25 in bug 1277255.
You need to log in before you can comment on or make changes to this bug.