Closed Bug 1462303 Opened 2 years ago Closed 2 years ago
_RX _MALFORMED _SERVER _HELLO on HN and You Tube (TLS 1 .3)
46 bytes, text/x-phabricator-request
|Details | Review|
We are seeing occasional reports of SSL_RX_MALFORMED_SERVER_HELLO on TLS 1.3-hosted sites with Fx60. Here's the available data: - These sites are running BoringSSL - It's intermittent: if you reload, it works - At least one user reports this error when they are pretty sure they are not behind a middlebox. I saw it myself on an airplane, but again, a reload fixed it. - We didn't get any reports of this with Fx59 even though it had TLS 1.3 on. There are no super-obvious changes to the NSS code that would cause this, though our current best bet is the error is coming from the session-id comparison code in compat mode: https://searchfox.org/nss/rev/535129af9c575342f8cd1fb227da7fc5722f3acd/lib/ssl/ssl3con.c#6165 We are trying to get a PCAP, but I think the next step is to run NSS (or better yet, Firefox) in a loop and try to reproduce.
We also need to measure the actual error rate we're getting here, and see if it's worth scaling back TLS 1.3 on 60 while we sort this out.
We're working on a Firefox loop based on the Canary + tcpdump. Assigning to me.
Assignee: nobody → jjones
Priority: -- → P1
ekr found the bug in theory, I reproduced with a test case and there is a fix inbound. Let's talk about uplift. It's a small change, but one that will prevent some ugly errors during TLS 1.3 rollout.
Assignee: jjones → martin.thomson
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.38
Version: trunk → 3.35
Ok, the case for/against uplift. This IS user visible. It will manifest with a scary warning once (for each user and each site). Reloading the page (or the browser) will make it go away. It only happens when the site upgrades to TLS 1.3, which we've seen a lot of recently. It only happens if the user has an open session during the upgrade. This is a pure interop issue. There is no security risk here, just an ugly warning. So... I would say that we want to uplift to Beta and - provided that release management are ok - wait for the next regular build of 60 to catch the change. I can start the process of getting the NSS releases ready in case someone wants to move.
(In reply to Eric Rescorla (:ekr) from comment #1) > We also need to measure the actual error rate we're getting here, and see if > it's worth scaling back TLS 1.3 on 60 while we sort this out. Is someone working on getting that data? Should this block rolling out 60 further in the mean time?
Comment on attachment 8976993 [details] Bug 1462303 - Allow TLS 1.3 compat mode when attempting to resume TLS 1.2, r?ekr Tim Taubert [:ttaubert] has approved the revision. https://phabricator.services.mozilla.com/D1310
Attachment #8976993 - Flags: review+
I am running a Spark job to try to collect this,
(In reply to Eric Rescorla (:ekr) from comment #9) > I am running a Spark job to try to collect this, But now that we have diagnosed it, I expect it to be pretty rare and given that we know a reload fixes it, I'm tempted not to block. What do others think?
(In reply to Eric Rescorla (:ekr) from comment #10) > (In reply to Eric Rescorla (:ekr) from comment #9) > > I am running a Spark job to try to collect this, > > But now that we have diagnosed it, I expect it to be pretty rare and given > that we know a reload fixes it, I'm tempted not to block. What do others > think? Looking at the patch -- and assuming I understand it correctly -- I don't see a need to block deployment. I vote to continue.
Spark doesn't show that 60 is worse than 59, so I don't think there is a need to throttle 60. I suggest that we do as MT says and trial if for a little while on Beta and then look at promoting to Release.
3.37: https://hg.mozilla.org/projects/nss/rev/2f5e8b4bf89729ea5ee18c089e684d4e8138b557 3.36: https://hg.mozilla.org/projects/nss/rev/ff35a3edaafb1586d839fca8cdcae53dcdf02ee9 I'll need help cutting (NSS) releases. The process is not nearly automated enough for my liking.
This is included in NSS 3.36.2  and 3.37.1 , and landed in m-c this morning .  https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/NSS_3.36.2_release_notes  https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/NSS_3.37.1_release_notes  https://hg.mozilla.org/mozilla-central/rev/6d3ee860c038
Hey J.C., is there a good telemetry probe that we can monitor on release 60 to get a sense of how severe this issue is? Thanks!
(In reply to Ritu Kothari (:ritu) from comment #15) > Hey J.C., is there a good telemetry probe that we can monitor on release 60 > to get a sense of how severe this issue is? Thanks! I believe this is bucket 29  of SSL_HANDSHAKE_RESULT . Currently 0.01% of Nightly 62 errors. Graph: https://screenshots.firefox.com/ai8yzHlOVbK06VYn/telemetry.mozilla.org  https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/security/nss/lib/ssl/sslerr.h#70  https://mzl.la/2GJfiCM
Do we know what the story was for Release 58 (no TLS 1.3)?
(In reply to Eric Rescorla (:ekr) from comment #17) > Do we know what the story was for Release 58 (no TLS 1.3)? No idea. It looks like the background noise for that one is ~1k-4k for most release cycles. (I'm not seeing Release 58 as an option on t.m.o anymore, but Beta 58 has ~4k errors noted, which is 0%. I'm guessing you're looking at stmo?)
You need to log in before you can comment on or make changes to this bug.