Open Bug 1590870 Opened 5 years ago Updated 2 years ago

TLS 1.3 Downgrade Sentinel causing breakage

Categories

(NSS :: Libraries, defect, P2)

Tracking

(firefox72+ wontfix)

Tracking Status
firefox72 + wontfix

People

(Reporter: bugs, Unassigned)

References

(Regression)

Details

(Keywords: regression)

Tested 2019-09-01, 2019-10-21 and 2019-10-23 in clean profiles (that is with security.certerrors.mitm.auto_enable_enterprise_roots;true). In all of them, instead of autoenabling enterprise MitM I get the error mentioned when accessing google.com and security.enterprise_roots.enabled remains false.

Also. In my repeated deletion and recreation of clean profiles, I sometimes got the SSL_ERROR_RX_MALFORMED_SERVER_HELLO on the mozilla.org welcome pages, but not always. In addition, if I hit a site that works (such as Facebook.com) after accessing google.com I got the more detailed HSTS error screen with MOZILLA_PKIX_ERROR_MITM_DETECTED under advanced with, again, security.enterprise_roots.enabled remaining false. This behaviour was not consistent either, and sometimes the page reloaded itself properly and activated security.enterprise_roots.enabled;true - however this did NOT fix the Google error.

I'm kind of puzzled as to the special behaviour of google, it's as if its certificate has a higher importance than even HSTS and mozilla.org. I was also able to reproduce this with: https://gmail.com , https://youtube.com , https://duckduckgo.com , https://google.fr , https://apple.com

https://bing.com , https://mozilla.org , https://en.wikipedia.org , https://reddit.com worked without issues (once enterprise roots autoactivated or was manually enabled ).

In trying various random sites I didn't see any obvious pattern. Some worked, some didn't but those that didn't, never did, no matter how many refreshes, clean profiles, or enterprise root settings. The error was always SSL_ERROR_RX_MALFORMED_SERVER_HELLO.

p.s. - I was panicked that this was related to the bug #1570222 fixes, but it totally wasn't. It predates it, and Firefox 70 works great on our domain now. It might of course still be related to the bug that triggered that regression (bug #1551177)

Can you attach a packet trace of a handshake for which you get the error SSL_ERROR_RX_MALFORMED_SERVER_HELLO?

Flags: needinfo?(bugs)

What would be the best way of going about that. Just installing wireshark? Nothing else? Anything built-in to firefox?

Flags: needinfo?(bugs)

Yes, wireshark would work.

Hi. I installed wireshark on this vm and got a pcap of communications between the device and google.com.
I don't want to attach the pcap to this public report however. You were not on IRC so I emailed your mozilla.com address from this email address and have not received a response yet. You can contact me on IRC (as nemo) or by replying to the email.
Thanks.

I tried to respond to that email but it looks like your email server might be down. Here's what I wrote:

You can email me the packet trace directly if you're comfortable with that.
Otherwise, you could use
https://mozilla.github.io/mozregression/quickstart.html to narrow down
when this changed. That would at least give us an idea of where to start
looking.

Flags: needinfo?(kyberneticist)

Yeah, I was pretty upset. Just after I'd written that update Thursday, I was on the phone resolving an unrelated issue they'd also broken during the recent account changes and a supposed Tier 3 in a completely different department broke what they'd just fixed the week before. Big "Noooooooooooooo" from me. The irritating thing is it used to be 5 minutes to fix but now they insist on letting issues stew for 3 days before I can call them again to resolve it. Trying not to name my ISP but I'm pretty unhappy w/ past few weeks. Anyway I temporarily moved my email address to gmail.

Flags: needinfo?(kyberneticist)

Oh, and if I was being ambiguous above, I emailed you the link to the pcap at the same time from the alt email account.

Looking at the packet trace, this seems to be a lower-level issue (i.e. NSS).
Do you mind if I share that packet trace with other NSS engineers to try and figure out what the issue is?

Flags: needinfo?(kyberneticist)
Assignee: nobody → nobody
Component: Security: PSM → Libraries
Product: Core → NSS
QA Contact: jjones
Version: Trunk → other

Don't mind no.

Flags: needinfo?(kyberneticist)

The priority flag is not set for this bug.
:jcj, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jjones)

Sorry for the delay here nemo.

What's happening according to the packet trace is that you're encountering the TLS 1.3 downgrade sentinel. The Google server's serverRandom field looks like this:

0000   e2 0d 45 6e 55 07 e3 ed b9 a3 61 f9 c6 ad 6c ca   ..EnU.....a...l.
0010   ec c7 17 03 44 4f 57 4e 47 52 44 01               ....DOWNGRD.

where those last 8 bytes are equal to our tls13_downgrade_random: https://searchfox.org/nss/source/lib/ssl/ssl3con.c#397

This is the indication that a middlebox forced a downgrade from TLS 1.3. We just turned this on by default for Firefox 72 in Bug 1576790, where I'm afraid this bug was mentioned by we neglected to get back to you here.

You can fix this on your profile(s) by changing the relevant preference, security.tls.hello_downgrade_check, back to false in about:config. We haven't decided what to do on enabling this overall by default, since your situation shows it does cause breakage.

I'm re-titling this bug to generally discuss the situation around that preference, and whether we want to, perhaps, avoid letting it ride to Release at this time.

Flags: needinfo?(jjones)
OS: Windows → All
Priority: -- → P2
Regressed by: 1576790
Hardware: Unspecified → All
Summary: Accessing certain sites (ex. google.com) fails with SSL_ERROR_RX_MALFORMED_SERVER_HELLO in Nightly with enterprise MitM → TLS 1.3 Downgrade Sentinal causing breakage
Summary: TLS 1.3 Downgrade Sentinal causing breakage → TLS 1.3 Downgrade Sentinel causing breakage

Ouch. I have no idea where P2 falls on the priority list, nor how widespread this issue is, but I can tell you 100k people are behind the enterprise MitM that is currently broken on half the internet with Firefox Nightlies.

Just FYI 😉

Oh. Thanks for the detailed explanation. What I did not immediately pick up was that I was encountering this I guess due to testing nightlies and you guys have no intention of enabling it any time soon.

Maybe you do enable it, have a special case for enterprise MitM? security.enterprise_roots.enabled;true or security.enterprise_roots.auto_enabled;true

Anyway, thanks, and confirmed that with that disabled everything is back to normal - i.e. w/ my connections seamlessly man-in-the-middled ☺

nemo, glad that worked. Can you help us get in touch with the organization in question? Maybe email me if it's sensitive? The middlebox is doing a Bad Thing here, forwarding the server's original serverRandom. If they're going to intercept your TLS session, we'd like them to do it the safest way possible.

And MT, should we consider deferring another cycle on the change in https://hg.mozilla.org/mozilla-central/rev/df10f6e28030?

Flags: needinfo?(mt)

(I work on Chrome, not Firefox, but I'm planning on reenabling this check later in Chrome too.)

What product is the MITM proxy? We're aware of flaws in Palo Alto Networks products and Cisco Firepower firewalls which trigger this kind of issue. Both vendors have since shipped fixes. The versions are as follows.

Palo Alto Networks:
PAN-OS 8.1 must be ≥ 8.1.4
PAN-OS 8.0 must be ≥ 8.0.14
PAN-OS 7.1 must be ≥ 7.1.21

Cisco:
Firmware 6.2.3 must be ≥ 6.2.3.4
Firmware 6.2.2 must be ≥ 6.2.2.5
Firmware 6.1.0 must be ≥ 6.1.0.7

If there are other affected vendors, that would be very useful information.

As J.C. noted, forwarding the server's original random value is not okay. Random values in security protocols are there for a reason, so this flaw invalidates all security analysis for TLS when used by this proxy. Administrators should treat these out-of-date proxies as a security risk, rather than merely a compatibility risk.

After having talked to David about this, I think that we want to make sure that our SUMO documentation is up to date (and thanks to David for sharing this info), but the Chrome delays are engineering-based on their end, not grounded in a direct need. That there remain networks that do bad things is unfortunate, but we should be OK provided that we:

a) clearly document what the ideal solution is, namely patch these systems with patches that are now > 1 year old
b) clearly document what the workaround is (set the pref to false)
c) clearly explain how to set the pref in enterprise policy to avoid individual tweaks

The best option being clearly (a) as there are likely a raft of other fixes that have been added to software in the last year. Anyone setting the pref will be exposed to attack until we are able to disable the pref entirely, which - based on experience - could take a long time.

This is a clear security risk and these interception boxes are not doing the right thing at all. By doing this they are not only harming security for their own users, but by allowing this practice to continue we would be harming all Firefox users. Yes, this is unpleasant, but we can retain the pref for the near term (until at least the next ESR at least), we should be OK.

I am not opposed to documenting the workaround or making the error page more specific, but the right default here is protecting people from downgrade attack.

Flags: needinfo?(mt)

I've talked with the reporter, and since about:config is blocked on the network by enterprise policy [1], there's no user-workaround available. Obviously the enterprise can fix the policy, but apparently there's no official support for Firefox users. I'm asking the reporter to file a network trouble ticket, link to this bug, link to Chromium/David's bug [0], and David's comment #15 about Chromium's plans.

That said, I think we need to treat this like WebPKI security breakages and try to turn it on at the same schedule as Chromium does, so we aren't stranding enterprise users. I'll start an out-of-band email thread.

[0] https://bugs.chromium.org/p/boringssl/issues/detail?id=226
[1] https://github.com/mozilla/policy-templates#blockaboutconfig

To follow-up, I've emailed the enterprise IT department affected, since it appears they have adjusted the Chrome feature flag to permit Chrome to work, and need to do the same with Firefox.

Anyway, I'm open to leaving this on into Beta and see if we get more reports, and we'll see if we can get a nice response out of the enterprise IT department.

Slight correction: https://bugs.chromium.org/p/chromium/issues/detail?id=996894 is the bug you want to follow for Chromium. I should probably close the BoringSSL one as it's implemented in the TLS library itself. (Honestly I'd forgotten about that one when I filed the Chromium one. :-) )

Might be good to keep an eye on this in beta.

Do we need to allow this to be toggled via GPO / policies.json?

Flags: needinfo?(mozilla)

Do we need to allow this to be toggled via GPO / policies.json?

Sounds like we do.

I'm having trouble understanding exactly what to do based on this bug. Just allow security.tls.hello_downgrade_check via policy in 72?

Flags: needinfo?(mozilla)

Yes, set security.tls.hello_downgrade_check to false.

Can you help find someone take this on now, if we're planning to fix it in 72 beta? Thanks.

Flags: needinfo?(jjones)

Mike, can you tell me how to make that policy change? Or take this?

Flags: needinfo?(jjones) → needinfo?(mozilla)

I have this in

https://bugzilla.mozilla.org/show_bug.cgi?id=1588183

as a preference. It just missed going to 72 last week. I'll get it in today or tomorrow and request uplift.

Flags: needinfo?(mozilla)

(In reply to J.C. Jones [:jcj] (he/him) from comment #18)

Anyway, I'm open to leaving this on into Beta and see if we get more reports, and we'll see if we can get a nice response out of the enterprise IT department.

Have we seen more reports? Are we ok with shipping 72 as-is at this point?

Flags: needinfo?(jjones)

I have seen no further reports or discussion. I think we can consider this wontfix for 72.

Flags: needinfo?(jjones)

As near as I can determine there's no intention to fix this on the internal network and the issue I filed was closed, so Mozilla Firefox will most likely cease to function if this is pushed to 72.

Maybe given Firefox does not have Chrome's leverage at this point, you should consider disabling this sentinel if security.enterprise_roots.auto_enabled is set to true since presumably enterprise MitM was forced on at that point and the SSL communications are probably half-broken anyway.

But up to you guys of course.

On the Chrome side, we're currently targeting Chrome 81 (around March 2020) to finish shipping this. We'd already long shipped this for known roots. This removes the bypass for user-installed roots. I believe Safari has also already enabled this in the latest versions of macOS and iOS.
https://chromestatus.com/feature/5128354539765760

This downgrade protection is perfectly compatible with enterprise MITM setups, so disabling it for all configurations means some enterprise environments will needlessly lose some of TLS's security features.

The incompatibility only arises if the products implemented TLS incorrectly and failed to generate a random value for the ServerHello.random field. This is incorrect and, independent of this change, a security risk for your network. Random values in security protocols are there for a reason. Setting them incorrectly invalidates all security analysis of the TLS protocol and means the implementation may be vulnerable to attacks.

Is the product one of the ones listed in https://bugzilla.mozilla.org/show_bug.cgi?id=1590870#c15? If so, the administrators simply need to update it to one of the versions listed. If not, it would be very useful to learn what the affected product is.

No idea, there was no followup. I did point to them to this bug. They do have the Chrome sentinel disabled, they just are unlikely to maintain any Firefox config - there is a great deal in Firefox that was disabled recently (dev tools, private browsing, about:config) as part of a global administrative profile, but it may have been done by some tool automatically. Requesting an explicit sentinel disable for Firefox does not seem to interest them since it is not part of the standard image.

Just a minor update, as of Firefox 72.0.1 all internal Firefox installs are broken with no workaround (since about:config is not accessible).

You might let your IT team know that they can configure this preference via policy.

https://github.com/mozilla/policy-templates/blob/master/README.md#preferences

security.tls.hello_downgrade_check

If they have a global admin profile, they should be able to do this.

Yeah, that's what I tried doing back in November with no success. (and was what they had done long ago for Chrome)

However good news in this case at least. 5 days after the Firefox rollout, the equipment was patched and fixed. Perhaps there were more Firefox users than they'd thought.

And at least it got the problem corrected properly.

Has Regression Range: --- → yes
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.