Open Bug 1323141 Opened 8 years ago Updated 2 years ago

tryLater OCSP response causes hard failure when stapled

Categories

(Core :: Security: PSM, defect, P3)

defect

Tracking

()

People

(Reporter: mt, Unassigned)

References

()

Details

(Whiteboard: [psm-backlog])

Attachments

(1 file)

This is probably a WONTFIX, but I want to get this on the record.

www.isc.org is currently stapling an OCSP response that includes a tryLater response.  That's insane, but...

Chrome connects happily.  However Firefox aborts the load.

All the OCSP prefs are at their default.  Critically, I don't have security.OCSP.require enabled.  It's only when I set security.ssl.enable_ocsp_stapling to false that it loads.

As I type this in, the problem has corrected itself, so while I did test Edge (it loaded), I don't know if the problem was already fixed or not.
Ryan, I was wondering if you have any comments on this. I can't find much discussion on this particular issue (that is, what to do when a stapled OCSP response is invalid for whatever reason (and, of course, it may depend on what that reason is)).
Flags: needinfo?(ryan.sleevi)
re: Comment #1: I suspect that's a broader conversation the Mozilla security team would want to have, and y'all do you, but a few thoughts:

1) From the Chrome side, we don't (currently) validate the OCSP stapled response for anything. We're mostly just grabbing it for SCTs, but we also push it to 'the OS' (aka either slap it to NSS or pass it to Windows).
2) When I say "slap it to NSS", we're still using the libpkix path in Chrome, and so it may simply be a behaviour divergence with moz::pkix
3) There's a philosophical debate about what to do if we get stapled junk
  a) One view says that unless we're in a must-staple mode, then stapling junk is no different than receiving junk from the CA, and we should treat it the same (e.g. "tryLater" is thus ignored, unless a positive response is required - such as the case for EV historically)
  b) Another view says that Postel's Law is more of a suggestion, and to help the ecosystem, we should abort the load. Servers should know not to staple junk, because as long as servers are stapling junk sometimes, then must-staple is impractical to deploy (since using must-staple with a junk-stapling server == broken users)

We (Chrome) haven't decided on a-vs-b, but I am very much a believer in the b-camp. However, whether or not b is practical depends on the failure rate/stapled junk rate, and Emily Stark was working to explore and gather metrics there in conjunction with her expect-staple explorations. However, I'm not sure the current status of that.

I think Firefox is doing the right thing, instinctively, but I don't believe that was a data-driven or experimentally-supported change, so you might find it difficult to justify that if you find it breaks users. Best I can do for now is to cheer.

Does that help?
Flags: needinfo?(ryan.sleevi)
Cheering does help.

This must be rare, but obviously not rare enough.  If we could expand our telemetry to cover the error code, we might be able to make an informed decision.

One very important piece of context... Check out the default for Apache, which is - as I said - bananas: https://httpd.apache.org/docs/2.4/mod/mod_ssl.html#sslstaplingreturnrespondererrors
Priority: -- → P3
Whiteboard: [psm-backlog]

This bug in Firefox is keeping us as a hosting operator with Apache from activating serverside OCSP stapling, which is a useful privacy feature and even almost a mandatory requirement if you serve websites for Dutch governemental organizations.

TryLater is described by OCSP RFC 6960, and is a fully valid OCSP response, albeit one that does not provide a clear yes or no decision. The Apache behaviour is therefore certainly not more or less bananas, then the RFC for OCSP is bananas.

If security.OCSP.require is false/off (the default) and the certificate does NOT have the Must-Staple extension there seems no compelling reason to stop the pageload. Indeed, if there is already an up to date OCSP response in the internal Firefox cache, then Firefox doesn't stop the pageload. After all, if OCSP.require is off, then it is an answer that you want, but not to the point of stopping the presses if it is not answered that instant.

Furthermore, the browser is now presenting a failure page to the user that is useless for roughly 99.9999% of all users, while users of other browsers just see the site loaded. You may argue that it is done for the security of the user, but clearly if someone was using a certificate that has just been retracted for nefarious reasons, that someone would let the server return no staple, and Firefox would happily load the page, so this blocking behaviour with a TryLater response makes no sense to me.

Not only should Firefox accept a TryLater response, when OCSP require is off, it should also accept all other possible non authoritative responses of an OCSP responder as defined by the RFC, because sooner of later they will get stapled to a server response as relay from the original responder of the certificate authority.

Surely with the present focus on privacy, it should really be in FireFox interest to help server operators activate OCSP stapling.

RFC for OCSP: https://tools.ietf.org/html/rfc6960

TryLater is described by OCSP RFC 6960, and is a fully valid OCSP response, albeit one that does not provide a clear yes or no decision.

tryLater is clearly defined in Section 2.3 of RFC 6960 as an "error message", just like internalError and malformedRequest and so on.

The Apache behaviour is therefore certainly not more or less bananas, then the RFC for OCSP is bananas.

Note that the Apache documentation for this "feature" says it is for "unsuccessful stapling related OCSP queries". It is not bananas for a security protocol to define error message codes for responses. The only bananas thing here is the default setting for mod_ssl's SSLStaplingReturnResponderErrors directive, but the value of the directive can be easily changed in the config file.

First off, I am sorry for reusing the word bananas. It was silly of me, and it clouds the issue, my apologies.

Let me put it like this:

As a server operator I like the idea of OCSP stapling. It removes the need for browsers to be individually hammering OCSP responders, it increases the probability of the browser having an up to date OCSP response available, and it brings important privacy benefits. So, even while it means an extra networking effort on the part of the server, and with that the risks of introducing failure modes, the benefits for the end-user outweigh those disadvantages and OCSP stapling code has duly been added to major webserver software. And apart from that, customers that have their site hosted demand the feature, and governmental bodies in the Netherlands are currently putting the pressure on to have this feature enabled.

We as a hosting provider have had this feature enabled for a while in Apache, but had to retract, because the configuration was unstable, mainly when OCSP responders have been DOS'ed. As it turns out after further testing, we can enable it, and have it degrade without too much loss of site uptime but this will mean that FireFox users are at risk of being shut out.

There can be very long arguments about whether the Apache software is correct or not in its use of a TryLater response, and the policy for clients about what to accept or not, and I certainly have an opinion about that, as you probably noticed. However the fact is, that setting SSLStaplingReturnResponderErrors to off in Apache 2.4 will NOT suppress the TryLater response which happens if Apache couldn't reach a DOS'ed or disfunctional OCSP responder. And leaving SSLStaplingReturnResponders on is not an option, because it means an unacceptable performance degradation in those situations, turning a DOS of the OCSP responder automatically into a DOS of the website hosted.

I am pushing for a change in the Apache 2.4 code, that is even already in the alpha code, for Apache not to return a TryLater response with SSLStaplingReturnResponders set to off. However the reality is that this change, even if it is adopted quickly, has to land in LTS distributions first and that means that most website servers with Apache won't be able to use that until about 2023 at the earliest. At the same time, we as a hosting provider, are looking to enable this within the coming year. As it stands, FireFox users, of which I personally am one, will then lose out. And for the benefit of what?

Also, if you can, help push for different behaviour in Apache: https://bz.apache.org/bugzilla/show_bug.cgi?id=60182

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: