+++ This bug was initially created as a clone of Bug #1368433 +++ bug 1368433 (and many, many duplicates) report being unable to reach a vast swath of Microsoft cloud-hosted servers due to incorrect stapled OCSP responses. The quick fix in that bug is to disable stapling. To avoid cross-conversations in that largely operational "stop the pain" bug I'm opening this to consider what we should do about stapling in Firefox 54 (shipping in two weeks!) and beyond. In bug 1368433 comment 26 ekr suggests making stapling "soft-fail" as Chrome does, so let's start with that. The advantage is that stapling can succeed and then we get the performance and privacy benefits we sought. If it fails because the server set their automation up incorrectly it fails over to the CA's server who presumably are better at that kind of thing. The disadvantage is that if stapling is broken the server likely won't ever know because the site will just work. We should at the very least spit an error into the web console to give sites a fighting chance. Do we need to distinguish between error reasons? If the stapled response is an outright "REVOKED" status and otherwise completely validly signed can we hard-fail on that, at least? AIUI that's not the kind of error MS had.
Ultimate decision-making and implementation probably falls on Keeler.
Tracking for 54 and 55. We may also want to ship a fix for this in ESR52.
We already more or less disregard expired stapled OCSP responses, so we may be able to also disregard responses from expired OCSP responders.
Assignee: nobody → dkeeler
Priority: -- → P1
Whiteboard: [parity-Chrome][parity-Edge] → [parity-Chrome][parity-Edge][psm-assigned]
Since the big problem was fixed by Microsoft I don't think we need to consider this a blocker for 54 anymore. Keeler, anything happening here for 55/56? Are we simply wontfixing this for 54 and sticking with the status quo for now?
The crypto engineering team has been discussing this, and I think we're leaning towards keeping this behavior in that Firefox displays an error when it encounters unacceptable stapled OCSP information (the benefit being that otherwise Microsoft might have never discovered there was a problem). We will probably need to change the error classification such that it's overridable, though, because otherwise users just have to switch browsers to complete their tasks.
I'm not enthusiastic about making it overridable because there are *other* errors where overriding is very dangerous and users have trouble distinguishing them. Why not instead allow the connection to proceed but provide an infobar that shows the problem or some other non-modal warning. I'm fairly concerned about being the only browser that has this behavior where other browsers succeed... NI-ing Panos.
We have been trying to avoid infobars in general as a notification mechanism, but it may make sense in this case and there is at least some precedent. A somewhat similar situation occurs with a captive portal, where we display an error page when failing to connect to a site, but use an infobar in other tabs that are still functioning. Needinfo-ing jsavory who works on security and bram who designed our error pages for the UX perspective.
I think using the infobar makes sense in this case. If we maintain our own or use Chrome’s CRLSets, would it help make these infobars appear less? Photon has a design for the infobar, as seen here: https://mozilla.invisionapp.com/share/7DC3VVUJZ#/229252107_Notification_-_Default – see the next two pages, as well. If we decide to go with it, there are three broad things to consider. #1 What’s the nature of this problem? This will determine the infobar style that we’ll use in Photon. If informational or not dangerous (like captive portal) then use the grey/white bar. If potentially dangerous, use yellow. If definitely dangerous (e.g. you’ve just ignored a security warning), use red. #2 Is there a way to phrase this problem that’s accurate, friendly and readable? Keep in mind that most users won’t read the infobar, and those who read it won’t care to finish if it’s jargony. There’s a possibility to explain more by using a “Learn more…” link that leads to SUMO, for example. #3 What’s the action that we want to encourage our users to do? There should be one. This is why we choose to use the infobar, after all. Is it to report the problem? Is it to go back to safety? If it’s just a notification that’s like saying “Just so you know”, and the only action is a close “x” button, we should consider not interrupting the user at all.
Sounds like this is still under discussion, not something that going to ship in 55 and likely not 56. I'll drop tracking for now but please re-nom if we start planning to ship this as a new feature.
Assignee: dkeeler → nobody
Priority: P1 → P2
Whiteboard: [parity-Chrome][parity-Edge][psm-assigned] → [parity-Chrome][parity-Edge][psm-backlog]
I guess whatever ends up happening here won't be 52esr material.
Summary: re-think OCSP stapling for Fx54 and later → re-think strictness of OCSP stapling given that other browsers aren't as strict
Duplicate of this bug: 1390117
Duplicate of this bug: 1408727
Duplicate of this bug: 1409013
David Keeler has marked bug 1408727 as duplicate to this, but in this thread there is no description of the impact of the FF "strictness" let me summarize - On Sunday (15.Oct.2017) Firefox has blocked for about 24 hours random internet sites. These sites are running on different web servers(IIS, AkamaiGHost, cloudflare-nginx ...) These sites have HTTPS certificates from different CA (GeoTrust, Symantec, thawte). The certificates were valid (in common sense of valid - trusted, not in CRL, can be validated by OCSP). There was no way regular user to access the blocked sites using Firefox (except if he is smart enough to set security.ssl.enable_ocsp_stapling=false). The site owners couldn't do anything to help the affected users (now there is a recommendation from symantec to restart web service on the impacted server to force caching to renew). This bug is 5 months old, are there any plans to be fixed?
We're working on addressing this.
Assignee: nobody → dkeeler
Priority: P2 → P1
Whiteboard: [parity-Chrome][parity-Edge][psm-backlog] → [parity-Chrome][parity-Edge][psm-assigned]
Comment on attachment 8927072 [details] bug 1368868 - give up on ocsp stapling strictness because we can't have nice things https://reviewboard.mozilla.org/r/198290/#review203760 Time for another flag, I think... but this patch looks good, and let's not hold it up. We should nominate it for uplift to 58 beta, since I think we're missing the merge. ::: security/certverifier/NSSCertDBTrustDomain.cpp:424 (Diff revision 1) > + // Stapled OCSP response present but invalid for a small number of reasons > + // CAs/servers commonly get wrong. This will be treated similarly to an > + // expired stapled response. > + mOCSPStaplingStatus = CertVerifier::OCSP_STAPLING_INVALID; > + MOZ_LOG(gCertVerifierLog, LogLevel::Debug, > + ("NSSCertDBTrustDomain: stapled OCSP response: " Dan's recommendation was that we print to the console here, somehow. Please open a follow-up to figure out how to do that.
Attachment #8927072 - Flags: review?(jjones) → review+
Comment on attachment 8927072 [details] bug 1368868 - give up on ocsp stapling strictness because we can't have nice things https://reviewboard.mozilla.org/r/198290/#review203760 Thanks! I filed a follow-up bug for the console warning. I also added some more tests - could you have another look? > Dan's recommendation was that we print to the console here, somehow. Please open a follow-up to figure out how to do that. Filed bug 1416327.
ni? to ensure you see comment 21.
Comment on attachment 8927072 [details] bug 1368868 - give up on ocsp stapling strictness because we can't have nice things https://reviewboard.mozilla.org/r/198290/#review204128 You're right, these are really good tests to add.
Had to add a minor pref fixup to the modified tests for android: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc43e3f39633 https://treeherder.mozilla.org/#/jobs?repo=try&revision=fc1700eb7437 Thanks for the reviews!
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/3c298625e967 give up on ocsp stapling strictness because we can't have nice things r=jcj
Please request Beta approval when you're comfortable doing so.
Approval Request Comment [Feature/Bug causing the regression]: ocsp stapling strictness (long-present feature) [User impact if declined]: potentially unable to access some sites that other browsers can access [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: yes [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: low-to-medium [Why is the change risky/not risky?]: The code change is minimal, but this is in a complicated, security-sensitive area of the code. There is a precedent for the change we made, however, so I'm fairly confident in it. [String changes made/needed]: none
Comment on attachment 8929555 [details] [diff] [review] patch for beta Let's take that in 58. It is a lost cause for now as it is hurting us :/
Attachment #8929555 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to David Keeler [:keeler] (pto until 27th) from comment #29) > [Is this code covered by automated tests?]: yes > [Needs manual test from QE? If yes, steps to reproduce]: no Marking this issue as qe-verify- based that is is covered by automation and manual testing is not required.
8 months ago
Duplicate of this bug: 1497491
You need to log in before you can comment on or make changes to this bug.