Closed
Bug 929617
Opened 11 years ago
Closed 11 years ago
validity period for stapled OCSP responses is too short (SEC_ERROR_OCSP_OLD_RESPONSE with nginx)
Categories
(Core :: Security: PSM, defect)
Core
Security: PSM
Tracking
()
RESOLVED
FIXED
mozilla28
People
(Reporter: keeler, Assigned: keeler)
References
Details
Attachments
(1 file, 3 obsolete files)
23.42 KB,
patch
|
keeler
:
review+
bajaj
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
As far as I can tell, the validity period for an OCSP response is 24 hours[1] (in terms of its "thisUpdate" time[2]). This means that servers have to update their stapled responses once a day. We have evidence that sites like twitter, tumblr, and soundcloud aren't doing this (bug 929063, bug 929068). Given that we follow the spec[3] and terminate connections when we receive an "unsatisfactory" stapled response, this means these sites are broken when ocsp stapling is enabled.
[1] https://mxr.mozilla.org/mozilla-central/source/security/nss/lib/certhigh/ocsp.c#4333
[2] http://tools.ietf.org/html/rfc6960#section-2.4
[3] http://tools.ietf.org/html/rfc6066#section-8
Assignee | ||
Comment 1•11 years ago
|
||
Slight amendment: the validity period for an OCSP response with no "nextUpdate" value is 24 hours. The validity period for an OCSP response with both "thisUpdate" and "nextUpdate" is that time interval {thisUpdate, nextUpdate} (with a little wiggle room for clock skew).
Assignee | ||
Comment 2•11 years ago
|
||
According to http://tools.ietf.org/html/rfc5019#section-2.2.4, "When pre-producing OCSPResponse messages, the responder MUST set the thisUpdate, nextUpdate, and producedAt times", so there's that.
Comment 3•11 years ago
|
||
I think we misunderstood the requirement from section-2.2.4: "When pre-producing OCSPResponse messages, the responder MUST set the thisUpdate, nextUpdate, and producedAt times as follows:" What happens when not pre-producing, but rather producing on demand?
Section 4 is the authoritative place:
http://tools.ietf.org/html/rfc5019#section-4
"Clients MUST check for the existence of the nextUpdate field and MUST
ensure the current time, expressed in GMT time as described in
Section 2.2.4, falls between the thisUpdate and nextUpdate times. If
the nextUpdate field is absent, the client MUST reject the response.
"If the nextUpdate field is present, the client MUST ensure that it is
not earlier than the current time. If the current time on the client
is later than the time specified in the nextUpdate field, the client
MUST reject the response as stale."
It is not clear whether stapled OCSP responses must conform to RFC 5019's requirements. Also it is unclear whether CAs in our program are required to conform to RFC 5019. The baseline requirements say "OCSP responses MUST conform to RFC2560 and/or RFC5019." The phrase "and/or" is problematic and should be replaced with "and."
Because of this lack of clarify, it is likely that some server implementations of OCSP stapling will be too liberal in trusting OCSP responses where nextUpdate is missing--i.e. they may not refresh these OCSP responses as frequently as we require them to be refreshed. This creates a big interoperability issue. We should ask our CA partners to ensure that they set nextUpdate on all of their OCSP responses. I suggest that we recommend to them that the set nextUpdate = thisUpdate + 48 hours. This balances freshness against potential DoS issues.
Comment 4•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Assignee | ||
Comment 5•11 years ago
|
||
This patch ignores stapled OCSP responses that are expired (meaning that we fall back to OCSP fetching). It depends on the patch in bug 938805 that is near landing.
Assignee | ||
Updated•11 years ago
|
Attachment #8336398 -
Flags: review?(cviecco)
Comment 6•11 years ago
|
||
Comment on attachment 8336398 [details] [diff] [review]
patch
Review of attachment 8336398 [details] [diff] [review]:
-----------------------------------------------------------------
looks good to me.
::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +880,5 @@
> if (rv != SECSuccess) {
> + // Due to buggy servers that will staple expired OCSP responses
> + // (see for example http://trac.nginx.org/nginx/ticket/425),
> + // don't terminate the connection if the stapled response is expired.
> + // We will fall back to actively fetching revocation information.
better to(?):
Will fall back to the non-stapled revocation setup
Attachment #8336398 -
Flags: review?(cviecco) → review+
Comment 7•11 years ago
|
||
Comment on attachment 8336398 [details] [diff] [review]
patch
Review of attachment 8336398 [details] [diff] [review]:
-----------------------------------------------------------------
::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +880,5 @@
> if (rv != SECSuccess) {
> + // Due to buggy servers that will staple expired OCSP responses
> + // (see for example http://trac.nginx.org/nginx/ticket/425),
> + // don't terminate the connection if the stapled response is expired.
> + // We will fall back to actively fetching revocation information.
s/actively //.
@@ +888,5 @@
> + }
> +
> + // Reset the error state, because we're going to continue the connection.
> + rv = SECSuccess;
> + PR_SetError(savedError, 0);
You don't need these lines that set rv and call PR_SetError. Nobody will read rv or PR_GetError() below.
::: security/manager/ssl/tests/unit/head_psm.js
@@ +65,5 @@
> .getService(Ci.nsINSSErrorsService);
> return nssErrorsService.getXPCOMFromNSSError(statusNSS);
> }
>
> +function _getLibraryFunction(functionName, libraryName) {
Name this with _getLibraryFunctionWithNoArguments?
@@ +101,5 @@
> + "ssl3");
> + if (SSL_ClearSessionCache() != 0) {
> + throw "Failed to clear SSL session cache";
> + }
> +}
Should we be calling this function in test_ocsp_stapling.js like we call clearOCSPCache?
::: security/manager/ssl/tests/unit/test_ocsp_stapling.js
@@ +30,5 @@
> + // Server Error is actually faster than attempting to connect to a closed
> + // port.
> + let fakeOCSPResponder = new HttpServer();
> + fakeOCSPResponder.registerPrefixHandler("/", function(request, response) {
> + response.setStatusLine(request.httpVersion, 500, "Internal Server Error");
I haven't thought about it deeply for all cases, but I suspect that this OCSP responder should never be accessed in these tests. Let me know if I am misunderstanding something here.
For example, let's say the server stapled a "malformed" response. We shouldn't be re-verifying the certificate in the cert error runnable because we aren't going to let the user override that malformed response error anyway. In CreateCertErrorRunnable or similar, there is already a case where we bail out early in the error was CERT_ERROR_REVOKED_CERTIFICATE. I suspect we need to add some more cases there.
If what I'm saying above is true, then really what we should be doing here is verifying that the OCSP responder is never accessed, for each test case.
::: security/manager/ssl/tests/unit/test_ocsp_stapling_expired.js
@@ +16,5 @@
> + aExpectedOCSPRequests) {
> + add_connection_test(aHost, aExpectedResult,
> + function() {
> + clearOCSPCache();
> + clearSessionCache();
Shouldn't be be calling clearSessionCache similarly in test_ocsp_stapling.js too?
@@ +21,5 @@
> + gCurrentOCSPResponse = aOCSPResponseToServe;
> + gOCSPRequestCount = 0;
> + },
> + function() {
> + do_check_eq(gOCSPRequestCount, 1);
Why would the request count be 1 for the first test case, where the OCSP response is valid? Shouldn't it be zero for that case?
@@ +32,5 @@
> + let args = [ ["good", "localhostAndExampleCom", "unused" ],
> + ["expiredresponse", "localhostAndExampleCom", "unused"],
> + ["oldvalidperiod", "localhostAndExampleCom", "unused"],
> + ["revoked", "localhostAndExampleCom", "unused"] ];
> + let ocspResponses = generateOCSPResponses(args, "tlsserver");
*** see below.
@@ +45,5 @@
> + ocspResponder.start(8080);
> +
> + add_tls_server_setup("OCSPStaplingServer");
> + add_ocsp_test("ocsp-stapling-expired.example.com", Cr.NS_OK,
> + ocspResponses[0]);
Can we clarify in the names that it is the *OCSP response* that is expired, and not any cert? It is very unclear what these test cases are actually testing. It might be helpful to add a comment describing the scenerio.
Instead of referring to these responses as ocspResponses[0], ocspResponses[1], could we name them at the point I labeled "***" above? e.g.:
const goodResponse = ocspResponses[0];
const expiredResponse = ocspResponses[1];
...
This would make the test much easier to review and much easier for us to modify/debug later.
Attachment #8336398 -
Flags: review?(brian) → review-
Assignee | ||
Comment 8•11 years ago
|
||
Thanks for the review. I've made the changes you pointed out. I have one concern in particular that I'll point out as a comment on the patch.
Attachment #8336398 -
Attachment is obsolete: true
Attachment #8337993 -
Flags: review?(brian)
Assignee | ||
Comment 9•11 years ago
|
||
Comment on attachment 8337993 [details] [diff] [review]
patch v2
Review of attachment 8337993 [details] [diff] [review]:
-----------------------------------------------------------------
::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +448,5 @@
>
> + // The certificate was revoked or there was an OCSP error, so
> + // don't do anything else
> + if (defaultErrorCodeToReport == SEC_ERROR_REVOKED_CERTIFICATE ||
> + defaultErrorCodeToReport == SEC_ERROR_BAD_DATABASE ||
This is the code we get when we try to verify an OCSP response signed by a certificate not in the database. I imagine there are other reasons defaultErrorCodeToReport could be SEC_ERROR_BAD_DATABASE - is this behavior correct in those cases as well as for the OCSP one?
Comment 10•11 years ago
|
||
Comment on attachment 8337993 [details] [diff] [review]
patch v2
Review of attachment 8337993 [details] [diff] [review]:
-----------------------------------------------------------------
Please make the blacklist/whitelist change I suggest below, unless that breaks something.
::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +456,5 @@
> + defaultErrorCodeToReport == SEC_ERROR_OCSP_TRY_SERVER_LATER ||
> + defaultErrorCodeToReport == SEC_ERROR_OCSP_REQUEST_NEEDS_SIG ||
> + defaultErrorCodeToReport == SEC_ERROR_OCSP_UNAUTHORIZED_REQUEST ||
> + defaultErrorCodeToReport == SEC_ERROR_OCSP_UNKNOWN_CERT ||
> + defaultErrorCodeToReport == SEC_ERROR_OCSP_MALFORMED_RESPONSE) {
Instead of having this be a blacklist, should we make it a whitelist of the error codes that we will allow a cert error override for?:
case SEC_ERROR_UNKNOWN_ISSUER:
case SEC_ERROR_CA_CERT_INVALID:
case SEC_ERROR_UNTRUSTED_ISSUER:
case SEC_ERROR_EXPIRED_ISSUER_CERTIFICATE:
case SEC_ERROR_UNTRUSTED_CERT:
case SEC_ERROR_INADEQUATE_KEY_USAGE:
case SEC_ERROR_CERT_SIGNATURE_ALGORITHM_DISABLED:
case SSL_ERROR_BAD_CERT_DOMAIN:
case SEC_ERROR_EXPIRED_CERTIFICATE:
It seems like the original call to CertVerifier::VerifyCert must have returned one of these error codes in order for us to be able to successfully allow the cert override. (And, if we wouldn't allow the cert override in the first place, there's no reason to call CertVerifier::VerifyCert again.) I think this would resolve your issue with SEC_ERROR_BAD_DATABASE more generally.
To answer your question more directly though, SEC_ERROR_BAD_DATABASE isn't an overridable error, so we can short-circuit in that case, as you have done.
Nit: If it is not too much trouble, this change should be done in its own bug or at least its own patch. It doesn't need its own tests.
Attachment #8337993 -
Flags: review?(brian) → review+
Assignee | ||
Comment 11•11 years ago
|
||
Thanks for the quick review. Spun off bug 943115, carrying over r+.
https://tbpl.mozilla.org/?tree=Try&rev=6decd0019e12
Attachment #8337993 -
Attachment is obsolete: true
Attachment #8338107 -
Flags: review+
Assignee | ||
Comment 12•11 years ago
|
||
Had to tweak the library loading because of a difference on windows. Carrying over r+.
Here's a recent try run: https://tbpl.mozilla.org/?tree=Try&rev=21c85ab20d15
(the android xpcshell failure has also been addressed in this patch)
Attachment #8338107 -
Attachment is obsolete: true
Attachment #8338892 -
Flags: review+
Assignee | ||
Comment 13•11 years ago
|
||
Comment 14•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Assignee | ||
Comment 15•11 years ago
|
||
Comment on attachment 8338892 [details] [diff] [review]
patch v2.2
[Approval Request Comment]
Bug caused by (feature/regressing bug #): needed to uplift bug 887321 (OCSP stapling telemetry) to beta
User impact if declined: no OCSP stapling telemetry on beta
Testing completed (on m-c, etc.): landed in 28, no problems since then
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #8338892 -
Flags: approval-mozilla-beta?
Updated•11 years ago
|
Summary: validity period for stapled OCSP responses is too short → validity period for stapled OCSP responses is too short (SEC_ERROR_OCSP_OLD_RESPONSE with nginx)
Comment 16•11 years ago
|
||
(In reply to David Keeler (:keeler) from comment #15)
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): needed to uplift bug 887321 (OCSP
> stapling telemetry) to beta
Besides the OCSP stapling telemetry, this bug fixes a regression from bug 700698, because some websites now occasionally stop working in cases where they were consistently working previously. I, and several users, have run into websites that have this issue. The problem is widespread because it is a bug in nginx, a very widely-deployed web server.
> User impact if declined: no OCSP stapling telemetry on beta
Also, some websites that were working fine in Firefox 25 will work less consistently in Firefox 27+ (and Firefox 26).
> Testing completed (on m-c, etc.): landed in 28, no problems since then
Also, we have automated unit tests for this fix and related functionality to ensure that it is safe. (See the patch.)
> Risk to taking this patch (and alternatives if risky): low
> String or IDL/UUID changes made by this patch: none
I agree with these two.
Blocks: 700698
Updated•11 years ago
|
Attachment #8338892 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee | ||
Comment 17•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/171046c2495f
(landed with bug 887321, bug 943115, bug 938805, bug 932519, bug 934327, and bug 930209)
status-firefox27:
--- → fixed
status-firefox28:
--- → fixed
Comment 18•9 years ago
|
||
I did uninstall the old firefox and download new one without any change. And mozilla homepage and wikipedia should load correctly. but no SEC_ERROR_OCSP_OLD_RESPONSE
Assignee | ||
Comment 19•9 years ago
|
||
(In reply to sigvard from comment #18)
> I did uninstall the old firefox and download new one without any change. And
> mozilla homepage and wikipedia should load correctly. but no
> SEC_ERROR_OCSP_OLD_RESPONSE
Please file a new bug: https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=Security%3A%20PSM
You need to log in
before you can comment on or make changes to this bug.
Description
•