Closed Bug 933109 Opened 11 years ago Closed 11 years ago

"Unknown" OCSP responses are being cached in memory, preventing us from receiving a "Good" response

Categories

(NSS :: Libraries, defect, P1)

Tracking

(firefox25 wontfix, firefox26 wontfix, firefox27 fixed, firefox28 fixed, firefox29 fixed, firefox-esr17 wontfix, firefox-esr2427+ fixed, b2g18 affected, b2g-v1.1hd affected, b2g-v1.2 affected)

RESOLVED FIXED
3.15.4
Tracking Status
firefox25 --- wontfix
firefox26 --- wontfix
firefox27 --- fixed
firefox28 --- fixed
firefox29 --- fixed
firefox-esr17 --- wontfix
firefox-esr24 27+ fixed
b2g18 --- affected
b2g-v1.1hd --- affected
b2g-v1.2 --- affected

People

(Reporter: briansmith, Assigned: briansmith)

References

(Blocks 1 open bug)

Details

(Whiteboard: [see links to user reports in comment 4][qa-])

Attachments

(3 files, 5 obsolete files)

See bug 745747 comment 24 and whatever discussion follows that.

We need to make sure that if we've received an "Unknown" response and cached it, that when we do OCSP fetching we try to fetch a response again, even if the "Unknown" response hasn't expired yet. If the fetch fails, then return SEC_ERROR_OCSP_UNKNOWN_CERT.

If we don't do this, then some sites will get DoS'd temporarily when they first get issued a cert and the CA's OCSP responder returns "Unknown" for that cert due to not (yet) being aware of its issuance. This is happening surprisingly often in the real world, based on reports from Twitter and Hacker News that I've seen. It is happening more frequently now because of the baseline requirements requiring CAs to return "Unknown" for unknown certs, where many of them were previously returning "Good" for unknown certs. It is especially bad because SEC_ERROR_OCSP_UNKNOWN_CERT is not overridable.

We should fix this soon if it is at all reasonable. To fix it right away in Firefox would mean to implement what I suggest in bug 745747 comment 24 (or similar) in NSS's classic cert verification code. For EV, we'd have to do the same for libpkix. It would be more important to do the classic mode fix first, since most certs are not EV.
Sid, can you suggest somebody to work on this? I have too much high-priority stuff on my plate right now. Thanks.
Flags: needinfo?(sstamm)
Keeler offered to take this.
Assignee: nobody → dkeeler
Status: NEW → ASSIGNED
Flags: needinfo?(sstamm)
It's not a PSM patch, but for the time being, perhaps the most effective solution would consist of not caching unknown responses in any case (neither from POST replies nor stapled ones), as illustrated by the attached patch. cert_RememberOCSPProcessingFailure will then prevent further requests for that cert for 1h by default.
Assignee: dkeeler → brian
See Also: → 943815
The fix for this bug may also need to handle the "extended revoked" status in a similar way. See bug 943815 for a description of "extended revoked."
Depends on: 943952
Comment on attachment 8335124 [details] [diff] [review]
Treat replies with "unknown" status as temporary failures

(In reply to Kaspar Brand from comment #6)
> Created attachment 8335124 [details] [diff] [review]
> Treat replies with "unknown" status as temporary failures

This patch probably doesn't behave as you had intended.

Firefox runs in relaxed OCSP mode. (Ignoring OCSP network errors.)

When executing for the first time, the OCSP code will return a failure (because of the unknown status).

When executing again (e.g. when reloading the page), the cache entry created by cert_RememberOCSPProcessingFailure will be found. However, because of the relaxed OCSP mode, the remembered failure will be ignored.

As a result, I believe, this patch has the effect to ignore the unknown status and proceed anyway (at least on the second and subsequent load attempts).
Attachment #8335124 - Flags: review-
I think a better patch, with less unexpected side effects, would be:

Cache the unknown status response, but cache it for a very short amount of time. Five minutes?
It seems the logic in PKIX_RevocationChecker_Check is stricter than the classic code. Even with an unknown status locally cached, with the cache item still fresh, it enforces an OCSP retrieval from network anyway, because the cache item is treated as having an invalid status.
Attached patch patch v3, work in progress (obsolete) — Splinter Review
Similar issues affected GlobalSign (and thus CloudFlare, SoundCloud, etc.) today. So, even more reason why we need to make sure this patch gets into Firefox 27.
There is no working patch yet.
Component: Security: PSM → Libraries
Product: Core → NSS
Target Milestone: mozilla28 → 3.15.4
Version: Trunk → trunk
Attached patch ocsp-unknown-caching-fix.patch (obsolete) — Splinter Review
This patches makes the following changes:

1. 'Unknown' responses are always considered stale.

2. When we encounter a stale 'Unknown' or 'Revoked' response, we will continue to try to fetch a newer one like before. However, if that fetch fails, and hard fail mode is not enabled, then we will fall back to the stale 'Unknown' or 'Revoked' response.

3. We will now never replace an 'Unknown' or 'Revoked' response with a negative (failure) cache entry like before; instead, an 'Unknown' or 'Revoked' cache entry can only be replaced by a valid signed entry.
Attachment #8335124 - Attachment is obsolete: true
Attachment #8339540 - Attachment is obsolete: true
Attachment #8350814 - Flags: review?(wtc)
I had some trouble doing the testing with the NSS OCSP testing code that Kai started in this bug.

For now, I have written these Gecko tests to verify that the behavior is as we intend. The tests verify that:

1. Unknown responses are still put into the cache.
2. We use cached Unknown responses when an OCSP fetch fails.
3. A newer 'Good' response fetched from the network will take precedence over an existing 'Unknown' response in the cache.
4. A newer 'Good' response fetched from the network will replace an existing 'Unknown' response in the cache and that the newly-cached 'Good' response will be used to satisfy future requests.
Attachment #8350816 - Flags: review?(dkeeler)
Comment on attachment 8350814 [details] [diff] [review]
ocsp-unknown-caching-fix.patch

Review of attachment 8350814 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc. The patch seems reasonable, but it may be prudent to review it more.
I'll try to review it again tomorrow. You can check it in first.

::: lib/certhigh/ocsp.c
@@ +758,5 @@
> +
> +    /* Work around broken OCSP responders that return unknown responses for
> +     * certificates, especially certificates that were just recently issued.
> +     * Forcing cacheItem->nextFetchAttemptTime to zero will cause us to attempt
> +     * to update the cache entry every time we look at it.

The comment says "forcing cacheItem->nextFetchAttemptTime to zero", but
the code below doesn't do that. Is the comment wrong? Or does the comment
describe code in some other function?

@@ +761,5 @@
> +     * Forcing cacheItem->nextFetchAttemptTime to zero will cause us to attempt
> +     * to update the cache entry every time we look at it.
> +     */
> +    if (fresh && cacheItem->certStatusArena &&
> +        cacheItem->certStatus.certStatusType == ocspCertStatus_unknown) {

Why is it necessary to test cacheItem->certStatusArena?

@@ +765,5 @@
> +        cacheItem->certStatus.certStatusType == ocspCertStatus_unknown) {
> +        fresh = PR_FALSE;
> +    }
> +
> +    OCSP_TRACE(("OCSP ocsp_IsCacheItemFresh: %d\n", fresh));

Move this OCSP_TRACE statement outside PR_ExitMonitor.

@@ +4981,2 @@
>              rv = SECSuccess;
> +        } else if (ocsp_IsCacheItemFresh(cacheItem)) {

Nit: we can call ocsp_IsCacheItemFresh(cacheItem) at line 4973 and save its return value in a local variable.

@@ +4983,2 @@
>              /*
>               * No status cached, the previous attempt failed.

This comment needs to be updated because the condition here is now "no status cached, and cache item is fresh".

@@ +4997,1 @@
>          }

Should we add an "else" branch here and set *cacheFreshness to ocspStale?

Or we can just do
    *cacheFreshness = ocsp_IsCacheItemFresh(cacheItem) ? ocspFresh : ocspStale;
at line 4973.

::: lib/certhigh/ocspi.h
@@ +40,5 @@
>                                   PRTime time,
>                                   PRBool addServiceLocator,
>                                   CERTCertificate *signerCert);
>  
> +typedef enum { ocspFresh, ocspStale, ocspMissing } OCSPFreshness;

Nit: the enums will get assigned the values 0, 1, 2. It would be nice to let ocspMissing get the 0 value, by listing it first.
Attachment #8350814 - Flags: review?(wtc) → review+
Comment on attachment 8350816 [details] [diff] [review]
unknown-caching-tests.patch for Gecko

Review of attachment 8350816 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, but I'm surprised the tests in test_ocsp_stapling_expired.js need to be removed.

::: security/manager/ssl/tests/unit/test_ocsp_stapling_expired.js
@@ -63,5 @@
>    // server staples an OCSP response with a recent signature but with an
>    // out-of-date validity period. The certificate has not expired.
>    add_ocsp_test("ocsp-stapling-expired.example.com", Cr.NS_OK,
> -                ocspResponseGood);
> -  add_ocsp_test("ocsp-stapling-expired-fresh-ca.example.com", Cr.NS_OK,

Why do your changes affect the tests in this file?

@@ -83,2 @@
>    add_test(function() { ocspResponder.stop(run_next_test); });
> -  add_test(check_ocsp_stapling_telemetry);

I'm almost certain this doesn't need to be removed - just update the expected telemetry values, below.

::: security/manager/ssl/tests/unit/test_ocsp_unknown_caching.js
@@ +14,5 @@
> +  let ocspResponder = new HttpServer();
> +  ocspResponder.registerPrefixHandler("/", function(request, response) {
> +    ++fetchCount;
> +
> +    dump("fetchCount: " + fetchCount + "\n");

dump isn't handled by the test infrastructure output - use do_print if you think it's important that this information be available (and wherever else you use dump)

@@ +81,5 @@
> +                      clearSessionCache);
> +  add_test(function() { do_check_eq(fetchCount, 2); run_next_test(); });
> +
> +  add_test(function() { ocspResponder.stop(run_next_test); run_next_test(); });
> +

Do you think it's useful to check the stapling telemetry at the end of the test?
Attachment #8350816 - Flags: review?(dkeeler) → review-
David, it was a mistake to include the changes to OCSPCommon.cpp and test_ocsp_stapling_expired.js in my patch. I had made those changes when I was trying to diagnose a bug in my patch. I have removed those changes from the patch.

I updated the patch to use do_print() instead of dump().

I did not add the OCSP stapling telemetry to the patch. Tests should try to test one thing when possible. I think we have enough testing of the telemetry.
Attachment #8350855 - Flags: review?(dkeeler)
Attachment #8350816 - Attachment is obsolete: true
Comment on attachment 8350814 [details] [diff] [review]
ocsp-unknown-caching-fix.patch

Review of attachment 8350814 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/certhigh/ocsp.c
@@ +758,5 @@
> +
> +    /* Work around broken OCSP responders that return unknown responses for
> +     * certificates, especially certificates that were just recently issued.
> +     * Forcing cacheItem->nextFetchAttemptTime to zero will cause us to attempt
> +     * to update the cache entry every time we look at it.

My patch originally did force cacheItem->nextFetchAttemptTime to zero, which probably has the same effect, but I changed it to work the way the patch currently works when debugging the patch. I think the current way is clearer.

@@ +761,5 @@
> +     * Forcing cacheItem->nextFetchAttemptTime to zero will cause us to attempt
> +     * to update the cache entry every time we look at it.
> +     */
> +    if (fresh && cacheItem->certStatusArena &&
> +        cacheItem->certStatus.certStatusType == ocspCertStatus_unknown) {

My understanding is that the members of cacheItem->certStatus are undefined when certStatusArena is NULL, based on how other code that accesses certStatusArena is written and documented.

@@ +765,5 @@
> +        cacheItem->certStatus.certStatusType == ocspCertStatus_unknown) {
> +        fresh = PR_FALSE;
> +    }
> +
> +    OCSP_TRACE(("OCSP ocsp_IsCacheItemFresh: %d\n", fresh));

Instead, I removed the PR_EnterMonitor and PR_ExitMonitor calls, because this function is always called with the monitor entered already.

@@ +4981,2 @@
>              rv = SECSuccess;
> +        } else if (ocsp_IsCacheItemFresh(cacheItem)) {

I did something very similar.

@@ +4983,2 @@
>              /*
>               * No status cached, the previous attempt failed.

I believe nothing has changed here. This block was previously already guarded by ocsp_IsCacheItemFresh(cacheItem). Please see if the new version of the code is clearer.

@@ +4997,1 @@
>          }

Effectively, that is what I have done.

::: lib/certhigh/ocspi.h
@@ +40,5 @@
>                                   PRTime time,
>                                   PRBool addServiceLocator,
>                                   CERTCertificate *signerCert);
>  
> +typedef enum { ocspFresh, ocspStale, ocspMissing } OCSPFreshness;

Done.
Review comments addressed.
Attachment #8350814 - Attachment is obsolete: true
Attachment #8350858 - Flags: review?(wtc)
Comment on attachment 8350858 [details] [diff] [review]
ocsp-unknown-caching-fix.patch [v2]

Review of attachment 8350858 [details] [diff] [review]:
-----------------------------------------------------------------

I reviewed the patch again, this time very carefully.
I understand the patch much better now.

I have some new comments.

::: lib/certhigh/ocsp.c
@@ +800,5 @@
> +     * response to replace it.
> +     */
> +    if (!single && cacheItem && cacheItem->certStatusArena &&
> +        (cacheItem->certStatus.certStatusType == ocspCertStatus_revoked ||
> +         cacheItem->certStatus.certStatusType == ocspCertStatus_unknown)) {

Please explain why we should skip the
ocsp_FreshenCacheItemNextFetchAttemptTime call at the end of the
function. I agree it is OK to skip the ocsp_CheckCacheSize call.

I don't need a detailed explanation. A simple confirmation that
you considered this issue would be good enough.

@@ +4946,5 @@
>   * in a failure, missingResponseError shows the error code of
>   * that failure.
>   */
>  SECStatus
> +ocsp_GetCachedOCSPResponseStatus(CERTOCSPCertID *certID,

Please update the comment for this function. In particular,
"Return value SECFailure means: not found or not fresh" is no
longer true.

@@ +4976,5 @@
>              if (*rvOcsp != SECSuccess) {
>                  *missingResponseError = PORT_GetError();
>              }
>              rv = SECSuccess;
> +        } else if (*cacheFreshness == ocspFresh) {

I now think this should be changed back to just "} else {".

This way, all cache items will get the same treatment whether they
contain a certStatus or an error. Right now the patch leaves this
code in a half-changed state, making it hard to understand.

I don't understand this code as well as you do, so I may have
missed something subtle.

@@ +5088,5 @@
>                                         &certIDWasConsumed, 
>                                         &rvOcsp);
>      if (rv != SECSuccess) {
> +        if (ocsp_FetchingFailureIsVerificationFailure()) {
> +            rvOcsp = SECFailure;

Should we call PORT_SetError here? I guess
ocsp_GetOCSPStatusFromNetwork has set the error code. If so, there
is still a small risk that ocsp_FetchingFailureIsVerificationFailure
may modify the error code.

@@ +5091,5 @@
> +        if (ocsp_FetchingFailureIsVerificationFailure()) {
> +            rvOcsp = SECFailure;
> +        } else if (cachedResponseFreshness == ocspStale &&
> +                   (cachedErrorCode == SEC_ERROR_OCSP_UNKNOWN_CERT ||
> +                    cachedErrorCode == SEC_ERROR_REVOKED_CERTIFICATE)) {

Do we also need to check that rv == SECSuccess &&
rvOcsp != SECSuccess? Or is that implied by a nonzero
cachedErrorCode?

::: lib/certhigh/ocspi.h
@@ +40,5 @@
>                                   PRTime time,
>                                   PRBool addServiceLocator,
>                                   CERTCertificate *signerCert);
>  
> +typedef enum { ocspMissing, ocspFresh, ocspStale } OCSPFreshness;

We probably should name this enum type OCSPCacheFreshness, with values ocspCacheMissing, ocspCacheFresh, and ocspCacheStale.

The reason is that "ocspMissing" is ambiguous. It can mean "the cache item is missing" or "we have a cache item but the cert status is missing (i.e., we have an error entry)".
Attachment #8350858 - Flags: review?(wtc)
Attachment #8350858 - Flags: review+
Comment on attachment 8350814 [details] [diff] [review]
ocsp-unknown-caching-fix.patch

Review of attachment 8350814 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/certhigh/ocsp.c
@@ +4983,2 @@
>              /*
>               * No status cached, the previous attempt failed.

You are right. The comment "No status cached" refers to the fact
that cacheItem->certStatusArena is NULL, which is still true.
Comment on attachment 8350858 [details] [diff] [review]
ocsp-unknown-caching-fix.patch [v2]

Review of attachment 8350858 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/certhigh/ocsp.c
@@ +800,5 @@
> +     * response to replace it.
> +     */
> +    if (!single && cacheItem && cacheItem->certStatusArena &&
> +        (cacheItem->certStatus.certStatusType == ocspCertStatus_revoked ||
> +         cacheItem->certStatus.certStatusType == ocspCertStatus_unknown)) {

ocsp_FreshenCacheItemNextFetchAttemptTime would cause us to extend the amount of time we consider the possibly-already-stale existing entry to be fresh upon failure to update it. AFAICT, that's not what we want; we want to keep trying to update it on failure.

@@ +4946,5 @@
>   * in a failure, missingResponseError shows the error code of
>   * that failure.
>   */
>  SECStatus
> +ocsp_GetCachedOCSPResponseStatus(CERTOCSPCertID *certID,

Done. SECFailure now means that the function failed in a fatal way. It will now return SECSuccess for missing, fresh, or stale entries. Please see the new documentation in the new patch.

@@ +4976,5 @@
>              if (*rvOcsp != SECSuccess) {
>                  *missingResponseError = PORT_GetError();
>              }
>              rv = SECSuccess;
> +        } else if (*cacheFreshness == ocspFresh) {

The full content of the else block is:

  } else if (*cacheFreshness == ocspFresh) {
      /*
       * No status cached, the previous attempt failed.
       * If OCSP is required, we never decide based on a failed attempt 
       * However, if OCSP is optional, a recent OCSP failure is
       * an allowed good state.
       */
      if (!ignoreGlobalOcspFailureSetting &&
          OCSP_Global.ocspFailureMode == 
              ocspMode_FailureIsNotAVerificationFailure) {
          rv = SECSuccess;
          *rvOcsp = SECSuccess;
      }
      *missingResponseError = cacheItem->missingResponseError;
  }

Note that we must NOT set *rvOCSP = SECSuccess when the error entry is stale. Otherwise, a failure to retrieve an OCSP response would permanently disable OCSP fetching for that certificate.

Consequently, I did not make the change you suggested. However, I did move the (*cacheFreshness == ocspFresh) test from the outer else clause to the inner block's if:

            if (*cacheFreshness == ocspFresh &&
                !ignoreGlobalOcspFailureSetting &&
                OCSP_Global.ocspFailureMode == 
                    ocspMode_FailureIsNotAVerificationFailure) {
                *rvOcsp = SECSuccess;
            }

I also factored out the (rv = SECSuccess) statement.

@@ +5088,5 @@
>                                         &certIDWasConsumed, 
>                                         &rvOcsp);
>      if (rv != SECSuccess) {
> +        if (ocsp_FetchingFailureIsVerificationFailure()) {
> +            rvOcsp = SECFailure;

I made this change (defensive programming).

@@ +5091,5 @@
> +        if (ocsp_FetchingFailureIsVerificationFailure()) {
> +            rvOcsp = SECFailure;
> +        } else if (cachedResponseFreshness == ocspStale &&
> +                   (cachedErrorCode == SEC_ERROR_OCSP_UNKNOWN_CERT ||
> +                    cachedErrorCode == SEC_ERROR_REVOKED_CERTIFICATE)) {

This is much clearer in the new patch, because we know the return value from ocsp_GetCachedOCSPResponseStatus will be SECSuccess if we've gotten to this point. Also, we know that ocspResponseFreshness was either ocspMissing or ocspStale because if it were ocspFresh then we would have returned early before attempting the fetch.

::: lib/certhigh/ocspi.h
@@ +40,5 @@
>                                   PRTime time,
>                                   PRBool addServiceLocator,
>                                   CERTCertificate *signerCert);
>  
> +typedef enum { ocspMissing, ocspFresh, ocspStale } OCSPFreshness;

I think "Cache" is redundant because it is implied by "freshness". What else would we be checking freshness of, if not a cache entry?

In the new patch, ocspMissing is now unambiguous; it means there was no cache entry.
Attachment #8350858 - Attachment is obsolete: true
Attachment #8350913 - Flags: review?(wtc)
Comment on attachment 8350913 [details] [diff] [review]
ocsp-unknown-caching-fix.patch [v3]

Review of attachment 8350913 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc. I didn't read your response to my previous review comments.
I just review this patch directly.

You just need to fix the comment problems. The patch needs to converge
if we want it in NSS 3.15.4. I think you wrote a very good patch.
This is not an easy bug.

::: lib/certhigh/ocsp.c
@@ +4936,5 @@
>  {
>      return ocsp_CertHasGoodStatus(single->certStatus, time);
>  }
>  
> +/* SECFailure means a fatal error occurred.

"fatal error" is not clear. I can say "certificate revoked" is a
fatal error, but that's not what you meant. Perhaps you can just
say it's an invalid argument (SEC_ERROR_INVALID_ARGS).

@@ +4946,5 @@
>   * in a failure, missingResponseError shows the error code of
>   * that failure.
> + * cacheFreshness is ocspCacheMissing if no entry was found,
> + *                   ocspCacheFresh if a fresh entry was found, or
> + *                   ocspCacheStale if a stale entry was found.

The three enum values in this comment should be changed back
to ocspMissing, ocspFresh, and ocspStale. No need to rename them.

@@ +4990,2 @@
>                  OCSP_Global.ocspFailureMode == 
>                      ocspMode_FailureIsNotAVerificationFailure) {

I guess you think ignoring OCSP failure should require a
fresh cache item. This seems reasonable. It also matches the
comment "if OCSP is optional, a recent OCSP failure is an
allowed good state". So at least we don't need to update the
comment.

Note that whether we check *cacheFreshness == ocspFresh or not
here doesn't matter to the call site in CERT_CheckOCSPStatus
because it will use the rvOcsp output argument only when
cachedResponseFreshness == ocspFresh.

NOTE: the call site in CERT_CacheOCSPResponseFromSideChannel
probably should not ignore a cached OCSP failure. I think that
call site should pass ignoreGlobalOcspFailureSetting=PR_TRUE.
But that is a pre-existing problem and you can ignore it in
this bug report.
Attachment #8350913 - Flags: review?(wtc) → review+
(In reply to Wan-Teh Chang from comment #25)
> "fatal error" is not clear. I can say "certificate revoked" is a
> fatal error, but that's not what you meant. Perhaps you can just
> say it's an invalid argument (SEC_ERROR_INVALID_ARGS).

Done.

> The three enum values in this comment should be changed back
> to ocspMissing, ocspFresh, and ocspStale. No need to rename them.

Done.

> NOTE: the call site in CERT_CacheOCSPResponseFromSideChannel
> probably should not ignore a cached OCSP failure. I think that
> call site should pass ignoreGlobalOcspFailureSetting=PR_TRUE.
> But that is a pre-existing problem and you can ignore it in
> this bug report.

I filed bug 952808 and checked in the fix for it in a separate patch.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment on attachment 8350855 [details] [diff] [review]
unknown-caching-tests.patch for Gecko [v2]

Review of attachment 8350855 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.

::: security/manager/ssl/tests/unit/test_ocsp_unknown_caching.js
@@ +3,5 @@
> +// License, v. 2.0. If a copy of the MPL was not distributed with this
> +// file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +"use strict";
> +
> +let fetchCount = 0;

suggestion: I find it useful to prefix global variables with "g" (so "gFetchCount" in this case)
Attachment #8350855 - Flags: review?(dkeeler) → review+
> @@ +765,5 @@
> > +        cacheItem->certStatus.certStatusType == ocspCertStatus_unknown) {
> > +        fresh = PR_FALSE;
> > +    }
> > +
> > +    OCSP_TRACE(("OCSP ocsp_IsCacheItemFresh: %d\n", fresh));
>
> Instead, I removed the PR_EnterMonitor and PR_ExitMonitor calls, because this function is
> always called with the monitor entered already.


It would be better to keep calls to Monitors. It reminds the programmer that protection is necessary. If you remove the call, future programmers might get into trouble, if they add calls to the function, without noticing the requirement for locking.
(In reply to Brian Smith from comment #14)
> 
> 2. When we encounter a stale 'Unknown' or 'Revoked' response, we will
> continue to try to fetch a newer one like before. However, if that fetch
> fails, and hard fail mode is not enabled, then we will fall back to the
> stale 'Unknown' or 'Revoked' response.
> 
> 3. We will now never replace an 'Unknown' or 'Revoked' response with a
> negative (failure) cache entry like before; instead, an 'Unknown' or
> 'Revoked' cache entry can only be replaced by a valid signed entry.


It worries me that you allow a response with a good status to override a previously received revoked status.

I'm worried about replay attacks during the hours after a revocation took place.

An attacker who hacked a compromised cert might replay a good status message (e.g. stapled into a MITMed connection).

In my understanding of past discussions in NSS meetings, there was consensus that status revoked should always stick.
(In reply to Kai Engert (:kaie) from comment #29)
> It would be better to keep calls to Monitors. It reminds the programmer that
> protection is necessary. If you remove the call, future programmers might
> get into trouble, if they add calls to the function, without noticing the
> requirement for locking.

Entering/exiting monitors has a cost, so it is better for performance to avoid doing so when reasonable. Also, entering/exiting monitors tends to make the code harder to read by encouraging us to use "goto" and friends.

Before removing the monitor usage in that function, I looked at other functions and found ones like ocsp_SetCacheItemResponse that also do not enter/exit the monitor. So, I think the reasoning I used to justify my change is already being used in ocsp.c.
(In reply to Kai Engert (:kaie) from comment #30)
> (In reply to Brian Smith from comment #14)
> > 
> > 2. When we encounter a stale 'Unknown' or 'Revoked' response, we will
> > continue to try to fetch a newer one like before. However, if that fetch
> > fails, and hard fail mode is not enabled, then we will fall back to the
> > stale 'Unknown' or 'Revoked' response.
> > 
> > 3. We will now never replace an 'Unknown' or 'Revoked' response with a
> > negative (failure) cache entry like before; instead, an 'Unknown' or
> > 'Revoked' cache entry can only be replaced by a valid signed entry.
> 
> It worries me that you allow a response with a good status to override a
> previously received revoked status.

In the PKIX model, some revocations can be undone. In particular, a certificate can be revoked with "on hold" status and then un-revoked. Also, in RFC 6960 there is the "extended revoked" status where 

> An attacker who hacked a compromised cert might replay a good status message
> (e.g. stapled into a MITMed connection).

Stapling an older Good response will not work, because CERT_CacheOCSPResponseFromSideChannel looks like this:

    rv = ocsp_GetDecodedVerifiedSingleResponseForID(handle, certID, cert,
						    time, pwArg,
						    encodedResponse,
						    &decodedResponse,
						    &singleResponse);
    if (rv == SECSuccess) {
	rvOcsp = ocsp_SingleResponseCertHasGoodStatus(singleResponse, time);
	/* Cache any valid singleResponse, regardless of status. */
	ocsp_CacheSingleResponse(certID, singleResponse, &certIDWasConsumed);

ocsp_CacheSingleResponse calls ocsp_


> 
> In my understanding of past discussions in NSS meetings, there was consensus
> that status revoked should always stick.
[Sorry, accidentally submitted my comment too early]

...

CERT_CacheOCSPResponseFromSideChannel calls ocsp_CacheSingleResponse which calls ocsp_CreateOrUpdateCacheEntry which does check that the new OCSP response is newer than the older one (based on thisUpdate). So, stapling a Good OCSP response will not allow an attacker to override a cached Revoked response unless the Good response is newer. And, if the Good response is newer then the certificate should be considered Good, on the presumption that the certificate was revoked either as "on hold" or "extended revoked / unknown certificate".

However, I do think that there is a problem in ocsp_GetOCSPStatusFromNetwork. It calls ocsp_CacheSingleResponse to cache the new response, which means we'll never replace a newer entry with an older one. However, ocsp_GetOCSPStatusFromNetwork doesn't return the newer (cached) status instead of the older (just-fetched) status, which seems like a legit bug. I filed bug 956838 for this.
No longer depends on: 943952
Comment on attachment 8350855 [details] [diff] [review]
unknown-caching-tests.patch for Gecko [v2]

https://hg.mozilla.org/integration/mozilla-inbound/rev/75e5396d0847

Tests checked into mozilla-inbound. I did s/fetchCount/gFetchCount/g
Attachment #8350855 - Flags: checked-in+
Thanks Wes.

I relanded this with the missing fail-if = os == 'android' in the xpcshell.ini:

https://hg.mozilla.org/integration/mozilla-inbound/rev/d59351efd413
Flags: needinfo?(brian)
Flags: in-testsuite+
Given in-testsuite coverage this fix will not be manually verified by QA. If you believe this warrants extra QA attention please nominate for testing by removing this whiteboard tag and adding the verifyme keyword. Please also provide any details you have that may inform our testing.
Whiteboard: [see links to user reports in comment 4] → [see links to user reports in comment 4][qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: