Closed Bug 915932 Opened 6 years ago Closed 6 years ago

Cache OCSP responses received from the network when insanity::pkix is the certificate verifier

Categories

(Core :: Security: PSM, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: briansmith, Assigned: keeler)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 8 obsolete files)

14.86 KB, patch
cviecco
: review+
Details | Diff | Splinter Review
11.07 KB, patch
cviecco
: review+
Details | Diff | Splinter Review
47.77 KB, patch
cviecco
: review+
Details | Diff | Splinter Review
4.29 KB, patch
cviecco
: review+
Details | Diff | Splinter Review
+++ This bug was initially created as a clone of Bug #915931 +++
Assignee: nobody → brian
No longer depends on: 871954
Assignee: brian → dkeeler
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla30
I highly suggest that this be implemented within NSSCertDBTrustDomain.cpp and not within insanity::pkix itself.

Also, currently security/certverifier/* doesn't depend on anything except insanity::pkix, NSS, and NSPR, and it would be great to keep it that way. However, if it is much simpler to use Gecko stuff to get this bug finished sooner, that is OK with me--especially if only things from mfbt/* are used.

It isn't important to implement caching of failures now (caching the fact that the OCSP responder failed to give a resaonable response, so that we avoid hitting it for a while). It is more urgent to implement caching of good, revoked, and unknown responses, and I suggest that we prioritize that first. I'm not sure if we should bother with the caching of failures at all, but if do need to then we should that in a separate bug that does NOT block bug 915930.
No longer blocks: 921885
Priority: -- → P2
Attached patch patch (obsolete) — Splinter Review
For the telemetry parts of the tests to pass, this requires the patch I'm going to post in bug 969048.
Attachment #8382608 - Flags: review?(brian)
Comment on attachment 8382608 [details] [diff] [review]
patch

(Also, just to be clear, this patch depends on the policy and ev patches that are inbound right now.)
Comment on attachment 8382608 [details] [diff] [review]
patch

Review of attachment 8382608 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/certverifier/OCSPCache.cpp
@@ +107,5 @@
> +  }
> +  return PLDHashOperator::PL_DHASH_NEXT;
> +}
> +
> +#define OCSP_CACHE_MAX_ENTRIES 25

Brian - I just saw your comment in another review about using const over #define, so I'll be sure to fix this.
Comment on attachment 8382608 [details] [diff] [review]
patch

Review of attachment 8382608 [details] [diff] [review]:
-----------------------------------------------------------------

Reviewed everything except OCSPCache.cpp and OCSPCache.h.

::: security/certverifier/NSSCertDBTrustDomain.cpp
@@ +39,5 @@
>  } // unnamed namespace
>  
>  NSSCertDBTrustDomain::NSSCertDBTrustDomain(SECTrustType certDBTrustType,
>                                             OCSPFetching ocspFetching,
> +                                           OCSPCache* ocspCache,

NIT: Better to have it be a reference instead of a pointer?

@@ +211,5 @@
> +             ("NSSCertDBTrustDomain: cached OCSP response: good"));
> +      return SECSuccess;
> +    }
> +    cachedResponseErrorCode = PR_GetError();
> +    if (cachedResponseErrorCode == SEC_ERROR_REVOKED_CERTIFICATE) {

What about "unknown"? Add a comment about it.

@@ +217,5 @@
> +             ("NSSCertDBTrustDomain: cached OCSP response: revoked"));
> +      return rv;
> +    }
> +    if (cachedResponseErrorCode == SEC_ERROR_OCSP_OLD_RESPONSE) {
> +      mOCSPCache->Remove(cert);

We shouldn't remove an expired Unknown response until we have another Good/Revoked/Unknown response to replace it.

@@ +309,5 @@
> +       "VerifyEncodedOCSPResponse"));
> +    return SECSuccess;
> +  }
> +
> +  if (mOCSPFetching != FetchOCSPForDVSoftFail) {

Don't we want to cache the response if it was Unknown or Revoked here?

::: security/certverifier/NSSCertDBTrustDomain.h
@@ +54,5 @@
>      FetchOCSPForEV = 3,
>      LocalOnlyOCSPForEV = 4,
>    };
>    NSSCertDBTrustDomain(SECTrustType certDBTrustType, OCSPFetching ocspFetching,
> +                       OCSPCache* ocspCache, void* pinArg);

NIT: reference instead of pointer?

@@ +78,5 @@
>  
>  private:
>    const SECTrustType mCertDBTrustType;
>    const OCSPFetching mOCSPFetching;
> +  OCSPCache* mOCSPCache; // non-owning!

NIT: Reference?

::: security/insanity/include/insanity/pkix.h
@@ +104,5 @@
>                                    const CERTCertificate* issuerCert);
>  
> +// The optional parameter validUntil will be the time until which the encoded
> +// response is considered trustworthy. If the response is not trustworthy,
> +// validUntil will be 0.

Add a comment that only successful responses (Good/Unknown/Revoked) are considered trustworthy.

::: security/manager/ssl/tests/unit/head_psm.js
@@ +21,5 @@
>  const SEC_ERROR_BASE = Ci.nsINSSErrorsService.NSS_SEC_ERROR_BASE;
>  const SSL_ERROR_BASE = Ci.nsINSSErrorsService.NSS_SSL_ERROR_BASE;
>  
>  // Sort in numerical order
> +const SEC_ERROR_BAD_DER                                 = SEC_ERROR_BASE +   9;

I'm worried about this, because SEC_ERROR_BAD_DER shouldn't leak out from OCSP response verification. Instead SEC_ERROR_OCSP_MALFORMED_RESPONSE should be returned. Otherwise the caller to BuildCertChain won't know if the bad DER is in the OCSP response or in the certificate.

::: security/manager/ssl/tests/unit/test_ocsp_caching.js
@@ +106,5 @@
>    add_connection_test("ocsp-stapling-none.example.com", Cr.NS_OK,
>                        clearSessionCache);
>    add_test(function() { do_check_eq(gFetchCount, 1); run_next_test(); });
>  
> +  // TODO: implement this for insanity

Please add (bug XXXXXX) to the TODO.

::: security/manager/ssl/tests/unit/test_ocsp_stapling.js
@@ +95,5 @@
>    add_ocsp_test("ocsp-stapling-unknown.example.com",
>                  getXPCOMStatusFromNSS(SEC_ERROR_OCSP_UNKNOWN_CERT), true);
>    add_ocsp_test("ocsp-stapling-good-other.example.com",
> +                getXPCOMStatusFromNSS(
> +                  useInsanity ? SEC_ERROR_BAD_DER

Please file a follow-up bug to fix the error code returned so that it is not SEC_ERROR_BAD_DER.

@@ +111,5 @@
>    );
>    add_ocsp_test("ocsp-stapling-empty.example.com",
> +                getXPCOMStatusFromNSS(
> +                  useInsanity ? SEC_ERROR_BAD_DER
> +                              : SEC_ERROR_OCSP_MALFORMED_RESPONSE), true);

Ditto.
Comment on attachment 8382608 [details] [diff] [review]
patch

Review of attachment 8382608 [details] [diff] [review]:
-----------------------------------------------------------------

Initial review focused only the OCSPCache.*

::: security/certverifier/OCSPCache.cpp
@@ +38,5 @@
> +  SECItem* tmp = ::SECITEM_AllocItem(nullptr, nullptr, aCert->derIssuer.len +
> +                                                       aCert->serialNumber.len);
> +  if (!tmp) {
> +    return 0;
> +  }

I dislike mallocs, could this be turned into a stack variable and be OK of caching up to 2K worth of data?

@@ +49,5 @@
> +    return 0;
> +  }
> +  uint64_t retval = 0;
> +  memcpy(&retval, buf, sizeof(uint64_t));
> +  ScopedPtr<char, PORT_Free_string> cn(CERT_GetCommonName(&aCert->subject));

if (retval == 0 ) {
  retval = 1; // to distinguish between error and an unfortunate cert
}
?

@@ +67,5 @@
> +  if (certIDHash == 0) {
> +    return nullptr;
> +  }
> +
> +  ScopedPtr<char, PORT_Free_string> cn(CERT_GetCommonName(&aCert->subject));

put it behind  #ifdef PR_LOGGING?

@@ +75,5 @@
> +  if (!present) {
> +    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG,
> +           ("OCSPCache::Get: %s not in cache\n", cn.get()));
> +    return nullptr;
> +  }

you are not handling clashes in the hashtable. I would use store the complete fp in the entry and then recheck here.

@@ +94,5 @@
> +}
> +
> +// Find the least-recently used entry.
> +PLDHashOperator
> +FindLRUEntry(const uint64_t& certIDHash, OCSPCacheEntry& entry,

make it static

@@ +107,5 @@
> +  }
> +  return PLDHashOperator::PL_DHASH_NEXT;
> +}
> +
> +#define OCSP_CACHE_MAX_ENTRIES 25

I think this is too little. visiting 9 distinct ev sites could start the eviction process. I would make this 64 if not 128.

@@ +116,5 @@
> +{
> +  PR_ASSERT(aOCSPResponse);
> +  PR_ASSERT(aCert);
> +
> +  ScopedPtr<char, PORT_Free_string> cn(CERT_GetCommonName(&aCert->subject));

put this around #ifdef PR_LOGGING?

@@ +118,5 @@
> +  PR_ASSERT(aCert);
> +
> +  ScopedPtr<char, PORT_Free_string> cn(CERT_GetCommonName(&aCert->subject));
> +
> +  if (aValidUntil < PR_Now()) {

keep PR_Now in a local variable as you use it later.

@@ +125,5 @@
> +            aValidUntil));
> +    return;
> +  }
> +
> +  MutexAutoLock lock(mMutex);

move mutex to after getCertID (to where the critical section actually starts)

or even better, make the

@@ +139,5 @@
> +    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG,
> +           ("OCSPCache::Put: %s already in cache - replacing\n", cn.get()));
> +    SECITEM_FreeItem(entry.mOCSPResponse, true);
> +  } else {
> +    entry.mCertIDHash = certIDHash;

dont you have to do this too for the ispresent case?

@@ +148,5 @@
> +  }
> +  entry.mLastUsedTime = PR_Now();
> +  entry.mValidUntil = aValidUntil;
> +
> +  if (!present && mOCSPTable.Count() == OCSP_CACHE_MAX_ENTRIES) {

>= to be paranoid?

@@ +155,5 @@
> +    mOCSPTable.Enumerate(FindLRUEntry, &toRemove);
> +    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG,
> +           ("OCSPCache::Put: too full - evicting an entry\n"));
> +    SECITEM_FreeItem(toRemove.mOCSPResponse, true);
> +    mOCSPTable.Remove(toRemove.mCertIDHash);

I dont like that once you reach the limit you pay the enumeration penalty all the time. Could this be improved to remove X entries (where X is small?) (LRU-2 LRU-4?) maybe even delete up to X epired entries? Maybe put the whole flush_recent in a separate private function?

@@ +168,5 @@
> +OCSPCache::Remove(const CERTCertificate* aCert)
> +{
> +  PR_ASSERT(aCert);
> +
> +  MutexAutoLock lock(mMutex);

again move mutex to closer to where you actually use it.

::: security/certverifier/OCSPCache.h
@@ +20,5 @@
> +public:
> +  SECItem* mOCSPResponse;
> +  PRTime mLastUsedTime;
> +  PRTime mValidUntil;
> +  uint64_t mCertIDHash;

why do you need this here? I would replace this for the whole hash result to allow to have a much lesser degree of clashing on (get recheck against the whole fp, if you find a clash on sha384 congrats!)

@@ +28,5 @@
> +{
> +public:
> +  OCSPCache()
> +    : mMutex("OCSPCache-mutex")
> +    , mOCSPTable(8)

8 seems too small. But since we have no statistics on the number of sites it is OK.

@@ +42,5 @@
> +  SECItem* Get(const CERTCertificate* aCert, PLArenaPool* aArena = nullptr);
> +
> +  // Inserts or replaces a cached response corresponding to the given
> +  // certificate. Makes a copy of the data, so the caller is still
> +  // responsible for the memory backing aOCSPResponse.

confusing comment.

@@ +43,5 @@
> +
> +  // Inserts or replaces a cached response corresponding to the given
> +  // certificate. Makes a copy of the data, so the caller is still
> +  // responsible for the memory backing aOCSPResponse.
> +  void Put(const SECItem* aOCSPResponse, const CERTCertificate* aCert,

why void functions? I prefer functions that return error code so that at least I have the option to check for it.
Attachment #8382608 - Flags: review?(cviecco) → review-
Comment on attachment 8382608 [details] [diff] [review]
patch

Review of attachment 8382608 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/certverifier/NSSCertDBTrustDomain.cpp
@@ +174,4 @@
>    if (stapledOCSPResponse) {
>      PR_ASSERT(endEntityOrCA == MustBeEndEntity);
>      SECStatus rv = VerifyEncodedOCSPResponse(*this, cert, issuerCert, time,
> +                                             stapledOCSPResponse, &validUntil);

It seems like a good idea to write a VerifyAndMaybeCacheEncodedOCSPResponse method that calls VerifyEncodedOCSPResponse and then caches the response if we're supposed to cache it. That way, we avoid some of this duplicate  logic and make things clearer.

@@ +217,5 @@
> +             ("NSSCertDBTrustDomain: cached OCSP response: revoked"));
> +      return rv;
> +    }
> +    if (cachedResponseErrorCode == SEC_ERROR_OCSP_OLD_RESPONSE) {
> +      mOCSPCache->Remove(cert);

Expanding on my previous comment: does this give us the semantics we want? We need to verify the OCSP response, and if the OCSP response is a valid Revoked/Unknown response *in all ways except expiration time*, then we want to cache the response. But, otherwise, unless it is a 100% valid Good response, we want to reject the response.

In particular, it isn't clear to me that when VerifyEncodedOCSPResponse returns SEC_ERROR_OLD_RESPONSE, that all the other checks were done and succeeded. See the expiration-related changes in the patch for bug 973268 and also see the changes in bug 933109.

@@ +251,5 @@
>    ScopedPtr<char, PORT_Free_string>
>      url(CERT_GetOCSPAuthorityInfoAccessLocation(cert));
>  
>    if (!url) {
>      if (stapledOCSPResponse) {

What if cachedResponseErrorCode != 0 here?

@@ +279,5 @@
>    }
>  
>    const SECItem* response(CERT_PostOCSPRequest(arena.get(), url.get(),
>                                                 request));
>    if (!response) {

if (cachedResponseErrorCode != 0) here, that means that we know the cert is either revoked or unknown status, right? In that case, would it be better to return the "revoked" or "unknown" error code instead of whatever error code CERT_PostOCSPRequest returned? It isn't clear which is better.

::: security/certverifier/OCSPCache.cpp
@@ +1,5 @@
> +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> +/* vim: set ts=8 sts=2 et sw=2 tw=80: */
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */

Since this doesn't seem to depend on any Gecko stuff except nsDataHashtable, and it is brand new code, I think we should eventually move this to insanity::pkix itself, so the license would change to Apache 2.0. We can eventually replace the use of nsDataHashtable with a non-XPCOM data structure if/when we want to move the code. But, it is hard to change the license. In particular, I'm hoping to eventually share insanity::pkix with other software, particularly servers that haven't implemented OCSP stapling yet. This cache seems like it may become useful to them, if we were to do that.

@@ +29,5 @@
> +}
> +
> +// For the purposes of this cache, a certificate will be identified by the
> +// first 8 bytes of the SHA-384 hash of the DER encoding of the issuer's DN
> +// and the certificate's serial number.

An 8-byte identifier means that collisions are possible/likely. We should explain here how/why those collisions are not a security issue. That is, we should explain how/why a collision here isn't something that an attacker could use to replace a revoked/unknown response with a good response.

However, if we have such collision protection, couldn't we use a simpler/faster hash function? Why not just use the last 8 bytes of the serial number or something very simple/fast? We'd only need to use a secure hash function if we were trying to avoid storing the entire OCSP response in the cache.

@@ +49,5 @@
> +    return 0;
> +  }
> +  uint64_t retval = 0;
> +  memcpy(&retval, buf, sizeof(uint64_t));
> +  ScopedPtr<char, PORT_Free_string> cn(CERT_GetCommonName(&aCert->subject));

#ifdef PR_LOGGING
  if (PR_LOG_TEST(gCertVerifierLog, PR_LOG_DEBUG)) {
    ScopedPtr<char, PORT_Free_string> cn(CERT_GetCommonName(&aCert->subject));
    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG,
           ("OCSPCache::GetCertIDHash: %s -> %lu\n", cn.get(), retval));
  }
#endif

I suggest that you wrap this logging logic into a function to reduce duplicated code.

@@ +77,5 @@
> +           ("OCSPCache::Get: %s not in cache\n", cn.get()));
> +    return nullptr;
> +  }
> +
> +  PRTime now = PR_Now();

You should never call PR_Now() in this code. Instead, the caller needs to pass in the time to all of these functions. This way, all of the validation logic ends up using a single value of "now" for all checks.

@@ +81,5 @@
> +  PRTime now = PR_Now();
> +  if (entry.mValidUntil < now) {
> +    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG,
> +           ("OCSPCache::Get: %s expired - removing\n", cn.get()));
> +    mOCSPTable.Remove(certIDHash);

Shouldn't remove Revoked or Unknown entries until they are replaced with a newer entry.

@@ +107,5 @@
> +  }
> +  return PLDHashOperator::PL_DHASH_NEXT;
> +}
> +
> +#define OCSP_CACHE_MAX_ENTRIES 25

I agree that it seems too small. I would try to use the same size as we are currently using with the NSS OCSP cache.

@@ +151,5 @@
> +
> +  if (!present && mOCSPTable.Count() == OCSP_CACHE_MAX_ENTRIES) {
> +    OCSPCacheEntry toRemove;
> +    toRemove.mLastUsedTime = 0;
> +    mOCSPTable.Enumerate(FindLRUEntry, &toRemove);

This should only evict Good responses, not Unknown or Revoked responses. Or, at a minimum, Good entries should be evicted before Unknown and Revoked.

@@ +155,5 @@
> +    mOCSPTable.Enumerate(FindLRUEntry, &toRemove);
> +    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG,
> +           ("OCSPCache::Put: too full - evicting an entry\n"));
> +    SECITEM_FreeItem(toRemove.mOCSPResponse, true);
> +    mOCSPTable.Remove(toRemove.mCertIDHash);

Right. Eventually (quickly) Put() becomes an O(n) operation. If Put() is going to be an O(n) operation then why not just use an array that is sorted in MRU/LRU order? Especially if we're going to have <= 100 or 1,000 entries, I bet it would be faster, especially when you consider the memory savings and memory cache effects.

::: security/certverifier/OCSPCache.h
@@ +1,5 @@
> +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> +/* vim: set ts=8 sts=2 et sw=2 tw=80: */
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */

Same question regarding licensing and location for the header as for the .cpp.

@@ +23,5 @@
> +  PRTime mValidUntil;
> +  uint64_t mCertIDHash;
> +};
> +
> +class OCSPCache

Document thread safety.

@@ +38,5 @@
> +  // Returns a copy of the cached response corresponding to the given
> +  // certificate (i.e. the caller is responsible for the memory returned).
> +  // The memory will be allocated in the given PLArenaPool if specified.
> +  // Returns null if there is no such cached response.
> +  SECItem* Get(const CERTCertificate* aCert, PLArenaPool* aArena = nullptr);

Do we ever NOT pass in an arena? If we always pass in an arena then we might as well make it mandatory.

@@ +42,5 @@
> +  SECItem* Get(const CERTCertificate* aCert, PLArenaPool* aArena = nullptr);
> +
> +  // Inserts or replaces a cached response corresponding to the given
> +  // certificate. Makes a copy of the data, so the caller is still
> +  // responsible for the memory backing aOCSPResponse.

I don't think you need to include "so the caller is still responsible for the memory backing aOCSPresponse." That is assumed by default unless a function is documented to transfer ownership.

@@ +43,5 @@
> +
> +  // Inserts or replaces a cached response corresponding to the given
> +  // certificate. Makes a copy of the data, so the caller is still
> +  // responsible for the memory backing aOCSPResponse.
> +  void Put(const SECItem* aOCSPResponse, const CERTCertificate* aCert,

It should be void unless we are going to actually check the results, which we don't do. Since it is using nsDataHashtable, the process will crash on OOM so there are no other interesting errors to return.
Attachment #8382608 - Flags: review?(brian) → review-
Attached patch patch v2 (obsolete) — Splinter Review
Thanks for the reviews. I switched out the hash table implementation for a sorted linked list.
Attachment #8382608 - Attachment is obsolete: true
Attachment #8383353 - Flags: review?(brian)
Comment on attachment 8383353 [details] [diff] [review]
patch v2

Review of attachment 8383353 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not sure a linked list is going to be better than an array of pointers considering there are only 1024 entries. In particular, the poor memory locality of linked lists means that every entry will probably be on a different page of memory, which means you'll be touching 512 (1024/2) pages of memory on average (probably much less, because MRU). That's got to be much more expensive than copying 2048 bytes (4*1024/2) of data in a single page on average (probably much less, because MRU).

Perhaps it is worth having a meeting about so we can reach a conclusion with minimal effort.

Also, could you split the patch into two parts: NSSCertDBTrustDomain.* + OCSPCache.* separated from everything else. Then we can r+ the "everything else" part. and focus reviews on the core part.

::: security/certverifier/OCSPCache.cpp
@@ +136,5 @@
> +#endif
> +}
> +
> +SECItem*
> +OCSPCache::Get(const CERTCertificate* aCert, PLArenaPool* aArena)

Let's say you have two CA certificates with the same subject name. Then you could have this:

CA1 -> EE1 (issuer=CA1, serial=1)
CA1 -> EE2 (issuer=CA1, serial=1)

It seems like the proper way to deal with that is to match (issuer public key, issuer name, serial number). This is why CertID is defined the way that it is (issuerKeyHash, issuerNameHash, serial number).
Comment on attachment 8383353 [details] [diff] [review]
patch v2

Clearing the review pending discussion with keeler.
Attachment #8383353 - Flags: review?(brian)
Comment on attachment 8383353 [details] [diff] [review]
patch v2

Review of attachment 8383353 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/certverifier/OCSPCache.cpp
@@ +44,5 @@
> +  if (mIssuer) {
> +    SECITEM_FreeItem(mIssuer, true);
> +  }
> +  if (mNext) {
> +    delete mNext;

This is really subtle. I does not feel natural that you delete the certs that point from it down. I would change this to have an explicit loop on the deletion of the ocsp cache.

@@ +77,5 @@
> +}
> +
> +OCSPCacheEntry*
> +OCSPCache::GetInternal(const CERTCertificate* aCert,
> +                       const MutexAutoLock& /* aProofOfLock */)

Since you always do a get internal followed but a PushFron (if found) would it not better to have instead:
FindAndMoveToHead combinint the two

on get that would be all you needed, (plus log)
on push if found you can do then an in-place replacement of the ocsp response

@@ +107,5 @@
> +}
> +
> +void
> +OCSPCache::RemoveInternal(OCSPCacheEntry *aEntry,
> +                          const MutexAutoLock& /* aProofOfLock */)

Since you are always removing the tail, why not have that instead?
popTail( const MutexAutoLock& /* aProofOfLock */); (assuming you to the findAndtofront)

::: security/certverifier/OCSPCache.h
@@ +34,5 @@
> +  SECItem* mOCSPResponse;
> +  // Rather than keep a copy of the entire certificate, just keep a copy
> +  // of the DER encodings of the serial number and issuer.
> +  SECItem* mSerialNumber;
> +  SECItem* mIssuer;

no expiration time means you cannot easily flush old entries in the table. But this is minor.

@@ +43,5 @@
> +
> +// OCSPCache can store and retrieve encoded OCSP responses. Each response
> +// is keyed on the certificate that purportedly corresponds to it (where
> +// certificates are distinguished based on serial number and issuer).
> +// A maximum of 1024 distinct entries can be stored. OCSPCache is thread-safe.

This works, but 
1. The ocsp responses I see in the wild are 500 bytes in size and public keys ~ 256 which would imply the cache will grow to almost 1M in size (which I think is too high for firefox OS, specially the 128MB one)
2. The because of the way this patch uses memory, I think you will destroy the l1 caches in many devices as the caching table gets full
3. Even on success a complete verification is done.

Would it not be better to instead cache the OCSP verifcation results? keyed by the full sha256 of the certificate der (cert-derCert) (32 bytes) and including status(4 bytes) + expiration (4bytes) + next +prev in a single array? (~= 58KB for 1024 entries (and contiguous)?
The only issue going this way is that we will always compute a sha256 value, but that is really fast even on slow arm.

@@ +75,5 @@
> +                      const MutexAutoLock& aProofOfLock);
> +  void LogWithCert(const char* aMessage, const CERTCertificate* aCert);
> +
> +  Mutex mMutex;
> +  OCSPCacheEntry* mEntries;

mHead?
Attachment #8383353 - Flags: review?(cviecco) → review-
This patch does introduce a dependency on mfbt (specifically, Mutex.h and Vertex.h), but I don't think that's unreasonable.
Attachment #8383353 - Attachment is obsolete: true
Attachment #8384943 - Flags: review?(brian)
Comment on attachment 8384943 [details] [diff] [review]
patch v3 part 1 of 2: introduce OCSPCache

Review of attachment 8384943 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/certverifier/OCSPCache.cpp
@@ +25,5 @@
> +
> +void
> +SHA384Hash(uint8_t (&buf)[SHA384_LENGTH], const SECItem& data)
> +{
> +  SECStatus rv = PK11_HashBuf(SEC_OID_SHA384, buf, data.data, data.len);

sha 256 is enough and is fater on mobile devices.

@@ +149,5 @@
> +OCSPCache::MakeMostRecentlyUsed(size_t aIndex,
> +                                const MutexAutoLock& /* aProofOfLock */)
> +{
> +  Entry entry = mEntries[aIndex];
> +  mEntries.erase(mEntries.begin() + aIndex);

This is linear witht the number of entries moved. So on average this will be linear.

@@ +214,5 @@
> +      }
> +    }
> +    // Well, we tried, but apparently everything has been revoked.
> +    if (mEntries.length() == MaxEntries) {
> +      mEntries.erase(mEntries.begin());

ouch this is o(1024)

::: security/certverifier/OCSPCache.h
@@ +30,5 @@
> +// OCSPCache can store and retrieve encoded OCSP responses. Each response
> +// is keyed on the certificate that purportedly corresponds to it (where
> +// certificates are distinguished based on serial number, issuer, and
> +// issuer public key). A maximum of 1024 distinct entries can be stored.
> +// OCSPCache is thread-safe.

did not we agree that we only need to store the certid, errorcode and valid-till values?

@@ +71,5 @@
> +    // Rather than keep a copy of the whole certificate, we keep the DER encoding
> +    // of the serial number and a SHA-384 hash of the issuer name and issuer key.
> +    SECItem* mSerialNumber;
> +    uint8_t mIssuerNameHash[SHA384_LENGTH];
> +    uint8_t mIssuerKeyHash[SHA384_LENGTH];

Why not just the certid?

@@ +88,5 @@
> +
> +  Mutex mMutex;
> +  static const size_t MaxEntries = 1024;
> +  // Sorted with the most-recently-used entry at the end.
> +  Vector<Entry> mEntries;

since you are using a vector (entry size * 2 max size) = (500 + 48 +48 + 20 ) * 2048 =~ 1e6 which for the 128MB is too much.
Attachment #8384943 - Flags: review?(cviecco) → review-
Comment on attachment 8384944 [details] [diff] [review]
patch v3 part 2 of 2: cache OCSP responses for insanity::pkix

Review of attachment 8384944 [details] [diff] [review]:
-----------------------------------------------------------------

tentative r+. Please answer the question I ask below.

::: security/certverifier/NSSCertDBTrustDomain.cpp
@@ +332,5 @@
> +  if (rv == SECSuccess || error == SEC_ERROR_REVOKED_CERTIFICATE ||
> +      error == SEC_ERROR_OCSP_UNKNOWN_CERT) {
> +    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG,
> +           ("NSSCertDBTrustDomain: caching OCSP response"));
> +    mOCSPCache.Put(encodedResponse, cert, issuerCert, error);

How do we prevent an older response from replacing a newer response?

::: security/manager/ssl/tests/unit/test_ocsp_caching.js
@@ +106,5 @@
>    add_connection_test("ocsp-stapling-none.example.com", Cr.NS_OK,
>                        clearSessionCache);
>    add_test(function() { do_check_eq(gFetchCount, 1); run_next_test(); });
>  
> +  // TODO (bug 977865): implement this for insanity

NIT: no space after TODO.

::: security/manager/ssl/tests/unit/test_ocsp_stapling.js
@@ +128,5 @@
> +  do_check_eq(histogram.counts[0], 1 * 0); // histogram bucket 0 is unused
> +  do_check_eq(histogram.counts[1], 1 * 1); // 1 connection with a good response
> +  do_check_eq(histogram.counts[2], 1 * 14); // 14 connections with no stapled resp.
> +  do_check_eq(histogram.counts[3], 1 * 0); // 0 connections with an expired response
> +  do_check_eq(histogram.counts[4], 1 * 11); // 11 connections with bad responses

Nit: Remove the "1 * "?

::: security/manager/ssl/tests/unit/test_ocsp_stapling_expired.js
@@ +110,5 @@
> +  do_check_eq(histogram.counts[0], 1 * 0); // histogram bucket 0 is unused
> +  do_check_eq(histogram.counts[1], 1 * 0); // 0 connections with a good response
> +  do_check_eq(histogram.counts[2], 1 * 0); // 0 connections with no stapled resp.
> +  do_check_eq(histogram.counts[3], 1 * 9); // 8 connections with an expired response
> +  do_check_eq(histogram.counts[4], 1 * 0); // 0 connections with bad responses

Ditto: remove the "1 * ".
Attachment #8384944 - Flags: review?(brian) → review+
Comment on attachment 8384943 [details] [diff] [review]
patch v3 part 1 of 2: introduce OCSPCache

Review of attachment 8384943 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/certverifier/OCSPCache.cpp
@@ +25,5 @@
> +
> +void
> +SHA384Hash(uint8_t (&buf)[SHA384_LENGTH], const SECItem& data)
> +{
> +  SECStatus rv = PK11_HashBuf(SEC_OID_SHA384, buf, data.data, data.len);

I already asked him to use SHA-384 when he and I previously talked about it. The advantage of SHA-384 is that this cache code will never be the weak link. The perf is unlikely to matter. If the perf hit turns out to be an issue then we can re-evaluate it.

@@ +26,5 @@
> +void
> +SHA384Hash(uint8_t (&buf)[SHA384_LENGTH], const SECItem& data)
> +{
> +  SECStatus rv = PK11_HashBuf(SEC_OID_SHA384, buf, data.data, data.len);
> +  PR_ASSERT(rv == SECSuccess);

Don't assume rv == SECSuccess. Instead, have SHA384Hash return an error code and check it.

@@ +35,5 @@
> +  , mErrorCode()
> +  , mSerialNumber(nullptr)
> +{
> +  memset(mIssuerNameHash, 0, sizeof(mIssuerNameHash));
> +  memset(mIssuerKeyHash, 0, sizeof(mIssuerKeyHash));

Are these memsets actually necessary? If not, remove them, or make them debug-only.

Why not combine mIssuerNameHash, mIssuerKeyHash, and mSerialNumber into a single hash of SHA-384(issuerName || issuerKey || serialNumber)? Then you can memcpy entries and the entries will be smaller, plus no need for writing destructor, copy constructor, and assignment operator.

@@ +39,5 @@
> +  memset(mIssuerKeyHash, 0, sizeof(mIssuerKeyHash));
> +}
> +
> +OCSPCache::Entry::Entry(const Entry& other)
> +  : mOCSPResponse(SECITEM_DupItem(other.mOCSPResponse))

This is going to be very expensive when copying entries around.

Like cviecco suggested, it seems like we could get rid of mOCSPResponse completely now, replacing it with just the PRErrorCode from VerifyEncodedOCSPResponse.

@@ +124,5 @@
> +  uint8_t issuerKeyHash[SHA384_LENGTH];
> +  SHA384Hash(issuerNameHash, aIssuerCert->derSubject);
> +  SHA384Hash(issuerKeyHash, aIssuerCert->derPublicKey);
> +
> +  for (int32_t i = mEntries.length() - 1; i >= 0; i--) {

Document why we are searching backwards.

@@ +137,5 @@
> +void
> +OCSPCache::LogWithCerts(const char* aMessage, const CERTCertificate* aCert,
> +                        const CERTCertificate* aIssuerCert)
> +{
> +#ifdef PR_LOGGING

You should also do a runtime check that the log module is enabled to avoid doing this expensive work when the log module isn't enabled. See my comment about this on your previous patch for how to do it.

@@ +149,5 @@
> +OCSPCache::MakeMostRecentlyUsed(size_t aIndex,
> +                                const MutexAutoLock& /* aProofOfLock */)
> +{
> +  Entry entry = mEntries[aIndex];
> +  mEntries.erase(mEntries.begin() + aIndex);

I think it is OK, as long as copying is made fast and as long as the number of entries is small. Let's try the simplest thing and then measure it to see if it ends up being expensive.

@@ +158,5 @@
> +OCSPCache::Get(const CERTCertificate* aCert,
> +               const CERTCertificate* aIssuerCert, PLArenaPool* aArena)
> +{
> +  PR_ASSERT(aCert);
> +  PR_ASSERT(aIssuerCert);

PR_Assert(aArena)?

@@ +192,5 @@
> +      LogWithCerts("OCSPCache::Put(%s, %s) already in cache as revoked - "
> +                   "not replacing", aCert, aIssuerCert);
> +    } else {
> +      LogWithCerts("OCSPCache::Put(%s, %s) already in cache - replacing",
> +                   aCert, aIssuerCert);

We shouldn't replace a newer entry with an older one.

@@ +213,5 @@
> +        break;
> +      }
> +    }
> +    // Well, we tried, but apparently everything has been revoked.
> +    if (mEntries.length() == MaxEntries) {

If we have 1024 revoked entries then we can just fail in a way that certificate verification will fail. Obviously there are serious issues if we've encountered 1024 different revoked certificates.

::: security/manager/ssl/tests/gtest/OCSPCacheTest.cpp
@@ +1,3 @@
> +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> +/* vim: set ts=8 sts=2 et sw=2 tw=80: */
> +/* TODO: license */

Don't forget the license block.

Please split OCSPCacheTest into a separate patch for the purposes of facilitating easier review.
Attachment #8384943 - Flags: review?(brian) → review-
Comment on attachment 8384944 [details] [diff] [review]
patch v3 part 2 of 2: cache OCSP responses for insanity::pkix

Review of attachment 8384944 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/certverifier/CertVerifier.h
@@ +88,5 @@
>        /*optional*/ const SECItem* stapledOCSPResponse,
>        /*optional out*/ insanity::pkix::ScopedCERTCertList* validationChain,
>        /*optional out*/ SECOidTag* evOidPolicy);
> +
> +  OCSPCache mOCSPCache;

I dont like this being part of the certverifier. As everytime we change any of the flags all the cache gets flushed.

I think there should be a pointer inside certverifier (part of the constructor) and it should be unique and initialized by nsNSSCeritifcateDB.
Attachment #8384944 - Flags: review?(cviecco) → review-
(In reply to Camilo Viecco (:cviecco) from comment #17)
> I dont like this being part of the certverifier. As everytime we change any
> of the flags all the cache gets flushed.

We agreed on IRC that this would be re-visited later.
Attachment #8384943 - Attachment is obsolete: true
Attachment #8384944 - Attachment is obsolete: true
Attachment #8385717 - Flags: review?(brian)
Comment on attachment 8385717 [details] [diff] [review]
patch v4 part 1 of 3: introduce OCSP cache

Review of attachment 8385717 [details] [diff] [review]:
-----------------------------------------------------------------

Almost there. Mostly minor nits

::: security/certverifier/OCSPCache.cpp
@@ +33,5 @@
> +
> +typedef ScopedPtr<PK11Context, PK11_DestroyContext_true> ScopedPK11Context;
> +
> +SECStatus
> +IDHash(uint8_t (&buf)[SHA384_LENGTH], const CERTCertificate* aCert,

static

@@ +82,5 @@
> +}
> +
> +OCSPCache::OCSPCache()
> +  : mMutex("OCSPCache-mutex")
> +{

why not resize the container to 128 or something like that here?

@@ +176,5 @@
> +
> +  if (index >= 0) {
> +    if (mEntries[index]->mErrorCode == SEC_ERROR_REVOKED_CERTIFICATE) {
> +      LogWithCerts("OCSPCache::Put(%s, %s) already in cache as revoked - "
> +                   "not replacing", aCert, aIssuerCert);

I would just return early here (to simplyfy if/else logic below), there is no need to push to front (ie optimize for the revoked case)

@@ +195,5 @@
> +
> +  if (mEntries.length() == MaxEntries) {
> +    LogWithCerts("OCSPCache::Put(%s, %s) too full - evicting an entry", aCert,
> +                 aIssuerCert);
> +    for (Entry** toEvict = mEntries.begin(); toEvict < mEntries.end();

Should this not be toEvict != mEntries.end() (instead of toEvict <mEntries.end()) for STL containters? (why not use an actual iterator?)

@@ +206,5 @@
> +      }
> +    }
> +    // Well, we tried, but apparently everything has been revoked or unknown.
> +    // Returning SEC_ERROR_REVOKED_CERTIFICATE causes whatever certificate
> +    // verification is currently underway to fail.

Why do we want this instead of a silent failure? (hopefully somtime in the future some space will be freed?) IE. the certificate currently being validated could be valid. But if you think failing is necessary I would prefer unkniown error as a failure code.

@@ +214,5 @@
> +    }
> +  }
> +
> +  Entry* newEntry = new Entry(aCert, aIssuerCert, aErrorCode, aThisUpdate,
> +                              aValidThrough);

Check for null allocation

::: security/certverifier/OCSPCache.h
@@ +26,5 @@
> +#include "prerror.h"
> +
> +namespace mozilla { namespace psm {
> +
> +// OCSPCache can store and retrieve encoded OCSP responses. Each response

s/responses/verification results/g

@@ +87,5 @@
> +
> +  Mutex mMutex;
> +  static const size_t MaxEntries = 1024;
> +  // Sorted with the most-recently-used entry at the end.
> +  Vector<Entry*> mEntries;

I think caching the entry directly is cleaner, but I am fine with this.
Attachment #8385717 - Flags: review?(brian) → review-
Comment on attachment 8385718 [details] [diff] [review]
patch v4 part 2 of 3: gtest for OCSP cache

Review of attachment 8385718 [details] [diff] [review]:
-----------------------------------------------------------------

very good.

::: security/manager/ssl/tests/gtest/OCSPCacheTest.cpp
@@ +43,5 @@
> +PutAndGet(mozilla::psm::OCSPCache& cache, CERTCertificate* subject,
> +          CERTCertificate* issuer,
> +          PRErrorCode error, PRTime time)
> +{
> +  SECStatus rv = cache.Put(subject, issuer, error, time, time);

I would change this to be have a different values for the times (probably far from each other)

@@ +74,5 @@
> +
> +  CERTCertificate issuer;
> +  MakeFakeCert(&issuer, "CN=issuer1", "issuer1", "000", "key000");
> +  PRTime timeIn = PR_Now();
> +  for (int i = 0; i < 1024; i++) {

make all thse magic numbers a const or a #define

@@ +121,5 @@
> +    snprintf(serialBuf, sizeof(serialBuf), "%04d", i);
> +    MakeFakeCert(&subject, subjectBuf, "issuer1", serialBuf, "key000");
> +    PutAndGet(cache, &subject, &issuer, 0, timeIn + i);
> +  }
> +  CERTCertificate evictedSubject;

Missing a makefakecert?
Attachment #8385718 - Flags: review?(cviecco) → review+
Comment on attachment 8385717 [details] [diff] [review]
patch v4 part 1 of 3: introduce OCSP cache

Review of attachment 8385717 [details] [diff] [review]:
-----------------------------------------------------------------

Moving train made me clear the wrong box. Rerequesting review from brian as I mistakenyly cleared his flag and not mine.
Attachment #8385717 - Flags: review?(cviecco) → review?(brian)
Comment on attachment 8385717 [details] [diff] [review]
patch v4 part 1 of 3: introduce OCSP cache

Review of attachment 8385717 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/certverifier/OCSPCache.cpp
@@ +206,5 @@
> +      }
> +    }
> +    // Well, we tried, but apparently everything has been revoked or unknown.
> +    // Returning SEC_ERROR_REVOKED_CERTIFICATE causes whatever certificate
> +    // verification is currently underway to fail.

This could be used a a DoS point for users. Buy 1024 certs, ask them to be revoked, and put a page with links to them. voila no more ssl/hsts sites for you. (imagine this on ffox where a 'restart' is a restarting the phone. There

::: security/certverifier/OCSPCache.h
@@ +87,5 @@
> +
> +  Mutex mMutex;
> +  static const size_t MaxEntries = 1024;
> +  // Sorted with the most-recently-used entry at the end.
> +  Vector<Entry*> mEntries;

ACtually, the way you use this is equivalent to use an STL List, why not use that instead?
(In reply to Camilo Viecco (:cviecco) from comment #25)
> ::: security/certverifier/OCSPCache.cpp
> @@ +206,5 @@
> > +      }
> > +    }
> > +    // Well, we tried, but apparently everything has been revoked or unknown.
> > +    // Returning SEC_ERROR_REVOKED_CERTIFICATE causes whatever certificate
> > +    // verification is currently underway to fail.
> 
> This could be used a a DoS point for users. Buy 1024 certs, ask them to be
> revoked, and put a page with links to them. voila no more ssl/hsts sites for
> you. (imagine this on ffox where a 'restart' is a restarting the phone.

I agree, but consider the case where you buy 1024 certs and revoke them all, so that you can ensure that any revocation of a stolen certificate is pushed out of the cache! That's a worse attack because it enables MitM instead of just DoS.

FWIW, I recommended it be done this way.

(In reply to Camilo Viecco (:cviecco) from comment #25)
> ACtually, the way you use this is equivalent to use an STL List, why not use
> that instead?

We can look into using STL list in a follow-up. The main issue with using STL is that STL is designed with the assumption that errors are reported with exceptions, so you have to be extremely careful about how you use it to avoid silent failures due to exceptions being disabled. MFBT was designed with the idea that there are no exceptions, so it seems like a safer choice now.
Comment on attachment 8385720 [details] [diff] [review]
patch v4 part 3 of 3: cache OCSP responses

Review of attachment 8385720 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/certverifier/CertVerifier.h
@@ +88,5 @@
>        /*optional*/ const SECItem* stapledOCSPResponse,
>        /*optional out*/ insanity::pkix::ScopedCERTCertList* validationChain,
>        /*optional out*/ SECOidTag* evOidPolicy);
> +
> +  OCSPCache mOCSPCache;

file follow up bug regarding lifecycleof cache.

::: security/certverifier/NSSCertDBTrustDomain.cpp
@@ +212,5 @@
> +      cachedResponseErrorCode = SEC_ERROR_OCSP_OLD_RESPONSE;
> +    }
> +  } else {
> +    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG,
> +           ("NSSCertDBTrustDomain: no cached OCSP response"));

in serval places below it is assumed that on failure of getting a cache result the value of cachedResponseErrorCode is not an actual error. Please set the values here to guarantee that this does not happen

@@ +242,5 @@
>    }
>  
>    if (mOCSPFetching == LocalOnlyOCSPForEV) {
> +    PR_SetError(cachedResponseErrorCode != 0 ? cachedResponseErrorCode
> +                                             : SEC_ERROR_OCSP_UNKNOWN_CERT, 0);

This is hard to parse. Could you use a local variable or realing the values.

@@ +333,5 @@
> +  SECStatus rv = VerifyEncodedOCSPResponse(*this, cert, issuerCert, time,
> +                                           encodedResponse, &thisUpdate,
> +                                           &validThrough);
> +  PRErrorCode error = (rv == SECSuccess ? 0 : PR_GetError());
> +  if (rv == SECSuccess || error == SEC_ERROR_REVOKED_CERTIFICATE ||

error == 0 instead of rv == SECSuccess

@@ +339,5 @@
> +    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG,
> +           ("NSSCertDBTrustDomain: caching OCSP response"));
> +    if (mOCSPCache.Put(cert, issuerCert, error, thisUpdate, validThrough)
> +          != SECSuccess) {
> +      return SECFailure;

I dont think we should fail if the put in cache fail. A warning yes.

::: security/certverifier/NSSCertDBTrustDomain.h
@@ +84,3 @@
>    const SECTrustType mCertDBTrustType;
>    const OCSPFetching mOCSPFetching;
> +  OCSPCache& mOCSPCache; // non-owning!

I prefer the *notation as it makes it clearer that there is no localownership

::: security/insanity/lib/pkixocsp.cpp
@@ +50,3 @@
>  };
>  
>  class Context

This is an unfortunate name, but alas this is not the place to solve it.
Attachment #8385720 - Flags: review?(cviecco) → review-
Comment on attachment 8385717 [details] [diff] [review]
patch v4 part 1 of 3: introduce OCSP cache

Review of attachment 8385717 [details] [diff] [review]:
-----------------------------------------------------------------

Very good. It would be great to get an interdiff of the changes you make to address the review comments, or a new copy of the patch to review again.

::: security/certverifier/OCSPCache.cpp
@@ +30,5 @@
> +{
> +  PK11_DestroyContext(context, true);
> +}
> +
> +typedef ScopedPtr<PK11Context, PK11_DestroyContext_true> ScopedPK11Context;

Please file a follow-up bug move security::psm::Digest to security/insanity/Digest.h and use it here instead. Not worth blocking progress here though.

@@ +33,5 @@
> +
> +typedef ScopedPtr<PK11Context, PK11_DestroyContext_true> ScopedPK11Context;
> +
> +SECStatus
> +IDHash(uint8_t (&buf)[SHA384_LENGTH], const CERTCertificate* aCert,

I agree that functions that are not called from outside the file should be static.

Also, I suggest we name this CertIDHash. It would be more clear.

I also suggest that you create a typedef for uint8_t[SHA384_LENGTH] and use it consistently here.

@@ +39,5 @@
> +{
> +  ScopedPK11Context context(PK11_CreateDigestContext(SEC_OID_SHA384));
> +  if (!context) {
> +    return SECFailure;
> +  }

It is important when you calculate a secure hash of multiple items that the items are clearly delimited. Example of badness:
   
   a1 = "cat";   a2 = "cats";
   b1 = "small"; b2 = "mall";

Notice that (a1 || b1) == (a2 || b2) so H(a1 || b1) == H(a2 || b2). Not good.

One way to resolve this is to encode the length as a prefix to each part. Since you are hashing the der encoding of the subject name and the der encoding of the public key, this seems to be taken care of. But, you should note that this is being addressed in this way.

@@ +49,5 @@
> +                     aCert->serialNumber.len);
> +  if (rv != SECSuccess) {
> +    return rv;
> +  }
> +  rv = PK11_DigestOp(context.get(), aIssuerCert->derSubject.data,

In pkixocsp.cpp we use aCert->derIssuer instead of aIssuerCert->derSubject, with the following comment:

  // From http://tools.ietf.org/html/rfc6960#section-4.1.1:
  // "The hash shall be calculated over the DER encoding of the
  // issuer's name field in the certificate being checked."

It seems like we should be consistent with pkixocsp.cpp and the spec here.

@@ +55,5 @@
> +  if (rv != SECSuccess) {
> +    return rv;
> +  }
> +  rv = PK11_DigestOp(context.get(), aIssuerCert->derPublicKey.data,
> +                     aIssuerCert->derPublicKey.len);

In pkixocsp.cpp we hash issuer.subjectPublicKeyInfo.subjectPublicKey for CertIDs.

is derPublicKey the entire subjectPublicKeyInfo? Seems safest to be consistent with pkixocsp's CertID processing by using issuer.subjectPublicKeyInfo.subjectPublicKey. If we are using something different then we should document the difference and explain why we think it doesn't hurt.

@@ +78,5 @@
> +  , mValidThrough(aValidThrough)
> +{
> +  // XXX how to handle failure here?
> +  IDHash(mIDHash, aCert, aIssuerCert);
> +}

I suggest that you make OCSPCache::Entry a plain old struct without any constructor, and create a InitializeEntry function or similar. Better than ignoring the error from IDHash, and also makes it clear that copying will be fast.

@@ +90,5 @@
> +{
> +  Clear();
> +}
> +
> +int32_t

document that -1 means "not found."

@@ +99,5 @@
> +  if (mEntries.length() == 0) {
> +    return -1;
> +  }
> +
> +  uint8_t idHash[SHA384_LENGTH];

Here's another place to use that typedef I suggested above.

@@ +105,5 @@
> +  if (rv != SECSuccess) {
> +    return -1;
> +  }
> +
> +  // mEntries is sorted with the most-recently-used entry at the end.

// [...] so searching from the end is going to give the best performance.

@@ +124,5 @@
> +    ScopedPtr<char, PORT_Free_string> cn(CERT_GetCommonName(&aCert->subject));
> +    ScopedPtr<char, PORT_Free_string> cnIssuer(CERT_GetCommonName(&aIssuerCert->subject));
> +    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG, (aMessage, cn.get(), cnIssuer.get()));
> +  }
> +#endif

I wonder if we're going to get unused parameter errors when PR_LOGGING isn't defined. If this fails on tryserver then we can just do something like this:

#ifdef PR_LOGGING
<your code>
#else
void
OCSPCache::LogWithCerts(const char*, const CERTCertificate*, const CERTCertificate*) { }
#endif

@@ +130,5 @@
> +
> +void
> +OCSPCache::MakeMostRecentlyUsed(size_t aIndex,
> +                                const MutexAutoLock& /* aProofOfLock */)
> +{

Please add a comment like this:

Since mEntries is in most-recently-used order, chances are that aIndex is close to the end of the list, so O(mEntries.erase()) should be much less than O(mEntries/2).

@@ +172,5 @@
> +
> +  MutexAutoLock lock(mMutex);
> +
> +  int32_t index = FindInternal(aCert, aIssuerCert, lock);
> +

Please add some comments explaining the special handling of revoked certificates.

@@ +194,5 @@
> +  }
> +
> +  if (mEntries.length() == MaxEntries) {
> +    LogWithCerts("OCSPCache::Put(%s, %s) too full - evicting an entry", aCert,
> +                 aIssuerCert);

Add a comment explaining why we don't evict revoked or unknown certs here.

@@ +214,5 @@
> +    }
> +  }
> +
> +  Entry* newEntry = new Entry(aCert, aIssuerCert, aErrorCode, aThisUpdate,
> +                              aValidThrough);

Interesting. Normally in Gecko we don't check operator new because operator new will never return null. But, eventually, we want this code to work outside of Gecko without libmozalloc.

Perhaps we should write it as new (nothrow) Entry(...) and then do the null check that cviecco suggests. I will add a note about this to the internal design documents.

::: security/certverifier/OCSPCache.h
@@ +30,5 @@
> +// OCSPCache can store and retrieve encoded OCSP responses. Each response
> +// is keyed on the certificate that purportedly corresponds to it (where
> +// certificates are distinguished based on serial number, issuer, and
> +// issuer public key). A maximum of 1024 distinct entries can be stored.
> +// OCSPCache is thread-safe.

It seems like this needs to be re-wrapped at 80 characters.

I would say "is keyed on (serial number, issuer name, and issuer public key), like an OCSP CertID."

@@ +73,5 @@
> +    PRErrorCode mErrorCode;
> +    PRTime mThisUpdate;
> +    PRTime mValidThrough;
> +    // The SHA-384 hash of the concatenation of the DER encodings of the
> +    // serial number, issuer name, and issuer key.

See my other comments about the hash. We should document this more precisely here and/or refer to the more precise description in the [Cert]IDHash function's comments.

@@ +74,5 @@
> +    PRTime mThisUpdate;
> +    PRTime mValidThrough;
> +    // The SHA-384 hash of the concatenation of the DER encodings of the
> +    // serial number, issuer name, and issuer key.
> +    uint8_t mIDHash[SHA384_LENGTH];

Use my suggested typedef here.
Attachment #8385717 - Flags: review?(brian) → review-
Comment on attachment 8385720 [details] [diff] [review]
patch v4 part 3 of 3: cache OCSP responses

Review of attachment 8385720 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/certverifier/NSSCertDBTrustDomain.cpp
@@ +242,5 @@
>    }
>  
>    if (mOCSPFetching == LocalOnlyOCSPForEV) {
> +    PR_SetError(cachedResponseErrorCode != 0 ? cachedResponseErrorCode
> +                                             : SEC_ERROR_OCSP_UNKNOWN_CERT, 0);

I think it is fine the way it is written.

@@ +339,5 @@
> +    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG,
> +           ("NSSCertDBTrustDomain: caching OCSP response"));
> +    if (mOCSPCache.Put(cert, issuerCert, error, thisUpdate, validThrough)
> +          != SECSuccess) {
> +      return SECFailure;

I can see either way as being reasonable. AFAICT, mOCSPCache.Put will only fail for very bad reasons.

::: security/certverifier/NSSCertDBTrustDomain.h
@@ +84,3 @@
>    const SECTrustType mCertDBTrustType;
>    const OCSPFetching mOCSPFetching;
> +  OCSPCache& mOCSPCache; // non-owning!

Pretty sure I asked him to make it a reference. The reference makes it clear that it will be non-null. (We use pointers for NSS types like CERTCertificate mostly out of convention with old code that used pointers for those types.)

::: security/insanity/include/insanity/pkix.h
@@ +115,5 @@
>                                      CERTCertificate* issuerCert,
>                                      PRTime time,
> +                                    const SECItem* encodedResponse,
> +                 /* optional out */ PRTime* thisUpdate = nullptr,
> +                 /* optional out */ PRTime* validThrough = nullptr);

Why default the parameters? In general I think defaulted parameters are a source of errors and should be avoided.
Attachment #8385720 - Flags: review?(brian) → review+
Comment on attachment 8385718 [details] [diff] [review]
patch v4 part 2 of 3: gtest for OCSP cache

Review of attachment 8385718 [details] [diff] [review]:
-----------------------------------------------------------------

I need to review this more, but here are some initial comments. I imagine some of these comments apply throughout the patch.

::: security/manager/ssl/tests/gtest/OCSPCacheTest.cpp
@@ +59,5 @@
> +
> +  CERTCertificate subject;
> +  MakeFakeCert(&subject, "CN=subject1", "issuer1", "001", "key001");
> +  CERTCertificate issuer;
> +  MakeFakeCert(&issuer, "CN=issuer1", "issuer1", "000", "key000");

But subject.derIssuer != issuer.derSubject!!!

@@ +60,5 @@
> +  CERTCertificate subject;
> +  MakeFakeCert(&subject, "CN=subject1", "issuer1", "001", "key001");
> +  CERTCertificate issuer;
> +  MakeFakeCert(&issuer, "CN=issuer1", "issuer1", "000", "key000");
> +  PutAndGet(cache, &subject, &issuer, 0, PR_Now());

Do we need to do something to make sure that any assertion failures in the called function are propogated as test failures after we call it? I seem to remember the need to call an additional GTest function after calling any function that may itself make aassertions. Because, otherwise, how could the ASSERT_* calls in PutAndGet cause TestPutAndGet to return early?

@@ +68,5 @@
> +}
> +
> +TEST_F(OCSPCacheTest, TestVariousGets)
> +{
> +  NSS_NoDB_Init(nullptr);

Do we really need to call this for every test? NSS_NoDB_Init is global initialization so it should only be called once.

@@ +69,5 @@
> +
> +TEST_F(OCSPCacheTest, TestVariousGets)
> +{
> +  NSS_NoDB_Init(nullptr);
> +  mozilla::psm::InitCertVerifierLog();

Same here. Surprised we have to call this more than once. Note that GTest has a mechanism for global per-process initialization.

@@ +77,5 @@
> +  PRTime timeIn = PR_Now();
> +  for (int i = 0; i < 1024; i++) {
> +    CERTCertificate subject;
> +    char subjectBuf[64];
> +    snprintf(subjectBuf, sizeof(subjectBuf), "CN=subject%04d", i);

I wonder if this builds on Windows with warnings-as-errors enabled. Windows hates these functions. May be better to just use PR_smprintf.

@@ +112,5 @@
> +
> +  CERTCertificate issuer;
> +  PRTime timeIn = PR_Now();
> +  MakeFakeCert(&issuer, "CN=issuer1", "issuer1", "000", "key000");
> +  for (int i = 0; i < 1025; i++) {

Explain the significance of 1025.

@@ +138,5 @@
> +  CERTCertificate notEvictedSubject;
> +  MakeFakeCert(&notEvictedSubject, "CN=subject0000", "issuer1", "0000", "key000");
> +  PRTime timeIn = PR_Now();
> +  PutAndGet(cache, &notEvictedSubject, &issuer, SEC_ERROR_REVOKED_CERTIFICATE, timeIn);
> +  for (int i = 1; i < 1025; i++) {

Explain the significance of 1025. Explain that this is adding 1025 different revoked certificates, which will fill up the cache with 1024 un-evictable entries and cause the 1025th to fail.
(In reply to Camilo Viecco (:cviecco) from comment #22)
> @@ +176,5 @@
> > +
> > +  if (index >= 0) {
> > +    if (mEntries[index]->mErrorCode == SEC_ERROR_REVOKED_CERTIFICATE) {
> > +      LogWithCerts("OCSPCache::Put(%s, %s) already in cache as revoked - "
> > +                   "not replacing", aCert, aIssuerCert);
> 
> I would just return early here (to simplyfy if/else logic below), there is
> no need to push to front (ie optimize for the revoked case)

I made this more clear, but I think it's important to maintain the most-recently-used aspect of the cache storage.

(In reply to Brian Smith (:briansmith, was :bsmith; NEEDINFO? for response) from comment #28)
> @@ +55,5 @@
> > +  if (rv != SECSuccess) {
> > +    return rv;
> > +  }
> > +  rv = PK11_DigestOp(context.get(), aIssuerCert->derPublicKey.data,
> > +                     aIssuerCert->derPublicKey.len);
> 
> In pkixocsp.cpp we hash issuer.subjectPublicKeyInfo.subjectPublicKey for
> CertIDs.
> 
> is derPublicKey the entire subjectPublicKeyInfo? Seems safest to be
> consistent with pkixocsp's CertID processing by using
> issuer.subjectPublicKeyInfo.subjectPublicKey. If we are using something
> different then we should document the difference and explain why we think it
> doesn't hurt.

derPublicKey is the entire DER encoding of the certificate's public key (the certificate decoding in NSS keeps a copy of the entire thing as well as decoding it into the subjectPublicKeyInfo).
Since we use the DER tag/length to make sure hash collisions can't be easily found, I think this should stay how it is.
(In reply to Brian Smith (:briansmith, was :bsmith; NEEDINFO? for response) from comment #30)
> @@ +60,5 @@
> > +  CERTCertificate subject;
> > +  MakeFakeCert(&subject, "CN=subject1", "issuer1", "001", "key001");
> > +  CERTCertificate issuer;
> > +  MakeFakeCert(&issuer, "CN=issuer1", "issuer1", "000", "key000");
> > +  PutAndGet(cache, &subject, &issuer, 0, PR_Now());
> 
> Do we need to do something to make sure that any assertion failures in the
> called function are propogated as test failures after we call it? I seem to
> remember the need to call an additional GTest function after calling any
> function that may itself make aassertions. Because, otherwise, how could the
> ASSERT_* calls in PutAndGet cause TestPutAndGet to return early?

I haven't been able to find good documentation on this. I did test that a failure in PutAndGet causes our test infrastructure to report a failure. From that point on, it doesn't really matter if the rest of the test runs or not, because that first failure would need to be fixed in any case.
(In reply to Camilo Viecco (:cviecco) from comment #27)
> @@ +333,5 @@
> > +  SECStatus rv = VerifyEncodedOCSPResponse(*this, cert, issuerCert, time,
> > +                                           encodedResponse, &thisUpdate,
> > +                                           &validThrough);
> > +  PRErrorCode error = (rv == SECSuccess ? 0 : PR_GetError());
> > +  if (rv == SECSuccess || error == SEC_ERROR_REVOKED_CERTIFICATE ||
> 
> error == 0 instead of rv == SECSuccess

I don't see an important reason for making this change, unless you feel strongly about it.
Attachment #8385718 - Attachment is obsolete: true
Attachment #8385718 - Flags: review?(brian)
Attachment #8387899 - Flags: review?(cviecco)
Attachment #8385720 - Attachment is obsolete: true
Comment on attachment 8387906 [details] [diff] [review]
patch v4.5 part 3 of 3: cache OCSP responses

Review of attachment 8387906 [details] [diff] [review]:
-----------------------------------------------------------------

r- only for the full cache DoS issue. Didnt we agree that failing this was was bad?

::: security/certverifier/NSSCertDBTrustDomain.cpp
@@ +342,5 @@
> +      error == SEC_ERROR_OCSP_UNKNOWN_CERT) {
> +    PR_LOG(gCertVerifierLog, PR_LOG_DEBUG,
> +           ("NSSCertDBTrustDomain: caching OCSP response"));
> +    if (mOCSPCache.Put(cert, issuerCert, error, thisUpdate, validThrough)
> +          != SECSuccess) {

so you are hard failing with a full cache? I think we had agreed that this was a bad idea?
Comment on attachment 8387906 [details] [diff] [review]
patch v4.5 part 3 of 3: cache OCSP responses

Review of attachment 8387906 [details] [diff] [review]:
-----------------------------------------------------------------

Ignore this previous comment. Need to see the other patches first.
Attachment #8387906 - Flags: review?(cviecco)
Attachment #8387898 - Flags: review?(cviecco) → review+
Comment on attachment 8387906 [details] [diff] [review]
patch v4.5 part 3 of 3: cache OCSP responses

Review of attachment 8387906 [details] [diff] [review]:
-----------------------------------------------------------------

r+
Attachment #8387906 - Flags: review+
Comment on attachment 8387899 [details] [diff] [review]
patch v4.5 part 2 of 3: gtest for OCSP cache

Review of attachment 8387899 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with nit request

::: security/manager/ssl/tests/gtest/OCSPCacheTest.cpp
@@ +28,5 @@
> +
> +// Makes a fake certificate with just the fields we need for testing here.
> +// (And those values are almost entirely bogus.)
> +// stackCert should be stack-allocated memory.
> +void

Can this functions be static?
Attachment #8387899 - Flags: review?(cviecco) → review+
(In reply to David Keeler (:keeler) from comment #32)
> I haven't been able to find good documentation on this. I did test that a
> failure in PutAndGet causes our test infrastructure to report a failure.
> From that point on, it doesn't really matter if the rest of the test runs or
> not, because that first failure would need to be fixed in any case.

See https://code.google.com/p/googletest/wiki/AdvancedGuide#Using_Assertions_in_Sub-routines
No longer depends on: 980628
Attached patch fixes-build-cviecco (obsolete) — Splinter Review
Attachment #8389413 - Flags: review?(dkeeler)
Comment on attachment 8389413 [details] [diff] [review]
fixes-build-cviecco

Review of attachment 8389413 [details] [diff] [review]:
-----------------------------------------------------------------

Great!

::: security/certverifier/NSSCertDBTrustDomain.h
@@ +18,5 @@
>  void DisableMD5();
>  
>  extern const char BUILTIN_ROOTS_MODULE_DEFAULT_NAME[];
>  
> +void PORT_Free_string(char* str);

Maybe eventually this should go in a little utils file or something, but I suppose it's fine for now. We can re-visit it later.

::: security/certverifier/OCSPCache.cpp
@@ +17,5 @@
>  
>  #include "OCSPCache.h"
>  
> +#include "NSSCertDBTrustDomain.h"
> +

nit: unnecessary blank line

@@ +22,2 @@
>  #include "pk11pub.h"
> +#include "secerr.h"

why was this necessary to include?

@@ +27,5 @@
>  #endif
>  
>  namespace mozilla { namespace psm {
>  
> +

nit: unnecessary blank line

@@ +34,5 @@
>  {
>    PK11_DestroyContext(context, true);
>  }
>  
> +typedef insanity::pkix::ScopedPtr<PK11Context, Insanity_PK11_DestroyContext_true> ScopedPK11Context;

nit: maybe try to break this into pieces so it all fits in 80 characters? I'm imagining something like this:

typedef insanity::pkix::ScopedPtr<PK11Context,
                                  Insanity_PK11_DestroyContext_true>
                                  ScopedPK11Context;

(although that looks a bit ugly, particularly with a non-constant width font)

@@ +35,5 @@
>    PK11_DestroyContext(context, true);
>  }
>  
> +typedef insanity::pkix::ScopedPtr<PK11Context, Insanity_PK11_DestroyContext_true> ScopedPK11Context;
> +

nit: unnecessary blank line

@@ +139,5 @@
>  {
>  #ifdef PR_LOGGING
>    if (PR_LOG_TEST(gCertVerifierLog, PR_LOG_DEBUG)) {
> +    insanity::pkix::ScopedPtr<char, mozilla::psm::PORT_Free_string> cn(CERT_GetCommonName(&aCert->subject));
> +    insanity::pkix::ScopedPtr<char, mozilla::psm::PORT_Free_string> cnIssuer(CERT_GetCommonName(&aIssuerCert->subject));

not a big deal, but if you feel like it, maybe break these lines at/near 80 characters?
Attachment #8389413 - Flags: review?(dkeeler) → review+
Attachment #8389413 - Attachment is obsolete: true
Comment on attachment 8389435 [details] [diff] [review]
fixes-build-cviecco (v 1.1)

Review of attachment 8389435 [details] [diff] [review]:
-----------------------------------------------------------------

Keeping r+ from keeler
Attachment #8389435 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/840df518d026
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.