Closed Bug 1364159 Opened 7 years ago Closed 7 years ago

add optimization to potentially avoid calling CERT_CreateSubjectCertList in NSSCertDBTrustDomain::FindIssuer

Categories

(Core :: Security: PSM, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: keeler, Assigned: keeler)

References

Details

(Whiteboard: [psm-assigned])

Attachments

(1 file)

CERT_CreateSubjectCertList is not an inexpensive function call, since it enumerates the certificate database (i.e. reads from disk a lot). If we're verifying for a TLS handshake, however, we should already have in memory a certificate chain sent by the peer (there are some cases where we won't, such as session resumption (see bug 731478)). If we can, we should use those certificates before falling back to calling CERT_CreateSubjectCertList.
Comment on attachment 8868262 [details]
bug 1364159 - potentially avoid calling CERT_CreateSubjectCertList in NSSCertDBTrustDomain::FindIssuer

https://reviewboard.mozilla.org/r/139846/#review143546

Looks reasonable to me, but I have some questions:
1. Do you have an estimate on how much faster this would make verifications?
2. Could we partially fix Bug 1063276 (pass a cert chain instead of just a cert), then use that cert chain to perform this optimisation?

::: security/certverifier/NSSCertDBTrustDomain.cpp:149
(Diff revision 1)
>  
> +// Remove from newCandidates any CERTCertificates in alreadyTried.
> +// alreadyTried is likely to be small or empty.
> +static void
> +RemoveCandidatesAlreadyTried(UniqueCERTCertList& newCandidates,
> +                             UniqueCERTCertList& alreadyTried)

const?

::: security/certverifier/NSSCertDBTrustDomain.cpp:169
(Diff revision 1)
> +
> +// Add to matchingCandidates any CERTCertificates from candidatesIn that have a
> +// DER-encoded subject name equal to the given subject name.
> +static Result
> +AddMatchingCandidates(UniqueCERTCertList& matchingCandidates,
> +                      UniqueCERTCertList& candidatesIn,

const?

::: security/certverifier/NSSCertDBTrustDomain.cpp:191
(Diff revision 1)
> +                                             certDuplicate.get());
> +      if (srv != SECSuccess) {
> +        return MapPRErrorCodeToResult(PR_GetError());
> +      }
> +      // matchingCandidates now owns certDuplicate
> +      certDuplicate.release();

We should prepend `Unused << ` here for clarity, and because not doing so might cause static analysis failures (I don't remember if the checks apply to both mozilla::UniquePtr and std::unique_ptr though).

::: security/manager/ssl/nsNSSCallbacks.cpp:1090
(Diff revision 1)
>    if (!cert) {
>      return;
>    }
>  
> +  UniqueCERTCertList peerCertChain(SSL_PeerCertificateChain(fd));
> +  MOZ_ASSERT(cert, "SSL_PeerCertChain failed in TLS handshake callback?");

Looks like `cert` should be `peerCertChain` instead.
`SSL_PeerCertChain` should probably also be `SSL_PeerCertificateChain` instead, but this doesn't really matter.
Attachment #8868262 - Flags: review?(cykesiopka.bmo) → review+
Comment on attachment 8868262 [details]
bug 1364159 - potentially avoid calling CERT_CreateSubjectCertList in NSSCertDBTrustDomain::FindIssuer

https://reviewboard.mozilla.org/r/139846/#review144216

Looks good.

::: security/certverifier/NSSCertDBTrustDomain.cpp:183
(Diff revision 1)
> +      continue; // probably just too big - continue processing other candidates
> +    }
> +    if (InputsAreEqual(candidateSubjectName, subjectName)) {
> +      UniqueCERTCertificate certDuplicate(CERT_DupCertificate(node->cert));
> +      if (!certDuplicate) {
> +        return Result::FATAL_ERROR_NO_MEMORY;

Seems weird to error here when we skip the out-of-memory condition above without bailing out.

Functions like FindIssuerInner keep going unless they hit a library failure.
Attachment #8868262 - Flags: review?(jjones) → review+
Comment on attachment 8868262 [details]
bug 1364159 - potentially avoid calling CERT_CreateSubjectCertList in NSSCertDBTrustDomain::FindIssuer

https://reviewboard.mozilla.org/r/139846/#review143546

Thanks!

Unfortunately this probably won't make a huge difference on modern machines and with the current default cert db format. On older hardware and when we switch to the sql-based db, it might be noticeable (again, though, this is really only a slight optimization).

As for bug 1063276, I think as long as we have some places where we aren't guaranteed to have a certificate chain, we should probably keep things as-is for now.

> const?

Good call.

> const?

Fixed.

> We should prepend `Unused << ` here for clarity, and because not doing so might cause static analysis failures (I don't remember if the checks apply to both mozilla::UniquePtr and std::unique_ptr though).

Sounds good.

> Looks like `cert` should be `peerCertChain` instead.
> `SSL_PeerCertChain` should probably also be `SSL_PeerCertificateChain` instead, but this doesn't really matter.

Good catch.
Comment on attachment 8868262 [details]
bug 1364159 - potentially avoid calling CERT_CreateSubjectCertList in NSSCertDBTrustDomain::FindIssuer

https://reviewboard.mozilla.org/r/139846/#review144216

Thanks!

> Seems weird to error here when we skip the out-of-memory condition above without bailing out.
> 
> Functions like FindIssuerInner keep going unless they hit a library failure.

What's being skipped on lines 175-178 is that mozilla::pkix doesn't handle inputs greater than 16k. In those situations we generally just pretend such certificates don't exist rather than returning a catastrophic error. (See e.g. https://dxr.mozilla.org/mozilla-central/source/security/certverifier/NSSCertDBTrustDomain.cpp#103 )
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/600b709c2634
potentially avoid calling CERT_CreateSubjectCertList in NSSCertDBTrustDomain::FindIssuer r=Cykesiopka,jcj
https://hg.mozilla.org/mozilla-central/rev/600b709c2634
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
See Also: → 1290639
Hello. This bug may have contributed to a sudden change in the Telemetry probe CERT_VALIDATION_SUCCESS_BY_CA[1] which seems to have occurred in Nightly 20170524[2][3].

Quite a few buckets went to 0 and quite a few others had a significant increase. I guess this means we're seeing radically different certs being successfully validated?

Is this a good thing? A bad thing?

Did this happen on purpose? Is this expected?

Is this probe measuring something useful?

[1]: http://alerts.telemetry.mozilla.org/index.html#/detectors/1/metrics/1059/alerts/?from=2017-05-24&to=2017-05-24
[2]: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5bc1c758ab57c1885dceab4e7837e58af27b998c&tochange=291a11111bdd05c5cd55dd552da4b1285ceba9b2
[3]: https://mzl.la/2rqaWMR
Flags: needinfo?(dkeeler)
I'm pretty sure this isn't a bad thing. Before this patch, Firefox would use whatever issuer certificates it could find in its certificate cache and built-in list to verify a server certificate. As of this patch, Firefox first essentially attempts to first build a path that uses the certificates presented by the server (and then it falls back to the old behavior). Due to prolific cross-signing by CAs, there can often be many paths from a server certificate to a trust anchor in our program. I think the telemetry changes indicate that for many sites, Firefox would previously find a trust path that used some cross-signed intermediate not actually sent by the server. Now, though, Firefox should more often find a trust path that looks like what the server sent, hence the change. While I wasn't explicitly expecting this, it does make sense.

As for if the probe is useful, I think it's good to have an idea of the relative usage of CAs in our program (if a root consistently has 0 hits, it might be a good candidate for removal).
Flags: needinfo?(dkeeler)
There are an awful lot of entries in the histogram that have 0 hits, but I don't know how many of them have corresponding roots in the codebase. Maybe a bug would be a good first step in auditing this data.

Speaking of auditing, this probe presently lacks an "alert_emails" field, so only I (well, and some other telemetry folks) get notified when there's sudden changes like this. Would this be a good component to file a bug in to get this fixed?
Some roots may have already been removed. The mapping of bucket number to root CA is here: https://dxr.mozilla.org/mozilla-central/source/security/manager/tools/KnownRootHashes.json (anything over 188 is not yet assigned, so it makes sense that those would get 0 hits).

Adding an alert_emails field would be great. Core :: Security: PSM would be the right component. I imagine seceng-telemetry@mozilla.com would be a good start.
See Also: → 1369747
I think that this bug introduced an undesirable behavior by accident.  It seems that Firefox is now reporting (and maybe using) the longest possible chain the server sends, rather than the shortest possible chain.

For example, if the server sends a chain like:

(server name) -> Online CA -> Root G3 -> Root G2 -> Root G1

And all three roots are in the trust store, it seems that G1 is getting the credit now.  This is why many of the buckets went to zero.

The reason this is undesirable is because it means that we no longer have data on what percentage are solely relying on G1.  If we want to retire older roots, then it is critical log the shortest chain.
Flags: needinfo?(dkeeler)
Thanks, Peter - I filed bug 1400913.
Flags: needinfo?(dkeeler)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: