Closed Bug 1085506 Opened 5 years ago Closed 5 years ago

add telemetry for all TLS handshake certificate verification errors encountered

Categories

(Core :: Security: PSM, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox38 --- fixed
firefox39 --- fixed

People

(Reporter: keeler, Assigned: keeler)

References

Details

Attachments

(1 file, 1 obsolete file)

We already have telemetry for overridable certificate errors (SSL_CERT_ERROR_OVERRIDES), but we don't have information on how often users encounter certificate verification errors that can't be overridden.
Attached patch patch (obsolete) — Splinter Review
This is a first pass at implementing this. I basically took all the error codes mozilla::pkix could return when verifying a certificate and mapped them linearly (leaving holes for the ones that are overridable, so if we change their overridability, we don't have to re-order everything).
Assignee: nobody → dkeeler
Status: NEW → ASSIGNED
Attachment #8523244 - Flags: feedback?(rlb)
Comment on attachment 8523244 [details] [diff] [review]
patch

Review of attachment 8523244 [details] [diff] [review]:
-----------------------------------------------------------------

Should we just do this for all errors?  I don't see any reason we would want to *not* know about overrideable ones.

::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +424,5 @@
>        return SECFailure;
>  
>      default:
> +    {
> +      uint32_t probeValue = MapNonOverridableErrorToProbeValue(defaultErrorCodeToReport);

It seems like we could just put this right before the switch to measure everything.
Attachment #8523244 - Flags: feedback?(rlb) → feedback+
Attached patch patch v2Splinter Review
Attachment #8523244 - Attachment is obsolete: true
Attachment #8570708 - Flags: review?(rlb)
Summary: add telemetry for non-overridable certificate errors → add telemetry for all TLS handshake certificate verification errors encountered
Comment on attachment 8570708 [details] [diff] [review]
patch v2

Review of attachment 8570708 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  Just a couple of readability nits.

::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +338,5 @@
> +  // possible values in telemetry would be cumbersome.
> +  // Right now there are ~50 non-fatal and 4 fatal errors mozilla::pkix might
> +  // return. This transformation and the eventual telemetry probe the values
> +  // will be stored in can handle 80 non-fatal and 20 fatal errors, which
> +  // should be sufficient until we re-write everything in rust.

This comment is a little convoluted.  I would suggest something of the form

> // Since FATAL_ERROR_FLAG is 0x800, fatal error codes are quite a bit larger
> // than non-fatal.  To conserve space, we re-map these error codes so that
> // they start at 80 (decimal) instead of 0x800.  Currently, there are ~50
> // non-fatal errors, so starting fatal errors at 80 should provide plenty
> // of space.

The 80/20 split seems slightly un-aesthetic, but I see why you're doing it.

@@ +342,5 @@
> +  // should be sufficient until we re-write everything in rust.
> +  static_assert(FATAL_ERROR_FLAG == 0x800,
> +                "mozilla::pkix::FATAL_ERROR_FLAG is not what we were expecting");
> +  if (probeValue & FATAL_ERROR_FLAG) {
> +    probeValue -= FATAL_ERROR_FLAG;

Seems like it might read a little more clearly to unset the flag using an XOR:

> probeValue ^= FATAL_ERROR_FLAG
Attachment #8570708 - Flags: review?(rlb) → review+
[Feature/regressing bug #]: This telemetry is needed to measure the impact of removing the Equifax Secure CA, which currently shows high usage in telemetry (it is bin 4 of http://mzl.la/1KfvGLo).  Thus we need to land this patch a few weeks before landing bug 1132496 in Firefox.

[User impact if declined]: Without this telemetry it will be difficult for us to tell whether the Equifax Secure CA removal is causing breakage for users, and thus whether we need to revert it.

[Describe test coverage new/current, TBPL]: Manual.

[Risks and why]: Low risk.  Just adds a new telemetry histogram to collect TLS errors.

[String/UUID change made/needed]: None
Attachment #8570708 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/2704f2276325
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Blocks: 1137470
Attachment #8570708 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.