Closed Bug 1196039 Opened 4 years ago Closed 4 years ago

Telemetry for certificate lifetime

Categories

(Core :: Security: PSM, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 --- fixed

People

(Reporter: kmckinley, Assigned: kmckinley)

Details

Attachments

(1 file, 6 obsolete files)

In order to help understand the impact of changing certificate validation policies based on cert lifetime, we need to understand the distribution of certificate lifetimes seen by users.

Collect two probes:
certificate lifetime: notAfter - notBefore
certificate lifetime remaining: notAfter - now

Ideally, measure by days up to a year, then by year.
Assignee: nobody → kmckinley
Status: NEW → ASSIGNED
There was no easy way to get the number of seconds in a Duration, so I added that method. I am not 100% happy with the buckets it comes up with. Below is a very small test with 128 buckets.

Alternatively, we could do an enum bucket with the number of days.

SSL_OBSERVED_CERTIFICATE_LIFETIME
2592 samples, average = 110.2, sum = 285609

  47 |  0  0%
  50 |  1  0%
  88 |#########################  2509  1%
 363 |  29  0%
 384 |  3  0%
 406 |  4  0%
 455 |  1  0%
 604 |  2  0%
 639 |  2  0%
 676 |  2  0%
 715 |  10  0%
 757 |  5  0%
1063 |  8  0%
1125 |  2  0%
1191 |  8  0%
1772 |  2  0%
2100 |  4  0%
2223 |  0  0%
Attachment #8658986 - Flags: review?(rlb)
One bucket per day up to 1 year, then by year up to 5, 5-10 years, 10-20 years, >20 years. Integer division truncates, so 89.9 days would result in 89, not 90.
Attachment #8658986 - Attachment is obsolete: true
Attachment #8658986 - Flags: review?(rlb)
Attachment #8660081 - Flags: review?(rlb)
Flags: needinfo?(vladan.bugzilla)
Comment on attachment 8660081 [details] [diff] [review]
Add telemetry for detecting certificate lifetimes.

Review of attachment 8660081 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/certverifier/NSSCertDBTrustDomain.cpp
@@ +896,5 @@
>  
> +  uint64_t durationInDays = validityDuration.getDurationInSeconds() / Time::ONE_DAY_IN_SECONDS;
> +
> +  if (durationInDays <= 365) {
> +    Telemetry::Accumulate( mozilla::Telemetry::SSL_OBSERVED_CERTIFICATE_LIFETIME, durationInDays);

Please use separate buckets for endEntityOrCA==MustBeEndEntity and endEntityOrCA==MustBeCA. It would also be better to have a separate bucket for root CAs and intermediate CAs. To do that, you'd probably have to move the measurement to NSSCertDBTrustDomain::IsChainValid.

Also, it would be better to separate out SSL end-entity certificates from OCSP signing end-entity certificates. This would require you to have access to the value of requiredEKUIfPresent.

@@ +899,5 @@
> +  if (durationInDays <= 365) {
> +    Telemetry::Accumulate( mozilla::Telemetry::SSL_OBSERVED_CERTIFICATE_LIFETIME, durationInDays);
> +  } else {
> +    uint64_t durationInYears = durationInDays / 365;
> +    if (durationInYears <= 5) {

It would be nice to have one-year granularity for up to 10 years.
Now we report telemetry for each cert in the chain in different histograms, depending on whether it is a root, intermediate, or end entity cert.

Also, the buckets are now days (0-365) and years 1,2,3,4,5,6,7,8,9,10, 10<lifetime<=20, 20<lifetime.
Attachment #8660081 - Attachment is obsolete: true
Attachment #8660081 - Flags: review?(rlb)
Attachment #8660220 - Flags: review?(rlb)
Attachment #8660220 - Flags: review?(brian)
Comment on attachment 8660220 [details] [diff] [review]
separate end entity certificates from root & intermediate certificates

Review of attachment 8660220 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for making these updates. I didn't review the patch carefully but I it does seem to be doing what I suggested.

Note that this patch doesn't measure the lifetimes of OCSP response signing certificates. IMO that is also interesting.
Attachment #8660220 - Flags: review?(brian) → feedback+
Comment on attachment 8660220 [details] [diff] [review]
separate end entity certificates from root & intermediate certificates

Review of attachment 8660220 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/certverifier/NSSCertDBTrustDomain.cpp
@@ +807,5 @@
> +  if (Telemetry::CanRecordExtended()) {
> +    // Record the observed certificate validity lifetime to telemetry
> +    CERTCertListNode* curNode = CERT_LIST_HEAD(certList);
> +    Telemetry::ID telemetryId = mozilla::Telemetry::SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME;
> +    

trailing space

@@ +811,5 @@
> +    
> +    PRTime notBefore, notAfter;
> +    SECStatus rv;
> +
> +    while(curNode) {

I think the spacing is a little unorthodox? https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style
Does the NSS module have its own style guide?

@@ +819,5 @@
> +            (notAfter - notBefore) : (notBefore - notAfter))
> +          / PR_USEC_PER_SEC)
> +        / Time::ONE_DAY_IN_SECONDS;
> +
> +      uint64_t telemetryValue = durationInDays;

Recording exact certificate lifetimes (precise to 1 day) seems like a bit of a privacy risk. Do we really need to distinguish lifetimes of say 200 vs 201 days? Can you just have buckets for < 1 year, 1 year, 2 years, etc?

@@ +833,5 @@
> +          telemetryValue = 12 + 365;
> +        }
> +      }
> +
> +      Telemetry::Accumulate( telemetryId, telemetryValue);

extra space

::: toolkit/components/telemetry/Histograms.json
@@ +1368,5 @@
>      "n_values": 16,
>      "description": "SSL Handshake Key Exchange Algorithm for resumed handshake (null=0, rsa=1, dh=2, fortezza=3, ecdh=4)"
>    },
> +  "SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME": {
> +    "expires_in_version": "55",

This expiry is 1.5 years in the future, set it to a shorter expiry (say 5 versions) and you can bump it again if you think the measurement is still useful at that point (you'll get an expiry reminder email)

@@ +1369,5 @@
>      "description": "SSL Handshake Key Exchange Algorithm for resumed handshake (null=0, rsa=1, dh=2, fortezza=3, ecdh=4)"
>    },
> +  "SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME": {
> +    "expires_in_version": "55",
> +    "kind": "enumerated",

add an alert_emails so we know the owner & so you get an automated notification when the distribution of data in the histogram changes significantly

@@ +1370,5 @@
>    },
> +  "SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME": {
> +    "expires_in_version": "55",
> +    "kind": "enumerated",
> +    "n_values": 1024,

this seems like a lot of buckets, why not e.g. 20?

@@ +1371,5 @@
> +  "SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME": {
> +    "expires_in_version": "55",
> +    "kind": "enumerated",
> +    "n_values": 1024,
> +    "description": "The total lifetime (notAfter - notBefore) of end entity certificates we see in the wild. 0d-365d,1y,2y,3y,4y,5y,10y,20y,>20y"

provide a mapping of bucket # to expiry range in the description
Flags: needinfo?(vladan.bugzilla)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #6)
> Comment on attachment 8660220 [details] [diff] [review]
> separate end entity certificates from root & intermediate certificates
> 
> Review of attachment 8660220 [details] [diff] [review]:
> 
> Recording exact certificate lifetimes (precise to 1 day) seems like a bit of
> a privacy risk. Do we really need to distinguish lifetimes of say 200 vs 201
> days? Can you just have buckets for < 1 year, 1 year, 2 years, etc?

We plan to use this information so we can make decisions on number of connections affected when we look to enhance or relax scrutiny of certificates presented by web servers, which has a direct affect on the security and privacy of their communications. Can you elaborate on the privacy risk you see here?
Comment on attachment 8660220 [details] [diff] [review]
separate end entity certificates from root & intermediate certificates

Review of attachment 8660220 [details] [diff] [review]:
-----------------------------------------------------------------

Unfortunately, I think this needs to be refactored and moved to GatherSuccessfulValidationTelemetry.  On the plus side, the refactor will make it quite a bit simpler.

::: security/certverifier/NSSCertDBTrustDomain.cpp
@@ +803,5 @@
>        return Result::ERROR_KEY_PINNING_FAILURE;
>      }
>    }
>  
> +  if (Telemetry::CanRecordExtended()) {

Vladan: Is the recommended practice here to gate telemetry code with this sort of thing, or just consider that an internal of Telemetry::Accumulate?  It seems like the latter would be preferable, in order preserve your ability to update the Telemetry API.

@@ +806,5 @@
>  
> +  if (Telemetry::CanRecordExtended()) {
> +    // Record the observed certificate validity lifetime to telemetry
> +    CERTCertListNode* curNode = CERT_LIST_HEAD(certList);
> +    Telemetry::ID telemetryId = mozilla::Telemetry::SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME;

You're already inside of "::mozilla" here, so you can just start with "Telemetry::", like you do below with the call to Telemetry::Accumulate().

@@ +811,5 @@
> +    
> +    PRTime notBefore, notAfter;
> +    SECStatus rv;
> +
> +    while(curNode) {

Yeah, space before paren.  However, I don't think this iteration is actually worth doing.  We only need data on the lifetime of end-entity certificates, since that's where we might take some action.  CA certificates are already covered by OneCRL.

There's also a question of which chains we want to collect this metric for. With the check here in IsChainValid, we will measure every complete chain regardless of revocation status. Since we're really only interested in EE revocation, this just makes the picture muddier.  I would prefer that we only collect statistics for EE certificates that we accept as valid.

With that in mind, I think the right place for this to be is in AuthCertificate, where we collect other telemetry on accepted certificate chains (GatherSuccessfulValidationTelemetry).

https://dxr.mozilla.org/mozilla-central/source/security/manager/ssl/SSLServerCertVerification.cpp#1149

That said, I think most of the rest of this can be copy/pasted.

Vladan: This is PSM, not NSS.  And don't get me started on NSS style (see Bug 1118245).

@@ +812,5 @@
> +    PRTime notBefore, notAfter;
> +    SECStatus rv;
> +
> +    while(curNode) {
> +      rv = CERT_GetCertTimes(curNode->cert, &notBefore, &notAfter);

You need to check this return value.

@@ +814,5 @@
> +
> +    while(curNode) {
> +      rv = CERT_GetCertTimes(curNode->cert, &notBefore, &notAfter);
> +
> +      uint64_t durationInDays = (((notAfter > notBefore) ?

This check is done after CheckIssuerIndependentProperties, so you're guaranteed notAfter > notBefore.  I would just silently fail gathering telemetry if this criterion isn't met.  (if (notBefore > notAfter) { MOZ_ASSERT(false); return; })

@@ +817,5 @@
> +
> +      uint64_t durationInDays = (((notAfter > notBefore) ?
> +            (notAfter - notBefore) : (notBefore - notAfter))
> +          / PR_USEC_PER_SEC)
> +        / Time::ONE_DAY_IN_SECONDS;

Spacing here is awkward.

@@ +819,5 @@
> +            (notAfter - notBefore) : (notBefore - notAfter))
> +          / PR_USEC_PER_SEC)
> +        / Time::ONE_DAY_IN_SECONDS;
> +
> +      uint64_t telemetryValue = durationInDays;

No, we really do need more granularity than years.  We're trying to see if there's a reasonable threshold we can set for not doing OCSP checking, which means we need at least week-level granularity.  There's no privacy risk here because certificate lifetimes tend to not to be unique to a cert, but common to all certs issued by a CA.  (And we already have _BY_CA metrics.)

@@ +822,5 @@
> +
> +      uint64_t telemetryValue = durationInDays;
> +
> +      if (durationInDays > 365) {
> +        uint64_t durationInYears = durationInDays / 365;

Please replace the literal 365 instances with a variable / #define.  

Not also that there are not always 365 days in a year (e.g., 2016 will have 366).  It might be worth a comment noting that the telemetry is approximate.

@@ +824,5 @@
> +
> +      if (durationInDays > 365) {
> +        uint64_t durationInYears = durationInDays / 365;
> +
> +        if (durationInYears <= 10) {

Given that we're focused on EE certificates, it's not useful to collect this far out -- we reject EE certificates that last more than 39 months.

https://dxr.mozilla.org/mozilla-central/source/security/certverifier/NSSCertDBTrustDomain.cpp#930

In reality, though, it's not really that interesting to learn about the long tail of the distribution; we're really only interested in the low end of the scale.  Why don't we simplify this and just collect days up to 2 years?  With a single bin for >2 years.

@@ +835,5 @@
> +      }
> +
> +      Telemetry::Accumulate( telemetryId, telemetryValue);
> +
> +      if (CERT_LIST_END(curNode, certList)) {

As noted above, we only care about EE, so this clause can be deleted.

::: security/pkix/include/pkix/Time.h
@@ +147,5 @@
>    }
>  
> +  inline
> +  uint64_t getDurationInSeconds() const
> +  { return durationInSeconds; }

This doesn't appear to be necessary now that you're using the NSS functions instead of the moz::pkix functions for validity time.

::: toolkit/components/telemetry/Histograms.json
@@ +1368,5 @@
>      "n_values": 16,
>      "description": "SSL Handshake Key Exchange Algorithm for resumed handshake (null=0, rsa=1, dh=2, fortezza=3, ecdh=4)"
>    },
> +  "SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME": {
> +    "expires_in_version": "55",

Vladan: I think this duration is appropriate.  We're measuring trends in the PKI here, so we need longitudinal data.

@@ +1370,5 @@
>    },
> +  "SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME": {
> +    "expires_in_version": "55",
> +    "kind": "enumerated",
> +    "n_values": 1024,

How about we compromise on 800?  :)  That gives us two years of days, plus some overhead.

@@ +1371,5 @@
> +  "SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME": {
> +    "expires_in_version": "55",
> +    "kind": "enumerated",
> +    "n_values": 1024,
> +    "description": "The total lifetime (notAfter - notBefore) of end entity certificates we see in the wild. 0d-365d,1y,2y,3y,4y,5y,10y,20y,>20y"

"The lifetime of accepted HTTPS server certificates, in days"

@@ +1373,5 @@
> +    "kind": "enumerated",
> +    "n_values": 1024,
> +    "description": "The total lifetime (notAfter - notBefore) of end entity certificates we see in the wild. 0d-365d,1y,2y,3y,4y,5y,10y,20y,>20y"
> +  },
> +  "SSL_OBSERVED_INTERMEDIATE_CERTIFICATE_LIFETIME": {

Remove.

@@ +1379,5 @@
> +    "kind": "enumerated",
> +    "n_values": 1024,
> +    "description": "The total lifetime (notAfter - notBefore) of intermediate certificates we see in the wild. 0d-365d,1y,2y,3y,4y,5y,10y,20y,>20y"
> +  },
> +  "SSL_OBSERVED_ROOT_CERTIFICATE_LIFETIME": {

Remove.
Attachment #8660220 - Flags: review?(rlb) → review-
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #3)
> Comment on attachment 8660081 [details] [diff] [review]
> Add telemetry for detecting certificate lifetimes.
> 
> Review of attachment 8660081 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: security/certverifier/NSSCertDBTrustDomain.cpp
> @@ +896,5 @@
> >  
> > +  uint64_t durationInDays = validityDuration.getDurationInSeconds() / Time::ONE_DAY_IN_SECONDS;
> > +
> > +  if (durationInDays <= 365) {
> > +    Telemetry::Accumulate( mozilla::Telemetry::SSL_OBSERVED_CERTIFICATE_LIFETIME, durationInDays);
> 
> Please use separate buckets for endEntityOrCA==MustBeEndEntity and
> endEntityOrCA==MustBeCA. It would also be better to have a separate bucket
> for root CAs and intermediate CAs. To do that, you'd probably have to move
> the measurement to NSSCertDBTrustDomain::IsChainValid.

I don't really think it's a good idea to use telemetry to measure CA certificates.  The set of CA certificates is pretty much known (given the disclosure requirements; except for technically-constrained roots), so we don't need measurements from the wild.


> Also, it would be better to separate out SSL end-entity certificates from
> OCSP signing end-entity certificates. This would require you to have access
> to the value of requiredEKUIfPresent.

Even if we were interested in this (I can't immediately think of an application of this knowledge), it would be better to collect this measurement in the OCSP code.
(In reply to Richard Barnes [:rbarnes] from comment #9)
> > Please use separate buckets for endEntityOrCA==MustBeEndEntity and
> > endEntityOrCA==MustBeCA. It would also be better to have a separate bucket
> > for root CAs and intermediate CAs. To do that, you'd probably have to move
> > the measurement to NSSCertDBTrustDomain::IsChainValid.
> 
> I don't really think it's a good idea to use telemetry to measure CA
> certificates.  The set of CA certificates is pretty much known (given the
> disclosure requirements; except for technically-constrained roots), so we
> don't need measurements from the wild.

That's ridiculously untrue, but I can understand why it may be better to expedite getting end-entity certificate lifetime measured.

> > Also, it would be better to separate out SSL end-entity certificates from
> > OCSP signing end-entity certificates. This would require you to have access
> > to the value of requiredEKUIfPresent.
> 
> Even if we were interested in this (I can't immediately think of an
> application of this knowledge), it would be better to collect this
> measurement in the OCSP code.

Really? You do realize that you don't check revocation status of OCSP response signing certificates at all, and that it would be a *very* good idea to move to making them short-lived due to that, right?
(In reply to Richard Barnes [:rbarnes] from comment #8)
> Given that we're focused on EE certificates, it's not useful to collect this
> far out -- we reject EE certificates that last more than 39 months.
> 
> https://dxr.mozilla.org/mozilla-central/source/security/certverifier/
> NSSCertDBTrustDomain.cpp#930

I haven't looked at the patch closely to really see what's being collected, but note that this is only true when attempting to verify an EV cert for EV. DV certs aren't limited yet. Even if the EV cert is for some reason valid for longer than 39 months, it just gets downgraded to DV.
(In reply to Kate McKinley [:kmckinley] from comment #7)
> We plan to use this information so we can make decisions on number of
> connections affected when we look to enhance or relax scrutiny of
> certificates presented by web servers, which has a direct affect on the
> security and privacy of their communications. Can you elaborate on the
> privacy risk you see here?

My concern is that collecting a very precise measurement could leak information about the user's browsing history to Telemetry. For example, if a duration of 234 days was very rare, then it's possible to infer that any samples in bucket "234" indicate that a user visited a specific site. Maybe this concern is unfounded?

Also, I don't like histograms with a lot of buckets (> ~100). Each bucket takes up memory in the browser (including Fennec) and the server needs to do more work to aggregate the histograms.

So I would basically recommend using only as much precision as you need to answer your questions.

By the way, how do you plan to study the trends? The Telemetry dashboard isn't good for visualizing the evolution of *distributions* over time (only averages & percentiles). You'll need to write your own webpage for this (using telemetry.js) or use Spark to do a custom analysis on the Telemetry backend.
(In reply to Richard Barnes [:rbarnes] from comment #8)
> > +  if (Telemetry::CanRecordExtended()) {
> 
> Vladan: Is the recommended practice here to gate telemetry code with this
> sort of thing, or just consider that an internal of Telemetry::Accumulate? 
> It seems like the latter would be preferable, in order preserve your ability
> to update the Telemetry API.

Yup, you can just call Accumulate and let Telemetry figure out if it's enabled or not

> No, we really do need more granularity than years.  We're trying to see if
> there's a reasonable threshold we can set for not doing OCSP checking, which
> means we need at least week-level granularity.  There's no privacy risk here
> because certificate lifetimes tend to not to be unique to a cert, but common
> to all certs issued by a CA.  (And we already have _BY_CA metrics.)

Ok. I'd prefer week-granularity if possible
> > +  "SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME": {
> > +    "expires_in_version": "55",
> 
> Vladan: I think this duration is appropriate.  We're measuring trends in the
> PKI here, so we need longitudinal data.

Ok, the aggregated histogram data will be available effectively forever, but note that the raw individual pings will be kept for up to a year.

Also note that this histogram is opt-in, so the Release channel population will only report it if they explicitly opted into "extended Telemetry" at some point (about ~1% of population). This histogram will always be reported on the pre-release channels.

> How about we compromise on 800?  :)  That gives us two years of days, plus
> some overhead.

I'll let you guys decide on the minimum required resolution. I did like the week-resolution idea. 365*2/7 = ~100 buckets
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #10)
> (In reply to Richard Barnes [:rbarnes] from comment #9)
> > > Also, it would be better to separate out SSL end-entity certificates from
> > > OCSP signing end-entity certificates. This would require you to have access
> > > to the value of requiredEKUIfPresent.
> > 
> > Even if we were interested in this (I can't immediately think of an
> > application of this knowledge), it would be better to collect this
> > measurement in the OCSP code.
> 
> Really? You do realize that you don't check revocation status of OCSP
> response signing certificates at all, and that it would be a *very* good
> idea to move to making them short-lived due to that, right?

My point is that it's not actionable from the Firefox end.  It may be a good idea for OCSP signers to have shorter-lived certs, but we *already* don't check revocation for them.  I guess we could just reject sufficiently long-lived OCSP signing certs altogether, but since there's no BRs on that, I would be very hesitant.  Especially since the CA can just do OCSP off the intermediate anyway.
(In reply to Richard Barnes [:rbarnes] from comment #14)
> My point is that it's not actionable from the Firefox end.  It may be a good
> idea for OCSP signers to have shorter-lived certs, but we *already* don't
> check revocation for them.  I guess we could just reject sufficiently
> long-lived OCSP signing certs altogether, but since there's no BRs on that,
> I would be very hesitant.  Especially since the CA can just do OCSP off the
> intermediate anyway.

Understood. My point is that (perhaps in another bug) it would be good to measure such things so that we can have enough information to write new BRs.
Attachment #8660220 - Attachment is obsolete: true
Attachment #8661450 - Flags: review?(vladan.bugzilla)
Attachment #8661450 - Flags: review?(rlb)
Comment on attachment 8661450 [details] [diff] [review]
Measure end-entity certificate lifetime in weeks, up to 2 years

Review of attachment 8661450 [details] [diff] [review]:
-----------------------------------------------------------------

r+ for the histogram
Attachment #8661450 - Flags: review?(vladan.bugzilla) → review+
Updated email address.
Attachment #8661450 - Attachment is obsolete: true
Attachment #8661450 - Flags: review?(rlb)
Attachment #8661452 - Flags: review?(vladan.bugzilla)
Attachment #8661452 - Flags: review?(rlb)
Attachment #8661452 - Flags: review?(vladan.bugzilla) → review+
Comment on attachment 8661452 [details] [diff] [review]
Measure end-entity certificate lifetime in weeks, up to 2 years

Review of attachment 8661452 [details] [diff] [review]:
-----------------------------------------------------------------

Getting closer, but needs to be more paranoid about null pointers.

::: security/manager/ssl/SSLServerCertVerification.cpp
@@ +1142,5 @@
>    AccumulateTelemetryForRootCA(Telemetry::CERT_VALIDATION_SUCCESS_BY_CA,
>                                 rootCert);
>  }
>  
> +const uint64_t ONE_WEEK_IN_SECONDS = (7 * (24 * 60 *60)); // doesn't account for leap seconds

The comments make these lines overly long.  I would just put a single comment above these, say, "These values are approximate, not taking leap days / seconds into account"

@@ +1143,5 @@
>                                 rootCert);
>  }
>  
> +const uint64_t ONE_WEEK_IN_SECONDS = (7 * (24 * 60 *60)); // doesn't account for leap seconds
> +const uint64_t ONE_YEAR_IN_WEEKS   = 52; // not exactly, but close enough

Nit: Blank line after this would be nice.

@@ +1148,5 @@
> +// Gathers telemetry on the certificate lifetimes we observe in the wild
> +void
> +GatherEndEntityTelemetry(const ScopedCERTCertList& certList)
> +{
> +  CERTCertListNode* endEntity = CERT_LIST_HEAD(certList);

You need to check that endEntity is not null, and endEntity->cert.  Follow the "PR_ASSERT(val) / if (!val) {return}" pattern that GatherCATelemetry follows.

@@ +1152,5 @@
> +  CERTCertListNode* endEntity = CERT_LIST_HEAD(certList);
> +
> +  PRTime notBefore, notAfter;
> +
> +  if (CERT_GetCertTimes(endEntity->cert, &notBefore, &notAfter) == PR_SUCCESS) {

I would just do this and notBefore/notAfter as sanity checks before proceeding.  Apply the same pattern for these as above.

To be clear, we should not collect telemetry when notBefore >= notAfter.  That's a bug, and should be caught way before we get here.

@@ +1157,5 @@
> +    uint64_t durationInWeeks =
> +      (((notAfter > notBefore) ? (notAfter - notBefore)
> +                               : (notBefore - notAfter)) /
> +       PR_USEC_PER_SEC) /
> +      ONE_WEEK_IN_SECONDS;

Once you fix the above, you can just do (notAfter - notBefore) / PR_USEC_PER_SEC) / ONE_WEEK_IN_SECONDS
Attachment #8661452 - Flags: review?(rlb) → review-
Attachment #8661959 - Flags: review?(rlb)
Comment on attachment 8661959 [details] [diff] [review]
Telemetry for certificate lifetime

Review of attachment 8661959 [details] [diff] [review]:
-----------------------------------------------------------------

I assume you meant to mark the other patch as obsolete?

r=me with the below fixed.

::: security/manager/ssl/SSLServerCertVerification.cpp
@@ +1164,5 @@
> +  }
> +
> +  PRTime notBefore, notAfter;
> +
> +  if (CERT_GetCertTimes(endEntityCert, &notBefore, &notAfter) == PR_SUCCESS) {

It's not tragic to do it as you have it here, but I would prefer (1) NS_FAILED / NS_SUCCEEDED and (2) that this be a speed bump like the other checks above, rather than introducing a new block.  In other words, if (NS_FAILED(...)) { return; }

@@ +1165,5 @@
> +
> +  PRTime notBefore, notAfter;
> +
> +  if (CERT_GetCertTimes(endEntityCert, &notBefore, &notAfter) == PR_SUCCESS) {
> +    PR_ASSERT(!(notAfter < notBefore));

Slight preference for using >= here.
Attachment #8661959 - Flags: review?(rlb) → review+
Comment on attachment 8661959 [details] [diff] [review]
Telemetry for certificate lifetime

Review of attachment 8661959 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/manager/ssl/SSLServerCertVerification.cpp
@@ +1156,5 @@
> +  if (!endEntityNode) {
> +    return;
> +  }
> +
> +  CERTCertificate * endEntityCert = endEntityNode->cert;

drive-by nit: no space before the *

@@ +1162,5 @@
> +  if (!endEntityCert) {
> +    return;
> +  }
> +
> +  PRTime notBefore, notAfter;

nit: declarations each on their own line - i.e.:

PRTime notBefore;
PRTime notAfter;

@@ +1164,5 @@
> +  }
> +
> +  PRTime notBefore, notAfter;
> +
> +  if (CERT_GetCertTimes(endEntityCert, &notBefore, &notAfter) == PR_SUCCESS) {

Except that CERT_GetCertTimes returns a SECStatus, so none of NS_FAILED/NS_SUCCEEDED or PR_SUCCESS are applicable here. SECSuccess/SECFailure are what we want to use.
I do agree with checking for an error and returning early, though:

if (CERT_GetCertTimes(endEntityCert, &notBefore, &notAfter) != SECSuccess) {
  return;
}

@@ +1175,5 @@
> +      / PR_USEC_PER_SEC
> +      / ONE_WEEK_IN_SECONDS;
> +
> +    if (durationInWeeks > (2*ONE_YEAR_IN_WEEKS)) {
> +      durationInWeeks = (2*ONE_YEAR_IN_WEEKS) + 1;

nit: spaces around operators: (2 * ONE_YEAR_IN_WEEKS)

::: toolkit/components/telemetry/Histograms.json
@@ +1371,5 @@
> +  "SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME": {
> +    "expires_in_version": "55",
> +    "alert_emails": ["seceng-telemetry@mozilla.com"],
> +    "kind": "enumerated",
> +    "n_values": 125,

should this be 106, then? (since we won't have any buckets with an index (starting from 0) greater than 105?)
(In reply to David Keeler [:keeler] (use needinfo?) from comment #22)
> Comment on attachment 8661959 [details] [diff] [review]
> Telemetry for certificate lifetime
> 
> Review of attachment 8661959 [details] [diff] [review]:
> -----------------------------------------------------------------
> ::: toolkit/components/telemetry/Histograms.json
> @@ +1371,5 @@
> > +  "SSL_OBSERVED_END_ENTITY_CERTIFICATE_LIFETIME": {
> > +    "expires_in_version": "55",
> > +    "alert_emails": ["seceng-telemetry@mozilla.com"],
> > +    "kind": "enumerated",
> > +    "n_values": 125,
> 
> should this be 106, then? (since we won't have any buckets with an index
> (starting from 0) greater than 105?)

Once a histogram has been defined, we can't change the number of buckets. This gives us a few extras if we decide we want to track something additional in this histogram. For example, if we want to track by year.
Attachment #8661959 - Attachment is obsolete: true
Comment on attachment 8662486 [details] [diff] [review]
Telemetry for certificate lifetime

Carrying over previous review
Attachment #8662486 - Flags: review+
Attachment #8661452 - Attachment is obsolete: true
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/2955c61e6ecf
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
You need to log in before you can comment on or make changes to this bug.