add telemetry for how many certificate overrides users have

RESOLVED FIXED in Firefox 34

Status

()

Core
Security: PSM
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: keeler, Assigned: keeler)

Tracking

unspecified
mozilla36
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox34 fixed, firefox35 fixed, firefox36 fixed)

Details

Attachments

(3 attachments, 5 obsolete attachments)

It would be worthwhile to see what the distribution of certificate error overrides is. For example, is it the case that many users have many overrides? Or do relatively few users account for the majority of overrides?
Created attachment 8509672 [details] [diff] [review]
patch
Assignee: nobody → dkeeler
Status: NEW → ASSIGNED
Attachment #8509672 - Flags: review?(jjones)
(Assignee)

Updated

3 years ago
Attachment #8509672 - Flags: review?(mmc)

Comment 2

3 years ago
Comment on attachment 8509672 [details] [diff] [review]
patch

Review of attachment 8509672 [details] [diff] [review]:
-----------------------------------------------------------------

While I'm clearly not the most familiar with the code in question, based on some digging I certainly see no issues with your implementation.
Attachment #8509672 - Flags: review?(jjones) → review+
Comment on attachment 8509672 [details] [diff] [review]
patch

Review of attachment 8509672 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Histograms.json
@@ +6461,5 @@
>    },
> +  "SSL_PERMANENT_CERT_ERROR_OVERRIDES": {
> +    "expires_in_version": "never",
> +    "kind": "exponential",
> +    "high": 1000,

! Hopefully no one has that many :) Do you know how this will bucketize?
Attachment #8509672 - Flags: review?(mmc) → review+
Thanks for the reviews!

(In reply to [:mmc] Monica Chew (please use needinfo) from comment #3)
> ::: toolkit/components/telemetry/Histograms.json
> @@ +6461,5 @@
> >    },
> > +  "SSL_PERMANENT_CERT_ERROR_OVERRIDES": {
> > +    "expires_in_version": "never",
> > +    "kind": "exponential",
> > +    "high": 1000,
> 
> ! Hopefully no one has that many :) Do you know how this will bucketize?

Indeed :)
I thought about this more and changed it to a high value of 1024 and 10 buckets, so I'm expecting it to bucket roughly like so: 0-1, 1-2, 2-4, 4-8, ..., 512-1024.

Here's to make sure it compiles:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=1cd56e994c3a
Created attachment 8510442 [details] [diff] [review]
patch v1.1

This is the patch as landed.
https://hg.mozilla.org/integration/mozilla-inbound/rev/f69fa3c13d1f
Attachment #8509672 - Attachment is obsolete: true
Attachment #8510442 - Flags: review+
Created attachment 8510697 [details] [diff] [review]
fix nsCertOverrideService

It turns out nsCertOverrideService depends on NSS being initialized, which causes a circular dependency with the telemetry patch. This patch "fixes" nsCertOverrideService so it doesn't depend on NSS to be initialized (I say "fixes" because there's still a lot wrong with that implementation, but that's a bug for another day).
Attachment #8510697 - Flags: review?(mmc)
Comment on attachment 8510697 [details] [diff] [review]
fix nsCertOverrideService

Review of attachment 8510697 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for fixing this.

::: security/manager/ssl/src/nsCertOverrideService.cpp
@@ +98,5 @@
>    }
>  
> +  // Note that the names of these variables would seem to indicate that at one
> +  // point another hash algorithm was used and is still supported for backwards
> +  // compatibility. This is not the case. It has always been SHA256.

:P A good argument for YAGNI.
Attachment #8510697 - Flags: review?(mmc) → review+
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #8)
> :P A good argument for YAGNI.

I know, right? Thanks for the quick review.
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=aef14b976865
Created attachment 8512181 [details] [diff] [review]
part 1/2: decouple nsCertOverrideService from NSS

This is as the patch landed (took out too many #includes).
Attachment #8510697 - Attachment is obsolete: true
Attachment #8512181 - Flags: review+
Created attachment 8512184 [details] [diff] [review]
part 2/2: telemetry for permanent cert overrides

Part 2 as landed.
Attachment #8510442 - Attachment is obsolete: true
Attachment #8512184 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/50650e0f0edf
https://hg.mozilla.org/mozilla-central/rev/b591ad43d53e
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
sorry had to back this out since this was causing perma failures in xperf on win7 opt like https://treeherder.mozilla.org/ui/logviewer.html#?job_id=3353727&repo=mozilla-inbound
Status: RESOLVED → REOPENED
Flags: needinfo?(dkeeler)
Resolution: FIXED → ---
(Assignee)

Updated

3 years ago
Depends on: 1090355
So it looks like this patch triggers the reading of the cert_override.txt file on every startup. I don't think cert_override.txt currently gets read on every startup, right?
True. Unfortunately, I'm not aware of any other way of ensuring that the telemetry measurement is taken at least once per run. I'm open to suggestions for how to improve this :)
Flags: needinfo?(dkeeler)
Btw, you should set the "expires_in_version" field for histograms used in temporary experiments. We have over 1000 histograms right now and more are being added all the time.

You should also set the "alert_emails" field so that your team gets automatically notified when there's a change in the distribution of the histogram. You can request a mail alias for this purpose from ServiceNow.
(In reply to David Keeler (:keeler) [use needinfo?] from comment #16)
> True. Unfortunately, I'm not aware of any other way of ensuring that the
> telemetry measurement is taken at least once per run. I'm open to
> suggestions for how to improve this :)

You could modify the code that reads cert_override.txt to record the value in Telemetry. That way there's no additional I/O cost to having this probe. It wouldn't get recorded every run, but I'm guessing it would not skew the probe's data either?

You could also trigger the Telemetry probe off of a shutdown event such as "quit-application-granted" (for desktop) and "application-background" for mobile. Generally, I think startup time is more valuable than shutdown time. And if the cert_override.txt data was read & cached during the session, there would be no additional IO cost.
(In reply to Vladan Djeric (:vladan) from comment #18)
> You could modify the code that reads cert_override.txt to record the value
> in Telemetry. That way there's no additional I/O cost to having this probe.
> It wouldn't get recorded every run, but I'm guessing it would not skew the
> probe's data either?

This is what I eventually went with. Even if it doesn't get recorded every run, we will probably be able to figure out what percentage of users it represents based on relative volume. In any case it seems to happen every run for me anyway (although that might just be due to my profile/tiles/etc.).

> You could also trigger the Telemetry probe off of a shutdown event such as
> "quit-application-granted" (for desktop) and "application-background" for
> mobile. Generally, I think startup time is more valuable than shutdown time.
> And if the cert_override.txt data was read & cached during the session,
> there would be no additional IO cost.

This didn't seem to work - the telemetry would be accumulated but then not stored or transmitted anywhere upon shutdown.

Anyway, thanks for the input! I'll see if the talos changes are still necessary...
Created attachment 8513819 [details] [diff] [review]
part 2/2: telemetry for permanent overrides

This changed enough such that I think I should at least get a quick sign-off on it. Try looked good: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=81ee878b05ec
Attachment #8512184 - Attachment is obsolete: true
Attachment #8513819 - Flags: review?(mmc)
Comment on attachment 8513819 [details] [diff] [review]
part 2/2: telemetry for permanent overrides

Review of attachment 8513819 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/manager/ssl/src/nsCertOverrideService.cpp
@@ +625,5 @@
> +  return PL_DHASH_NEXT;
> +}
> +
> +void
> +nsCertOverrideService::DoPermanentOverrideCountTelemetry()

CountPermanentOverrideTelemetry is slightly better, I think
Attachment #8513819 - Flags: review?(mmc) → review+
Created attachment 8514459 [details] [diff] [review]
parat 2/2: telemetry for permanent overrides as landed

Thanks for the review! Let's hope this sticks, now:
https://hg.mozilla.org/integration/mozilla-inbound/rev/70af2ce2bdfc
https://hg.mozilla.org/integration/mozilla-inbound/rev/c1d47db66068
Attachment #8513819 - Attachment is obsolete: true
Attachment #8514459 - Flags: review+
Bustage fix to include forgotten telemetry header: https://hg.mozilla.org/integration/mozilla-inbound/rev/ce3e04b35c42
Created attachment 8518351 [details] [diff] [review]
patch for uplift

Approval Request Comment
[Feature/regressing bug #]: permanent certificate override telemetry
[User impact if declined]: we won't get this data and be able to act on it as quickly, otherwise
[Describe test coverage new/current, TBPL]: seems to be working in Nightly
[Risks and why]: low - it's just telemetry
[String/UUID change made/needed]: none (this patch is just the telemetry - it turns out the other changes weren't strictly necessary)
Attachment #8518351 - Flags: review+
Attachment #8518351 - Flags: approval-mozilla-beta?
Attachment #8518351 - Flags: approval-mozilla-aurora?
Attachment #8518351 - Flags: approval-mozilla-beta?
Attachment #8518351 - Flags: approval-mozilla-beta+
Attachment #8518351 - Flags: approval-mozilla-aurora?
Attachment #8518351 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/339b62ba542d
status-firefox34: --- → affected
status-firefox35: --- → fixed
status-firefox36: --- → fixed

Updated

3 years ago
Blocks: 467317
You need to log in before you can comment on or make changes to this bug.