Closed Bug 1085509 Opened 10 years ago Closed 10 years ago

add telemetry for how many certificate overrides users have

Categories

(Core :: Security: PSM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla36
Tracking Status
firefox34 --- fixed
firefox35 --- fixed
firefox36 --- fixed

People

(Reporter: keeler, Assigned: keeler)

References

Details

Attachments

(3 files, 5 obsolete files)

It would be worthwhile to see what the distribution of certificate error overrides is. For example, is it the case that many users have many overrides? Or do relatively few users account for the majority of overrides?
Attached patch patch (obsolete) — Splinter Review
Assignee: nobody → dkeeler
Status: NEW → ASSIGNED
Attachment #8509672 - Flags: review?(jjones)
Comment on attachment 8509672 [details] [diff] [review]
patch

Review of attachment 8509672 [details] [diff] [review]:
-----------------------------------------------------------------

While I'm clearly not the most familiar with the code in question, based on some digging I certainly see no issues with your implementation.
Attachment #8509672 - Flags: review?(jjones) → review+
Comment on attachment 8509672 [details] [diff] [review]
patch

Review of attachment 8509672 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Histograms.json
@@ +6461,5 @@
>    },
> +  "SSL_PERMANENT_CERT_ERROR_OVERRIDES": {
> +    "expires_in_version": "never",
> +    "kind": "exponential",
> +    "high": 1000,

! Hopefully no one has that many :) Do you know how this will bucketize?
Attachment #8509672 - Flags: review?(mmc) → review+
Thanks for the reviews!

(In reply to [:mmc] Monica Chew (please use needinfo) from comment #3)
> ::: toolkit/components/telemetry/Histograms.json
> @@ +6461,5 @@
> >    },
> > +  "SSL_PERMANENT_CERT_ERROR_OVERRIDES": {
> > +    "expires_in_version": "never",
> > +    "kind": "exponential",
> > +    "high": 1000,
> 
> ! Hopefully no one has that many :) Do you know how this will bucketize?

Indeed :)
I thought about this more and changed it to a high value of 1024 and 10 buckets, so I'm expecting it to bucket roughly like so: 0-1, 1-2, 2-4, 4-8, ..., 512-1024.

Here's to make sure it compiles:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=1cd56e994c3a
Attached patch patch v1.1 (obsolete) — Splinter Review
This is the patch as landed.
https://hg.mozilla.org/integration/mozilla-inbound/rev/f69fa3c13d1f
Attachment #8509672 - Attachment is obsolete: true
Attachment #8510442 - Flags: review+
Attached patch fix nsCertOverrideService (obsolete) — Splinter Review
It turns out nsCertOverrideService depends on NSS being initialized, which causes a circular dependency with the telemetry patch. This patch "fixes" nsCertOverrideService so it doesn't depend on NSS to be initialized (I say "fixes" because there's still a lot wrong with that implementation, but that's a bug for another day).
Attachment #8510697 - Flags: review?(mmc)
Comment on attachment 8510697 [details] [diff] [review]
fix nsCertOverrideService

Review of attachment 8510697 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for fixing this.

::: security/manager/ssl/src/nsCertOverrideService.cpp
@@ +98,5 @@
>    }
>  
> +  // Note that the names of these variables would seem to indicate that at one
> +  // point another hash algorithm was used and is still supported for backwards
> +  // compatibility. This is not the case. It has always been SHA256.

:P A good argument for YAGNI.
Attachment #8510697 - Flags: review?(mmc) → review+
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #8)
> :P A good argument for YAGNI.

I know, right? Thanks for the quick review.
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=aef14b976865
This is as the patch landed (took out too many #includes).
Attachment #8510697 - Attachment is obsolete: true
Attachment #8512181 - Flags: review+
Part 2 as landed.
Attachment #8510442 - Attachment is obsolete: true
Attachment #8512184 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/50650e0f0edf
https://hg.mozilla.org/mozilla-central/rev/b591ad43d53e
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
sorry had to back this out since this was causing perma failures in xperf on win7 opt like https://treeherder.mozilla.org/ui/logviewer.html#?job_id=3353727&repo=mozilla-inbound
Status: RESOLVED → REOPENED
Flags: needinfo?(dkeeler)
Resolution: FIXED → ---
So it looks like this patch triggers the reading of the cert_override.txt file on every startup. I don't think cert_override.txt currently gets read on every startup, right?
True. Unfortunately, I'm not aware of any other way of ensuring that the telemetry measurement is taken at least once per run. I'm open to suggestions for how to improve this :)
Flags: needinfo?(dkeeler)
Btw, you should set the "expires_in_version" field for histograms used in temporary experiments. We have over 1000 histograms right now and more are being added all the time.

You should also set the "alert_emails" field so that your team gets automatically notified when there's a change in the distribution of the histogram. You can request a mail alias for this purpose from ServiceNow.
(In reply to David Keeler (:keeler) [use needinfo?] from comment #16)
> True. Unfortunately, I'm not aware of any other way of ensuring that the
> telemetry measurement is taken at least once per run. I'm open to
> suggestions for how to improve this :)

You could modify the code that reads cert_override.txt to record the value in Telemetry. That way there's no additional I/O cost to having this probe. It wouldn't get recorded every run, but I'm guessing it would not skew the probe's data either?

You could also trigger the Telemetry probe off of a shutdown event such as "quit-application-granted" (for desktop) and "application-background" for mobile. Generally, I think startup time is more valuable than shutdown time. And if the cert_override.txt data was read & cached during the session, there would be no additional IO cost.
(In reply to Vladan Djeric (:vladan) from comment #18)
> You could modify the code that reads cert_override.txt to record the value
> in Telemetry. That way there's no additional I/O cost to having this probe.
> It wouldn't get recorded every run, but I'm guessing it would not skew the
> probe's data either?

This is what I eventually went with. Even if it doesn't get recorded every run, we will probably be able to figure out what percentage of users it represents based on relative volume. In any case it seems to happen every run for me anyway (although that might just be due to my profile/tiles/etc.).

> You could also trigger the Telemetry probe off of a shutdown event such as
> "quit-application-granted" (for desktop) and "application-background" for
> mobile. Generally, I think startup time is more valuable than shutdown time.
> And if the cert_override.txt data was read & cached during the session,
> there would be no additional IO cost.

This didn't seem to work - the telemetry would be accumulated but then not stored or transmitted anywhere upon shutdown.

Anyway, thanks for the input! I'll see if the talos changes are still necessary...
This changed enough such that I think I should at least get a quick sign-off on it. Try looked good: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=81ee878b05ec
Attachment #8512184 - Attachment is obsolete: true
Attachment #8513819 - Flags: review?(mmc)
Comment on attachment 8513819 [details] [diff] [review]
part 2/2: telemetry for permanent overrides

Review of attachment 8513819 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/manager/ssl/src/nsCertOverrideService.cpp
@@ +625,5 @@
> +  return PL_DHASH_NEXT;
> +}
> +
> +void
> +nsCertOverrideService::DoPermanentOverrideCountTelemetry()

CountPermanentOverrideTelemetry is slightly better, I think
Attachment #8513819 - Flags: review?(mmc) → review+
Attached patch patch for upliftSplinter Review
Approval Request Comment
[Feature/regressing bug #]: permanent certificate override telemetry
[User impact if declined]: we won't get this data and be able to act on it as quickly, otherwise
[Describe test coverage new/current, TBPL]: seems to be working in Nightly
[Risks and why]: low - it's just telemetry
[String/UUID change made/needed]: none (this patch is just the telemetry - it turns out the other changes weren't strictly necessary)
Attachment #8518351 - Flags: review+
Attachment #8518351 - Flags: approval-mozilla-beta?
Attachment #8518351 - Flags: approval-mozilla-aurora?
Attachment #8518351 - Flags: approval-mozilla-beta?
Attachment #8518351 - Flags: approval-mozilla-beta+
Attachment #8518351 - Flags: approval-mozilla-aurora?
Attachment #8518351 - Flags: approval-mozilla-aurora+
Blocks: 467317
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: