SSL_OCSP_STAPLING telemetry reports 0% OCSP stapling rate in Nightly 34+

RESOLVED FIXED in Firefox 35

Status

()

Core
Security: PSM
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: briansmith, Assigned: briansmith)

Tracking

({perf, regression})

Trunk
mozilla37
perf, regression
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox35 fixed, firefox36 fixed, firefox37 fixed)

Details

Attachments

(2 attachments)

Nightly 27: 8%
Nightly 28: 20%
Nightly 29: 7%
Nightly 30: 6%
Nightly 31: 2%
Nightly 32: 1%
Nightly 33: 1%
Nightly 34: 0%
Nightly 35: 0%
Nightly 36: 0%
Nightly 37: 0%

Firefox 31 was the first release with mozilla::pkix, IIRC. It appears the switch to mozilla::pkix made the numbers change substantially. IIRC, in that release the counting was moved from security/manager/ssl/src to security/certverifier. It might be the case that non-SSL-handshake verifications are now being counted whereas before they weren't. They shouldn't be being counted.

I believe recently we also added a call to CertVerifier for SPDY connection coalescing. That might be the cause of the 2% -> 1% or 1% -> 0% drops. Again, those calls shouldn't be counted.

As it is now, the telemetry isn't useful for measuring OCSP stapling effectiveness.
Also, the above comments are assuming that the problem is in the counting, and not in the actual processing of OCSP stapling. If the problem is that we're no longer processing stapled OCSP responses then that is a much bigger problem.
Created attachment 8535442 [details] [diff] [review]
fix-ocsp-stapling-telemetry.patch

There are three problems:

1. When we verify an SSL certificate, sometimes we call BuildCertChain 2 or 3 times for a single validation, due to multiple possibilities for the keyUsage extension. The telemetry is potentially counted for every BuildCertChain call, although I think this shouldn't actually ever happen in practice.

2. We call CertVerifier::VerifyCert to verify SSL certificates outside of SSL handshakes--e.g. in the connection coalescing logic. Thus, even though we only want to count telemetry for SSL handshakes, we end up counting other cases as well.

3. We call CertVerifier::VerifyCert for non-SSL-certificate verifications. We shouldn't count these validations in the telemetry, but we do.

All three of these have the effect of making it appear as though there is less OCSP stapling happening in the world than there actually is.

I would like to uplift this patch to Firefox 35 after it bakes in Nightly for a while.
Assignee: nobody → brian
Status: NEW → ASSIGNED
Attachment #8535442 - Flags: review?(dkeeler)
Comment on attachment 8535442 [details] [diff] [review]
fix-ocsp-stapling-telemetry.patch

Review of attachment 8535442 [details] [diff] [review]:
-----------------------------------------------------------------

Great - thanks for doing this, Brian.
Attachment #8535442 - Flags: review?(dkeeler) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/610eb25d2d63
Target Milestone: --- → mozilla37
Comment on attachment 8535442 [details] [diff] [review]
fix-ocsp-stapling-telemetry.patch

Approval Request Comment
[Feature/regressing bug #]: bug 915930, bug 1049095, and probably others.
[User impact if declined]: We won't have the telemetry we need to inform us about necessary changes to revocation checking mechanisms in Firefox.
[Describe test coverage new/current, TBPL]: This has existing xpcshell tests.
[Risks and why]: This is a telemetry-only refactoring. Very low risk.
[String/UUID change made/needed]: None.
Attachment #8535442 - Flags: approval-mozilla-beta?
Attachment #8535442 - Flags: approval-mozilla-aurora?
status-firefox35: --- → affected
status-firefox36: --- → affected
status-firefox37: --- → affected
https://hg.mozilla.org/mozilla-central/rev/610eb25d2d63
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
status-firefox37: affected → fixed
Attachment #8535442 - Flags: approval-mozilla-beta?
Attachment #8535442 - Flags: approval-mozilla-beta+
Attachment #8535442 - Flags: approval-mozilla-aurora?
Attachment #8535442 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/63f7bbd24da3
https://hg.mozilla.org/releases/mozilla-beta/rev/f378b68991bc
status-firefox35: affected → fixed
status-firefox36: affected → fixed
Created attachment 8539692 [details] [diff] [review]
fix-ocsp-stapling-telemetry-2.patch

I believe this additional patch should effectively be a no-op, but I think it greatly helps clarify the fix. In particular, I think the primary reason the telemetry was wrong was that we were counting it for intermediate certificates, not just end-entities. Since intermediates never have stapled responses, this greatly skewed the counts.

Since the original patch landed on Nightly, the telemetry is now showing a ~12% stapling rate on Nightly. Similarly, there's a 13% stapling rate on Aurora over the last two days. Beta is at 6% but AFAICT that's just counting a single day where there are probably 1/2 people on the older beta and 1/2 people on the newer beta. So, it looks like we should expect ~12% stapling rate, which is actually pretty good, compared to the historical numbers noted above.
Attachment #8539692 - Flags: review?(dkeeler)
Comment on attachment 8539692 [details] [diff] [review]
fix-ocsp-stapling-telemetry-2.patch

Review of attachment 8539692 [details] [diff] [review]:
-----------------------------------------------------------------

Good call.
Attachment #8539692 - Flags: review?(dkeeler) → review+
Comment on attachment 8539692 [details] [diff] [review]
fix-ocsp-stapling-telemetry-2.patch

Review of attachment 8539692 [details] [diff] [review]:
-----------------------------------------------------------------

https://hg.mozilla.org/integration/mozilla-inbound/rev/33d139e87c89

[Uplift Approval]
This is a tiny improvement on the previously-uplifted patches, which only affects telemetry, and which mostly just adds documentation. It has basically zero risk. No string changes or interface changes are made in this patch. This patch is needed to make sure our OCSP stapling telemetry is accurate so that we can make decisions on revocation checking policy.
Attachment #8539692 - Flags: checkin+
Attachment #8539692 - Flags: approval-mozilla-beta?
Attachment #8539692 - Flags: approval-mozilla-aurora?
Comment on attachment 8539692 [details] [diff] [review]
fix-ocsp-stapling-telemetry-2.patch

approving now so they can get into this week's only beta.
Attachment #8539692 - Flags: approval-mozilla-beta?
Attachment #8539692 - Flags: approval-mozilla-beta+
Attachment #8539692 - Flags: approval-mozilla-aurora?
Attachment #8539692 - Flags: approval-mozilla-aurora+
I'm not sure if the normal needs-landing-on-aurora-and/or-beta reports will notice these patches, so I landed them myself:

https://hg.mozilla.org/releases/mozilla-aurora/rev/b304acd94272
https://hg.mozilla.org/releases/mozilla-beta/rev/54e086d2dcc3
Flags: qe-verify-
https://hg.mozilla.org/mozilla-central/rev/33d139e87c89
A follow-up: Before this bug was fixed, the telemetry was showing that approximately 0% of full handshakes were doing OCSP stapling. After this bug was fixed, we can now see that over 10% of handshakes are doing OCSP stapling.
You need to log in before you can comment on or make changes to this bug.