Closed Bug 1758372 Opened 11 months ago Closed 8 months ago

Google Trust Services: Incorrect OCSP response for issued certificate

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: matthias, Assigned: cadecairns)

Details

(Whiteboard: [ca-compliance])

Attachments

(6 files)

116.57 KB, image/png
Details
38.31 KB, application/octet-stream
Details
88.92 KB, application/octet-stream
Details
76.46 KB, application/octet-stream
Details
109.68 KB, application/octet-stream
Details
93.43 KB, application/octet-stream
Details
Attached image OCSP_Unauthorized.PNG

Today I noticed a (presumably temporary) problem with GTS' OCSP responders; which I verified with the OCSP function of crt.sh as shown by the attached screenshot. I didn't have a console ready to export the OCSP response; and don't know how I'd export OCSP error results from Firefox.

According to crt.sh, the responder responded with an 'unauthorized' response; which is not a response status that GTS' OCSP responders use to reply, according to the section 4.9.10 of GTS' CPS.

PS. Further browser-related info: After posting I'll also add a screenshot of the FF browser window for further documentation, in case it has some useful information. My browser was configured to require good OCSP responses (security.OCSP.require = true), which is likely why I discovered the issue.

As mentioned; the screenshot of the failing OCSP responses for the certificate served by www.youtube.com -- all active and valid certificates of which are signed by GTS.

Thanks for alerting us. GTS have assigned an engineer to investigate and will provide an update by the end of this week.

Assignee: bwilson → doughornyak
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

1. How your CA first became aware of the problem

This bug was filed in Bugzilla.

2. A timeline of the actions your CA took in response.

YYYY-MM-DD (UTC) Description
2022-03-07 03:57 A new certificate for YouTube is issued by Google Trust Services’ GTS CA 1C3.
2022-03-07 ??:?? The bug reporter temporarily receives a SEC_ERROR_OCSP_SERVER_ERROR message while validating a GTS-issued certificate for YouTube using Firefox in OCSP hard fail mode.
2022-03-07 14:14 crt.sh recorded the response “Unauthorized” when querying the GTS OCSP responder for the newly-issued certificate.
2022-03-07 15:51 This bug is filed in Bugzilla.
2022-03-07 22:00 The bug is acknowledged and the incident response process is initiated.
2022-03-10 14:25 The engineering investigation concludes that the OCSP check by crt.sh was not performed for the currently-served certificate but for the newly issued one, and that the initial OCSP message was related to a temporary error during the OCSP query. Therefore, the GTS Policy Authority determines that this is not an incident.

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.

N/A. See section 6.

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.)

N/A. No issues were identified where certificates are involved.

5. In a case involving certificates, the complete certificate data for the problematic certificates.

N/A.

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

Based on our review of the initial report, we believe that the reporter assumed that the currently served certificate is the latest issued one, for which the “Unauthorized” OCSP response recorded by crt.sh was received. However, further investigation revealed that this is not the case, and the report identifies two independent issues:

  1. The OCSP request failure that occurred in the reporter’s web browser, which prompted the investigation that led to this report.

  2. The “Unauthorized” OCSP response that crt.sh recorded from our OCSP responders for a request made against the newly-issued certificate had not yet been deployed to Google’s production infrastructure.

For the issue reported by the web browser, we believe that a network error may have occurred when the OCSP request was made to our responder. The error that Firefox displays is SEC_ERROR_OCSP_SERVER_ERROR, which according to Mozilla’s documentation, indicates that “The OCSP server experienced an internal error.” In reviewing the corresponding source code, we see this error may arise if a network error occurs during transmission of the OCSP request, or if the responder served a valid OCSP response but its status was InternalError. We conducted a review of our responder logs around the time of the report and did not detect any abnormal error rates, which may indicate that the issue occurred before reaching our infrastructure. However, given the volume of requests our responder receives and the information given, we can not be more precise with our analysis without further information.

For the “Unauthorized” OCSP response that crt.sh recorded for a precertificate, this can be explained by how our OCSP responders currently work for certificates issued by GTS CA 1C3. Whenever a certificate is issued, a corresponding OCSP response is created and is pushed to all OCSP responders. However, it may take some time for the update to propagate to our global infrastructure. Google’s current architecture delays the rollout of all certificates, ensuring that no certificates are served to relying parties without the corresponding OCSP response having propagated to all OCSP servers. Based on the timelines provided, we can confirm that the original certificate served to the reporter is not the newly-issued certificate which is being queried via crt.sh.

To the point raised about “Unauthorized” not being a response status described in section 4.9.10 of our CPS, this section of our CPS does not list the individual response codes returned by our OCSP responder and only describes that certificate serial numbers are deemed to be either “assigned”, “reserved”, or “unused”.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

The information provided on this bug does not indicate that there is a risk that needs to be mitigated since the initial error message is related to a temporary problem, and the “Unauthorized” response for the second certificate is intended behavior. We would like to propose that Mozilla resolves this bug as Invalid. However, if there is more information that might change our analysis, we are happy to review it.

Assignee: doughornyak → cadecairns
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → INVALID

(In reply to Cade Cairns from comment #3)

For the “Unauthorized” OCSP response that crt.sh recorded for a precertificate, this can be explained by how our OCSP responders currently work for certificates issued by GTS CA 1C3. Whenever a certificate is issued, a corresponding OCSP response is created and is pushed to all OCSP responders. However, it may take some time for the update to propagate to our global infrastructure. Google’s current architecture delays the rollout of all certificates, ensuring that no certificates are served to relying parties without the corresponding OCSP response having propagated to all OCSP servers. Based on the timelines provided, we can confirm that the original certificate served to the reporter is not the newly-issued certificate which is being queried via crt.sh.

CAs are required to operate OCSP services for all unexpired certificates, whether or not they are being "served to relying parties". This is similar to previous incidents. In Bug 1724276, the CA failed to operate OCSP services for certificates that were never even delivered to subscribers, let alone served to relying parties. In Bug 1640805, the CA served out-of-date OCSP responses because the responses were slow to propagate through the CDN. Those issues were appropriately treated as compliance incidents, and fixed by the CAs.

Ben: can this bug be re-opened for consistency with prior incidents?

Flags: needinfo?(bwilson)
Status: RESOLVED → REOPENED
Flags: needinfo?(bwilson)
Resolution: INVALID → ---

Please respond more fully and address concerns regarding this OCSP behavior and steps that will be taken to remedy it.

Flags: needinfo?(cadecairns)
Whiteboard: [ca-compliance]

For clarification, OCSP needs to give an appropriate response even for pre-certificates. See:
1 - Section 4.9.10 of the Baseline Requirements says "The OCSP responder MAY provide definitive responses about "reserved" certificate serial numbers, as if there was a corresponding Certificate that matches the Precertificate [RFC6962]."
2 - https://wiki.mozilla.org/CA/Required_or_Recommended_Practices#Precertificates.
3 - section 5.4 of the proposed Mozilla Root Store Policy, https://github.com/mozilla/pkipolicy/compare/master...BenWilson-Mozilla:2.8

Thank you for your comments, Andrew and Ben.

We believe that this case is different from the bugs that Andrew cited, because:

  • Relative to bug 1724276: We operate OCSP services for all certificates. Our responder responded with “Unauthorized” because data had not yet consistently propagated globally. In Bug 1724276, the OCSP responses for certain precertificates were not issued at all.
  • Relative to bug 1640805: Our OCSP responders consistently publish and propagate status updates to end-entity certificates in under 24 hours, typically in under 10 minutes. In Bug 1640805, the OCSP response propagation was greater than the 24 hour revocation timeline defined in the BRs.

To clarify, we publish OCSP responses for all certificates immediately upon creation of the corresponding precertificate, regardless of delivery to the customer, and these responses are signed before sending the precertificate to the CT logs for inclusion. The point in question here is that it may take some time for the signed update to achieve global consistency in our infrastructure.

Our OCSP responders conform to RFC 6960 and meet the CA/B Forum Baseline Requirements for serving status information for publicly trusted certificates. Specifically:

  • BRs 4.10.2: “The CA SHALL operate and maintain its CRL and OCSP capability with resources sufficient to provide a response time of ten seconds or less under normal operating conditions.” - The mean response time under normal operating conditions across all our responders is 350ms or less.
  • BRs 4.10.2: “The CA SHALL maintain an online 24x7 Repository that application software can use to automatically check the current status of all unexpired Certificates issued by the CA.” - The service is available 24/7 globally with 99.995% availability.
  • BRs 4.9.1.1: “The CA SHOULD revoke a certificate within 24 hours and MUST revoke a Certificate within 5 days ..." - Our responders publish updated status information as described after this set of bullet points.
  • BRs 4.9.10: “OCSP responses MUST have a validity interval less than or equal to ten days;” - Our response validity period is seven days minus one hour and we re-issue every OCSP response at least every 24 hours.
  • BRs 4.9.10: “For OCSP responses with validity intervals less than sixteen hours, then the CA SHALL update the information provided via an Online Certificate Status Protocol prior to one-half of the validity period before the nextUpdate.” - Does not apply; our validity interval is greater than sixteen hours.
  • BRs 4.9.10: “For OCSP responses with validity intervals greater than or equal to sixteen hours, then the CA SHALL update the information provided via an Online Certificate Status Protocol at least eight hours prior to the nextUpdate, and no later than four days after the thisUpdate.” - Our responses are updated every 24 hours + (0, 30] minutes. This excludes the time to propagate the data globally and we have monitoring and alerting in place to detect issues in propagation.

Our responders publish status information and achieve global consistency for ~99.7% of issuances within 10 minutes of signing as an upper bound. The remaining small proportion of issuance represents our subscribers’ busiest sites, inclusive of sites such as YouTube, which use a different response path for scalability reasons. Global propagation for this remaining ~0.3% of certificates may take 6-14 hours under normal conditions. This is within 24 hours as required by BRs 4.9.1.1. However, we have procedures to trigger an emergency publication to achieve global consistency within 15-20 minutes for this small set of certificates in the event of a revocation or other urgent publication need. We also have ongoing efforts to further improve the propagation time without sacrificing scalability and reliability for these larger sites.

Flags: needinfo?(cadecairns)

(In reply to Cade Cairns from comment #7)

Our responder responded with “Unauthorized” because data had not yet consistently propagated globally

Does this mean that in the window that a new OCSP status is being propagated through your systems, that the OCSP responses for that certificate will be 'unauthorized', regardless of the actual authority of the OCSP responder and any previous or future OCSP status (as opposed to an "unknown" or "reserved" response)?

Our OCSP responders conform to RFC 6960 and meet the CA/B Forum Baseline Requirements for serving status information for publicly trusted certificates. Specifically:

RFC 6960 does not include "it's fine to respond with 'unauthorized' while your systems are doing routine tasks"-clause; so I'm having trouble understanding which part of the RFC you refer to. More specifically:

The response "unauthorized" is returned in cases where the client is
not authorized to make this query to this server or the server is not
capable of responding authoritatively (cf. [RFC5019], Section 2.2.3).
- RFC 6960, Section 2.3. Exception Cases

It seems to me that if you're relying on the 'server is not capable of responding authoritatively' clause, then that would be a very broad interpretation; and I would appreciate it if you were able to further explain your reasoning.

BRs 4.10.2: “The CA SHALL maintain an online 24x7 Repository that application software can use to automatically check the current status of all unexpired Certificates issued by the CA.” - The service is available 24/7 globally with 99.995% availability.

but at the same time:

The remaining small proportion of issuance represents our subscribers’ busiest sites, inclusive of sites such as YouTube, which use a different response path for scalability reasons. Global propagation for this remaining ~0.3% of certificates may take 6-14 hours under normal conditions

... which to me implies that OCSP status might only be available after 6-14 hours of OCSP downtime (response "unauthorized") for 0.3% of your certificates. I.e. for a predictable 0.3% of your certificates, your expected OCSP availability in the first month after publishing a certificate is only 99 to 98.1%. Frankly, I find that unacceptable, and hardly "24/7 [...] of all unexpired Certificates issued by the CA".

Using the provided screenshot:

03:57:12 UTC Precertificate logged at https://ct.googleapis.com/logs/xenon2022
14:14 UTC OCSP failed "unauthorized"

That's over 10 hours of "unauthorized" responses to OCSP requests of a public certificate, if I understand your responses correctly. About exactly in the middle of the 6-14h time window, but very bad regardless.

I'll attach another set of crt.sh OCSP responses of a single random* sample retrieved from crt.sh, each of which shows a delay of at least 1 hour between issuance of the pre-certificate and "unauthorized" responses, one of which at least 18 hours. I fail to understand how you get to 99.995% availability with such large windows of non-responses.

* random being "pick one issued recently from the same CA as the problematic OCSP-responder certificate; and find one that has not been revoked". Which proved difficult; as many that I found were revoked seconds after they were logged by their first CT log.

Flags: needinfo?(cadecairns)
Attached file More_OCSP_Madness.PNG
Attached file OCSP_Madness_2.PNG
Attached file OCSP_Madness_3.PNG
Attached file OCSP_Madness_4.PNG

(In reply to Cade Cairns from comment #7)

Relative to bug 1640805: Our OCSP responders consistently publish and propagate status updates to end-entity certificates in under 24 hours, typically in under 10 minutes. In Bug 1640805, the OCSP response propagation was greater than the 24 hour revocation timeline defined in the BRs.

The relevant part of Bug 1640805 is that the CA tried to claim that their obligations under the BRs were met as soon as they signed the OCSP response, rather than once the response was actually available at the responder URL. This interpretation was rejected in Bug 1640805 Comment 11. By appealing to "propagation", GTS is making the same discredited argument.

Although the BRs define a 24 hour timeframe, it clearly applies to the revocation of a certificate, not to the initial publication of OCSP responses. There is nothing in the BRs that permit a CA to fail to respond to OCSP requests for 14 hours after a certificate is issued.

Note that in Bug 1753123, Let's Encrypt concluded two things:

  1. Responding with "unauthorized" is a failure to provide OCSP responses, and

  2. Failing to provide OCSP responses is "most accurately described as a violation of Section 4.10.2 of the BRs, which requires that the CA maintain a service that application software can 'use to automatically check the current status of all unexpired Certificates issued by the CA' (emphasis added)" and that "although there are other sections of the BRs which pertain to OCSP responses and their update periods, all such requirements appear to pertain to extant OCSP responses, of which these affected certs had none."

Was GTS aware of Let's Encrypt's analysis in Bug 1753123?

I would also like to note the inconsistency in GTS' responses. In Comment 3, GTS' excuse was that the certificates with non-operational OCSP weren't being served to relying parties. When this was called out as being unsupported by the BRs or prior precedent, GTS changed their explanation, and is now claiming in Comment 7 that the delay is OK because it's less than 24 hours. It doesn't seem like GTS has a coherent understanding of the requirements of a publicly-trusted CA. Instead of designing a compliant OCSP responder, GTS has instead done what is convenient, and is now providing post hoc justifications for their design.

GTS needs to review https://wiki.mozilla.org/CA/Responding_To_An_Incident and provide a new, detailed incident report, which includes explanations for why this non-compliance occurred despite similar incidents by other CAs, as well as a binding timeline for fixing this non-compliance.

Below is a list of certificates issued by GTS which my monitor has identified as lacking proper OCSP services. A striking aspect of this list is every single one of these certificates has a lifetime of just 24 hours. If it takes 14 hours to publish an OCSP response for these certificates (and two of the certificates below have already passed the 10 hour mark), then the availability for the OCSP services for these certificates is less than 50%. I agree with Comment 8 that this cannot possibly count as the "24x7" availability required by the BRs.

          dns_names           | cert_lifetime | time_since_discovery |          check_time           |                              problem                               |                           cert_sha256                            
------------------------------+---------------+----------------------+-------------------------------+--------------------------------------------------------------------+------------------------------------------------------------------
 {haplorrhini.com}            | 24:00:00      | 10:15:58.546021      | 2022-03-21 17:16:53.334005+00 | error parsing OCSP response: ocsp: error from server: unauthorized | f956e6bf1b75c2667c12c8a6c4f8312a010734a51c49f14bf7bdf0064becf4be
 {haplorrhini.com}            | 24:00:00      | 10:15:58.024169      | 2022-03-21 17:11:55.760379+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 7c21b29a5cb79dec2c7aa6deee5ce6f60353df6581bf985f2fc48033c463d4cd
 {haplorrhini.com}            | 24:00:00      | 09:56:51.151686      | 2022-03-21 17:16:50.902548+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 7d3e04243f040afb08e14ebabae71cf24d2155caa166803a067d6a28ae4f1d6c
 {haplorrhini.com}            | 24:00:00      | 09:56:51.147187      | 2022-03-21 17:16:50.900931+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 7d6f2da394b838041fdd8b22d7eb250ebf99b435e521bda828c35a6da1c7e5e1
 {haplorrhini.com}            | 24:00:00      | 09:46:49.557663      | 2022-03-21 17:11:56.602378+00 | error parsing OCSP response: ocsp: error from server: unauthorized | db539e11667e5676cae27fd8b3b69ac89da5e58b31152a0ef6a69598f86cb4e9
 {haplorrhini.com}            | 24:00:00      | 09:46:48.73198       | 2022-03-21 17:16:45.657113+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 9d17067c4972f2ff9dc8235926f7ec53ba20da1751637674de2c8639c2a80eb8
 {haplorrhini.com}            | 24:00:00      | 09:26:41.316777      | 2022-03-21 17:11:56.602039+00 | error parsing OCSP response: ocsp: error from server: unauthorized | f237f70a9dba6f1471195062de2cc376b1feda5a799dc534bcd1311f389b2bcb
 {haplorrhini.com}            | 24:00:00      | 09:26:41.219531      | 2022-03-21 17:16:49.927318+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 27f220329e8364a0a1b699293d480d97d089fa67d3586906ca3fa93ed1f32732
 {haplorrhini.com}            | 24:00:00      | 09:21:40.488728      | 2022-03-21 17:11:54.202538+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 8641b5698b69a4e50a4c8b6615352436ed88a21a0c2ae9e96b1c530f72263630
 {xn--ir8h.haplorrhini.com}   | 24:00:00      | 09:21:40.48837       | 2022-03-21 17:16:46.769942+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 14e538f82e10997b7f61678d8e74493ac548ab11afc55af14cd6be9bb266caad
 {haplorrhini.com}            | 24:00:00      | 09:16:38.109655      | 2022-03-21 17:16:50.892539+00 | error parsing OCSP response: ocsp: error from server: unauthorized | baa44d3f9b1867f9708f5f31e5872b1049f2c78ed2379f1695e3b862004f8d22
 {production.haplorrhini.com} | 24:00:00      | 09:16:38.068643      | 2022-03-21 17:16:50.872752+00 | error parsing OCSP response: ocsp: error from server: unauthorized | f359370bacf34bc2c43ad1335025467e1d4fde4cb867d2ca79f957c5725244e9
 {production.haplorrhini.com} | 24:00:00      | 09:16:38.017494      | 2022-03-21 17:16:50.886852+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 39d5fe07c41a5c2a01210177e7bfdbc082fb37cf6adf84f6d6a873ccefd1583e
 {production.haplorrhini.com} | 24:00:00      | 09:16:38.004634      | 2022-03-21 17:16:50.873765+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 6ba87f1119b6b689be1ff3dcbf5804c93564651acb26756cdcbaf70d858449f4
 {xn--ir8h.haplorrhini.com}   | 24:00:00      | 09:16:37.817978      | 2022-03-21 17:16:50.888851+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 0f54a6b205a9958606606aa7a7f705196946cf0baa329df040996097577112d8
 {haplorrhini.com}            | 24:00:00      | 08:56:29.865504      | 2022-03-21 17:16:46.743983+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 0fe76318ebc9965034fbbf06df5aba1b1e3b3b04406b39cf3fd81810897aba89
 {haplorrhini.com}            | 24:00:00      | 08:56:29.863027      | 2022-03-21 17:16:47.077613+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 8ff4c631081e468d3dec498e1cf4a4639f62c74d2f799a8a1b58bb3d5b96b96a
 {haplorrhini.com}            | 24:00:00      | 08:46:25.941402      | 2022-03-21 17:16:43.210695+00 | error parsing OCSP response: ocsp: error from server: unauthorized | c6051b563f015c3a34c2c64bea9a555558964171d52966e4862c7734b3c65af5
 {haplorrhini.com}            | 24:00:00      | 08:46:25.778574      | 2022-03-21 17:12:00.784845+00 | error parsing OCSP response: ocsp: error from server: unauthorized | d2287fb39085a1d0f816e14826eb5e4e44b70b229a0e3479d0d8cf0c1a9a7ecb
 {haplorrhini.com}            | 24:00:00      | 08:26:18.756881      | 2022-03-21 17:11:56.590516+00 | error parsing OCSP response: ocsp: error from server: unauthorized | c43c3a1fb707e9c84b7ad4767171138d6cec289ca11c63ce5930dd6e649e9773
 {haplorrhini.com}            | 24:00:00      | 08:26:17.834875      | 2022-03-21 17:12:00.975912+00 | error parsing OCSP response: ocsp: error from server: unauthorized | cafca8bd1fe5d20c78cea603c664a6938c5c8e70820cd01750d28d5774fc253f
 {haplorrhini.com}            | 24:00:00      | 08:21:15.547998      | 2022-03-21 17:16:48.597908+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 9b67415e7f26671f441b8dc6bb27eb3c54135c4e44a9ddc4091e3f3205e039ef
 {haplorrhini.com}            | 24:00:00      | 08:21:15.486887      | 2022-03-21 17:11:57.000033+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 48c98b3100c2994451e22361d6ec411b73d415d8a84c9ee8e15497fb19434933
 {haplorrhini.com}            | 24:00:00      | 07:56:08.204446      | 2022-03-21 17:11:56.588157+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 8f99a6f6a281b5fcb093900a133c4a65dd0a5e92a9b978ed8991b91b64ee041f
 {haplorrhini.com}            | 24:00:00      | 07:56:06.755732      | 2022-03-21 17:11:56.588453+00 | error parsing OCSP response: ocsp: error from server: unauthorized | a6d420b161d249fd5745b052af14721c227249ea07543c925db9cb3bbe6db200
 {haplorrhini.com}            | 24:00:00      | 07:46:03.06116       | 2022-03-21 17:16:53.841087+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 6ce8919a8c218f132e0e480e1f258f906ecea16bf877c5336739146743d37649
 {haplorrhini.com}            | 24:00:00      | 07:46:03.036185      | 2022-03-21 17:16:53.846934+00 | error parsing OCSP response: ocsp: error from server: unauthorized | ce9e6d9967ace74ad59ae71b639ba6c91ed64f1501b3b7af716479041b037412
 {haplorrhini.com}            | 24:00:00      | 07:25:56.626494      | 2022-03-21 17:16:53.815201+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 5b2c1dec280226b00eb07915e0b1e023f93b3dfab4d712051cefe349eff5d44e
 {haplorrhini.com}            | 24:00:00      | 07:25:55.604054      | 2022-03-21 17:12:00.973714+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 20d4b2a0cc00f2407d0960f505d285981802307101c697f2168ac11803d6f1cd
 {haplorrhini.com}            | 24:00:00      | 07:15:52.043603      | 2022-03-21 17:11:57.029556+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 025ceccfc955890c8e7f75cf9f9438b176d3a1cd0f3ada4692f46e07971dd35c
 {haplorrhini.com}            | 24:00:00      | 07:15:51.588221      | 2022-03-21 17:11:57.03881+00  | error parsing OCSP response: ocsp: error from server: unauthorized | 7d1950f98f6f0e38e856804fab0fb77a26d267c7707c7f9aa72ef4289a9ecb7c
 {haplorrhini.com}            | 24:00:00      | 06:56:44.829145      | 2022-03-21 17:12:00.987684+00 | error parsing OCSP response: ocsp: error from server: unauthorized | c1de75fadd2c8eeaa1f4e08dccc095b1f38026630334622bbbbb884f85aed68b
 {haplorrhini.com}            | 24:00:00      | 06:56:44.828705      | 2022-03-21 17:12:00.975521+00 | error parsing OCSP response: ocsp: error from server: unauthorized | dc206265667cb297761adcac9650a5b5a6b01325461a2df1795734930fc33771
 {haplorrhini.com}            | 24:00:00      | 06:46:40.563688      | 2022-03-21 17:16:51.369568+00 | error parsing OCSP response: ocsp: error from server: unauthorized | cfc9c2357a9465fa5515438c04dc1f0e1eabb0a7050af828328c2a73b10eb218
 {haplorrhini.com}            | 24:00:00      | 06:46:40.38386       | 2022-03-21 17:11:54.062265+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 47f825f43ef5f234223047154724b468af3e0f2acac9d277eb27561499735bda
 {haplorrhini.com}            | 24:00:00      | 06:31:34.280751      | 2022-03-21 17:16:44.672454+00 | error parsing OCSP response: ocsp: error from server: unauthorized | ff7646ed9185fba7d588d9d3cd26cd449256d4b010e9e34d2521dc8a33b544fb
 {haplorrhini.com}            | 24:00:00      | 06:26:33.747664      | 2022-03-21 17:12:00.77058+00  | error parsing OCSP response: ocsp: error from server: unauthorized | 548b4a89b92e4a545ca97a57f080f4a8ec1534c615210402eb628cb8f4eea027
 {haplorrhini.com}            | 24:00:00      | 06:16:29.859388      | 2022-03-21 17:11:57.008753+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 5d520ff08b6839dfce97f376dcd1578c00e8b2077f06dc4ba13ffc7efe3e1d29
 {haplorrhini.com}            | 24:00:00      | 06:16:29.795963      | 2022-03-21 17:11:57.012104+00 | error parsing OCSP response: ocsp: error from server: unauthorized | f5cdc8af5b4900c41c2e0adde914fd7de154d6c39f56b15b04c9ca9c82c45632
 {haplorrhini.com}            | 24:00:00      | 05:56:22.196293      | 2022-03-21 17:11:56.441595+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 1f699fcb968aec2c5e41b12aff167569ceb11415150c06e1904fbf651a059ae8
 {haplorrhini.com}            | 24:00:00      | 05:56:22.181369      | 2022-03-21 17:12:01.002404+00 | error parsing OCSP response: ocsp: error from server: unauthorized | c3456d92f975b39c4027f581efda6de492295255be7580ffc81a1d02850d9582
 {haplorrhini.com}            | 24:00:00      | 05:46:19.129548      | 2022-03-21 17:11:55.718304+00 | error parsing OCSP response: ocsp: error from server: unauthorized | a77445977530bdfce8135b1175b7e6cc16b1c1bd31ac4330813ebd957ccedfb3
 {haplorrhini.com}            | 24:00:00      | 05:46:19.067849      | 2022-03-21 17:11:55.667934+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 46097ae8af707cfe14204a17301446ee63787194dd592831fb51b5deb7c558f0
 {haplorrhini.com}            | 24:00:00      | 05:26:12.635796      | 2022-03-21 17:16:46.417576+00 | error parsing OCSP response: ocsp: error from server: unauthorized | df0631c05dce5ff8355f0e5a9b01715386312118d92480b0f440489a58b9d101
 {haplorrhini.com}            | 24:00:00      | 05:26:12.635757      | 2022-03-21 17:16:46.417304+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 2f98dafd959610efcaee0df3b8c44ce9b530552d220f9e971f369f9edba7a2d4
 {haplorrhini.com}            | 24:00:00      | 05:21:09.788501      | 2022-03-21 17:16:46.403333+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 6e898c1af5ac41b5402ffda30e1e23e223e7f2e3511d1eeef064872f611a7878
 {haplorrhini.com}            | 24:00:00      | 05:16:09.193967      | 2022-03-21 17:11:56.437005+00 | error parsing OCSP response: ocsp: error from server: unauthorized | bb4307aa7585db010894b3d94efff6aff7cdc7b775c28da07928fc69690da108
 {haplorrhini.com}            | 24:00:00      | 04:56:03.225265      | 2022-03-21 17:12:01.079335+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 61a18e534247384fec81de6d50141cb80dbd93ce74d0be4d2afd487fd775ed17
 {haplorrhini.com}            | 24:00:00      | 04:56:01.509731      | 2022-03-21 17:16:53.429532+00 | error parsing OCSP response: ocsp: error from server: unauthorized | d65da0c26317d40b2a025e9fd190edbde29b6e41c27211613e1464879985f6e4
 {haplorrhini.com}            | 24:00:00      | 04:45:58.089008      | 2022-03-21 17:12:01.035442+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 3d8139d0752965d087d4fdb2c6c1e8e3e7395fcd4c2226a2b0cd0c6ed3e271a7
 {haplorrhini.com}            | 24:00:00      | 04:45:57.959841      | 2022-03-21 17:12:01.020879+00 | error parsing OCSP response: ocsp: error from server: unauthorized | fd7d582f17fd7d8badb8f369176f2f1143b1c89dac7414829cd4bf65882cd634
 {haplorrhini.com}            | 24:00:00      | 04:26:52.368582      | 2022-03-21 17:16:49.387422+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 0a82d0f77c97e6444a179abccb75c15975e490e2cf0534080fd1dcfbba9433b7
 {haplorrhini.com}            | 24:00:00      | 04:26:51.394041      | 2022-03-21 17:16:49.387937+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 60d066ccbe51cef07b76a9068d030fe223490ce20d44eb47e284118e06397db4
 {haplorrhini.com}            | 24:00:00      | 04:16:48.542517      | 2022-03-21 17:16:52.218469+00 | error parsing OCSP response: ocsp: error from server: unauthorized | d079c2c7dfd37ae2d211fd61d45f152ca0e2916c0afb535ff81e1e1de8c27e08
 {haplorrhini.com}            | 24:00:00      | 04:16:48.533434      | 2022-03-21 17:16:52.204095+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 6bafca32cb576833fbd0b7824291ede00592bb44750dcce78ac04580d13abaa7
 {haplorrhini.com}            | 24:00:00      | 03:56:42.034576      | 2022-03-21 17:16:44.090784+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 3643e0027fd200da161208ba5f11a6d1b2543cf6c1d22b819ce286dc91ab314b
 {haplorrhini.com}            | 24:00:00      | 03:56:41.311022      | 2022-03-21 17:16:41.161146+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 4659e53792e28f1f16aaa7b3b7aeb180af4837e1a73e3b082e8a282319f22198
 {haplorrhini.com}            | 24:00:00      | 03:51:39.094808      | 2022-03-21 17:16:42.336134+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 0fe619cf11d65247ecbca3708d0bd52be733f98e5495d55d7d5ea68420f4a3d2
 {haplorrhini.com}            | 24:00:00      | 03:51:39.086079      | 2022-03-21 17:16:44.487631+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 55510903fc9cb1796a15e73a3932f0ad903066b97baa0a6b347b0524ef616978
 {haplorrhini.com}            | 24:00:00      | 03:26:30.653782      | 2022-03-21 17:16:41.217447+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 4b70e847e20c00a5b524de089919a2d3ca7d02f8f58e1de6e8d431a06da38910
 {haplorrhini.com}            | 24:00:00      | 03:26:30.652523      | 2022-03-21 17:16:41.214611+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 16335bc7c76bcc59184263fe4c7fa1d37c8bb5689c465a0ef93b1f56e60bcd60
 {haplorrhini.com}            | 24:00:00      | 03:16:27.095458      | 2022-03-21 17:16:41.249507+00 | error parsing OCSP response: ocsp: error from server: unauthorized | b3aeaaf62b7310aa8e6ebda5eda5e0b749077704cbdee74c706627e00079afb2
 {haplorrhini.com}            | 24:00:00      | 03:16:27.062306      | 2022-03-21 17:16:41.249206+00 | error parsing OCSP response: ocsp: error from server: unauthorized | b8defb726d4bf728f5bda664ff1363d53512308a62ad329fdf577595f5ad7ae1
 {haplorrhini.com}            | 24:00:00      | 02:56:18.684873      | 2022-03-21 17:16:40.222705+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 594db4ce99de580c7a22520a593d136f168832adae75c28d2c12c707bf3cd329
 {haplorrhini.com}            | 24:00:00      | 02:56:18.666113      | 2022-03-21 17:16:40.215815+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 36d6de1500b95b60c31f3384da8bbb915a3d8ae57432a94957110114b2ea13bd
 {haplorrhini.com}            | 24:00:00      | 02:46:15.609845      | 2022-03-21 17:16:39.897957+00 | error parsing OCSP response: ocsp: error from server: unauthorized | adb381623485dd0c2e64037949192a1504e556873db993ce5ce2cb1305ad278b
 {haplorrhini.com}            | 24:00:00      | 02:46:15.550942      | 2022-03-21 17:16:39.916104+00 | error parsing OCSP response: ocsp: error from server: unauthorized | f1fa78c0f30e2835a396fdd1792183cef44c8408ee0d2171cc59e7b48ead5b15
 {haplorrhini.com}            | 24:00:00      | 02:26:08.370987      | 2022-03-21 17:16:39.874125+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 3d3f96d191ef61db22b82ce92db165ee6851692b7b64eaf3bc6d037b0022d362
 {haplorrhini.com}            | 24:00:00      | 02:26:08.300835      | 2022-03-21 17:16:39.874026+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 3178a015635b7d58113254d1b6a39fac31662c0435d5da23a0c3a98ec510f84f
 {haplorrhini.com}            | 24:00:00      | 02:16:04.876763      | 2022-03-21 17:16:39.789087+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 01569e350e5945e3cb6828b4982dc13521ae30ffa450838af65511edd00b1320
 {haplorrhini.com}            | 24:00:00      | 02:16:04.668075      | 2022-03-21 17:16:39.785406+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 37b373086931cef9988454e241904cd8f217f010839b4059e11cb7ece0e65baa
 {haplorrhini.com}            | 24:00:00      | 01:55:58.003494      | 2022-03-21 17:16:39.76938+00  | error parsing OCSP response: ocsp: error from server: unauthorized | cab7ee65e11e9435e88acb2f9cd0809effed84e2294bb289b8c13ccc54243808
 {haplorrhini.com}            | 24:00:00      | 01:55:57.638295      | 2022-03-21 17:16:39.770237+00 | error parsing OCSP response: ocsp: error from server: unauthorized | e7145ea7caf547d0a3555906c6c81b7ff8a86daa1f9cc97f35db515838ad856b
 {haplorrhini.com}            | 24:00:00      | 01:45:54.757045      | 2022-03-21 17:16:39.657384+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 5c073b086ec6239948ca0a9046a96f10371516d9137fd4c599109e5d988fb5e2
 {haplorrhini.com}            | 24:00:00      | 01:45:54.729347      | 2022-03-21 17:16:39.655725+00 | error parsing OCSP response: ocsp: error from server: unauthorized | d787cd29331f1ceb2d30b878fcedeb13f040c38d2dcba78d2752390d75c5cd71
 {haplorrhini.com}            | 24:00:00      | 01:26:48.322211      | 2022-03-21 17:16:39.485199+00 | error parsing OCSP response: ocsp: error from server: unauthorized | ee69b05264532876b6109aa77144958c9f20882ee9a1b4bc4446b243c170a31b
 {haplorrhini.com}            | 24:00:00      | 01:26:48.305641      | 2022-03-21 17:16:39.499817+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 45af702f5f66619db402df1126234a045b5e3a4ef835c9d2e337ea4c00a8f83b
 {haplorrhini.com}            | 24:00:00      | 01:21:46.663619      | 2022-03-21 17:16:40.597603+00 | error parsing OCSP response: ocsp: error from server: unauthorized | d8f131af5e9f29b9bef20b2f5ef8ce6ebfbeae4350d4a89c051e15432677abd3
 {haplorrhini.com}            | 24:00:00      | 01:21:46.662318      | 2022-03-21 17:16:40.595439+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 418978bf12f3f6bf9a47dd152557bd0fa1b423d5ee95b49f27173efc7cef37a1
 {haplorrhini.com}            | 24:00:00      | 00:56:39.67183       | 2022-03-21 17:16:39.332395+00 | error parsing OCSP response: ocsp: error from server: unauthorized | eb996000fa7948a18d899c7057f9e6e92c3dd79a3c7d083d1c545d561e0c21b4
 {haplorrhini.com}            | 24:00:00      | 00:56:39.645055      | 2022-03-21 17:16:39.329064+00 | error parsing OCSP response: ocsp: error from server: unauthorized | b05f7f936b3f3909d373b93badbd012376665fa3c659bfced297f8a40e61b990
 {haplorrhini.com}            | 24:00:00      | 00:46:34.740536      | 2022-03-21 17:16:39.22921+00  | error parsing OCSP response: ocsp: error from server: unauthorized | bf83b4f767daab780629c001f08c0df1e4400e6275298a770015eeeb9c063dcc
 {haplorrhini.com}            | 24:00:00      | 00:46:34.472827      | 2022-03-21 17:16:39.227+00    | error parsing OCSP response: ocsp: error from server: unauthorized | f95be775497064bb641f594c1e9867f1ed329f436f41d92cbe0794798b29b822
 {haplorrhini.com}            | 24:00:00      | 00:26:27.179973      | 2022-03-21 17:16:39.037799+00 | error parsing OCSP response: ocsp: error from server: unauthorized | a3342b8121528efc5ed3f64097f908066ea8af5a78e4a1b1a224bfaf61825298
 {haplorrhini.com}            | 24:00:00      | 00:26:27.173258      | 2022-03-21 17:16:40.594664+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 7f2b4e3647f238386ccbb457c309820489e905f1f91162482645c0232d59f260
 {haplorrhini.com}            | 24:00:00      | 00:16:23.945803      | 2022-03-21 17:16:39.024978+00 | error parsing OCSP response: ocsp: error from server: unauthorized | ccdbd31f8d05965c1d48a2bff809b4708487a45e4afbb7aad631d57c4872ea78
 {haplorrhini.com}            | 24:00:00      | 00:16:23.945187      | 2022-03-21 17:16:39.023913+00 | error parsing OCSP response: ocsp: error from server: unauthorized | 77b327dea053d7166c0ef33547dd6fbbac78bc4b745e29ff988494c5cb4bf2d7

Thank you for your comments, Matthias and Andrew.

First, we want to make it clear that we acknowledge the value in reducing the propagation delay for the OCSP responses for the certificates associated with the ~0.3% of our issuance.

In our earlier response we mentioned our ongoing efforts to further improve propagation time and this is where those efforts have been focused.

In response to your comments we are actively working on an update to our original response to describe steps we are taking to improve on the propagation delay and will update the bug with those details once the update is ready.

1. How your CA first became aware of the problem

This bug was filed in Bugzilla.

2. A timeline of the actions your CA took in response.

YYYY-MM-DD (UTC) Description
2020-05-26 06:02 Bug 1640805 is filed in Bugzilla regarding a delay of over 24 hours by Digicert to propagate updated OCSP status information after a revocation request was received.
2020-06-19 16:00 Bug 1640805 is discussed during a Compliance Review Meeting. Our assessment is that we could propagate revocation information within a short time if needed and that we want to prioritize improving propagation times.
2020-07-06 21:12 Following discussion on Bugzilla, an incident report is added for Bug 1640805.
2020-07-30 16:00 Bug 1640805 was discussed again during a Compliance Review Meeting due to the update to the bug. Improvements to propagation time were being worked on.
2021-01-18 17:55 Bug 1687330 is filed in Bugzilla regarding a failure to propagate updated OCSP status information to an individual responder of the CA’s many responders.
2021-01-27 16:00 Bug 1687330 is in the agenda of the bi-weekly Compliance Review Meeting. The bug is not deemed relevant based on our assessment of our OCSP response push mechanism.
2021-07-28 16:00 The EJBCA bug described in ECA-10215 that later contributed to Bug 1724276 was reviewed by our engineering leads, since our secondary CA platform is based on EJBCA. We monitored the problem and later updated the latest version once available.
2021-08-05 18:49 Bug 1724276 is filed in Bugzilla regarding an incorrect OCSP response returned by QuoVadis/PKIoverheid for some precertificates.
2021-09-16 13:25 Secondary analysis related to Bug 1724276 is added to our internal bug tracking system. We conclude to continue waiting for a fix to EJBCA and apply it once it becomes available.
2022-02-01 23:12 Bug 1753123 is filed on Bugzilla regarding a software defect that resulted in incorrect OCSP responses by Let’s Encrypt.
2022-02-18 15:34 Technical notes related to Bug 1753123 are added to our internal bug tracking system. Our assessment concluded that we would not face a similar issue after reviewing our certificate issuance flow.
2022-02-23 13:20 Secondary analysis related to Bug 1753123 is added to our internal bug tracking system. Our assessment further considered technical controls to identify similar programming issues.
2022-03-07 15:51 This bug is filed in Bugzilla.
2022-03-22 14:00 We met with another engineering team within Google to discuss taking immediate steps to reduce the propagation time for OCSP responders running a legacy software version.

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.

N/A. See section 6.

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.)

N/A. No problematic certificates related to this report were issued.

5. In a case involving certificates, the complete certificate data for the problematic certificates.

N/A.

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

Google Trust Services’ OCSP responders have evolved several times since we began operation. We currently operate responders running two different software versions: a newer version, which offers rapid propagation of status information, and a legacy version that we are in the process of migrating away from, which has long provided the scalability and reliability we need. Our earlier comments in response to this bug were intended to describe our current state and compliance with various requirements. In this report we provide detail on our current state, including improvements we are making and how we evaluated other relevant incidents that were referenced in earlier comments.

The newer version implements an improved approach to OCSP propagation, reducing the time to provide updated status information by several orders of magnitude. However, this feature is not without cost. There is an inherent reliability risk to removing any staged rollout and pushing data globally immediately. There is also risk in taking on dependencies which do not have a “100%” (24x7) availability SLO. This new responder currently serves ~99.7% of our total issuances, as we described in Comment 7. For new issuances, the response is generally available before the precertificate is sent to CT. Other updates may take on average below 10 minutes due to cache behavior. This software is currently being evaluated to ensure it meets our subscribers’ high availability requirements for several months before completing our migration.

An earlier version is still in use for a small proportion of issuances representing our subscribers’ busiest sites, inclusive of sites such as YouTube. This represents the remaining ~0.3% of our total issuance. Although this is a small number of sites, it represents a comparatively high proportion of status information requests we serve. This version of the software works by propagating batches of OCSP updates across a large, global network of nodes several times per day using a staggered approach to ensure there is no unintended availability impact. Because of this approach, it takes some time for an update to achieve global consistency. This earlier version of our responder and our processes have each undergone many improvements over the past several years to reach their current state, including through lessons learned in past bugs Bug 1634795, Bug 1630079, Bug 1630040, and Bug 1522975. This responder version is also used for issuances by our secondary CA platform, which is a rarely-used backup we described in greater detail in Bug 1731164. We are currently retiring this secondary CA with a tentative goal of the end of Q3 of this year.

As we described in Comment 7, ​​we publish OCSP responses for all certificates immediately upon creation of the corresponding precertificate, regardless of delivery to the customer, and these responses are signed before sending the precertificate to the CT logs for inclusion. However, as described above, status information for a small subset of issuances takes longer to achieve global consistency.

We acknowledge the value in reducing the propagation time for OCSP responses so we can achieve consistent performance across all responses we provide and are already accelerating our efforts to make improvements, which we describe in section 7. We believe there is room to improve the BR language to include clear, measurable service level objectives, taking into consideration the use of CDNs and caching strategies that are prevalent in the ecosystem today. To provide a relevant quote from the Google SRE book that describes our intent,

Choosing and publishing SLOs to users sets expectations about how a service will perform. This strategy can reduce unfounded complaints to service owners about, for example, the service being slow. Without an explicit SLO, users often develop their own beliefs about desired performance, which may be unrelated to the beliefs held by the people designing and operating the service. This dynamic can lead to both over-reliance on the service, when users incorrectly believe that a service will be more available than it actually is (as happened with Chubby: see The Global Chubby Planned Outage), and under-reliance, when prospective users believe a system is flakier and less reliable than it actually is.

Regarding public incident reports involving other CAs, we monitor and evaluate all bugs posted to Bugzilla to ensure that we can rapidly adapt to changing requirements and avoid issues other CAs have faced. We described our approach in Bug 1708516 Comment 44. Bugzilla bugs are triaged by at least two engineers and their assessments are recorded in our internal bug tracking system. This includes OCSP-related bugs. Two such bugs referenced in earlier comments are discussed below.

Bug 1753123 underwent such an assessment, during which we evaluated our certificate issuance flow to ensure that OCSP information is generated for any assigned serial number, inclusive of precertificates, and that we would not face a similar issue. We also considered technical controls to identify similar programming issues. This bug describes an issue where certificates never had OCSP data available due to a software defect.

Bug 1640805 was reviewed following our previous assessment process, since that bug is now almost two years old and was raised before we implemented our newer review process. This bug describes an issue where revocation information was not available for more than 24 hours after a certificate problem report was received, in violation of BRs 4.9.1.1. As we described in Comment 7, the propagation time of our OCSP responses is less than 24 hours. In addition, we have procedures to trigger an emergency publication to achieve global consistency within 15-20 minutes for this small set of certificates in the event of a revocation or other urgent publication need, ensuring a revocation can be performed and the updated status information will subsequently become available within a 24-hour window. The certificates identified in Comment 14 are related to a service health check process; all other issuances related to the CA in question are not deployed for several days following issuance and would fall under the emergency publication process in such an event.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

Since posting Comment 7, we have been engaging with internal groups within Google to discuss how we can accelerate our efforts to deploy improvements.

We are planning a series of steps to reduce the propagation time for the responders running the earlier version of our software while we continue to prepare the migration to the newer version. This represents an effort that involves multiple engineering teams so, while we can’t commit to a timeline now, we will continue to update this bug as we make improvements.

In addition, we are committed to bringing proposals to the CA/B Forum to ensure there is clear language with measurable service level objectives for revocation services. We plan to submit a proposal by Thursday, April 14, 2022.

Flags: needinfo?(cadecairns)

This version of the software works by propagating batches of OCSP updates across a large, global network of nodes several times per day using a staggered approach to ensure there is no unintended availability impact. Because of this approach, it takes some time for an update to achieve global consistency

I'm having trouble understanding why you'd need global consistency to resolve OCSP requests.

Updating the state of a certificate - sure, that would need some propagation, but it's not like there'll be many (if any) conflicting updates on the state of any certificate: Each goes from nonexistent to issued to either revoked or expired, with limited opportunities to introduce conflicting information.

But resolving the OCSP response based on the state of a certificate, shouldn't that be not much more than querying what (locally) is considered the latest known state of the certificate and returning the relevant OCSP response?

Summary: Google Trust Services: Incorrect response for issued certificate → Google Trust Services: Incorrect OCSP response for issued certificate

Hi Matthias,

The earlier version of our OCSP works by pushing pre-signed responses to a large, global network of nodes to be served, as described in our previous comment.

I believe the model you are suggesting would require delegated OCSP responders with access to a global database and to keys, which would not fit with our distributed architecture and reliability goals.

We have been able to use this issue to drive the reprioritization of the work that needed to be done to improve the propagation time of OCSP responses for the remaining certificates.

Because of these changes, when implemented, propagation time for the remaining status information will be similar to the new software version, which we described in Comment 16. This change will cover all issued certificates except for the certificates issued for our test web pages used to satisfy BRs 2.2. Those certificates are issued by our secondary CA platform, which is in the process of being retired. Once retired, those certificates will also no longer be dependent on the slower distribution mechanism. We will begin testing soon and plan to have the OCSP changes fully deployed by May 6, 2022.

Separate discussions are underway for our other action item to bring proposals to the CA/B Forum to ensure the language is clear with measurable service level objectives for revocation services.

Google Trust Services will continue to monitor this bug for any additional updates or questions. We will post another update in a week.

We have begun testing a change that will improve propagation time for the remaining status information as described in Comment 16.

In addition, we are working with CA/B Forum members on a ballot to add clear and measurable service level objectives for revocation services to the Baseline Requirements.

Google Trust Services will continue to monitor this bug for any additional updates or questions. We will post another update in a week.

We are continuing to test our change to improve propagation time for the remaining status information as described in Comment 16 and are on track to meet our May 6 plan.

We are also continuing our work with CA/B Forum members on a ballot to add clear and measurable service level objectives for revocation services to the Baseline Requirements.

Google Trust Services will continue to monitor this bug for any additional updates or questions. We will post another update in a week.

We are preparing deployment of changes to improve propagation time for the remaining status information as described in Comment 16 and are still on track to meet our May 6 plan.

We are also continuing our work with CA/B Forum members on a ballot to add clear and measurable service level objectives for revocation services to the Baseline Requirements.

Google Trust Services will continue to monitor this bug for any additional updates or questions. We will post another update in a week.

We have completed global deployment of changes to improve propagation time for the remaining status information as described in Comment 16. Their behavior for propagation of status information is now similar to our newer software version. However, as we described in Comment 19, this change does not cover our secondary CA platform, which is low-issuance and used to issue certificates for our test web pages used to satisfy BRs 2.2. We are currently retiring that platform with a tentative goal of the end of Q3 of this year.

We are continuing our work with CA/B Forum members on a ballot to add clear and measurable service level objectives for revocation services to the Baseline Requirements. We believe the draft ballot will be at a point it can be presented as a proposal soon.

With these changes now complete, we request that this issue be closed.

Flags: needinfo?(bwilson)

Could you clarify: for your primary CA platform, is GTS now able to always provide OCSP services for all non-expired certificates, or can there still be a 10 minute delay between issuance and availability of the OCSP response?

Flags: needinfo?(cadecairns)

Hi Andrew,

GTS is now able to provide OCSP services for all non-expired certificates issued by our primary CA platform, however factors beyond our control in services we depend upon could potentially delay achieving global consistency.

We expect status information for new issuances to be globally available for all certificates within 10 seconds 99.9% of the time and within two minutes 99.99% of the time. Issues such as network latency or a datacenter problem could cause delays. In the event that a GET request was made for a newly-issued certificate before its status information is available from the responder, the OCSP “unauthorized” response could be cached for up to five minutes with our current configuration.

Flags: needinfo?(cadecairns)

I will close this next Wed. 18-May-2022 unless there are additional issues to discuss.

Google Trust Services is monitoring this bug for any additional updates or questions.

Status: REOPENED → RESOLVED
Closed: 11 months ago8 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
You need to log in before you can comment on or make changes to this bug.