Bug 1910805 Comment 59 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

(In reply to DigiCert from comment #58)
> Comment 12: We start the revocation clock when we receive notice of an incident sufficient to indicate a problem or, if detected internally, when we are aware that revocation is required. This is the only logical conclusion as overly vague notices may require more than 24 hours before we can even confirm there is an issue. This is consistent with the Baseline requirements section 4.9.1 that requires us to revoke when we obtain evidence or are aware of an issue.

I'd like to clarify that my question was more what specific dates and times that would mean in this incident as there seems to be multiple clocks ongoing by DigiCert's analysis of the situation. For other incidents to time the clock starts on is notification of an incident occurring in the CA's environment (CPR received and read, issue confirmation is iffy per-CA), but not when full collation of the certificates occurs. If it were the latter then 'finding' the last certificate to delay the clock starting would be rather beneficial to a CA and hard for an outsider to discover.

Consider that there have been delayed revocation incidents historically where a CA has found a batch of certificates they missed but they did not start a fresh clock for them. Given that is how other CAs operate, it should be considered normal practice.

> > I have a meta-question: has DigiCert reviewed previous delayed revocation incidents for any interesting questions that also apply to this incident? What are the answers? 😊 
> 
> Yes – we routinely review delayed revocation bugs. There aren’t interesting answers. Adoption of ACME slowly improves and crypto-agility is spreading. Hostility to delayed revocation is definitely the biggest change when you consider incidents such as this one: https://bugzilla.mozilla.org/show_bug.cgi?id=1715672. Delayed Revocation bugs are relatively new.  Or this one: https://bugzilla.mozilla.org/show_bug.cgi?id=1800756 which didn’t really gather any attention. 

Incidents from 2021 and 2022 are a poor metric to judge appropriate compliance in 2024 and onwards, especially given clarification in guidance has been sought and provided multiple times since then. Would it be fair to assert that these are the most recent incidents that DigiCert finds noteworthy for compliance guidance according to their internal assessments? I'm trying to figure out why these in particular were highlighted and considered worth mentioning to be clear.

> > How many subscribers and certificates were affected by "exceptional circumstances" (assuming, arguendo, that any such circumstances can exist)? 
> 
> Unknown. By the time we parsed into exceptional vs. non-exceptional, all certificates were revoked. 
> ...
> > How many subscribers claimed how many certificates were "exceptional" for which DigiCert disagreed? 
> 
> Unknown – we revoked all certificates before the analysis was completed. 

Given that Comment 10 makes it clear that data was gathered how is this possible? I appreciate that there may be an imperfect collection of data, but surely there must be a non-zero amount of certificates/subscribers marked with clear 'exceptional circumstances' reasons to provide a lower boundary?

> > What analysis is there of the risk to relying parties and the public by not revoking over 83,000 certificates on time? 
> 
> This is a trap question. Per Mozilla’s previous policy: Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable.

For some clarity that non-compliant certificate quote is a partial excerpt here is the full one from the 'Responding to an Incident' Mozilla wiki from when this incident occurred:
>The decision and rationale for delaying revocation will be disclosed in the form of a preliminary incident report immediately; preferably before the BR-mandated revocation deadline. The rationale must include detailed and substantiated explanations for why the situation is exceptional. Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable. When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.

Particular attention, is as always, to the last sentence. We've had incidents historically where a CA has asserted that certificates were used in air traffic control, and other dangerous scenarios. It is unfortunately not a trap question, even if ideally it were one and none of these scenarios existed.

Please understand that we do not know the environment that DigiCert's customers utilize their public certificates, and while DigiCert doesn't have full oversight they should have a rough idea given the data gathered in this exercise. Either there was risk analysis performed, or there was not.

There isn't going to be any fallout for a yes or no response to that question. We're trying to figure out how to improve things going forward and any partial analysis would be beneficial. We've been told that data was collected and a report would appear.

(In reply to Tim Hollebeek from comment #10)
> Now that all of the certificates have been revoked, we're taking a careful look at the data we collected during the replacement effort. This is a good opportunity for us to provide some high-quality data about the current agility, or lack thereof, of the webpki as it exists today, and what the challenges actually are so that we can discuss pragmatic steps that improve the situation for everyone.

Did this ever occur, and if so could you point at any potential public reports that would help the community? It would be very beneficial to answering the questions that are considered 'Unknown'.
(In reply to DigiCert from comment #58)
> Comment 12: We start the revocation clock when we receive notice of an incident sufficient to indicate a problem or, if detected internally, when we are aware that revocation is required. This is the only logical conclusion as overly vague notices may require more than 24 hours before we can even confirm there is an issue. This is consistent with the Baseline requirements section 4.9.1 that requires us to revoke when we obtain evidence or are aware of an issue.

I'd like to clarify that my question was more what specific dates and times that would mean in this incident as there seems to be multiple clocks ongoing by DigiCert's analysis of the situation. For other incidents the time the clock starts on is notification of an incident occurring in the CA's environment (CPR received and read, issue confirmation is iffy per-CA), but not when full collation of the certificates occurs. If it were the latter then 'finding' the last certificate to delay the clock starting would be rather beneficial to a CA and hard for an outsider to discover.

Consider that there have been delayed revocation incidents historically where a CA has found a batch of certificates they missed but they did not start a fresh clock for them. Given that is how other CAs operate, it should be considered normal practice.

> > I have a meta-question: has DigiCert reviewed previous delayed revocation incidents for any interesting questions that also apply to this incident? What are the answers? 😊 
> 
> Yes – we routinely review delayed revocation bugs. There aren’t interesting answers. Adoption of ACME slowly improves and crypto-agility is spreading. Hostility to delayed revocation is definitely the biggest change when you consider incidents such as this one: https://bugzilla.mozilla.org/show_bug.cgi?id=1715672. Delayed Revocation bugs are relatively new.  Or this one: https://bugzilla.mozilla.org/show_bug.cgi?id=1800756 which didn’t really gather any attention. 

Incidents from 2021 and 2022 are a poor metric to judge appropriate compliance in 2024 and onwards, especially given clarification in guidance has been sought and provided multiple times since then. Would it be fair to assert that these are the most recent incidents that DigiCert finds noteworthy for compliance guidance according to their internal assessments? I'm trying to figure out why these in particular were highlighted and considered worth mentioning to be clear.

> > How many subscribers and certificates were affected by "exceptional circumstances" (assuming, arguendo, that any such circumstances can exist)? 
> 
> Unknown. By the time we parsed into exceptional vs. non-exceptional, all certificates were revoked. 
> ...
> > How many subscribers claimed how many certificates were "exceptional" for which DigiCert disagreed? 
> 
> Unknown – we revoked all certificates before the analysis was completed. 

Given that Comment 10 makes it clear that data was gathered how is this possible? I appreciate that there may be an imperfect collection of data, but surely there must be a non-zero amount of certificates/subscribers marked with clear 'exceptional circumstances' reasons to provide a lower boundary?

> > What analysis is there of the risk to relying parties and the public by not revoking over 83,000 certificates on time? 
> 
> This is a trap question. Per Mozilla’s previous policy: Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable.

For some clarity that non-compliant certificate quote is a partial excerpt here is the full one from the 'Responding to an Incident' Mozilla wiki from when this incident occurred:
>The decision and rationale for delaying revocation will be disclosed in the form of a preliminary incident report immediately; preferably before the BR-mandated revocation deadline. The rationale must include detailed and substantiated explanations for why the situation is exceptional. Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable. When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.

Particular attention, is as always, to the last sentence. We've had incidents historically where a CA has asserted that certificates were used in air traffic control, and other dangerous scenarios. It is unfortunately not a trap question, even if ideally it were one and none of these scenarios existed.

Please understand that we do not know the environment that DigiCert's customers utilize their public certificates, and while DigiCert doesn't have full oversight they should have a rough idea given the data gathered in this exercise. Either there was risk analysis performed, or there was not.

There isn't going to be any fallout for a yes or no response to that question. We're trying to figure out how to improve things going forward and any partial analysis would be beneficial. We've been told that data was collected and a report would appear.

(In reply to Tim Hollebeek from comment #10)
> Now that all of the certificates have been revoked, we're taking a careful look at the data we collected during the replacement effort. This is a good opportunity for us to provide some high-quality data about the current agility, or lack thereof, of the webpki as it exists today, and what the challenges actually are so that we can discuss pragmatic steps that improve the situation for everyone.

Did this ever occur, and if so could you point at any potential public reports that would help the community? It would be very beneficial to answering the questions that are considered 'Unknown'.

Back to Bug 1910805 Comment 59