Closed Bug 1910805 Opened 1 year ago Closed 7 months ago

DigiCert: Delayed revocation of 1910322

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jeremy.rowley, Assigned: dcbugzillaresponse)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36

Steps to reproduce:

This is a preliminary report.

Digicert posted a bug due to an issue where an underscore was not appended to the start of a random value when using CNAME for domain control validation https://bugzilla.mozilla.org/show_bug.cgi?id=1910322.

DigiCert was working to revoke all of the certificates in 24 hours, but after discussions with the relevant root programs and the community about the impact of such an action, DigiCert has decided to delay revocation and will revoke all certificates within the next 120 hours.

July 29. 2024 02:17– DigiCert files the preliminary incident report. https://bugzilla.mozilla.org/show_bug.cgi?id=1910322

July 29, 2024 22:36 – DigiCert identifies impacted certificates and lets them know we will be revoking within 24 hours.

July 30 2024 11:33 - DigiCert receives notice customer has filed for a Temporary Restraining Order (TRO) against revocations.

July 30 2024 2:10 – 12:56 UTC DC talked to multiple root programs and customers to discuss critical infrastructure impact and the exceptional circumstances to this revocation.

July 30, 2024 19:01– Court grants TRO against DigiCert, prohibiting revocation. https://www.courtlistener.com/docket/68995396/alegeus-technologies-llc-v-digicert/.

July 30 2024 19:52 – DigiCert made the decision to delay revocation up to 120 hours. All certificates impacted by this incident, regardless of circumstances, will be revoked no later than Saturday, August 3rd 2024, 19:30 UTC.

All certificates missed the 24 hour deadline as we were not tracking any allowed delays. Untangling who actually required a delay vs. who did not will take time. We plan to post an updated list of serials that are not revoked, post the reason they were not revoked, and revoke anyone not needing a delay tomorrow.

Component: CA Certificate Root Program → CA Certificate Compliance

"but after discussions with the relevant root programs and the community about the impact of such an action"

Can you please point to the community discussions, and also detail the discussions with root programs?

I'd encourage those representatives from those programs who monitor Bugzilla to post details of their communications with DigiCert here, too.

When did you (formally) receive the court order? Do I understand it correctly that the court order was granted/received after the 24 hour deadline?

Flags: needinfo?(jeremy.rowley)
Assignee: nobody → jeremy.rowley
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [leaf-revocation-delay]

We received the court order at the time listed above. The court order was granted on July 30, 2024 18:50. The 24 hours expired at 22:36 - 24 hours from when we had the serial numbers. There's ambiguity in the guidelines about when the clock should start, but that's the start time we've always used.

It's surprising and upsetting that DigiCert said they did "not expect" to disregard the Baseline Requirements like so often happens when revocation is required, then did an about-face and trotted out the same old list of reasons CAs give to delay revocation, plus an extremely alarming new reason.

Would have been disappointing but less surprising if DigiCert had said as much from the beginning.

In response to comment #3

In the hours before the time DigiCert was originally planning to revoke, the Chrome Root Program was made aware of the acute challenges several major DigiCert customers were facing in completing replacement of affected certificates within the 24 hours required by the CA/Browser Forum TLS Baseline Requirements. DigiCert reached out to inform us of the situation, and ask about our view of delaying revocation.

We communicated:

  • We expect CAs to adhere to the CA/Browser Forum TLS Baseline Requirements, their policies and the Chrome Root Program Policy.
  • We do not have the authority to grant exceptions to the CA/Browser Forum TLS Baseline Requirements. These are consensus-driven requirements not owned by any one organization.
  • That DigiCert was best positioned to weigh the risks and ecosystem impacts of either revoking affected certificates following strict adherence to the TLS Baseline Requirements or delaying revocation (as has been the case with all similar incidents in the ecosystem, and consistent with past incident responses by our team) and that at no point was Chrome demanding a specific course of action.
  • If revocation was delayed, a subsequent incident report would need to be opened (in alignment with the Chrome Root Program Policy and as we see here in this report).
  • The delayed revocation of certificates in this incident report alone is not cause for enforcement action; we always consider the wider context and the factors significant to the Chrome Root Program.
  • We would continue to evaluate the incident as more information becomes available. We requested that they post more detail to Bugzilla about the causes for a delay.

These communications were in line with what we posted on the original incident report.

We’ve previously voiced our concern with delayed revocation becoming routine rather than exceptional. In this case we concur that Subscribers were facing real challenges with completing their certificate replacement, and that had major implications for critical infrastructure, and note DigiCert's efforts for limiting further delay to only exceptional circumstances over the next few days.

Blocks: 1911183

DigiCert has revoked all 83,267 certificates affected by bug 1910322, as of Saturday, August 3, 2024, 20:47 UTC. In addition, DigiCert has discovered 1,308 S/MIME certificates that are affected, and these will be revoked by Friday, August 9 at 20:30 UTC.

The previously mentioned legal issues have been resolved between the parties. We will be providing more information later about some of the situations we encountered and reasons we received for being unable to replace within 24 hours to help inform future discussions about possible solutions, but the #1 reason is still the same one that’s affected customers for years: the vast majority of organizations in the industry still do not use automation to issue, maintain, and replace their certificates. This needs to change or progress is impossible.

However, this should not overshadow the hard work of many people, both at DigiCert and at the affected organizations who worked hard to make this week go as smoothly as possible. I’d like to thank the root programs for their assistance and consultation as we worked to get this resolved on as short a timescale as possible.

As we noted in bug 1910322, all of the S/MIME certificates are now revoked as well.

Now that all of the certificates have been revoked, we're taking a careful look at the data we collected during the replacement effort. This is a good opportunity for us to provide some high-quality data about the current agility, or lack thereof, of the webpki as it exists today, and what the challenges actually are so that we can discuss pragmatic steps that improve the situation for everyone.

Incident Report

Summary

Digicert posted a bug due to an issue where an underscore was not appended to the start of a random value when using CNAME for domain control validation https://bugzilla.mozilla.org/show_bug.cgi?id=1910322.
DigiCert was working to revoke all of the certificates in 24 hours, but after discussions with the relevant root programs and the community about the impact of such an action, DigiCert revoked all certificates within 120 hours.

Impact

DigiCert revoked 83,267 certificates in 5 days, instead of 24 hours as required by the current Baseline Requirements.

Timeline

All times are UTC.
July 29. 2024 02:17– DigiCert files the preliminary incident report. https://bugzilla.mozilla.org/show_bug.cgi?id=1910322
July 29, 2024 22:36 – DigiCert pulls the data on impacted certificates and sends out a communication to customers about the revocation requirement. The alert informs the customer that revocation will occur within 24 hours of when the certificates were discovered.
July 30 2024 11:33 - DigiCert receives notice customer has filed for a Temporary Restraining Order (TRO) against revocations.
July 30 2024 2:10 – 12:56 UTC DC talked to multiple root programs and customers to discuss critical infrastructure impact and the exceptional circumstances to this revocation.
July 30, 2024 19:01– Court grants TRO against DigiCert, prohibiting revocation. https://www.courtlistener.com/docket/68995396/alegeus-technologies-llc-v-digicert/.
July 30 2024 19:52 – DigiCert made the decision to delay revocation up to 120 hours. All certificates impacted by this incident, regardless of circumstances, will be revoked no later than Saturday, August 3rd 2024, 19:30 UTC.
July 30, 2024 23:12 – DigiCert files this delayed revocation bug.
July 31 2024 – DigiCert attempts to separate out customers into exceptional circumstances. This information was not being tracked prior to July 30 as we did not intend to delay revocation. Customers start mass reporting exceptional circumstances.
Aug 1 2024 – Reviewing the exceptional circumstances overwhelms the team and a decision is made to delay all revocations until Aug 3 except where the customer has signified that the replacement is complete.
Aug 3 2024 19:30 – All TLS certificates revoked.
Aug 7 2024 17:47 – Final list of affected S/MIME certificates.
Aug 9 2024 21:48 – All S/MIME certificates revoked.

Root Cause Analysis

The root cause of the delay was that 24 hours was extremely short to replace certificates for how many certificates were impacted. We found that many customers did not see the notice within the 24-hour timeframe nor were most customers able to replace certificates. We sent notices via email, phone, status page, and an update within the management console. The lack of automation implementation by customers, such as ACME, and lack of support for ARI in our system was certainly a factor. Although most customers likely did not qualify for exceptional circumstances, because we were rejecting all delayed revocation requests, we did not track or evaluate what qualified as exceptional.
This plan changed after speaking to the community and customers where exceptional circumstances were found. After we decided a delayed revocation was warranted, we attempted to determine which customers had exceptional circumstances. However, unraveling which customers had exceptional circumstances vs. ordinary circumstances competed for time with getting certificates replaced as fast as possible. Although we attempted to gather reasons for customer delays in revocation, reviewing the reasons and determining whether a use case was exceptional took so much time that we hit the updated revocation drop-dead timeline before we were able to do so.

Lessons Learned

  • The industry needs a better notification process . Some customers thought the email notification was phishing. Others claimed not to receive it. We added banner messaging to the system, but not all customers logged into their accounts during the 5-day period, and those that did not missed the banner message.
  • Some customers that use automation are still unable to replace certificates within 24 hours

What went well

  • DigiCert revoked 83,267 web certificates in 5 days
    We had good community engagement during the process and ensured people were informed on the status.

What didn't go well

  • Some notifications were not received, so many customers were unaware of the revocation
  • Most of these customers still don’t use automation

Where we got lucky

Action Items

Action Item Kind Due Date
Implement ARI Prevent 2038-01-19

Appendix

Details of affected certificates

See populations on bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1910322

Can DigiCert explain when they consider the clock to have started for the 24h/5d revocation period? I presume the action item is a placeholder as well given it doesn't cover the scope of the issues.

This incident report doesn't offer any reasons to think that DigiCert intends to or will be organizationally capable of revoking a nontrivial number of certificates within 1 day in the future.

One lesson from this incident seems to be that notifying customers of upcoming revocation is misguided, difficult to scale and works against compliance and the integrity of the web PKI. If DigiCert had chosen to revoke all certificates immediately, before notifying customers, this whole affair would have been avoided.

Tim, at the risk of "mis-restating" your responses (per bug 1896053 comment 53), I expect you didn't intend for this bug's sole action item to be more than thirteen years in the future. Perhaps you can help the community with the real due date.

Yeah, Tim, sorry about that date. The right one is 2024-12-15.

Tim, can you please provide a list of the certificates that you viewed as in scope for the TRO from the District Court?

  1. I have a meta-question: has DigiCert reviewed previous delayed revocation incidents for any interesting questions that also apply to this incident? What are the answers? :-)

  2. How many subscribers and certificates were affected by "exceptional circumstances" (assuming, arguendo, that any such circumstances can exist)?

  3. What were those circumstances?

  4. How many subscribers claimed how many certificates were "exceptional" for which DigiCert disagreed?

  5. What analysis is there of the risk to relying parties and the public by not revoking over 83,000 certificates on time?

  6. Beyond implementing ARI -- as an optional feature? -- what will be done to prevent this situation from recurring?

Finally, a question for Google and any other root programs involved in the discussions: can you elaborate on the situation from your perspective?

From Mozilla - we are working on a draft that would revise our policy regarding delayed revocations and will circulate it on the m-d-s-p list for comment when it is ready.

Flags: needinfo?(jeremy.rowley)

To what extent does DigiCert anticipate that the single action item provided in the incident report -- "implement ARI" -- will actually prevent the problems described in this incident from occurring again in the future? The incident report notes that "Most of these customers still don’t use automation", a situation which would seem to imply that ARI would not, in fact, assist these customers in any meaningful way.

I would ask that the action items section of this incident report be expanded to provide action items which will clearly mitigate all of the issues identified.

Flags: needinfo?(tim.hollebeek)
Assignee: jeremy.rowley → tim.hollebeek
Whiteboard: [ca-compliance] [leaf-revocation-delay] → [ca-compliance] [leaf-revocation-delay] Next update 2024-10-01

This is a complex issue that is being actively discussed in the industry. We had a long discussion in Bergamo about the current situation and the fact that it makes exactly zero people happy, but proposals on how to move forward are all over the place and still being very actively discussed. I would refer people who haven't been following closely to those discussions, and the discussions that will inevitably happen in Seattle, for more background.

Attempting to resolve these industry-wide problems that have existed and been debated for almost a decade now through this particular bugzilla is probably not productive. It would be best if we focused incident discussions on discussing the particular incident in question. There are plenty of other opportunities (MDSP, CABF, etc) for discussing the wider issues in the industry.

Flags: needinfo?(tim.hollebeek)

I'm dismayed at the apparent implication that the broader Mozilla community is somehow a second-class citizen in these issues, given that seemingly the relevant part of the discussion surrounding these issues is happening in venues that the vast majority of the Mozilla community are not able (due to a lack of corporate sponsorship for their work) or willing (due to location, concerns around climate impact, etc) to attend.

I believe that my question, if Comment 20 is intended as a response to same, is relevant to this particular incident, given that my question arises directly from the incident report provided by Digicert in this particular bugzilla.

[In response to Comment 17]

Hi Matt,

In general, the Chrome Root Program continues to prioritize improving agility and resilience across the Web PKI such that revocation events, for whatever underlying reason, are less disruptive to the respective subscriber organizations, relying parties, and the Internet ecosystem as a whole.

While we’ve taken steps to promote agility across the Web PKI (e.g., standardizing short-lived certificates within the TLS BRs) and established expectations related to the availability of automation solutions for applicants to the Chrome Root Store, incidents such as this, and many others over the last 6+ months, are a stark reminder that there’s more work to be done.

While optimistic about ARI, our understanding is that its adoption is still limited (by both CAs and subscribers). In the absence of broader adoption, its ability to positively impact large-scale revocation events will be limited (though, being better prepared is still better). We’re eager to explore ways of increasing ARI adoption (and other solutions like it) and would be interested in community feedback on how to do so.

We describe some other ideas on our "Moving Forward, Together" page - and we’ll continue to collaborate with members of the community as we pursue a more resilient web.

Personally, I also wonder about the opportunity for “blue sky" innovation(s). For example, if we could imagine an ACME client/server relationship that successfully performed multiple DCV methods during a single certificate request workflow, might we be able to make a case that revocation was not required given 1) the circumstances of an incident and 2) the circumstances of the previous validation process?

High-level Example: Assume…

  • I own ryandickson.com.
  • I make a request to a CA that included demonstrating control over the requested domain using TLS BRs Methods 3.2.2.4.7, 3.2.2.4.19, and 3.2.2.4.20.
  • All three methods succeeded, a certificate is issued.
  • Weeks later, we learn one of those methods was flawed (e.g., like the “_" issue reported in 1910322.
  • Given Methods 3.2.2.4.19, and 3.2.2.4.20 were completed successfully at the time of the initial request and are not subject to the same failure mode described in 1910322, bypass revocation resulting from the 3.2.2.4.7 flaw.

The above approach would require more careful consideration and discussion within the community if it were to be pursued more seriously, but it's something I've been thinking about lately.

-Ryan

Bug 1910322 comment 33 didn’t answer my questions from bug 1910322 comment 28. I’m moving the conversation to this bug because it’s about the delayed revocation.

The TRO dismissal is already public record (https://storage.courtlistener.com/recap/gov.uscourts.utd.149707/gov.uscourts.utd.149707.9.0.pdf).

I was aware of the TRO dismissal, and that wasn’t the point of the original question. The point was to understand what actions DigiCert took immediately after receiving the TRO to defend its ability to perform a mandatory revocation in a timely manner. Even if a 24-hour revocation is rendered impossible by the court’s action, there is still a radical difference between revoking before 48 hours have elapsed and waiting the full 120 hours.

DigiCert had enough time to file a motion to dissolve that contained important information missing from the Alegeus filing, including the legal language granting DigiCert the right to revoke certificates on time and passages from the BRs indicating that the deadline was in no way arbitrary. It is reasonable to believe such a filing could have made a difference to DigiCert’s ability to revoke far in advance of the moment when it appears revocation finally occurred, and in the case of a delayed DCV revocation, this is a meaningful difference.

So to rephrase my first question from bug 1910322 comment 28, we cannot find any filing from DigiCert with the District Court in the attempt to defend DigiCert’s right to follow the BR-mandated behaviors to which Alegeus presumably previously agreed.

Question 1: Did DigiCert make such a filing? If so, please point us to it or post it on this incident.

our terms are easily found on our website

Specificity matters here. The point of my second question was to secure the specific wording from the specific agreement to which Alegeus had agreed, and to confirm that DigiCert indeed correctly (as I’m sure you did) had a binding agreement for revocation in effect with Alegeus when the filing with the District Court occurred.

You have a document called “DIGITAL CERTIFICATE SUBSCRIBER AGREEMENT” at https://www.digicert.com/content/dam/digicert/pdfs/legal/GSA-Normal-Subscriber-Agreement.pdf which states in part,

2.7. Certificate Revocation. DigiCert may revoke a Certificate, without notice, for the reasons stated in the CPS, including if DigiCert reasonably believes that:

(i) Applicant requested revocation of the Certificate or did not authorize the issuance of the Certificate;
(ii) Applicant has materially breached this Agreement or an obligation it has under the CPS;
(iii) Applicant is added to a government list of prohibited persons or entities or is operating from a prohibited destination under the laws of the United States;
(iv) the Certificate contains inaccurate or misleading information;
(v) the Private Key associated with a Certificate was disclosed or Compromised; (vi) this Agreement terminates;
(vii) industry standards or DigiCert’s CPS require Certificate revocation,
(viii) the Certificate was (a) used outside of its intended purpose, (b) used to sign malicious code or software that is downloaded to a computer without the user’s consent, (c) used or issued contrary to law, the CPS, or applicable industry standards, or (d) used, directly or indirectly, for illegal or fraudulent purposes; or
(ix) revocation is necessary to protect the rights, confidential information, operations, or reputation of DigiCert or a third party

It appears that DigiCert’s right to revoke in this circumstance is unambiguously established by sections vii, viii (c), and ix.

Question 2: Please confirm, as I expect, that Alegeus was bound at issuance time by one or more “Subscriber Agreements” or similar documents that granted DigiCert the right to revoke certificates according to the BRs.

Question 3: Is the above language the specific language that bound Alegeus for the certificates in question? If not, please provide the specific language that did bind Alegeus at that time.

Tim,

Answer 1: We made the filing at the time already noted in the public record (https://www.courtlistener.com/docket/68995396/alegeus-technologies-llc-v-digicert/).

Answer 2: Confirmed.

Answer 3: Our legal counsel has advised us that, related to this TRO and the incident involving Alegeus, we are not permitted to specify anything beyond what was publicly filed. We are happy to answer questions related to the incident itself, but any specific questions on the TRO will need to be referred to our legal counsel.

Your message and questions seem to imply that DigiCert did not do enough to override a court order to not revoke a specific subscriber’s certificate. You are free to your opinion, but I believe we acted promptly to resolve the issue and dissolve the court order. We will use this opportunity to learn how to better manage incidents and improve the revocation process, but, given that neither of us are lawyers, I don’t think we’re qualified to debate legal nuances here.

Hi Matt – I certainly did not intend to imply that the Mozilla community is a second-class citizen. Instead, I think the discussion should happen on MDSP with public contributions. There is already a thread where ideas are being discussed. I think we won’t resolve the fundamental problem with revocation on this bug, but we could resolve them on MDSP. We should have the conversation in the public forum where all CAs that had delayed revocations can participate.

Ryan – This is an idea we’ve been discussing internally and like quite a bit. Our engineering team has it on their backlog to facilitate verification using multiple methods. We are definitely interested in blue sky approaches towards preventing revocation and minimizing the impact of security issues and are happy to collaborate.

I think we won’t resolve the fundamental problem with revocation on this bug

My intention was not to attempt to resolve large-scope revocation issues within this bug. Instead, I was attempting to identify what I felt was a limitation in the action items that DigiCert presented in their report in regards to this specific incident.

If DigiCert does not have any specific ideas as to how the root causes of this incident can be ameliorated, it should make that clear, and if DigiCert intends to solicit the contributions of the MDSP community (or other communities of practice) in an attempt to identify comprehensive mitigations, then I would expect those activities to be clearly identified action items to be tracked and reported on.

I, too, am quite interested in the idea of using multiple independent methods of verification, and I don't see any reason to revoke a certificate if one method is found to be deficient, as long as at least one completed method is still valid.

Matt,

I get what you're saying. Let me think about it a little bit and see what I can do.

-Tim

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-10-01 → [ca-compliance] [leaf-revocation-delay] Next update 2024-10-31

Thanks Matt for clarifying what you were looking for. I did misunderstand what you were asking.

To address your questions, we do not have additional specific ideas on how to accelerate revocation to the 24 hour mark for all certificates, especially when systems are identified as critical and third parties are brought into the discussion. We are also unsure on how to address future restraining orders and legal action taken by subscribers that could limit compliance. We already started the discussion on MDSP, but that conversation ended up superseded by the new Mozilla revocation proposal. We are very supportive of this proposal and believe a go-forward industry-wide validity restriction on domain names is appropriate where the CABF requirements are missed. We are still exploring the effort to use multiple validation methods, but we do not plan to make it an action item for this bug because using multiple validation methods does not address the root issue of crypto-agility. Therefore, right now, the only action item we have for this bug is to implement ARI support. We would be happy to add additional items if there are ways we can address the root cause of delayed revocations moving beyond the 24 hours.

One action item I think would be useful is something addressing the (at the most generous interpretation of the timeline) 20 hours it apparently took to get a list of impacted certificates (July 29. 2024 02:17 to July 29, 2024 22:36). I'm sympathetic to the possibility that collecting the list of impacted certificates might not have been trivial, given the complex nature of the issue, and the variety of legacy systems that Digicert is apparently wrangling, but burning 20 of the permitted 24 hours to gather the certificate list seems like something that would be worth improving.

I think that a good meta-action item would be to comprehensively review the reasons given by customers for why they requested a delay in revocation. From the incident report, "Customers start[ed] mass reporting exceptional circumstances", which sounds like there's probably enough data already on hand to produce some sort of useful conclusions.

Assuming that the reasons given by customers are anything other then "because we weren't notified immediately" (which implementing ARI would solve), then additional action items could be identified to address the given reasons. Some hypothetical examples, off the top of my head:

  • "transition certificates for systems that cannot possibly meet the BRs off the WebPKI to another PKI" for the "laws don't let us change certs quickly!" and "our CAB only meets once a quarter!" people;
  • "warn on/refuse issuance of certificates for names that still return HPKP headers to HTTPS requests", to catch at least the lowest of the low-hanging legacy pinning fruit;
  • "limit the timespan that a public key can be used in certificates", to discourage end-entity key pinning;
  • "issue from multiple intermediates at random", to discourage intermediate pinning; and
  • undoubtedly the weakest of all, but at least it's relatively easy to do -- "education campaign to discourage practices that contribute to requests for delayed revocation", the content of which would presumably be informed by the information gleaned from the analysis of excuses given.

(In reply to Tim Hollebeek from comment #28)

…we do not have additional specific ideas on how to accelerate revocation to the 24 hour mark for all certificates, especially when systems are identified as critical and third parties are brought into the discussion. We are also unsure on how to address future restraining orders and legal action taken by subscribers that could limit compliance.
. . .
Therefore, right now, the only action item we have for this bug is to implement ARI support. We would be happy to add additional items if there are ways we can address the root cause of delayed revocations moving beyond the 24 hours.

We can think of several action items that seem appropriate here:

  • Create a firm policy not to deliberately delay revocation accompanied by controls to ensure this policy is enforced and an unambiguous public commitment to this effect by DigiCert.
  • Establish a firm policy that outbound communications announcing mandatory revocations contain no suggestion that Subscribers can request delays, with controls in place to ensure compliance.
  • Add unambiguous language to all enterprise MSAs stating DigiCert’s right to revoke certificates at any time for any reason on any timeline.
  • Add language to all enterprise MSAs that any attempt to use the legal system to subvert proper CA behaviors is breach of contract.
  • Establish a firm, public policy that any Subscriber attempting to use the legal system to subvert proper CA operations will be banned from future certificate issuance from DigiCert, with controls in place to ensure compliance.

Other CAs with recent delayed revocation incidents, including Telia and Hongkong Post, have created policies and made public commitments to discontinue the practice of delaying revocation. Based on the last sentence of the quoted passage above, I’m hopeful that DigiCert has come around to understanding it should follow their example and do the same.

Thanks Tim and Matt. We really appreciate the feedback and suggestions. There’s a couple of misconceptions that I’d like to address before answering your specific questions.

Pulling CA data is not a difficult task. We consolidated our systems years ago and created a central data lake for all publicly trusted certificates. Because everything is centralized, pulling information from the data lake and cross-referencing the CA data with contact information can be done quickly. However, the data lake is non-relational and access restricted, meaning that only the BI team has the proper ability to pull information. During this incident, discovery of the issue occurred when the BI team was not readily available. Once we contacted this team and the BI team began to pull data, the discovery process went quickly. Although there are lots of opinions on how to count time on CA bugs, 20 hours marks the time spent reviewing data to sending emails. Sending emails can take time when large volumes are impacted.  

We deprecated most of our legacy systems several years ago, around 2019 when this issue was introduced. In fact, deprecation of legacy systems contributed to this bug being introduced into the validation system. The only remaining legacy system we have that issues publicly trusted certificates are Quovadis systems. We are actively working to decommission these systems. Currently, all non-Quovadis public issuance is through two different CAs – one that issues TLS and code signing with another that issues SMIME. These two CAs were separate before adoption of the BRs and kept separate partially to avoid overlapping SMIME and TLS certs and their respective requirements. With the S/MIME BR adoption, we would like to consolidate the two CAs. We do not have a timeline for this consolidation but that isn’t really relevant to this bug. The important note is that legacy systems were not a factor in this delayed revocation. The mistake was made on our go-forward systems at the time when we were consolidating our systems away from Symantec systems.  

Although we are aware that some customers pinned certificates, certificate pinning was not a reason for the delay. When we were gathering information on why customers could not replace their certificates within 24 hours, the major concern cited was “not enough time”. This isn’t an interesting reason or valid reason. However, when the decision was made to delay revocation for 1 customer, we had insufficient time to deep-dive into every customer’s stated reason for delay. We could not timely separate out what “exceptional circumstances” vs ordinary non-exceptional circumstances. Allowing an exception for one customer permitted many customers to start requesting delays for exceptional circumstances and critical infrastructure. Of course, not everyone claiming exceptional circumstances had truly exceptional circumstances, but sorting through the requests before the 24 hour timeline passed was not feasible. Analyzing the reasons provided won’t be terribly productive given that all the certificates were revoked within five day and the underlying root cause is the lack of automation adoption, regardless of the reason submitted to DigiCert. 

To address your specific suggestions:

"transition certificates for systems that cannot possibly meet the BRs off the WebPKI to another PKI" for the "laws don't let us change certs quickly!" and "our CAB only meets once a quarter!" people; 

DigiCert has promoted moving off public trust for all non-Web PKI certs since 2017. We’ve announced this in most major conferences we’ve held or attended. We constantly talk to customers about separating public and private trust during contract negotiations and in public forums. Our CPS states: “Customers should also avoid mixing certificates trusted for the web with non- web PKI.” As this is a customer education task that will continue with no foreseeable end-date, an action item here will not accelerate customer adoption of this viewpoint.

"warn on/refuse issuance of certificates for names that still return HPKP headers to HTTPS requests", to catch at least the lowest of the low-hanging legacy pinning fruit; 

I like this idea, especially because such a check will make a system responsible for verification that pinning isn’t an issue instead of human review. I will speak with our engineering team about how long this might take to implement. However, as noted above, pinning was not a major reason for delayed revocations, and implementing this will only remediate a (very) small percentage of customers. This would not be a great action item for this bug as pinning was not a significant reason for the delay.

"limit the timespan that a public key can be used in certificates", to discourage end-entity key pinning; 

We support the industry moving to 90-day certificates. Again, pinning wasn’t the major issue. A lack of automation adoption is the primary culprit. Moving towards 90 day certs will help solve this lack-luster adoption of automation by most large enterprises. Shorter validity periods forces automation, leading to better crypto-agility and eliminating the real reason for delayed revocations.

"issue from multiple intermediates at random", to discourage intermediate pinning; and 

The lack of automation adoption by large and highly-regulated enterprises is the biggest issue in delayed revocation, not pinning. Offering the services for free hasn’t helped improve automation. Even those with automation cannot replace certificates in 24 hours because most have not adopted ARI. This is why promoting ARI is the biggest thing CAs can do to meet the 24-hour requirement.

undoubtedly the weakest of all, but at least it's relatively easy to do -- "education campaign to discourage practices that contribute to requests for delayed revocation", the content of which would presumably be informed by the information gleaned from the analysis of excuses given. 

Agreed on the need for better education around why short timelines for revocation is important. We will be talking about the new Mozilla policy with our customers and informing them of the consequences of delayed revocation.

Create a firm policy not to deliberately delay revocation accompanied by controls to ensure this policy is enforced and an unambiguous public commitment to this effect by DigiCert. 

I don’t think we can say that delayed revocation will never happen again. We are excited for the (much improved) Mozilla policy, which allows delayed revocation in certain cases and with set consequences. We believe the consequences under that policy are fair and appropriate while still allowing emergency delays in revocation where systems will be impacted. We will follow whatever rules the browsers set, with the correct approach being to update the BRs with better expectations. We support the hard work the browsers and CAB Forum members were doing and appreciate the excellent discussions led by Mozilla. We also agree with Chrome’s points made this week that the existing rules should be followed regarding future incidents. We cannot predict in advance what the next issue would be, and support Chrome's reasonable position that delayed revocations should be reported and discussed.

Establish a firm policy that outbound communications announcing mandatory revocations contain no suggestion that Subscribers can request delays, with controls in place to ensure compliance. 

I think this contradicts the Mozilla policy and ignores the fact that browsers can request delayed revocation in exceptional circumstances. We will ensure all communication accurately reflects the applicable root store policies.

Add unambiguous language to all enterprise MSAs stating DigiCert’s right to revoke certificates at any time for any reason on any timeline. 

This already exists. Specifically, our Terms of Use that: “DigiCert may revoke a Certificate without notice for the reasons stated in the CPS”. Notice is never required nor is a timeline specified. However, I do not think CAs should be permitted to revoke without reason. This leads to the silly actions we’ve seen in the past where CAs try to leverage revocation as a way to retain customers.

Add language to all enterprise MSAs that any attempt to use the legal system to subvert proper CA behaviors is breach of contract. 

As I’ve noted before, we are not lawyers but I do not think we can restrict the rights of people to use the legal system. We primarily operate in the US where the court system is readily available. I suggest that you consult your own legal counsel on whether this is an appropriate control for Sectigo with the understanding that not all lawyers are going to agree with one another.

Establish a firm, public policy that any Subscriber attempting to use the legal system to subvert proper CA operations will be banned from future certificate issuance from DigiCert, with controls in place to ensure compliance. 

This runs contrary to what the Mozilla policy states. As mentioned above, we like the improved Mozilla policy and will adhere to that when it is adopted.

Thank you both for your suggestions. I’ll get back to you with the response from our engineering team on checking for pinning.

Tim, apologies, I tried to read your last post in many viewers but the formatting made it incredibly difficult to do so, because of the formatting. I know this is not in any way a requirement, but can you please re-post using a more "reader friendly" format? Using code blocks for non-code or long sentences, is not displayed very kindly for humans by Bugzilla :)

Thanks!

Hi Tim,

(I'd like to echo Dimitris' observation that using a monospace, non-wrapping block makes it really hard to read what you've written)

Thanks for the further information. I've got some comments, as you might have guessed.

During this incident, discovery of the issue occurred when the BI team was not readily available. Once we contacted this team and the BI team began to pull data, the discovery process went quickly. [...] 20 hours marks the time spent reviewing data to sending emails. Sending emails can take time when large volumes are impacted.

A few thoughts come to mind from this paragraph:

  1. I feel that the incident timeline is lacking in sufficient detail to reflect the situation you're describing. Can you break down the 20 hours that these separate steps were in process into their constituent parts? That is, how much of the 20 hours went to each of "Finding BI team", "BI team finds impacted certificates/subscribers", "Reviewing data to sending emails", and "sending emails"?
  2. If the "BI team" (business intelligence, I assume?) is a critical part of responding to a misissuance incident, presumably one useful action item would be to make sure that sufficient members of that team are available in a timely fashion -- or else modifying processes to allow other suitably qualified individuals to access the data they need.
  3. If "sending emails" was a significant part of the 20 hours, perhaps a useful action item would be to review how much mail sending capacity is available to DigiCert at short notice.

Of course, not everyone claiming exceptional circumstances had truly exceptional circumstances, but sorting through the requests before the 24 hour timeline passed was not feasible.

That sounds to me like a possible action item could be "provide (or at least investigate the feasibility of providing) a system to allow customers (or customer-facing people within DigiCert, such as account managers) to request revocation delay in a standardised form to permit timely analysis of such requests".

Analyzing the reasons provided won’t be terribly productive given that all the certificates were revoked within five day

I don't think I understand the rationale here. The timeline of revocation seems orthogonal to the desire to identify root causes of delay requests, so that appropriate action items can be developed to mitigate the need for delay requests in the future.

the underlying root cause is the lack of automation adoption, regardless of the reason submitted to DigiCert.

To me, that sounds like a possible action item would be "retire all non-automated methods of certificate issuance", to further encourage the adoption of automation.

DigiCert has promoted moving off public trust for all non-Web PKI certs since 2017. We’ve announced this in most major conferences we’ve held or attended. We constantly talk to customers about separating public and private trust during contract negotiations and in public forums. Our CPS states: “Customers should also avoid mixing certificates trusted for the web with non- web PKI.” As this is a customer education task that will continue with no foreseeable end-date, an action item here will not accelerate customer adoption of this viewpoint.

I deliberately separated out "customer education" from "get non-WebPKI-compatible issuance off the WebPKI", because the approaches I was envisaging to do the latter go far beyond just screaming into the void. I would imagine that the process would be more along the lines of identifying customers who are requesting non-WebPKI-compatible issuance, and actively moving them to private PKIs -- and by "actively" I mean going as far as saying "DigiCert will no longer provide issuance to these systems as of $DATE". Identifying customers would presumably start with those who made revocation delay requests, and then move on to identifying common characteristics of those customers and finding other customers that have similar characteristics.

This would not be a great action item for this bug as pinning was not a significant reason for the delay.

Perhaps I'm misunderstanding the purpose of post-incident action items. To my mind, anything that is planned to be done to reduce the frequency, scope, severity, or MTTR of a future incident is a valid candidate for an action item. I'm getting the impression that you are of the opinion that there's some minimum "impact score" that is needed for something to qualify as an action item. Is my impression correct, or am I misunderstanding something else about your stance?

We support the industry moving to 90-day certificates. Again, pinning wasn’t the major issue. A lack of automation adoption is the primary culprit. Moving towards 90 day certs will help solve this lack-luster adoption of automation by most large enterprises. Shorter validity periods forces automation, leading to better crypto-agility and eliminating the real reason for delayed revocations.

I'm glad to hear that DigiCert supports moving to 90 day certificates. Perhaps "DigiCert adopts 90 day maximum validity period" should be an action item? It would probably have at least as much impact on automation adoption as implementing ARI.

I'm somewhat confused, though, since the above paragraph was in response to a suggestion to limiting the amount of time that a given public key could be used in certificate issuance. That is a somewhat orthogonal concern, as keys do not intrinsically have a concept of "validity period".

We are excited for the (much improved) Mozilla policy, which allows delayed revocation in certain cases and with set consequences. We believe the consequences under that policy are fair and appropriate while still allowing emergency delays in revocation where systems will be impacted.

Which policy is this? I haven't seen any proposed Mozilla policy that implies or states that delayed revocation is "allowed". Rather, my understanding of the recent discussions on mdsp have been around codifying consequences for certain classes of delayed revocation, which would seem to be the antithesis of "allowing" delayed revocation. I would hate for anyone -- CA, subscriber, or relying party -- to get the incorrect impression that Mozilla "allows" delayed revocation under any circumstance.

We will follow whatever rules the browsers set, with the correct approach being to update the BRs with better expectations.

This is a curious framing, given that the BRs tend to stick to describing what should be done, and don't venture into consequences for non-compliance. Are you suggesting that the CABF should be expanded to codify "baseline consequences" as well as "baseline requirements"?

Dimitris, sorry, that wasn't intentional. There must be some sort of markdown shortcut for code blocks that I accidentally triggered.

Hi Matt,

I think given the required revocation timing and the lack of any notice requirement in the BRs or Mozilla policy, discussion on the 20 hours required to gather all of the certificates, the contact information, get organized, and send emails won’t be that productive. Even reducing the timeline by half would not have given customers enough time to replace all of their certificates.

That sounds to me like a possible action item could be "provide (or at least investigate the feasibility of providing) a system to allow customers (or customer-facing people within DigiCert, such as account managers) to request revocation delay in a standardised form to permit timely analysis of such requests".

I would not want to include this in the product as delays in revocation are not allowed. Adding a form or something similar would convey that delayed revocations are permissible under the policy.

Perhaps I'm misunderstanding the purpose of post-incident action items. To my mind, anything that is planned to be done to reduce the frequency, scope, severity, or MTTR of a future incident is a valid candidate for an action item. I'm getting the impression that you are of the opinion that there's some minimum "impact score" that is needed for something to qualify as an action item. Is my impression correct, or am I misunderstanding something else about your stance?

Any action item needs to have “some” impact. Because key pinning was not a permitted reason for delayed revocation under the Mozilla policy and never a reason that we allowed delayed revocation, detection of pinning would not accelerate replacement.

I'm glad to hear that DigiCert supports moving to 90 day certificates. Perhaps "DigiCert adopts 90 day maximum validity period" should be an action item? It would probably have at least as much impact on automation adoption as implementing ARI.

We are happy to endorse a ballot that moves the industry to 90-day certificates.

Which policy is this? I haven't seen any proposed Mozilla policy that implies or states that delayed revocation is "allowed". Rather, my understanding of the recent discussions on mdsp have been around codifying consequences for certain classes of delayed revocation, which would seem to be the antithesis of "allowing" delayed revocation. I would hate for anyone -- CA, subscriber, or relying party -- to get the incorrect impression that Mozilla "allows" delayed revocation under any circumstance.

Yes. We like clarity in every policy we follow. We support policies that clearly prohibit delayed revocation without wiggle room. We also like ballots that clearly define the consequences of delaying revocation. This prevents customers from thinking there are exceptions to the BRs and prevents customers from escalating to browsers. The current policy’s focus on exceptional circumstances lacks clarity on exceptional circumstances and does not define the penalty for finding exceptional circumstances.

This is a curious framing, given that the BRs tend to stick to describing what should be done, and don't venture into consequences for non-compliance. Are you suggesting that the CABF should be expanded to codify "baseline consequences" as well as "baseline requirements"?

Consequences would be out of scope for the BRs as only the browsers can enforce the BRs. The browsers made that very clear in the early days of the CAB Forum. The BRs only have the teeth the browsers give it, therefore the browsers should define the consequences instead of having them incorporated into the BRs.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-10-31 → [ca-compliance] [leaf-revocation-delay] Next update 2024-11-30

(In reply to Tim Hollebeek from comment #31)

I don’t think we can say that delayed revocation will never happen again.

I’m quite sure you can say that DigiCert will not deliberately delay revocation again, which was explicit in the suggested action item. I concur that the possibility of accidental delay due to authentic technical or procedural error exists, as currently exhibited by bug 1924385. However, based on the CA community’s performance this year, accidental delrev does not appear to be the meaningful problem we need to address.
So to be clear, I am not suggesting that DigiCert make any promises it cannot keep. Just promises it can.

Establish a firm policy that outbound communications announcing mandatory revocations contain no suggestion that Subscribers can request delays, with controls in place to ensure compliance.

I think this contradicts the Mozilla policy and ignores the fact that browsers can request delayed revocation in exceptional circumstances. We will ensure all communication accurately reflects the applicable root store policies.

That, I must say, is an unexpected interpretation of Mozilla policy.

Question 1: Please elaborate on your rationale here. Where in Mozilla policy does it require that CAs encourage Subscribers to request active disobedience to BR requirements?

Question 2: By this line of reasoning, are CAs required to encourage Subscribers to ask them to disobey other BR requirements? What is the full set of BR requirements that in your view Mozilla policy forces CAs to invite Subscribers to fight against?

Question 3: Where in Mozilla policy does it state that Mozilla “can request delayed revocation in exceptional circumstances”? (We only see Section 6.1, which mandates that “CAs MUST also revoke any certificates issued in violation of the then-current version of this policy according to the timeline defined in section 4.9.1 of the TLS Baseline Requirements” without listing any sort of carve-out for Mozilla itself.)

Add unambiguous language to all enterprise MSAs stating DigiCert’s right to revoke certificates at any time for any reason on any timeline.

This already exists. Specifically, our Terms of Use that: “DigiCert may revoke a Certificate without notice for the reasons stated in the CPS”.

In fact, no. My suggested action item refers to the MSA specifically for a reason. Your “Terms of Use” is a different thing. The documents submitted as evidence by Alegeus Technologies in obtaining its rapidly granted temporary restraining order included its MSA with DigiCert but not the Terms of Use or any other agreement granting DigiCert the right to revoke certificates. Had this language been included in the MSA, the District Court may have gone in a different direction, or perhaps Alegeus would not have attempted the gambit at all.

This means there is a clear and obvious action item to mitigate the risk of delayed revocation due to future, improper legal actions, which is to include the right to revoke in the MSA. Your response does not account for this need. Perhaps now that I’ve explained it more clearly, DigiCert can add this action item.

Establish a firm, public policy that any Subscriber attempting to use the legal system to subvert proper CA operations will be banned from future certificate issuance from DigiCert, with controls in place to ensure compliance.

This runs contrary to what the Mozilla policy states.

This again is an untraditional interpretation of Mozilla policy.

Question 4: Please elaborate on your rationale. Where in Mozilla policy does it require CAs to offer certificates to specific parties?

Question 5: Do you interpret that Mozilla policy forces a CA to open certificate enrollment to any Subscriber under any circumstance, or are there specific qualities of a Subscriber that force the CA to make certificates available to it? What are these qualities, how does Alegeus meet them, and where is this specified in Mozilla policy?

As mentioned above, we like the improved Mozilla policy and will adhere to that when it is adopted.

You seem to be leaning on the proposed Mozilla policy as a substitute for addressing the specific failures in this incident. There are two clear problems with this approach.

  1. It is a proposal. It may never become enforced policy. If it does, it may be in modified form that no longer addresses the need. I am a big believer in action items that put success in the hands of the actor, rather than sitting around and hoping someone else solves the problem for you. DigiCert is able to make changes that would have prevented this violation of the BRs, had they been in effect. These changes will prevent repetition of this error. Passively waiting for someone else to clean up your messes is a poor practice for anyone, and especially for a public CA.
  2. It is a future event. Even if the Mozilla proposal does become policy, that will take time, perhaps significant time. In the meanwhile, DigiCert has no action items to address the root causes of this incident. There is no reason to believe you will behave correctly if a similar situation occurs again, should it be in advance of this possible browser action.

Question 1: Please elaborate on your rationale here. Where in Mozilla policy does it require that CAs encourage Subscribers to request active disobedience to BR requirements?

We did not say this. Please refrain from putting words in my mouth. I said the current Mozilla policy has language that allows for delayed revocation, not that CAs are required to encourage Subscribers to request disobedience. DigiCert requires, not encourages, compliance with the browser policies.

Question 2: By this line of reasoning, are CAs required to encourage Subscribers to ask them to disobey other BR requirements? What is the full set of BR requirements that in your view Mozilla policy forces CAs to invite Subscribers to fight against?

Again, please do not put words in my mouth. I do not appreciate this. CAs are not required to encourage Subscribers to disobey requirements, and I have never said this.

Question 3: Where in Mozilla policy does it state that Mozilla “can request delayed revocation in exceptional circumstances”? (We only see Section 6.1, which mandates that “CAs MUST also revoke any certificates issued in violation of the then-current version of this policy according to the timeline defined in section 4.9.1 of the TLS Baseline Requirements” without listing any sort of carve-out for Mozilla itself.)

In fact, no. My suggested action item refers to the MSA specifically for a reason. Your “Terms of Use” is a different thing. The documents submitted as evidence by Alegeus Technologies in obtaining its rapidly granted temporary restraining order included its MSA with DigiCert but not the Terms of Use or any other agreement granting DigiCert the right to revoke certificates. Had this language been included in the MSA, the District Court may have gone in a different direction, or perhaps Alegeus would not have attempted the gambit at all.

This shows a lack of understanding of how legal contracts and incorporation by reference works. Our Terms of Use are incorporated by reference into our MSA, so legally they are one and the same. We are not going to go back and forth with you anymore on legal process.

Question 4: Please elaborate on your rationale. Where in Mozilla policy does it require CAs to offer certificates to specific parties?

At this point, I think you are just making stuff up.

Question 5: Do you interpret that Mozilla policy forces a CA to open certificate enrollment to any Subscriber under any circumstance, or are there specific qualities of a Subscriber that force the CA to make certificates available to it? What are these qualities, how does Alegeus meet them, and where is this specified in Mozilla policy?

Again, we do not think the Mozilla policy says any of this, nor did I post anything that said as much. We believe the Mozilla policy expects entities to comply with the applicable law. Not sure why you are so fixated on one person filing a TRO that only lasted a few days.

Monitoring for updates.

We continue work on incident-reporting and compliance requirements aimed at reducing delayed revocation, so this bug will remain open until at least February 1, 2025. Meanwhile, CAs should review https://github.com/mozilla/www.ccadb.org/pull/186.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-11-30 → [ca-compliance] [leaf-revocation-delay] Next update 2025-02-01

I said the current Mozilla policy has language that allows for delayed revocation

You made a similar claim in response to one of my questions, and I asked at the time for references, which you did not answer. I'd like to repeat that request now: please cite the specific wording in current or proposed Mozilla policy which explicitly allows (as in, permits, approves of, or sanctions) delayed revocation under any circumstances.

Not sure why you are so fixated on one person filing a TRO that only lasted a few days.

I can't speak for Tim Callan, but my attention was caught by the TRO because it is an unprecedented event in the WebPKI, and humans are wired to have their attention drawn to novelty. Further, if allowed to proliferate, it would potentially be used by subscribers en masse to do an end-run around important technical security controls.

For myself, I've been extremely underwhelmed, and somewhat perplexed, by DigiCert's public response to that part of this incident. If I were running a CA, I would be extremely disconcerted by the TRO, because the options for trust stores to mitigate this hazard are limited, and extremely unpleasant for my hypothetical CA.

For example, a trust store might decide they have to reduce the required revocation time limit to "near-instant" (at least, quicker than any subscriber could possibly obtain a TRO), and institute extremely draconian mandatory consequences for delayed revocation, if for no other reason than to give CAs either sufficient legal top cover for challenging any possible legal action, or sufficient motivation to defy a court's order. Again, if I were running a CA, none of the available options for trust stores would be things I'd want to have happen, or even come within a million miles of even being considered.

Flags: needinfo?(tim.hollebeek)

You made a similar claim in response to one of my questions, and I asked at the time for references, which you did not answer. I'd like to repeat that request now: please cite the specific wording in current or proposed Mozilla policy which explicitly allows (as in, permits, approves of, or sanctions) delayed revocation under any circumstances.

The process for delayed revocation is found here https://wiki.mozilla.org/CA/Responding_To_An_Incident:

If your CA will not be revoking the certificates within the time period required by the BRs, our expectations 
are that: 

A separate incident report will be filed in Bugzilla. 

The decision and rationale for delaying revocation will be disclosed in the form of a preliminary incident 
report immediately; preferably before the BR-mandated revocation deadline. The rationale must include 
detailed and substantiated explanations for why the situation is exceptional. Responses similar to “we do 
not deem this non-compliant certificate to be a security risk” are not acceptable. When revocation is 
delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis. 

Any decision to not comply with the timeline specified in the Baseline Requirements must also be accompanied 
by a clear timeline describing if and when the problematic certificates will be revoked or expire naturally, and 
supported by the rationale to delay revocation. 

The issue will need to be listed as a finding in your CA’s next BR audit statement. 

Your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA 
participates in to ensure your analysis of the risk and plan of remediation is acceptable. 

You will perform an analysis to determine the factors that prevented timely revocation of the certificates, and
include a set of remediation actions in the final incident report that aim to prevent future revocation delays.

With regards to questions about the TRO, our legal folks are preparing a response.

I don't believe any of that language states that delayed revocation is acceptable, permitted, approved, or sanctioned. To my reading, it in fact clearly reiterates that it is not: "Any decision not to comply...". It is simply clarifying that delayed revocation is an incident like any other, must be accompanied by an incident report like any other incident, and placing some additional requirements on the contents of that incident report. This section exists not because delayed revocation is acceptable, but because delayed revocation incidents are often deliberate (rather than mistakes) and therefore require special handling.

Our legal counsel has prepared the following, which we hope will put this issue to bed.

Perhaps a brief explanation of how temporary restraining orders work is in order.

A TRO is a short-term legal order issued by a court to prevent alleged immediate harm or irreparable damage. A person or company seeking a TRO files a petition with the court, providing evidence of the alleged threat or harm. A TRO petition is a one-sided submission, and the court may decide whether to provide the temporary relief requested without any input from the other side. This is particularly true in urgent cases, as the judge may issue a TRO ex parte, meaning without notifying the other party. This is done to provide immediate protection to the requesting party, particularly when the alleged harm will occur immediately. Given the “emergency” nature of TROs, a TRO petitioner benefits from a lower burden of proof than usual, being required to show only a “reasonable fear” of harm. And in determining whether to grant a TRO, courts focus on preventing the potential harm, even if it means temporarily restricting the other parties’ rights before a full hearing can be held.

 This is what occurred here. Believing it would face dire and immediate consequences if its certificates were revoked in the 24-hour time period, Alegeus filed an ex parte motion for a TRO in Utah federal court the morning of July 30. Approximately one hour after the motion was filed, and before DigiCert could appear or provide any input, the court granted the TRO ex parte, prohibiting DigiCert from “revoking the security certificates for the Alegeus Websites for a period of seven (7) days, or until the Court is able to schedule a hearing on the Motion, whichever is earlier.” Before the court ever scheduled a hearing—which typically is scheduled 10 to 20 days after the TRO is granted—on August 2, after Alegeus’s certificates had been revoked, DigiCert and Alegeus filed a joint motion to vacate the TRO, which the court granted on August 3.

Even though DigiCert’s TOU and MSA prohibited Alegeus from taking the action it did, once it filed for a TRO and the court almost immediately granted it, DigiCert’s hands were tied. (Note that after Alegeus filed its TRO motion, the court did not (nor does any US court have to) consult the legal agreement between DigiCert and Alegeus to see if a TRO prohibition existed.) DigiCert could not disobey the court’s order without incurring legal consequences. And even if DigiCert’s TOU and MSA were somehow even more explicit in prohibiting the filing of a TRO, there is nothing DigiCert can do to prevent a TRO request being filed. Once the TRO was filed and granted here, DigiCert worked rapidly to renew Alegeus’s certificates so it could jointly with Alegeus petition the court to vacate its order which, once agreed to by the court, allowed DigiCert to immediately revoke Alegeus’s defective certificates. Under the circumstances, there was no need to challenge the legitimacy of the relief sought under DigiCert’s terms because we instead resolved the situation amicably with Alegeus.

All of that said, in a different situation and going forward, we may well respond differently. We choose not to telegraph our legal strategy in plain sight. But, to be clear, there is nothing deficient in the terms in DigiCert’s TOU and MSA.

Flags: needinfo?(tim.hollebeek)

You made a similar claim in response to one of my questions, and I asked at the time for references, which you did not answer. I'd like to repeat that request now: please cite the specific wording in current or proposed Mozilla policy which explicitly allows (as in, permits, approves of, or sanctions) delayed revocation under any circumstances.

The process for delayed revocation is found here https://wiki.mozilla.org/CA/Responding_To_An_Incident

I'd like to second Aaron Gable's belief that the document cited does not explicitly allow (as in permit, approve or, or sanction) delayed revocation. It documents requirements for incident reports about delayed revocation incidents, but I do not believe that documenting reporting requirements for delayed revocation incidents "allows" delayed revocation any more than documenting reporting requirements for other kinds of incidents "allows" those incidents.

Are there any other bases for the prior assertions that Mozilla allows delayed revocation?

As for the TRO issue, whilst I acknowledge DigiCert's decision to play their cards close to their chest, legal-strategy-wise, there are other parties involved (specifically the relying parties for DIgiCert's certificates, and the wider WebPKI community) that would prefer some assurance that TROs are not going to become the tool of choice for CA customers who have decided to ignore their legal agreements and build systems that are incapable of appropriately participating in the WebPKI.

In the absence of clear and transparent information from CAs that shows that they are taking the problem of legal action being used to side-step important technical security controls as seriously as other participants in the WebPKI community, the community as a whole will have to rely more heavily on the threat of punitive measures in the future, to motivate compliance, which is disappointing.

On the revocation language, Mozilla policy is currently being updated precisely because there is some ambiguity here. We appreciate Ben’s attempts to provide improved clarity here (https://bugzilla.mozilla.org/show_bug.cgi?id=1896053#c58) and look forward to the additional discussion he has scheduled for the February face to face. All the root programs, including Mozilla, have historically and consistently communicated that if revocation cannot be achieved within the mandated timeframe, an incident should be filed. We hope future policy updates make the situation even more clear.

On the TRO, DigiCert understands the importance of, and is committed to, adhering to Mozilla’s Baseline Requirements. The BRs include the requirement in Section 9.15 “Compliance with applicable law” that “The CA SHALL issue Certificates and operate its PKI in accordance with all law applicable to its business and the Certificates it issues in every jurisdiction in which it operates.” This can require a careful balancing of security concerns and legal obligations, such as adhering to court orders. CAs do not have the discretion to ignore lawfully issued court orders simply because a customer may be trying to use the courts to avoid complying with a compliance requirement.

We proactively disclosed the occurrence and provided a detailed incident report. We also worked to have the TRO dismissed quickly. We took this matter very seriously and acted in full accordance with the law.

It is important to appreciate that there will be legal/judicial circumstances going forward, not too different from court-ordered TROs, which are often granted without notice to CAs, that prevent CAs from immediately revoking certificates. In other words, CAs cannot disobey court orders, and it is doubtful that CAs can preempt 100% of such legal challenges with added language to subscriber agreements, or notices provided elsewhere. However, CAs can take steps to strengthen their legal posture when facing such situations. To that end, the draft of policy/guidance I'm currently working on proposes, "Subscriber Agreements: CA operators MUST include language in customer agreements requiring subscribers’ timely cooperation with revocation timelines and acknowledging the CA’s obligations to adhere to applicable policies and standards." There will be an opportunity in the near future to comment on this draft language, or something similar, in m-d-s-p. Thanks, Ben

Thank you, Ben. Looking forward to the proposed language.

(In reply to Tim Hollebeek from comment #45)

On the revocation language, Mozilla policy is currently being updated precisely because there is some ambiguity here.

Tim, forgive my frankness, but I can't let this comment pass without questioning whether you're being deliberately disingenuous or if you really do believe that "Mozilla policy" has "some ambiguity here". Could you please provide an unambiguous yes/no answer to the following question:

Q: Does DigiCert accept Aaron's interpretation in comment 42, echoed by Matt in comment 44, that delayed revocation is always a violation of the Mozilla Root Store Policy?

In my opinion, there is absolutely no "ambiguity" in the current version of the MRSP (v2.9) regarding delayed revocation. Every delayed revocation is a BR violation, the MRSP does not currently mention delayed revocation at all, Mozilla does not grant exceptions to the BR revocation requirements, and so Aaron and Matt must be correct.

The wiki page is NOT part of the MRSP; as Aaron explained, it only comes into play when a CA violates the MRSP.

We appreciate Ben’s attempts to provide improved clarity here (https://bugzilla.mozilla.org/show_bug.cgi?id=1896053#c58) and look forward to the additional discussion he has scheduled for the February face to face.

Ben's efforts are (in his words) "aimed at reducing delayed revocation". Since the MRSP always forbids delayed revocation, it's clear that he is not attempting to "provide improved clarity" on anything that the MRSP permits. Furthermore, in the MDSP post a few days ago that introduced his latest "attempts", Ben reiterated the existing expectation that "Revocation must occur promptly in compliance with the timelines set in section 4.9.1 of the TLS Baseline Requirements (TLS BRs). Mozilla does not grant exceptions to these timelines."

All the root programs, including Mozilla, have historically and consistently communicated that if revocation cannot be achieved within the mandated timeframe, an incident should be filed. We hope future policy updates make the situation even more clear.

Linking "the situation" described in that first sentence to "future policy updates" is a non-sequitur in my view, because delayed revocation is always a Mozilla Root Store Policy violation.

All of the root programs, including Mozilla, have historically and consistently communicated that they do not provide exceptions to the TLS BR revocation timelines. The mere fact that "an incident should be filed" should be enough to convince you that delayed revocation is always a policy violation! This has always been clear in the past, is still clear today, and so I don’t see how it could be made any "more clear" in the future!

Flags: needinfo?(tim.hollebeek)

First, I don't think implying that people are being deliberately disingenuous is necessary or appropriate.

I believe that delayed revocation is a violation of the Baseline Requirements as the Baseline Requirements do not contemplate revocations beyond the 24 hour/5 day timeline. I also think it violates the Mozilla policy based on the fact that Mozilla states that CAs are expected to comply with the BRs. However, there is also a long history, of which you are well-aware, of publishing procedures to be followed in the event that revocation cannot be performed in a timely manner. The Chrome team even mentioned at the recent Face to Face that in their view, the procedures are working as designed.

We continue to appreciate Ben's efforts to lead a productive discussion on potential policy changes in this area, and would prefer that people spend their efforts on moving such discussions forward.

Flags: needinfo?(tim.hollebeek)

(In reply to Tim Hollebeek from comment #49)

I believe that delayed revocation is a violation of the Baseline Requirements as the Baseline Requirements do not contemplate revocations beyond the 24 hour/5 day timeline. I also think it violates the Mozilla policy based on the fact that Mozilla states that CAs are expected to comply with the BRs. However, there is also a long history, of which you are well-aware, of publishing procedures to be followed in the event that revocation cannot be performed in a timely manner. The Chrome team even mentioned at the recent Face to Face that in their view, the procedures are working as designed.

Would this be the Portsmouth F2F over 6 months ago? It was originally portrayed by Digicert that:

It was during the revocation discussion after lunch. I said that the 5 day rule should be changed as its too short, which is causing delayed revocation bugs. The browsers said this is normal and they'd rather have the bug posted than change the revocation windows. I followed up again, asking roughly the same thing, as I was surprised by the answer. Inigio was at the front since he was leading the discussion but both Ryan and Clint answered the question. Obviously I'm paraphrasing what I recall since the meeting minutes during that discussion are non-existence. There are references to the discussion later in the minutes though. The Server Cert WG chair really needs to enforce better minute taking.

However later clarifications from the Chrome Root Program cleared up this misinterpretation to the community:

Beyond what I hope was a clear update communicating Chrome’s position at F2F 62 (slides), I went back and reviewed my minutes from F2F 60 to review the discussion described by Jeremy in Comment 20.

In reviewing them, my talking points focused on:

  • The absence of data that concludes an extended revocation window is an appropriate or necessary response for the ecosystem. (during this discussion, members of the community suggested a 15-day window was appropriate, and we still disagree with that proposal given observed behavior in recent bugs).
  • How linting might prevent the issues necessitating revocation. This holds true, today - and was further corroborated given the numerous profile conformance issues observed in the Spring. Almost all of the revocation delay bugs could have been prevented through freely available linting tools, and should have been prevented by more thorough examinations of profile alignment with the BRs.
  • The importance of automation, specifically ACME and ARI. Not only to help us move past revocation delays, but also be better prepared for the next Heartbleed-style event.
  • The opportunity for Private PKIs to address customer use cases that are not a good fit for the Web PKI.

I believe we’ve been consistent in our perspective.

If there has been a more recent F2F that has changed any of these points I would be interested in any public documentation supporting it. Frankly speaking even outside of such F2F meetings the incidents of delayed revocation have been discussed to such a degree that no CA should be ignorant of it being an absolute policy violation, and not something to be taken lightly.

The existence of procedures to deal with failure to follow policy should never be interpreted as condoning it. These aspects have been discussed in-depth, but if there is further clarification required please make it clear where the issues lie.

No, it was Seattle, and the discussion continues. It's why Ben has been setting the "next update" on delayed revocation bugs to next February, when the discussion on Mozilla's proposals will be discussed again. As I noted elsewhere, we're very appreciative of Ben's patient and professional efforts to find a path forward here.

Onward to 2025!

Nothing new here.

Going forward, we will be posting updates as DigiCert, as suggested by a proposed update to CCADB policy. This will make it clear these are official DigiCert responses.

Assignee: tim.hollebeek → dcbugzillaresponse

We think Ben is doing great work on new proposals that make Mozilla's expectations more clear in this area, and we think they will make future revocation events go much smoother.

Flags: needinfo?(bwilson)

In bug 1937210 comment 10 you wrote,

If you have additional issues with bug 1910805, please take them there.

DigiCert could have addressed the unanswered questions in this thread without requiring this post, especially after I listed them for you. The quoted comment indicates you have no such intention, so here we go.

Comment 3 asks,

"but after discussions with the relevant root programs and the community about the impact of such an action"
Can you please point to the community discussions, and also detail the discussions with root programs?

Did DigiCert ever answer this question? Can you either point to the eventual answer or answer it?

Comment 12 asks,

Can DigiCert explain when they consider the clock to have started for the 24h/5d revocation period? I presume the action item is a placeholder as well given it doesn't cover the scope of the issues.

Did DigiCert ever answer this question? Can you either point to the eventual answer or answer it?

Comment 16 asks,

Tim, can you please provide a list of the certificates that you viewed as in scope for the TRO from the District Court?

Far as I can tell, you haven’t provided that list. Please do so.

Comment 17 asks,

  1. I have a meta-question: has DigiCert reviewed previous delayed revocation incidents for any interesting questions that also apply to this incident? What are the answers? :-)
  2. How many subscribers and certificates were affected by "exceptional circumstances" (assuming, arguendo, that any such circumstances can exist)?
  3. What were those circumstances?
  4. How many subscribers claimed how many certificates were "exceptional" for which DigiCert disagreed?
  5. What analysis is there of the risk to relying parties and the public by not revoking over 83,000 certificates on time?
  6. Beyond implementing ARI -- as an optional feature? -- what will be done to prevent this situation from recurring?

I don’t see answers to any of these questions. Please answer them or point to the answers you previously gave.

Comment 44 asks,

I'd like to second Aaron Gable's belief that the document cited does not explicitly allow (as in permit, approve or, or sanction) delayed revocation. It documents requirements for incident reports about delayed revocation incidents, but I do not believe that documenting reporting requirements for delayed revocation incidents "allows" delayed revocation any more than documenting reporting requirements for other kinds of incidents "allows" those incidents.
Are there any other bases for the prior assertions that Mozilla allows delayed revocation?

From what I can see, DigiCert hasn’t answered this question. Please point to your answer or answer it.

Flags: needinfo?(dcbugzillaresponse)

The commentary about our intent is completely without merit, and also unnecessary. Thank you for posting on the correct incident this time. We will review your questions and provide a response.

Comment 3: Sorry, “and the community” was a mistake. We spoke to the root programs about the revocation. We also posted publicly that the revocation was likely delayed. The root programs who were spoken to have already addressed the nature of those conversations. See, for example, https://bugzilla.mozilla.org/show_bug.cgi?id=1910805#c7. The original bug also included information about the delayed revocation before the revocation was delayed.

Comment 12: We start the revocation clock when we receive notice of an incident sufficient to indicate a problem or, if detected internally, when we are aware that revocation is required. This is the only logical conclusion as overly vague notices may require more than 24 hours before we can even confirm there is an issue. This is consistent with the Baseline requirements section 4.9.1 that requires us to revoke when we obtain evidence or are aware of an issue.

Comment 16: We gave a full list of all certificates affected by this bug. The TRO itself covered only a single certificate. The TRO and the entity which filed it are a matter of public record. As noted in the https://bugzilla.mozilla.org/show_bug.cgi?id=1910805#c24, we will not discuss any further details surrounding the TRO.

Comment 17: As noted in comment 22, a lot of these questions are the subject of much discussion in the wider community, and DigiCert is an active participant in those discussions. Like Google, DigiCert is committed to improving agility and resilience in the WebPKI. However, some specific insights:

I have a meta-question: has DigiCert reviewed previous delayed revocation incidents for any interesting questions that also apply to this incident? What are the answers? 😊

Yes – we routinely review delayed revocation bugs. There aren’t interesting answers. Adoption of ACME slowly improves and crypto-agility is spreading. Hostility to delayed revocation is definitely the biggest change when you consider incidents such as this one: https://bugzilla.mozilla.org/show_bug.cgi?id=1715672. Delayed Revocation bugs are relatively new. Or this one: https://bugzilla.mozilla.org/show_bug.cgi?id=1800756 which didn’t really gather any attention.

How many subscribers and certificates were affected by "exceptional circumstances" (assuming, arguendo, that any such circumstances can exist)?

Unknown. By the time we parsed into exceptional vs. non-exceptional, all certificates were revoked.

What were those circumstances?

Browsers indicated that there were exceptional circumstances.

How many subscribers claimed how many certificates were "exceptional" for which DigiCert disagreed?

Unknown – we revoked all certificates before the analysis was completed.

What analysis is there of the risk to relying parties and the public by not revoking over 83,000 certificates on time?

This is a trap question. Per Mozilla’s previous policy: Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable.

Beyond implementing ARI -- as an optional feature? -- what will be done to prevent this situation from recurring?

We’ve informed customers that both Google and Mozilla have clarified that delayed revocation is not acceptable and that there is no such thing as exceptional circumstances.

Flags: needinfo?(dcbugzillaresponse)

(In reply to DigiCert from comment #58)

Comment 12: We start the revocation clock when we receive notice of an incident sufficient to indicate a problem or, if detected internally, when we are aware that revocation is required. This is the only logical conclusion as overly vague notices may require more than 24 hours before we can even confirm there is an issue. This is consistent with the Baseline requirements section 4.9.1 that requires us to revoke when we obtain evidence or are aware of an issue.

I'd like to clarify that my question was more what specific dates and times that would mean in this incident as there seems to be multiple clocks ongoing by DigiCert's analysis of the situation. For other incidents the time the clock starts on is notification of an incident occurring in the CA's environment (CPR received and read, issue confirmation is iffy per-CA), but not when full collation of the certificates occurs. If it were the latter then 'finding' the last certificate to delay the clock starting would be rather beneficial to a CA and hard for an outsider to discover.

Consider that there have been delayed revocation incidents historically where a CA has found a batch of certificates they missed but they did not start a fresh clock for them. Given that is how other CAs operate, it should be considered normal practice.

I have a meta-question: has DigiCert reviewed previous delayed revocation incidents for any interesting questions that also apply to this incident? What are the answers? 😊

Yes – we routinely review delayed revocation bugs. There aren’t interesting answers. Adoption of ACME slowly improves and crypto-agility is spreading. Hostility to delayed revocation is definitely the biggest change when you consider incidents such as this one: https://bugzilla.mozilla.org/show_bug.cgi?id=1715672. Delayed Revocation bugs are relatively new. Or this one: https://bugzilla.mozilla.org/show_bug.cgi?id=1800756 which didn’t really gather any attention.

Incidents from 2021 and 2022 are a poor metric to judge appropriate compliance in 2024 and onwards, especially given clarification in guidance has been sought and provided multiple times since then. Would it be fair to assert that these are the most recent incidents that DigiCert finds noteworthy for compliance guidance according to their internal assessments? I'm trying to figure out why these in particular were highlighted and considered worth mentioning to be clear.

How many subscribers and certificates were affected by "exceptional circumstances" (assuming, arguendo, that any such circumstances can exist)?

Unknown. By the time we parsed into exceptional vs. non-exceptional, all certificates were revoked.
...

How many subscribers claimed how many certificates were "exceptional" for which DigiCert disagreed?

Unknown – we revoked all certificates before the analysis was completed.

Given that Comment 10 makes it clear that data was gathered how is this possible? I appreciate that there may be an imperfect collection of data, but surely there must be a non-zero amount of certificates/subscribers marked with clear 'exceptional circumstances' reasons to provide a lower boundary?

What analysis is there of the risk to relying parties and the public by not revoking over 83,000 certificates on time?

This is a trap question. Per Mozilla’s previous policy: Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable.

For some clarity that non-compliant certificate quote is a partial excerpt here is the full one from the 'Responding to an Incident' Mozilla wiki from when this incident occurred:

The decision and rationale for delaying revocation will be disclosed in the form of a preliminary incident report immediately; preferably before the BR-mandated revocation deadline. The rationale must include detailed and substantiated explanations for why the situation is exceptional. Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable. When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.

Particular attention, is as always, to the last sentence. We've had incidents historically where a CA has asserted that certificates were used in air traffic control, and other dangerous scenarios. It is unfortunately not a trap question, even if ideally it were one and none of these scenarios existed.

Please understand that we do not know the environment that DigiCert's customers utilize their public certificates, and while DigiCert doesn't have full oversight they should have a rough idea given the data gathered in this exercise. Either there was risk analysis performed, or there was not.

There isn't going to be any fallout for a yes or no response to that question. We're trying to figure out how to improve things going forward and any partial analysis would be beneficial. We've been told that data was collected and a report would appear.

(In reply to Tim Hollebeek from comment #10)

Now that all of the certificates have been revoked, we're taking a careful look at the data we collected during the replacement effort. This is a good opportunity for us to provide some high-quality data about the current agility, or lack thereof, of the webpki as it exists today, and what the challenges actually are so that we can discuss pragmatic steps that improve the situation for everyone.

Did this ever occur, and if so could you point at any potential public reports that would help the community? It would be very beneficial to answering the questions that are considered 'Unknown'.

Flags: needinfo?(dcbugzillaresponse)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-02-01 → [ca-compliance] [leaf-revocation-delay] Next update 2025-03-03

In Comment 58 it’s stated:

We’ve informed customers that both Google and Mozilla have clarified that delayed revocation is not acceptable and that there is no such thing as exceptional circumstances.

For clarity, as we have not seen or heard how customers were informed, we hope that the message communicated to customers was consistent with what the Chrome Root Program has stated publicly multiple times (e.g., [1][2][3][4][5]). Primarily, that is, delayed revocations becoming routine rather than exceptional is unacceptable.

As an administrative note, we’d encourage DigiCert to err on the side of clarity and specificity when participating in the public incident reporting process - as over the past few months, we have noted examples of general statements made by DigiCert that have required further clarification.

For example, Comment 49 appears to misappropriate comments offered by Chrome at F2F 63 in response to Mozilla’s (then planned) revocation policy, where we stated disagreement with the proposal that required revocation “fire drills". To be clear, we consider the incident reporting process to be “working as intended” when (1) CA Owners file detailed incident reports that demonstrate an understanding of the issue’s root cause(s) and describe action items that convincingly convey why the precipitating issue will not reoccur, and (2) there is observed commitment to completing those actions.

When a member of the community asked for clarification on that commentary, no such discussion is presented in Comment 51, rather it is just indicated that “the discussion continues.” From our view, there is nothing left to discuss - our expectations have been made clear, and we’ve been consistent in that messaging (referenced above).

Thanks for the comments Chris! We appreciate the feedback

We’ve informed customers that both Google and Mozilla have clarified that delayed revocation is not acceptable and that there is no such thing as exceptional circumstances.

For clarity, as we have not seen or heard how customers were informed, we hope that the message communicated to customers was consistent with what the Chrome Root Program has stated publicly multiple times (e.g., [1][2][3][4][5]). Primarily, that is, delayed revocations becoming routine rather than exceptional is unacceptable.

We’ve communicated to customers a few items related to delayed revocation. First, we clarified that the Mozilla policy was being updated and pointed customers to the discussions, encouraging them to participate in the public discussion if they have strong opinions. We support Mozilla’s transparency policies and always encourage more people to take place in the discussion and policy making process. Given that we didn’t see a significant increase in people chiming in on MDSP, I’m not sure we were that effective in encouraging better participation. Second, we advised customers that exceptional circumstances must be truly exceptional. Just alleging exceptional circumstances is insufficient, even if the customer is related to critical infrastructure. Given how readily available automation is, I doubt we’ll see exceptional circumstances. Third, we’ve messaged customers that automation should be enabled by default wherever possible. Revocations need to be a non-event for certificate subscribers.

As an administrative note, we’d encourage DigiCert to err on the side of clarity and specificity when participating in the public incident reporting process - as over the past few months, we have noted examples of general statements made by DigiCert that have required further clarification.

Thank you for the feedback. We will look for ways to provide additional clarity and transparency. We did not realize our statements were getting more general. Sorry about that.

For example, Comment 49 appears to misappropriate comments offered by Chrome at F2F 63 in response to Mozilla’s (then planned) revocation policy, where we stated disagreement with the proposal that required revocation “fire drills". To be clear, we consider the incident reporting process to be “working as intended” when (1) CA Owners file detailed incident reports that demonstrate an understanding of the issue’s root cause(s) and describe action items that convincingly convey why the precipitating issue will not reoccur, and (2) there is observed commitment to completing those actions.

Thanks Chris. We did find the F2F 63 presentation enlightening. We didn’t get the impression it was working as intended so that was a misunderstanding we had on the presentation. We appreciate you clarifying that.

Flags: needinfo?(dcbugzillaresponse)

[In response to Comment 61]

Thank you for this follow-up!

We appreciate the added context provided, but we now have a bit of confusion on our end given the most recent response appears to contradict the statement made in Comment 58.

Comment 58 stated:We’ve informed customers that both Google and Mozilla have clarified that delayed revocation is not acceptable and that there is no such thing as exceptional circumstances.

Comment 61 stated:Second, we advised customers that exceptional circumstances must be truly exceptional. Just alleging exceptional circumstances is insufficient, even if the customer is related to critical infrastructure.

We do not consider the above statements to be compatible with one another (interpreted as “Chrome says delayed revocation is not acceptable and there are no exceptional circumstances.” versus “For circumstances to be considered exceptional, they must truly be so.”)

Can you help us better understand how Chrome's views were presented?

Flags: needinfo?(dcbugzillaresponse)

In resonse to comment 62

Thank you for this follow-up!

Comment 58 stated: “We’ve informed customers that both Google and Mozilla have clarified that delayed revocation is not acceptable and that there is no such thing as exceptional circumstances.”

Comment 61 stated: “Second, we advised customers that exceptional circumstances must be truly exceptional. Just alleging exceptional circumstances is insufficient, even if the customer is related to critical infrastructure.”

We do not consider the above statements to be compatible with one another (interpreted as “Chrome says delayed revocation is not acceptable and there are no exceptional circumstances.” versus “For circumstances to be considered exceptional, they must truly be so.”)

Can you help us better understand how Chrome's views were presented?

Sorry for the confusion! After the Google presentation during the CABF meeting and updated language in the Mozilla policy, DigiCert decided that critical infrastructure, by itself, is insufficient to constitute exceptional circumstances. We communicated this decision to customers. In the past, both we and other CAs believed that critical infrastructure and an allegation that revocation would disrupt services related to the critical infrastructure was a sufficient reason to delay revocation. The updated language, discussion in other recent bugs, presentation at the CABF F2F meeting, and the updated Mozilla policy clarified that this is not the case. We believe this messaging aligns with the Mozilla and Google policies, both before this incident and after.

Our communication to customers stated that delayed revocation would not be acceptable, even if the customer alleged critical infrastructure, without their providing additional proof that automation was not possible as well as details on the potential damage caused by the revocation.

With the growing adoption of ACME and other automation for domain validation and lifecycle management, as well as addition of ARI, delayed revocation will become less of a problem. We hope that the community will be receptive to evaluate “exceptional circumstances” should they arise in future, to further improve automation.

Flags: needinfo?(dcbugzillaresponse)

Will there be a response to Comment 59 posted February 1st?

In response to comment 59

I'd like to clarify that my question was more what specific dates and times that would mean in this incident as there seems to be multiple clocks ongoing by DigiCert's analysis of the situation. For other incidents the time the clock starts on is notification of an incident occurring in the CA's environment (CPR received and read, issue confirmation is iffy per-CA), but not when full collation of the certificates occurs. If it were the latter then 'finding' the last certificate to delay the clock starting would be rather beneficial to a CA and hard for an outsider to discover.

Consider that there have been delayed revocation incidents historically where a CA has found a batch of certificates they missed but they did not start a fresh clock for them. Given that is how other CAs operate, it should be considered normal practice.

To clarify, we always start a clock when we identify a certificate that requires revocation regardless of whether a CPR was submitted internally or externally. If submitted externally, we start the clock when we receive sufficient information to confirm an issue. This is typically after receiving the first CPR but some can be so vague that we can’t tell what the issue is. While investigating, if we find a wider problem than first indicated in the original CPR, we start a new clock for the expanded certificates. This does not modify the clock for all certificates associated with the previous CPR. We believe this is a logical approach as an investigation into problem A could turn up problem B that was not part of the original issue. Our timeline is to kick off a thorough investigation of all impacted certificates and pull the data as soon as an incident is identified.

Incidents from 2021 and 2022 are a poor metric to judge appropriate compliance in 2024 and onwards, especially given clarification in guidance has been sought and provided multiple times since then. Would it be fair to assert that these are the most recent incidents that DigiCert finds noteworthy for compliance guidance according to their internal assessments? I'm trying to figure out why these in particular were highlighted and considered worth mentioning to be clear.

This response was answering the meta question asked. We’ve reviewed a lot of delayed incident bugs and most are not very informative about the measures CAs can take to prevent delayed revocation (other than simply not delaying revocation). The examples cited show that, until recently, delayed revocation bugs did not receive the same scrutiny or hostility as they do now. The norm for delayed revocations bugs is much different now than it was in 2001 although the reasons for delay haven’t changed much.

For some clarity that non-compliant certificate quote is a partial excerpt here is the full one from the 'Responding to an Incident' Mozilla wiki from when this incident occurred:

The decision and rationale for delaying revocation will be disclosed in the form of a preliminary incident report immediately; preferably before the BR-mandated revocation deadline. The rationale must include detailed and substantiated explanations for why the situation is exceptional. Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable. When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.

Particular attention, is as always, to the last sentence. We've had incidents historically where a CA has asserted that certificates were used in air traffic control, and other dangerous scenarios. It is unfortunately not a trap question, even if ideally it were one and none of these scenarios existed.

Please understand that we do not know the environment that DigiCert's customers utilize their public certificates, and while DigiCert doesn't have full oversight they should have a rough idea given the data gathered in this exercise. Either there was risk analysis performed, or there was not.

There isn't going to be any fallout for a yes or no response to that question. We're trying to figure out how to improve things going forward and any partial analysis would be beneficial. We've been told that data was collected and a report would appear.

(In reply to Tim Hollebeek from comment #10)

Now that all of the certificates have been revoked, we're taking a careful look at the data we collected during the replacement effort. This is a good opportunity for us to provide some high-quality data about the current agility, or lack thereof, of the webpki as it exists today, and what the challenges actually are so that we can discuss pragmatic steps that improve the situation for everyone.

Did this ever occur, and if so could you point at any potential public reports that would help the community? It would be very beneficial to answering the questions that are considered 'Unknown'.

Tim has reviewed this information, but is out of the office this week. He will post a response with some findings regarding these questions when he returns next week.

If you look at the timing on comment 10, we hadn't even gotten the incident report up at that point, and a lot has happened in the last six months.

We get asked about delayed revocation reasons a lot, and we have looked at the responses quite a bit, but the reason we haven't posted much about them is because the results are not particularly enlightening. The responses are self-reported under very tight time constraints by organizations that have a strong interest in one particular outcome. In this particular case, that timeline was 24 hours, so the timeline was even shorter than usual. We have significant concerns about whether the collected information is actually reliable.

While we appreciate all the work that went into the exception process, we've basically reached the same conclusion as the Mozilla community, which is that this process did not ever work particularly well. The community has made it clear that there are no legitimate reasons for delaying revocation, and we accept that.

(In reply to DigiCert from comment #58)

What were those circumstances?

Browsers indicated that there were exceptional circumstances.

Please explain this statement. Are you saying that browser vendors told you that for some or all of the affected certificates there were circumstances that prevented on-time revocation?

Which browser vendors? What did they say about exceptional circumstances? To which certificates did these circumstances apply? What due diligence did DigiCert do about these claims?

And why didn’t you mention this before comment 58?

Flags: needinfo?(dcbugzillaresponse)

(In reply to DigiCert from comment #57)

The commentary about our intent is completely without merit, and also unnecessary.

On January 14, 2025 bug 1937210 comment 9 pointed to ten questions from this bug that had not been answered. On January 20, in bug 1937210 comment 10, DigiCert replied to that comment, adding, “If you have additional issues with bug 1910805, please take them there.” On January 25, my comment 56 quoted that comment and said,

DigiCert could have addressed the unanswered questions in this thread without requiring this post, especially after I listed them for you. The quoted comment indicates you have no such intention, so here we go.

DigiCert did indeed indicate that it had no intention to address these questions until they were specifically pointed out in this bug, despite DigiCert being made aware of the unanswered questions on January 14. That means my comment about intent was merited.

CCADB’s “Incident Reports” gives the direction,

Once the report is posted, CA Owners should respond promptly to questions that are asked, and in no circumstances should a question linger without a response for more than one week, even if the response is only to acknowledge the question and provide a later date when an answer will be delivered.

Eleven days passed between DigiCert becoming aware that these questions remained unanswered and comment 57. This gap, combined with the dismissive remark from bug 1937210 comment 10, indicates that DigiCert indeed had no intention of addressing these unanswered questions, despite knowing they remained unanswered. This choice is contrary to CCADB guidance for incident management and the spirit of Bugzilla. I had hoped that pointing out this error in judgement would help DigiCert recognize the error and do better next time. Therefore, my comment about intent was necessary.

I don’t want to be pedantic, but I do think these points matter. The comment quoted at the top of this post, without full context, might give the false impression that the original statement was inappropriate or of no import. As my statement in comment 56 was both relevant and material, this kind of dismissive response doesn’t belong in this context.

On comment 68, we are obviously not going to comment further on our communications with browsers without their approval. We would be happy to identify the root program that made that comment if they wish to go public.

Several of the implications made in comment 68 are unsubstantiated and false, and we ask that our competitor stop making assertions about what we did or did not know, and what our intentions were. Such unfounded assertions are extremely unhelpful.

Flags: needinfo?(dcbugzillaresponse)

I don't know why we're trying to act like which root program was spoken to is some major secret. Comment 7 already made it very clear that DigiCert have had conversations with the Chrome Root Program. The Chrome Root Program are also very clear that the ~70 certificates impacted by the TRO, and "several major DigiCert customers" require explanations if they went into delayed revocation. This is also what is required by the Mozilla Root Store Program, and we've discussed this part to death. Even in Comment 41 DigiCert quotes the requirement that:

When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.

What has been lacking are any explanations beyond the TRO. We'll put those ~70 certs aside and instead focus on the 83k other certs that should have been revoked in 24h and instead got pushed to 5 days and still lack the most basic required information. The Lessons Learned in Comment 11 is pertinent in my opinion:

The industry needs a better notification process . Some customers thought the email notification was phishing. Others claimed not to receive it. We added banner messaging to the system, but not all customers logged into their accounts during the 5-day period, and those that did not missed the banner message.

This tells me that a large portion of customers did not respond at all, and that DigiCert took this lack of response as implied exceptional circumstances during this incident. Could DigiCert confirm if this presumption reflects reality? How will DigiCert handle a non-response going forward?

We're trying to find some way forward so that if this happens again, bar the TRO, revocations actually occur as required under the baseline requirements. This kind of discussion requires admitting that a fault has occurred somewhere in your practices for any remedy to be created.

Let us not forget that in the legal weeds of ~70 certs being legally barred from revocation, ~83k others were intentionally left alone. We need action items to show how this will not happen again. To date we only have one action item, implementing ARI, but I see no comment even mentioning that work being complete. We're approaching 3 months past its due date. The response to this incident has been a mess to be frank and an updated report trying to clarify DigiCert's ongoing actions would be beneficial.

It should not take 7 months of persistent questioning to get the barest of information out of a CA.

(In reply to DigiCert from comment #58)

The TRO itself covered only a single certificate.

You repeat this claim in bug 1950144 comment 12:

The TRO affected only 1 certificate

Please explain this statement.

The original filing on 7/30/24 refers to,

the security certificates (collectively, the “Security Certificates”) for 72 of its websites

The 7/30/24 order from the District Court in Salt Lake City says,

IT IS HEREBY ORDERED that DigiCert is prohibited from revoking the security certificates for the Alegeus Websites for a period of seven (7) days, or until the Court is able to schedule a hearing on the Motion, whichever is earlier.

Certificates. Plural.

The timeline says:

July 30 2024 11:33 - DigiCert receives notice customer has filed for a Temporary Restraining Order (TRO) against revocations.

Revocations. Plural.

Later, comment 43 says,

on August 2, after Alegeus’s certificates had been revoked, DigiCert and Alegeus filed a joint motion to vacate the TRO, which the court granted on August 3.

Certificates. Plural.

I hope it’s obvious why the statement in comment 58 was surprising.

Question 1: Where did DigiCert get the information that a single certificate was in scope for this order? Please point to the specific wording or source of this information or explain the detailed rationale that led to this conclusion.

Question 2: What is that certificate? Please provide a crt.sh link.

Question 3: What changed between the posting of comment 43 and the posting of comment 58 that led to this new conclusion?

Question 4: Why was comment 58 the first time you mentioned this change?

Flags: needinfo?(dcbugzillaresponse)

(In reply to comment 69)

Several of the implications made in comment 68 are unsubstantiated and false, and we ask that our competitor stop making assertions about what we did or did not know, and what our intentions were. Such unfounded assertions are extremely unhelpful.

Question 5: We agree that false assertions have no place on Bugzilla. Please point out the specific assertions you’re referring to.

In response to Wayne in comment 70:

We heard from a lot of customers before the 24 hours. Many of the ones we heard from claimed not to receive the initial communication and heard about the issue through reddit or another channel. We received about 500 comments requesting delayed revocation when people heard there might be a delay for exceptional circumstances. At that point, we made a decision to delay them all and stopped collecting exceptional circumstance emails. We did not review the comments collected before revoking the certificates as the justifications provided were not part of the delay. These comments are unlikely to have included truly exceptional circumstances. Once we established one or more customers had exceptional circumstances, we delayed all certificate revocations.

Below is a revised incident report that updates the Summary, Root Cause Analysis, Lessons Learned, and Action Items sections of the prior Incident Report and that provides additional information as requested by you and other posters. Hopefully this gives you the information you requested. We acknowledge that a violation of the BRs occurred, that DigiCert’s fault was it delayed revocation, and that the remedy is to prevent all delayed revocation.

Revised Incident Report

Summary

Digicert posted a bug due to an issue where an underscore was not appended to the start of a random value when using CNAME for domain control validation https://bugzilla.mozilla.org/show_bug.cgi?id=1910322.

DigiCert was working to revoke all of the certificates in 24 hours. DigiCert decided to delay revocation until 120 hours, making the delay 96 hours for all certificates impacted by the bug.

Root Cause Analysis

The delay was due to the inability of many non-automated subscribers to replace affected certificates within 24 hours, and DigiCert’s decision that some of the customers had exceptional circumstances near the end of the 24 hours. Although we received delay requests during the 24 -hour period, we denied all such requests regardless of reason provided until, after consulting with the root programs, the decision was made that some customers have exceptional circumstances. Once we decided to delay revocation for some certificates due to exceptional circumstances, there was a flood of requests for delays. Although we tried to review all incoming emails, parsing through the volume and confirming exceptional circumstances was not practical during the time frame. Because we could not determine true exceptional circumstances, DigiCert delayed the entire set of certificates. The lack of automation by customers was a factor in a number of “exceptional circumstances” reported.

Lessons Learned

  • Exceptional circumstances is not defined, nor is it a consistent standard. Therefore, we do not believe any customer can meet this standard going forward. Any delay is a violation of the BRs.
  • If a CA indicates in any way that delayed revocations may be possible, for any reason, the CA will immediately be inundated with requests.
  • DigiCert needed a better notification process. Some customers thought the email notification of revocation was phishing. Others claimed that they did not receive it. DigiCert provided notices through other mechanisms as well, including in the product and blog post from our CEO. However, there was confusion about why the revocation had to occur within 24 hours and whether this was a legitimate communication..
  • Some customers that use automation are still unable to replace certificates within 24 hours due to failures within their system and the lack of ARI support.

What went well

  • DigiCert kept the community informed during the revocation.
    What didn't go well
  • Digicert did not revoke any of the affected certificates within the required 24 hour timeframe.
  • Some notifications were not received so some customers were unaware of the revocation
  • Most customers still don’t use automation or require a manual step or policy approval before kicking off the automated install.
  • Many customers requested delayed revocation where no exceptional circumstances existed, overwhelming our ability to review requests for “legitimacy” in a short timeframe.

Action Items

  • Clarify messaging to customers that DigiCert will not delay revocation beyond the timeframes required by the Baseline Requirements and that there is no such thing as exceptional circumstances.
  • Proselytize, promote and incentivize automation and ARI in order to prepare all customers to be able to replace their certificates within a maximum of 24 hours for any reason.
  • Other reasonable items that are suggested by the community that it believes could help prevent delayed revocations.

In response to Tim Callan in comment 71

Tim, the correct number of certificates affected by the TRO is 72. Re-reading the old posts, we see there was confusion in our posts. We apologize for the error. The single certificate stated in comment 58 and 1950144#c12 was supposed to be the number of customers covered by the TRO, not the number of certificates. There was one customer who filed the TRO and that customer had 72 impacted certificates. Again, apologies for the confusion on the number of certificates compared to the number of customers under the TRO. We've attached the list of these certificates.

Flags: needinfo?(dcbugzillaresponse)
Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-03-03 → [ca-compliance] [leaf-revocation-delay]

Response to comments 71 and 72 :

We believe all questions in comment 71 were answered by our previous post but wanted to reiterate to ensure proper tracking:

Question 1: Where did DigiCert get the information that a single certificate was in scope for this order? Please point to the specific wording or source of this information or explain the detailed rationale that led to this conclusion.

The one certificate comment was a mistake. The TRO was filed by a single company but covered 72 certificates.

Question 2: What is that certificate? Please provide a crt.sh link.

Provided

Question 3: What changed between the posting of comment 43 and the posting of comment 58 that led to this new conclusion?

A mis-communication on the scope of the TRO. One customer, many certs.

Question 4: Why was comment 58 the first time you mentioned this change?

As stated above, there was a mis-communication on the bug about the number of certificates compared to the number of entities covered by the TRO.

Question 5: We agree that false assertions have no place on Bugzilla. Please point out the specific assertions you’re referring to.

We think this is irrelevant to the current bug, which is focused on the delayed revocation.

Closure Summary

Incident description

DigiCert delayed revocation of the certificates effected on bug 1910322

Incident Root Cause(s)

DigiCert’s delayed revocation beyond the 24 hour requirements of the BRs..

Remediation description

DigiCert provided active communication to promote adoption of automated solutions, including ACME, posted that exceptional circumstances and are not defined enough for any customer to ever meet the bar and acknowledged that exceptional circumstances are not part of the baseline requirements, and promoted the use of ARI to its customer-base. DigiCert revoked all impacted certificates.

Commitment summary

We have updated our internal incident management processes to remove any reference to critical infrastructure, and “exceptional circumstances.” We implemented ARI for customers to use in automatically replacing certificates impacted by an incident. We are working with our customers to encourage and assist them in adopting automated solutions for certificate management.

As all action items have been completed and all questions have been answered, we request closure of this bug.

I will take a look at closing this during the week of Mar. 24-28, 2025.

Flags: needinfo?(bwilson)

(In response to comment 74)

Question 5: We agree that false assertions have no place on Bugzilla. Please point out the specific assertions you’re referring to.

We think this is irrelevant to the current bug, which is focused on the delayed revocation.

This is surprising. Were you of the opinion that it was relevant to this bug when you posted comment 69? Do you now agree that my assertions in comment 68 were in fact true and accurate? When and why did your opinion change?

Flags: needinfo?(dcbugzillaresponse)

In response to comment 77:
Again, this is not relevant to the topic of this bug, which is delayed revocation. All questions relevant to the topic of this bug have been answered and all action items have been completed.

Flags: needinfo?(dcbugzillaresponse)

According to this bug, DigiCert has implemented ARI. Via DigiCert's documentation I found the ARI endpoint https://one.digicert.com/mpki/api/v1/acme/v2/renewal-info. However, when I try to query this endpoint for various DigiCert certificates (I tried digicert.com, trusted-root-g4.chain-demos.digicert.com, and facebook.com) I get a 400 Bad Request error:

Could DigiCert please clarify the status of their ARI implementation?

Flags: needinfo?(dcbugzillaresponse)

DigiCert's ARI is deployed and is working as designed. Since the certificate in questions was not renewed via a ACME enabled process, you received a 400 error. Our chain demo site currently uses DigiCert automation instead of ACME because of the need for automated deployment of expired and revoked certificates. We are in the process of moving the chain demos for valid certificates to ACME instead of DigiCert automation. However, not all ecosystems may support ACME (such as Imperva) wherein we have run into obstacles.

You (Andrew) should be able to see it working here: https://one.digicert.com/mpki/api/v1/acme/v2/renewal-info/ak5Qv5honVt7IHXUWQF5SGaSMgY.BElCa2yC4fcsQwHPGGMBtw==. This is probably obvious, but if the certificate is not part of our ecosystem (ie - the DigiCert ACME service is not managing that certificate) then the DigiCert ACME ARI won't renew the certificates. We are returning a 400 error where renewal is not authorized. We understand that some others may be returning another error code such as Let's Encrypt returns a 404 instead. If you have recommendations on a better alignment of error code, please let us know.

Flags: needinfo?(dcbugzillaresponse)

(In response to comment 74)

The one certificate comment was a mistake. The TRO was filed by a single company but covered 72 certificates.

In comment 58, on January 31, DigiCert wrote,

The TRO itself covered only a single certificate.

On February 28, bug 1950144 comment 12 states,

The TRO affected only 1 certificate

I get that mistakes happen. What is unsettling about this mistake, though, is that the exact same mistake occurred twice on two entirely different incident threads 28 days apart in time. It’s hard to blame that on one of those unthinking errors that busy people can make when they are trying to get too much done in a day. This is a persistent misunderstanding of a basic fact that made its way into official DigiCert posts over a significant amount of time.

Which leads me to ask about the quality processes DigiCert has in place for its official online reporting.

Question 6: Does DigiCert include a fact checking or peer review step as a procedural part of its Bugzilla posting process?

Question 7: How is it that this same misunderstanding of a basic fact of the subject matter on which you were posting persisted across multiple incidents and a full month of time?

These are material questions as the public incident reporting process depends on CAs providing accurate information. Failing to guard against meaningful factual errors in posting – even those made in good faith – undermines the community’s power to enforce quality, evaluate CAs, and learn lessons. Though only DigiCert can say for sure, we appear to have such a failure here. It’s worth understanding what went wrong and how to guard against it in the future.

For what it’s worth, Sectigo has a procedure whereby all posts we make on our open bugs must be peer-reviewed by at least one of a panel of expert reviewers, and in practice nearly all posts we make on other CAs’ bugs as well. These aren’t just proofreaders; they are team members who are intimately familiar with our public CA and compliance operations. Prior to posting complex comments, authors frequently solicit multiple reviews, and sometimes these comments go through several rounds before we deem them ready for posting. While no process is perfect, this has worked well for us, and we recommend it to any CA.

Flags: needinfo?(dcbugzillaresponse)

Question 6: Does DigiCert include a fact checking or peer review step as a procedural part of its Bugzilla posting process?

Yes.

Question 7: How is it that this same misunderstanding of a basic fact of the subject matter on which you were posting persisted across multiple incidents and a full month of time?

The initial question was read incorrectly. The reviews interpreted your question as the number of organizations covered by the TRO, despite you asking about the number of certificates in both questions.

Flags: needinfo?(dcbugzillaresponse)

Thanks for the info in Comment 80. It's worth noting that Sectigo's ARI endpoint returns information about all Sectigo certificates, even those not issued using ACME. This is rather valuable, since it allows certificate monitoring tools to raise an alarm when renewal is imminently needed. Such tools can be deployed by subscribers on a faster time scale than full ACME automation and can help mitigate the impact of mass revocation by providing another way to notify subscribers. Considering the difficulty that DigiCert faced when notifying subscribers about this incident (per Comment 11), I think it would be a very good idea for DigiCert's ARI endpoint to support all certificates, not just ACME-issued ones. Is that planned?

Flags: needinfo?(dcbugzillaresponse)

(In reply to comment 43)

Sorry to return to such an old comment, but we have not had the chance to properly discuss this TRO. There are still loose ends here we need to tie up.

Even though DigiCert’s TOU and MSA prohibited Alegeus from taking the action it did

Perhaps you can help me here. We all know DigiCert has refused to produce the TOU, so we don’t know what that document did specifically to protect DigiCert’s right to revoke. However, Alegeus was kind enough to provide the MSA in its filing. I have read it and can spot no reference to CA-initiated or mandatory revocation, nor any reference to Temporary Restraining Orders.

Question 8: Please point to the specific language in the MSA that prohibited Alegeus’s actions. If any explanation is needed to understand what actions it was prohibiting and how, provide that color commentary as well.

In response to comment 83

That is not planned at this time.

In response to comment 84

“We all know DigiCert has refused to produce the TOU”

This is a false statement. Our TOU is publicly available and prominently displayed on our legal repository. The TOU is, at the time of this writing, the first document listed. See section 18.j. for the wording that addresses your question.

@Ben, as we’ve posted a Closure Summary and again answered all questions, we request this bug be closed.

Flags: needinfo?(dcbugzillaresponse)

I'll take a look at closing this on Wed. 2-Apr-2025.

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)

(In reply to DigiCert from comment #85)

In response to comment 84

“We all know DigiCert has refused to produce the TOU”

This is a false statement. Our TOU is publicly available and prominently displayed on our legal repository. The TOU is, at the time of this writing, the first document listed. See section 18.j. for the wording that addresses your question.

You made it explicitly clear that the community cannot work on the assumption that your published TOU contains the same language that bound Alegeus when the order was issued. Your first response to the question of such language was the phenomenally unhelpful 1910322 comment 33, which simply stated,

our terms are easily found on our website

…without doing readers the courtesy of including it in the bug. So I went to the website and in comment 23 of this bug included the language I found there, specifically section 2.7, Certificate Revocation. My question 3 in that comment was to find out if this language bound Alegeus at the time. A straightforward question easily answered.

Instead, in comment 24 DigiCert replied,

Our legal counsel has advised us that, related to this TRO and the incident involving Alegeus, we are not permitted to specify anything beyond what was publicly filed. We are happy to answer questions related to the incident itself, but any specific questions on the TRO will need to be referred to our legal counsel.

The next mention we see from DigiCert is six weeks later in comment 43, which includes a rather generic explanation of Temporary Restraining Orders but does not include the requested language. It opens with,

Our legal counsel has prepared the following, which we hope will put this issue to bed.

Combined with comment 24, DigiCert is communicating that it has no intention to provide this language.

In the very next comment, comment 44, a community member calls out DigiCert’s generally unforthcoming treatment of this topic, saying in part,

whilst I acknowledge DigiCert's decision to play their cards close to their chest, legal-strategy-wise, there are other parties involved (specifically the relying parties for DIgiCert's certificates, and the wider WebPKI community)...

DigiCert’s reply in comment 45 includes some light discussion of the BR requirement to follow the law but offers no new information, such as the language that at this point had been requested nearly four months prior. Then up through comment 85 DigiCert makes no mention I can find of revocation language or the TOU.

In summary: DigiCert refused to produce the TOU quite explicitly in comment 24 and has not produced it subsequently, up to the time of this post.

In comment 69 you wrote,

unfounded assertions are extremely unhelpful.

We agree.

Question 9: Please point to the specific reference on Bugzilla prior to comment 84 where you provided the TOU that bound Alegeus at filing time.

Question 10: Do you now mean to tell us that the TOU in your legal repository today was in effect for Alegeus at filiing time?

@Ben, as we’ve posted a Closure Summary and again answered all questions, we request this bug be closed.

I will point out that the very comment to which DigiCert is explicitly replying as it makes this unfounded assertion contains a clear, labeled question that DigiCert didn’t even pretend to attempt to answer. Frankly, this is insulting to the Mozilla community and the Bugzilla incident reporting process.

Question 11: When you posted comment 85, were you aware that the very comment you were replying to contained an unanswered question that was labeled as such?

Ben, considering DigiCert’s extensive track record on this bug of failing to answer questions and making patently false statements, I propose that we need be in no hurry to close this bug. There is more to come. Comment 84 starts by saying,

Sorry to return to such an old comment, but we have not had the chance to properly discuss this TRO. There are still loose ends here we need to tie up.

This statement remains true.

Flags: needinfo?(dcbugzillaresponse)

In response to comment 87

Tim, you point out:

You made it explicitly clear that the community cannot work on the assumption that your published TOU contains the same language that bound Alegeus when the order was issued. Your first response to the question of such language was the phenomenally unhelpful 1910322 comment 33, which simply stated,

our terms are easily found on our website
…without doing readers the courtesy of including it in the bug

So, you acknowledge that the question was answered with the required information, even though the document was not submitted. The response contained the link to the TRO filing which, as you’ve pointed out in comment 36, contained the MSA which references the TOU. The TOU can easily be found on our legal repository and was available at the time of Tim’s original post. Web Archive links provided for ease of access:

Our TOU, which we refer to as the Certificate Terms of Use in our MSA, have not changed between now and when Alegeus filed its motion for a TRO on July 30, 2024. For ease of reference, here is a link to our Certificate Terms of Use: www.digicert.com/certificate-terms.

We could have provided the link as an attachment, but that doesn’t make your previous statement less false. The question was answered. You keep asking the same question that was already answered.

To ensure the ease of tracing questions to answers:

Question 8: Please point to the specific language in the MSA that prohibited Alegeus’s actions. If any explanation is needed to understand what actions it was prohibiting and how, provide that color commentary as well.

The MSA incorporates the TOU by reference. In the TOU, see section 18.j. for the wording you’re looking for. Again, for ease of reference, Section 18(j) states: “DigiCert may revoke a Certificate without notice for the reasons stated in the CPS, including if DigiCert reasonably believes that: ...(j) Industry Standards or DigiCert’s CPS require Certificate revocation, or revocation is necessary to protect the rights, confidential information, operations, or reputation of DigiCert or a third party.”

Flags: needinfo?(dcbugzillaresponse)

Please note that as an action item on Bug 1957499, we are reviewing this Bug and all of our other open Bugs to ensure that we have fully answered each posted question. Our goal is to complete that review by the end of the week and to post any outstanding responses as quickly as possible thereafter.

Flags: needinfo?(bwilson)

We have completed our review of this bug for unanswered questions pursuant to our action item in Bug 1957499. We believe that we answered all explicit questions. However we identified two comments which may deserve further response.

The first is Comment 30 from Tim Callan.

We already addressed Tim’s first two bullet points. The fifth bullet would be improper because it punishes Subscribers for exercising their legal rights. The third and fourth bullets were touched on during the recent CA/B Forum face-to-face meeting in Tokyo and merit additional discussion:

  • Add unambiguous language to all enterprise MSAs stating DigiCert’s right to revoke certificates at any time for any reason on any timeline.
  • Add language to all enterprise MSAs that any attempt to use the legal system to subvert proper CA behaviors is breach of contract.

The first bullet language is already a component of our MSA. Our MSA, as defined in the MSA itself, incorporates this right explicitly from our Certificate Terms of Use (https://www.digicert.com/content/dam/digicert/pdfs/legal/Certificate-Terms-of-Use.pdf).

As to the second bullet point, we have discussed with our Legal team. They have advised that this would not be a legally effective or appropriate measure. Courts will not enforce agreements of private parties that one party or both cannot seek recourse through the legal system for judicial review of matters relevant to the contract or the relationship or interactions of the parties. What Tim is asking here is just not legally feasible; and, if attempted, it could jeopardize the enforceability of the entire MSA.

The second is Comment 33 from Matt Palmer. Our response in Comment 35 addressed most of Matt’s comments and questions. However one portion was left out of that response:

A few thoughts come to mind from this paragraph:

  1. I feel that the incident timeline is lacking in sufficient detail to reflect the situation you're describing. Can you break down the 20 hours that these separate steps were in process into their constituent parts? That is, how much of the 20 hours went to each of "Finding BI team", "BI team finds impacted certificates/subscribers", "Reviewing data to sending emails", and "sending emails"?
  2. If the "BI team" (business intelligence, I assume?) is a critical part of responding to a misissuance incident, presumably one useful action item would be to make sure that sufficient members of that team are available in a timely fashion -- or else modifying processes to allow other suitably qualified individuals to access the data they need.
  3. If "sending emails" was a significant part of the 20 hours, perhaps a useful action item would be to review how much mail sending capacity is available to DigiCert at short notice.

We’re looking into these items. Regarding item #1, pulling the certificate data, drafting the Subscriber communication, sending the emails, etc. are all very human intensive endeavors involving multiple teams which may not have specific timestamps the way a system function does. Pulling together the detailed timeline to give a full breakdown is difficult, but we are reviewing potential bottlenecks with the involved teams to improve the response speed. We will report back on these findings by next week, and, if appropriate, update the action items on this Bug accordingly.

Action Item Update

In reviewing this Bug last week for unanswered questions, we looked deeper into comments made by Matt Palmer regarding the activities undertaken during the first 24 hours of this incident, seeking to identify any bottlenecks.

We found that the biggest bottleneck in the early stages of incident response is getting a complete list of affected certificates. DigiCert places great emphasis on providing accurate information to the community in a timely manner, and has adopted a cautious approach knowing that incomplete data has played a controversial role in some Bugzilla discussions.

Often, incidents require investigations that may involve technical aspects of what is in a certificate, as well as the process of how a certificate was issued, and often include “corner cases” that are outside the daily experience of our Business Intelligence (BI) team.

The BI team defers somewhat to the Compliance team’s expertise to define the right set of data to pull. This coordination across teams has slowed response time.

To address this, as an additional action item to this Bug, DigiCert will provide additional training to our BI team on certificate profile parameters and how to pull very specific lists of certificates, based on various criteria, from the database. This will increase our ability to respond quickly in some incident events.

Action Item Update

We’ve completed the review of our data analysis process that supports our certificate reporting procedures. We typically use the Compliance Data team, who are most adept in the technical searches involved in certificate incidents, but on occasion those resources are tied up in the course of audit and internal audit activities. In order to ensure the ability to rapidly pull accurate data, we will cross train two other teams (Data Engineering and Business Intelligence) involved in data collection and management at DigiCert.
In order to address the bottlenecks that have been experienced in the past we will take the following actions:

  1. Cross train the other two teams in pulling the kind of detailed certificate data required. We will begin with the Data Engineering team because they’ve already worked closely with the Compliance Data team so are already generally familiar with the data structures required.
    a. Target for completion: 30 May 2025
  2. Set a target SLA of 6 hours, regardless of the time or day of the week, to provide defined certificate datasets required in support of an incident report (acknowledging that the evolving understanding of an incident may require multiple different datasets).
  3. As part of initial and ongoing training, we will routinely simulate incident response scenarios wherein they are asked for a very particular set of certificate data. Their resulting reports will be cross-checked by the Compliance Data team to ensure accuracy and address any additional training needed.

Action Item Update
We are executing on the plan outlined in comment 92. Cross-training (action item #1) has started and is expected to be completed on schedule. Additionally, we plan to perform a test (action item #3) to verify that training is sufficient so that we achieve the defined SLA (action item #2).

Please set a nextUpdate of May 30th, so that we can provide a report on that date of the results of the training and test.

Whiteboard: [ca-compliance] [leaf-revocation-delay] → [ca-compliance] [leaf-revocation-delay] Next update 2025-05-30

Action Item Update

Cross-training of the teams has been completed. We ran our first unannounced test of a data pull and required our data team to complete the exercise without assistance from compliance experts. The data team successfully returned the information within the SLA. We are planning monthly unannounced tests of increasing complexity going forward, both to maintain the new skills and identify any gaps where additional training might be helpful as part of our continuous improvement processes.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-05-30 → [ca-compliance] [leaf-revocation-delay]

Report Closure Summary

Incident Description:

DigiCert filed Bug 1910805 to report a delayed revocation incident related to Bug 1910322. The original incident involved more than 80,000 certificates that were impacted by a validation flaw where an underscore prefix was not properly appended to random values during CNAME-based domain validation. 

Under the CA/Browser Forum TLS Baseline Requirements, DigiCert was required to revoke all affected certificates within 24 hours of discovering the non-compliance. However, DigiCert delayed revocation to 120 hours (i.e., 96 hours overdue).

Incident Root Cause(s):

The delay was due to the inability of many non-automated subscribers to replace affected certificates within 24 hours, and DigiCert’s decision that some of the customers had exceptional circumstances near the end of the 24 hours. Although we received delay requests during the 24-hour period, we denied all such requests regardless of reason provided until, after consulting with the root programs, the decision was made that some customers had exceptional circumstances. Once we decided to delay revocation for some certificates due to exceptional circumstances, there was a flood of requests for delays. Although we tried to review all incoming emails, parsing through the volume and confirming exceptional circumstances was not practical during the time frame. Because we could not determine true exceptional circumstances, DigiCert delayed the entire set of certificates until 120 hours. The lack of automation by customers was a factor in a number of “exceptional circumstances” reported.

Remediation Description:

DigiCert has implemented the following remediation actions to address the root causes: 

  • Enhanced mass revocation procedures: Developed mass revocation plans, a process that is being expanded under the updated Mozilla Policy requirements. 

  • Improved personnel training and capacity: In response to community concerns, we expanded the number of trained personnel capable of executing data queries and notification operations. Implemented regular training drills to ensure all personnel maintain proficiency in mass revocation procedures. 

  • Customer automation advocacy: Launched enhanced outreach programs to promote and incentivize customer adoption of automated certificate management, such as ACME and Automated Renewal Information (ARI) capabilities. 

  • Clear policy communication: Updated customer communications to explicitly state that DigiCert will not delay revocation beyond timeframes required by the TLS Baseline Requirements. 

Commitment Summary:

DigiCert commits to the following ongoing initiatives beyond the specific remediation actions: 

  • Industry leadership on automation: Continue promoting industry-wide adoption of automated certificate management through marketing, educational content, and technical support to reduce ecosystem dependence on manual certificate replacement processes. 

  • Mass revocation preparedness: Maintain and regularly test documented mass revocation procedures through periodic drills and scenario planning exercises to ensure organizational readiness for future incidents. 

  • Community engagement: Actively participate in industry discussions about mass revocation best practices and contribute lessons learned to help establish consistent approaches across the CA ecosystem. 

  • Transparency and accountability: Maintain DigiCert's commitment to transparent reporting of incidents and timely response to community questions and concerns on public forums. 

Closure Request:

The incident has provided valuable lessons that have strengthened both DigiCert's operational capabilities and the broader industry's understanding of mass revocation challenges. DigiCert believes the comprehensive nature of the remediation actions addresses not only the immediate causes of this incident but also contributes to long-term improvements in ecosystem resilience.

All Action Items disclosed in this report have been completed as described, and we request its closure.

Flags: needinfo?(incident-reporting)

This is a final call for comments or questions on this Incident Report.

Otherwise, it will be closed on approximately 2025-06-10.

Whiteboard: [ca-compliance] [leaf-revocation-delay] → [close on 2025-06-10] [ca-compliance] [leaf-revocation-delay]

Thank you for setting a closing date. We have nothing further to report on this incident.

Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Flags: needinfo?(incident-reporting)
Resolution: --- → FIXED
Whiteboard: [close on 2025-06-10] [ca-compliance] [leaf-revocation-delay] → [ca-compliance] [leaf-revocation-delay]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: