Microsoft PKI Services: Failure to Revoke in 5 Days for 1962829
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: u654666, Assigned: CentralPKI, NeedInfo)
References
(Blocks 1 open bug)
Details
(Whiteboard: [ca-compliance] [leaf-revocation-delay])
Attachments
(2 files)
Preliminary Incident Report
Summary
-
Incident description
This incident is related to Bugzilla bug https://bugzilla.mozilla.org/show_bug.cgi?id=1962829.Microsoft PKI Services made a change to a previous policy document that included a copy and paste typo that was missed until after the document had already been superseded, but still has active certificates related to that already superseded document. While reformatting the document to include various tables, a new detail was added that did not align with how we have been operating since inception. Specifically, CPS Version 3.2.4 incorrectly added that keyEncipherment is not present in Subscriber certificates even though it had always been present and continues to be present.
Microsoft PKI Services understands that the Baseline Requirements 4.9.1.1 require that when “the CA is made aware that the Certificate was not issued in accordance with these Requirements (BRs) or the CA’s Certificate Policy or Certification Practice Statement” that the certificate needs to be revoked within 5 days.
We have not revoked these certificates, and this is the reason we have opened this Failure to Revoke in 5 days bug.
We understand this does not meet the expectations of the BRs and we look forward to discussing and resolving this issue with the industry.
-
Relevant policies:
- TLS Baseline Requirements Section 4.9.1.1 Reasons for Revoking a Subscriber Certificate
-
Source of incident disclosure:
This incident is related to Bugzilla bug https://bugzilla.mozilla.org/show_bug.cgi?id=1962829.
Updated•10 months ago
|
So are we to believe, even with Entrust's removal which started with a similar event, and all of the discussions around delayed revocation and the consequences over the past year - Microsoft are choosing simply not to revoke all the affected certificates and hoping they remain a trusted CA?
From searching I see that Microsoft's CA does not issue directly to other people where Microsoft do not control the private key. This means all keys and certificates are held within one organization. This surely means revokation and issuing of new certificates is 'easy' compared to a CA such like LetsEncrypt? What possible reason is there not to do this?
Questions:
- Are Microsoft really intending not to revoke impacted certificates as they are required to?
- What are the specific, detailed, technical and commercial reasons for not revoking?
- Do Microsoft expect to remain a trusted CA after such an incident?
It makes no sense to revoke certificates because of a clear typo in a document. It's like recalling washing machines because of a typo in the manual. Of course, if the typo was security relevant or the washing machine had a defect, recalling would be the only right thing to do.
(In reply to Max D from comment #1)
So are we to believe, even with Entrust's removal which started with a similar event, and all of the discussions around delayed revocation and the consequences over the past year - Microsoft are choosing simply not to revoke all the affected certificates and hoping they remain a trusted CA?
From searching I see that Microsoft's CA does not issue directly to other people where Microsoft do not control the private key. This means all keys and certificates are held within one organization. This surely means revokation and issuing of new certificates is 'easy' compared to a CA such like LetsEncrypt? What possible reason is there not to do this?
Questions:
- Are Microsoft really intending not to revoke impacted certificates as they are required to?
- What are the specific, detailed, technical and commercial reasons for not revoking?
- Do Microsoft expect to remain a trusted CA after such an incident?
Microsoft PKI Services has not finalized a plan related to revocation yet. We’re planning to provide more information in a Full Incident Report that includes responses to these questions. Until then, we opened a Preliminary Incident Report to acknowledge that we did not revoke certificates within 5 days as expected in Baseline Requirements section 4.9.1.1.
Updated•10 months ago
|
Comment 4•10 months ago
|
||
We understand that a full incident report will be provided, but we wanted to point to our related post in 1962829 to reaffirm how we view incident reporting. We will continue to evaluate this incident as more information becomes available and we look forward to a significant commitment to changes that definitively and convincingly resolve the underlying issues.
We have three questions for consideration with developing the full incident report:
(1) From past Chrome Root Program surveys and policy “preflight” processes, we understood Microsoft PKI Services to be highly automated, having a very small percentage of associated domains (~10%) relying on manual certificate issuance and management. Other public references 1, 2, 3, and 4 also led us to believe that automation and mass revocation was of minimal concern to Microsoft, outside of “CRL bloating” as referenced in the last link. What role is Microsoft PKI Services’ current automation solution and past lessons learned playing in responding to this incident?
(2) While acknowledging Microsoft PKI Services' reported level of automation and technical implementation does not rely on ACME, we’d like to ask about Microsoft PKI Service's equivalent of an ARI-like solution. What solution(s) similar to ARI does Microsoft PKI Services have in place to mitigate the impact of future events similar to this incident?
(3) How did CA and community member responses to past large-scale revocation events, many of which resulted in commitments to improved automation solutions and ARI, play a role in Microsoft PKI Services preparation for responding to this incident?
| Assignee | ||
Comment 5•10 months ago
|
||
| Assignee | ||
Comment 6•10 months ago
|
||
Response to Comment #4
(1) Automation and Lessons Learned
" From past Chrome Root Program surveys and policy “preflight” processes, we understood Microsoft PKI Services to be highly automated, having a very small percentage of associated domains (~10%) relying on manual certificate issuance and management. Other public references 1, 2, 3, and 4 also led us to believe that automation and mass revocation was of minimal concern to Microsoft, outside of “CRL bloating” as referenced in the last link. What role is Microsoft PKI Services’ current automation solution and past lessons learned playing in responding to this incident?"
We are in a much better position to auto-rotate our certificates and conduct a mass revocation from a CA perspective. However, there is high potential for business impact to subscribers as many still rely on deployments to consume new certificates. Given the CRL bloat issue, we are evaluating revocation options for the ICAs and will share the details as part of our full report.
(2) ARI-like Solutions
"While acknowledging Microsoft PKI Services' reported level of automation and technical implementation does not rely on ACME, we’d like to ask about Microsoft PKI Service's equivalent of an ARI-like solution. What solution(s) similar to ARI does Microsoft PKI Services have in place to mitigate the impact of future events similar to this incident?"
Microsoft has the capability to centrally renew all certificates issued by specific Issuers managed in Key Vault and internal vaults—a process we've successfully executed in the past and can repeat if necessary. While many subscribers adopt renewed certificates within 24–48 hours, some do not. Additionally, despite our guidance, many customers still use certificate pinning. As a result, even though we can renew certificates centrally, immediate revocation would negatively impact those customers.
(3) Preparation for revocation
" How did CA and community member responses to past large-scale revocation events, many of which resulted in commitments to improved automation solutions and ARI, play a role in Microsoft PKI Services preparation for responding to this incident?"
Greater than 90% of the impacted time-valid certificates are no longer in use, however due to the CRL bloat issue it is not possible to revoke them without impacting the certificates which are still in use. We are implementing CRL partitioning to handle this issue in our new CAs. Furthermore, as part of our effort to reduce the certificate lifetime we have already reduced most of our certificate lifetime to 6 months by default, with a goal to meet or exceed the industry lifetime requirements.
Comment 7•10 months ago
|
||
(In reply to Microsoft PKI Services from comment #6)
While many subscribers adopt renewed certificates within 24–48 hours, some do not. Additionally, despite our guidance, many customers still use certificate pinning. As a result, even though we can renew certificates centrally, immediate revocation would negatively impact those customers.
Is the intention of this response to indicate that negative impact on customers is a reason to avoid prompt revocation? Given the many discussions in Bugzilla on the topic in the last year, I feel it’s virtually certain that Microsoft is aware that subscriber impact due to missing automation or ill-advised practices like pinning is not considered to be sufficient reason to delay or avoid revocation of mis-issued certificates. But I’m not sure how else to interpret that answer, I confess!
Comment 8•10 months ago
|
||
Thank you for the responses in Comment 6. We understand (and hope) that these might be elaborated upon in the delivery of the full incident report, but we’d like to explicitly ask for more information to better understand this statement from Microsoft PKI Services:
Greater than 90% of the impacted time-valid certificates are no longer in use, however due to the CRL bloat issue it is not possible to revoke them without impacting the certificates which are still in use.
(1) What does it mean when you say these certificates are “no longer in use”?
(2) Can you describe how you are measuring “use”?
(3) Why do you suspect the percentage of certificates “no longer in use” is so high compared to the total population?
(4) Can you expand upon the impact of the “not in use” leaf revocations to the smaller percentage of certificates that would be considered “in use”? Is this offered from the assumption/perspective that the systems/user agents relying on the affected “in use” certificates would be polling the bloated CRL, timing out due to size, and failing closed? To be clear, we are not offering commentary on your conclusion, we are trying to better understand its basis.
| Assignee | ||
Comment 9•10 months ago
|
||
Full Incident Report
Summary
-
CA Owner CCADB unique ID:
A002577 -
Incident description:
Microsoft made an erroneous revision to our Microsoft PKI Services policy document, specifically CPS Version 3.2.4, which incorrectly stated that keyEncipherment is not present in RSA Subscriber certificates, even though it has always been present. This error was overlooked until after the document had been superseded. According to Baseline Requirements 4.9.1.1, certificates issued under this erroneous policy need to be revoked within 5 days. While we understand this obligation, revocation is delayed due to the potential negative impact on client-side validation caused by the large size of Microsoft’s Certificate Revocation Lists (CRLs). Microsoft PKI Services plans to revoke the certificates in batches to manage CRL size and avoid client-side validation issues. The revocation process will begin on 5/28/25 and is expected to be completed by 11/15/2025. CRL partitioning will be implemented to prevent similar issues in the future. -
Timeline summary:
-
Non-compliance start date:
2024-07-21 (the non-compliance start date in the preliminary incident report incorrectly stated 2024-07-01) -
Non-compliance identified date:
2025-04-25 -
Non-compliance end date:
2025-04-21 -
Relevant policies:
TLS Baseline Requirements Section 4.9.1.1 Reasons for Revoking a Subscriber Certificate -
Source of incident disclosure:
This incident was self-reported and related to Bugzilla bug https://bugzilla.mozilla.org/show_bug.cgi?id=1962829 which was third-party reported.
-
Impact
-
Total number of certificates:
100,322,979 -
Total number of "remaining valid" certificates:
75,361,465 -
Affected certificate types:
Organization Validated TLS Subscriber Certificates -
Incident heuristic:
This incident impacts all OV Subscriber certificates with RSA keys issued between 2024-07-21 and 2025-04-21. -
Was issuance stopped in response to this incident, and why or why not?:
No. Microsoft did not stop issuance as the affected CPS version had already been superseded prior to discovery of the issue and issuance continues under corrected documentation. -
Analysis:
We have the capability to bulk revoke certificates and have exercised it in previous bugs requiring revocation (e.g. 1962830). However, revoking tens of millions of certificates at once will create CRLs >600MB in size, and negatively impact client-side validation of certificates. After considering various options, we have decided to take a staged approach to revocation. -
Additional considerations: Most Subscribers of the certificates issued by the CA require support for TLS 1.2, which requires keyEncipherment to be set as per RFC 5246: "keyEncipherment bit MUST be set if the key usage extension is present)." While this does not excuse the typographical mistake, it helps re-enforce that that this was a typo for a setting that was never planned to be changed.
Timeline
- 2024-07-21: Public TLS CPS 3.2.4 published with new tables that included a typo in Section 7.1.2.7.11 stating keyEncipherment was not present in Subscriber certificates with RSA keys even though this was being set at the time and continues to be set
- 2025-04-21: Public TLS CPS 3.3.0 published that replaced multiple tables with new Appendix B Certificate Profiles section where keyEncipherment may be set, but did not distinguish between ECC and RSA public keys
- 2025-04-25: Third-party researcher emailed a Certificate Problem Report to Microsoft PKI Services identifying mismatches between Subscriber certificates and CPS document language related to bug 1962829
- 2025-04-29: Public TLS CPS 3.3.1 published that retained language in Appendix B that keyEncipherment may be set, but did not distinguish between ECC and RSA public keys.
- 2025-05-09: This bug was opened as we did not meet the Baseline Requirement guidelines stated in 4.9.1.1 to revoke certificates within 5 days.
Related Incidents
| Bug | Date | Description |
|---|---|---|
| 1962829 | 2025-04-25 | Microsoft PKI Services: Policy document bug. Microsoft PKI Services introduced a typo error in CPS Version 3.2.4 while reformatting the document, incorrectly stating that keyEncipherment is not present in Subscriber certificates. This contradicts longstanding practice and affects still-active certificates tied to the superseded document. |
Root Cause Analysis
Contributing Factor 1: CRL bloat risk prevents timely mass revocation due to large number of unexpired subscriber certificates
-
Description: There are ~75M impacted, unexpired certificates. Revoking all these certificates at once will create CRLs >600MB in size and negatively impact client-side validation of valid certificates. After considering various options, we have decided to take a staged approach to revocation. Our plan is to revoke certificates in batches on a weekly basis, maintaining a CRL size which does not negatively impact clients, and leaving room for additional revocations in case other incidents occur. Given the volume of certificates and the anticipated space available on the CRL at any given moment, this means many certificates will expire before we are able to revoke them. We expect to begin the revocations on 5/28/25 and complete before 11/15/2025.
-
Timeline:
- Since inception: Known risk associated with lack of CRL partitioning
- 2025-04-25: Inconsistency between published CPS and issued certificates identified (Bug 1962829)
- 2025-05-09: Microsoft opens Bug 1965612, acknowledging delay due to CRL size concerns
-
Detection:
The risk was known in advance from previous revocation planning efforts, and work was already underway on risk mitigations. The problem was reaffirmed during planning of this incident’s revocation response. -
Interaction with other factors:
N/A. -
Root Cause Analysis methodology used:
5-Whys
Lessons Learned
- What went well:
None - What didn’t go well:
- Without CRL partitioning in place, MS PKI could not execute revocation of tens of millions of certificates in a manner that does not negatively impact relying parties. To eliminate CRL size as an issue in the future, we will complete implementation of CRL partitioning which we started before this incident and roll it out before 11/15/2025.
- We considered an alternate plan to revoke the issuing CAs. However, we are relying on cross-signing of our ICAs so that our subscribers can support legacy devices which do not trust our root, and we do not have warm standby cross-signed ICAs to move subscribers to. We will add warm standby ICAs so that ICA revocation becomes a viable option. We are working out the details for this and will share a target date before 6/14/25.
- We agree with the observations from the community about the benefits of reducing the volume of publicly trusted certificates. We have started an investigation, and suspect subscriber implementation issues may be a contributing factor to the large number of certificates. We will complete the investigation before 6/27/25. If we identify additional repair items during this investigation, we will append them to this bug.
- Because there was no change to established certificate profiles, the misstatement in the CPS was initially viewed as a documentation issue rather than mis-issuance, which delayed reporting of this bug.
- We had a playbook and mechanisms to do revocations, but it needed to be scaled carefully and validated to be able to safely revoke millions of certificates.
- Where we got lucky:
The erroneous CPS version had already been superseded: The impacted CPS (v3.2.4) was replaced by v3.3.0 before the issue was detected, eliminating the need to stop issuance upon discovery of the incident (see 1962829 - Microsoft PKI Services: Policy document bug for related action items). - Additional:
None
Action Items
| Action Item | Kind | Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Track % of impacted certificates revoked. | 11/15/2025 | Not Started |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | % certificates confirmed via CT logs and CDP endpoints | 11/15/2025 | In Progress |
| Standup cross-signed warm stand by CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Publish and disclose standby ICAs in CT logs. Validate readiness through test issuance. | TBD | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | TSG documentation will be created and training compliance tracked through internal processes. | TBD | New |
| Reduce usage of public PKI | Prevent | Root Cause 1 | % reduction in public trusted certificates, unexpired certificates | TBD | In Progress |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Playbook validated with multiple rounds of revocations. Tracked internally. | 7/27/2025 | In Progress |
Appendix
- See attached file for the full list of affected certificates.
- Relevant CPS Policy Documents:
| Assignee | ||
Comment 10•10 months ago
|
||
Update to certificate text file
We are currently unable to upload the impacted certificate text file due to size limitations. We will provide a persistent URI for access in our next update.
| Assignee | ||
Comment 11•10 months ago
|
||
Response to Comment 7
Is the intention of this response to indicate that negative impact on customers is a reason to avoid prompt revocation? Given the many discussions in Bugzilla on the topic in the last year, I feel it’s virtually certain that Microsoft is aware that subscriber impact due to missing automation or ill-advised practices like pinning is not considered to be sufficient reason to delay or avoid revocation of mis-issued certificates. But I’m not sure how else to interpret that answer, I confess!
Thank you for the follow up. Though being able to safely revoke certificates is a consideration, our primary constraint is CRL sizes: revoking all ~75 million impacted/time-valid certificates at once would result in CRLs exceeding 600MB, which would impair revocation checking for relying parties. As a result, we are proposing revocation in batches, which we will detail in our full incident report.
| Assignee | ||
Comment 12•10 months ago
|
||
Response to Comment 8
(1) Definition of “No Longer in Use”
“What does it mean when you say these certificates are ‘no longer in use’?”
There are two scenarios. Our subscribers store private keys in vaults such as Azure Key Vault. The first scenario is the subscriber has deleted the certificate and its key from the vault. The second scenario is that the subscriber has enrolled and started using a new version of the certificate, is no longer using the previous version of the certificate, and previous version has not expired yet.
(2) Measuring Certificate “Use”
“Can you describe how you are measuring ‘use’?”
We have an inventory of certificates in subscriber vaults and telemetry of subscriber certificate usage.
(3) High Percentage of Certificates Not in Use
“Why do you suspect the percentage of certificates ‘no longer in use’ is so high compared to the total population?”
We are investigating why the total population is so high. We suspect, but have not yet confirmed, that the population is high due to subscriber implementation issues, aka a "leak”. If this is the case, it would explain why the percentage “no longer in use” is high compared to the total population.
(4) Impact of Revoking “Not in Use” Certificates
“Can you expand upon the impact of the “not in use” leaf revocations to the smaller percentage of certificates that would be considered “in use”? Is this offered from the assumption/perspective that the systems/user agents relying on the affected “in use” certificates would be polling the bloated CRL, timing out due to size, and failing closed? To be clear, we are not offering commentary on your conclusion, we are trying to better understand its basis.
Yes, the concern is that revoking all these certificates at once will create CRLs >600MB in size and client-side validation will experience delays and/or time-outs that may result in fail closed. Additionally, some clients have technical limits on the size of the CRLs and number of entries that they can process, which will be unable to process these CRLs altogether.
Comment 13•10 months ago
|
||
(In reply to Microsoft PKI Services from comment #11)
Response to Comment 7
Is the intention of this response to indicate that negative impact on customers is a reason to avoid prompt revocation? Given the many discussions in Bugzilla on the topic in the last year, I feel it’s virtually certain that Microsoft is aware that subscriber impact due to missing automation or ill-advised practices like pinning is not considered to be sufficient reason to delay or avoid revocation of mis-issued certificates. But I’m not sure how else to interpret that answer, I confess!
Thank you for the follow up. Though being able to safely revoke certificates is a consideration, our primary constraint is CRL sizes: revoking all ~75 million impacted/time-valid certificates at once would result in CRLs exceeding 600MB, which would impair revocation checking for relying parties. As a result, we are proposing revocation in batches, which we will detail in our full incident report.
The proposal of revoking 'the issuing CAs', or rather the affected intermediaries, and moving to new ones would handle the vast majority of your certificates. As it stands all that has been proposed is a massive step backwards for CA standards, all on the basis that a subset of subscribers are relying on the cross-signing for legacy device support. That it is taking a month to propose a potential, and lackluster, revocation plan shows a lack of regard for how the WebPKI as a whole has advanced over the years.
That the proposed revocation plan goes over a 6-month period is telling on how much effort Microsoft PKI have placed on reading any recent incidents. Moreso is the 'Related Incidents' part of the incident report that presumes the incidents must only relate to Microsoft PKI - a gross misunderstanding on incident reporting guidelines.
“Related Incidents” MUST consider incidents beyond those corresponding to the CA Owner subject of this report.
The Let's Encrypt: certificate lifetimes 90 days plus one second incident was 4 years ago. That focused on 185 million certificates, and drastically shifted how CAs operate to stop this being an issue going forward.
Q1: What learnings did Microsoft PKI take from that and similar incidents to make sure they would be capable of a mass-revocation event in keeping with the timelines in CCADB's Incident Reporting Guidelines?
The 'Timeline' section is remarkably quiet on what Microsoft PKI have been doing for 4 weeks. From statements provided it seems no assessment of the corpus of certificates has even occurred, but will start soon. This is called a 'Final' Incident Report because the work should have been completed already.
Q2: Does Microsoft PKI think this is acceptable practice in 2025? What has your CA been doing this entire time?
The Root Cause Analysis section is also lacking in completeness. It seems to have been written to focus solely on why this particular revocation plan is the only feasible way forward. It does not address the complete lack of attention to incidents in the past few years that would introduce best practices to Microsoft PKI that make this a non-issue.
Q3: Can Microsoft PKI talk us through the time it would take to generate new intermediaries and transition as many certificates across as possible? Note that this is not including any cross-signing.
Q4: What are the limitations on cross-signing with a new intermediary in getting this handled in a timely manner?
Q5: Are there any plans currently for dealing with root CAs being rotated and the impact on subscribers leaning on legacy-device use? See Chrome Root Program's Root CA Term-Limit as an example.
Given the entire point of these incidents is to tell the WebPKI ecosystem what has happened, and what you will do to ensure this does not happen again I'm rather baffled at the Action Items. As far as I can see these focus solely on dealing with the current problem, not making sure this can never happen again. There seems to be blinders on the CRL throughout the report, when alternative means of handling this exist but seem to be getting disregarded in favor of a 6-month plan at minimum.
2025-04-25 is when the Certificate Problem Report was sent in.
2025-05-10 is the start of this incident with a preliminary incident report.
2025-05-23 is when the final incident report was published.
Q6: Can Microsoft PKI please explain how this is in-keeping with CCADB's Incident Reporting Guidelines for handling incident reports? Note: 'When are reports expected?'
I strongly advise that Microsoft PKI at the very least read this comment by Mozilla in a recent incident.
Q7: With the above comment in mind, can Microsoft PKI please explain how this plan is showing public trust in adhering to best practices to date?
For those with censys access this should cover the majority of impacted certificates.
(labels="trusted" and validation.nss.has_trusted_path=true and not labels="revoked") and parsed.extensions.extended_key_usage.server_auth="true" and parsed.validity_period.not_before: {`2024-07-21` to `2025-04-21`} and parsed.issuer.organization=`Microsoft Corporation` and parsed.subject_key_info.key_algorithm.name=`RSA`
Results: 70,044,495
I will leave further analysis to other parties, suffice to say Chrome Root Program did hint at issues in question 7 of the other incident.
Comment 14•10 months ago
•
|
||
[In response to Comment 12.]
Thank you for providing answers to the questions posted in Comment 8 and for providing the Full Incident Report in Comment 9.
We have a few additional questions:
(1) Can you please describe the TLS server authentication certificate automation solution(s) in place to help us understand what is and is not considered in scope of the solutions available to subscribers? Statements in Comment 6 indicate “While many subscribers adopt renewed certificates within 24–48 hours, some do not.”
We interpret that to mean while the process of requesting a certificate (which may include key generation and performing domain control validation) is automated for some subscribers, the retrieval and installation of the corresponding certificate might not be in-scope for the automation solution.
(2) Can you help us understand the percent of affected certificates that are relying on “Azure Key Vault” or “internal vaults” where “Microsoft has the capability to centrally renew all certificates issued by specific Issuers”? For example: “XX% of the affected certificates can automatically be renewed and automatically configured for use due to Microsoft certificate lifecycle management solutions.”
(3) The context of a “leak” as presented below is unclear to us. Can you please explain this to us in a different way?
We suspect, but have not yet confirmed, that the population is high due to subscriber implementation issues, aka a "leak”.
Is this referencing scenarios where certificates were requested and issued, but then later abandoned by subscribers without requesting revocation?
(4) Comment 9 states:
Our plan is to revoke certificates in batches on a weekly basis, maintaining a CRL size which does not negatively impact clients, and leaving room for additional revocations in case other incidents occur.
Can you please share:
- (a) The criteria used for determining which certificates will be included in each week’s “batch”?
- (b) The CRL size being targeted to accomplish the stated goal of using this batch strategy?
- (c) How Microsoft PKI Services concluded the target size described immediately above will not negatively impact clients?
(5) We understand that Microsoft PKI Services was aware of its “CRL bloat” concerns related to mass revocation events in February 2025, and presumably earlier. Can you help us understand that given the existence of this concern and the community’s emphasis on improving response to mass revocation events over the past year, Microsoft PKI Services did not move forward with planning (minimally) or implementing partitioned CRLs sooner?
(6) Comment 6 includes:
Microsoft has the capability to centrally renew all certificates issued by specific Issuers managed in Key Vault and internal vaults—a process we've successfully executed in the past and can repeat if necessary.
Can you share which DCV method(s) is being relied upon during these types of renewals?
(7) Comment 6 includes:
Furthermore, as part of our effort to reduce the certificate lifetime we have already reduced most of our certificate lifetime to 6 months by default, with a goal to meet or exceed the industry lifetime requirements.
Given 90% of the impacted time-valid certificates were found to no longer be in use, and when considered against the degree of automation we understand to be in place, could the default validity be decreased further to reduce the likelihood of “stale” or unused TLS certificates?
(8) Related to the above, has Microsoft considered the use of short-lived certificates, as defined by the TLS BRs, for these subscriber use cases?
| Assignee | ||
Comment 15•10 months ago
|
||
We apologize for the delay in uploading the impacted certificates. The high volume of certificates caused unexpected issues: https://prsspublishingstorage.blob.core.windows.net/public-tls-certs/all/crtshurls.txt
| Assignee | ||
Comment 16•10 months ago
|
||
Response to Comment 14
(1) TLS Server Authentication Certificate Automation Scope
Can you please describe the TLS server authentication certificate automation solution(s) in place to help us understand what is and is not considered in scope of the solutions available to subscribers? Statements in Comment 6 indicate “While many subscribers adopt renewed certificates within 24–48 hours, some do not.” We interpret that to mean while the process of requesting a certificate (which may include key generation and performing domain control validation) is automated for some subscribers, the retrieval and installation of the corresponding certificate might not be in-scope for the automation solution.
Most subscribers use a vault such as Azure Key Vault for certificate management. The vault enrolls the certificate, and the subscriber retrieves the keys and certificate metadata from the vault and puts them into use. Triggering re-enrollment at the vault is fully automated, and vaults support automated distribution of certificates to subscriber nodes. However, not all subscribers have adopted the automated distribution solution yet.
(2) Percentage of Certificates Managed via Vaults
Can you help us understand the percent of affected certificates that are relying on “Azure Key Vault” or “internal vaults” where “Microsoft has the capability to centrally renew all certificates issued by specific Issuers”? For example: “XX% of the affected certificates can automatically be renewed and automatically configured for use due to Microsoft certificate lifecycle management solutions.”
Greater than 99% of impacted certificates are managed through vaults and can be centrally renewed.
(3) Clarification of “Leak”
The context of a “leak” as presented below is unclear to us. Can you please explain this to us in a different way? We suspect, but have not yet confirmed, that the population is high due to subscriber implementation issues, aka a "leak”. Is this referencing scenarios where certificates were requested and issued, but then later abandoned by subscribers without requesting revocation?
Yes, your interpretation is correct. That said, as we mentioned, we are following up with subscribers to understand whether these are implementation issues or valid use cases.
(4) Weekly Revocation Batch Strategy
Comment 9 states: Our plan is to revoke certificates in batches on a weekly basis, maintaining a CRL size which does not negatively impact clients, and leaving room for additional revocations in case other incidents occur. Can you please share:
• (a) The criteria used for determining which certificates will be included in each week’s “batch”?
• (b) The CRL size being targeted to accomplish the stated goal of using this batch strategy?
• (c) How Microsoft PKI Services concluded the target size described immediately above will not negatively impact clients?
(a) The primary criterion is telemetry that tells us if the certificate is in use or not. A secondary criterion is the certificate expiration date, which allows us to demonstrate revocation of larger batch sizes over time while preventing the CRL from exceeding the target size.
(b) Our goal is for the CRL size to not exceed 10MB.
(c) We used the recommended CRL size from the Windows TRP (10MB) and a large known existing CRL (13.3MB) as reference. As a precaution, we will scale up the revocation batch size over time while observing the impact on clients.
(5) CRL Partitioning Timeline
We understand that Microsoft PKI Services was aware of its “CRL bloat” concerns related to mass revocation events in February 2025, and presumably earlier. Can you help us understand that given the existence of this concern and the community’s emphasis on improving response to mass revocation events over the past year, Microsoft PKI Services did not move forward with planning (minimally) or implementing partitioned CRLs sooner?
Planning and implementation of CRL partitioning started before this incident and is currently being tested in a non-production environment. However, it has not been completed in time to be a mitigating factor in this incident.
(6) DCV Method for Central Renewals
Comment 6 includes: Microsoft has the capability to centrally renew all certificates issued by specific Issuers managed in Key Vault and internal vaults—a process we've successfully executed in the past and can repeat if necessary. Can you share which DCV method(s) is being relied upon during these types of renewals?
The domain control validation (DCV) method used during these types of certificate renewals is BR Section 3.2.2.4.2 – Email, Fax, SMS, or Postal Mail to Domain Contact. We use the email to the Domain Contact method specifically to validate the domains and this process is automated.
(7) Certificate Validity Period
Comment 6 includes:
Furthermore, as part of our effort to reduce the certificate lifetime we have already reduced most of our certificate lifetime to 6 months by default, with a goal to meet or exceed the industry lifetime requirements.
Given 90% of the impacted time-valid certificates were found to no longer be in use, and when considered against the degree of automation we understand to be in place, could the default validity be decreased further to reduce the likelihood of “stale” or unused TLS certificates?
Our goal is for our default validity to meet or exceed industry requirements. In addition, we are working with subscribers to request shorter lifetime certificates based on their scenarios.
(8) Use of Short-Lived Certificates
Related to the above, has Microsoft considered the use of short-lived certificates, as defined by the TLS BRs, for these subscriber use cases?
Yes, we are evaluating the use of short-lived subscriber certificates.
| Assignee | ||
Comment 17•10 months ago
|
||
Response to Comment 13
(1) Learnings from Past Incidents
What learnings did Microsoft PKI take from that and similar incidents to make sure they would be capable of a mass-revocation event in keeping with the timelines in CCADB's Incident Reporting Guidelines?
The 'Timeline' section is remarkably quiet on what Microsoft PKI have been doing for 4 weeks. From statements provided it seems no assessment of the corpus of certificates has even occurred, but will start soon. This is called a 'Final' Incident Report because the work should have been completed already.
We have completed our internal assessment of the full corpus of impacted certificates. Due to file size limitations, we were unable to upload this directly to Bugzilla. We have provided access via public blob storage.
Two learnings we had from previous incidents were the criticality of CRL partitioning and the value of reducing certificate lifetime. Work had already started on CRL partitioning before this incident occurred but was not complete in time to be a mitigating factor. On certificate lifetime, we took a first step by reducing the lifetime of new certificates for the majority of subscribers from 1 year to 6 months starting in October 2024.
(2) Acceptability of Current Practices
Does Microsoft PKI think this is acceptable practice in 2025? What has your CA been doing this entire time?
The Root Cause Analysis section is also lacking in completeness. It seems to have been written to focus solely on why this particular revocation plan is the only feasible way forward. It does not address the complete lack of attention to incidents in the past few years that would introduce best practices to Microsoft PKI that make this a non-issue.
Thank you for the feedback. We recognize the importance of aligning with evolving best practices in the Web PKI ecosystem. As part of our long-term strategy to eliminate the conditions that led to this issue, we are investing in CRL partitioning, standing up warm standby ICAs, reducing the number of publicly trusted certificates, and exploring short-lived certificate models. These actions are designed to ensure we can support timely and large-scale revocation going forward.
Additionally, as mentioned in Comment 11 of Bug 1962829 we have opened action items to enhance our process to ensure we are adapting best practices from all Bugzilla incidents moving forward.
(3) Time to Transition to New Intermediaries
Can Microsoft PKI talk us through the time it would take to generate new intermediaries and transition as many certificates across as possible? Note that this is not including any cross-signing.
If we had warm standby ICAs, we could start transitioning subscribers immediately. The goal of the repair item to have warm standby CAs is to remove the lag associated with the creation of new ICAs. Subscribers can be transitioned through a combination of automation and an internal campaign in order of days.
(4) Limitations on Cross-Signing
What are the limitations on cross-signing with a new intermediary in getting this handled in a timely manner?
A new cross-signing arrangement requires negotiation, legal review, and formal execution of a contract with the third-party CA. This process introduces time constraints that make it unsuitable for immediate response actions.
(5) Root CA Rotation and Legacy Devices
Are there any plans currently for dealing with root CAs being rotated and the impact on subscribers leaning on legacy-device use? See Chrome Root Program's Root CA Term-Limit as an example.
Given the entire point of these incidents is to tell the WebPKI ecosystem what has happened, and what you will do to ensure this does not happen again I'm rather baffled at the Action Items. As far as I can see these focus solely on dealing with the current problem, not making sure this can never happen again. There seems to be blinders on the CRL throughout the report, when alternative means of handling this exist but seem to be getting disregarded in favor of a 6-month plan at minimum.
2025-04-25 is when the Certificate Problem Report was sent in.
2025-05-10 is the start of this incident with a preliminary incident report.
2025-05-23 is when the final incident report was published.
We are aware of the Chrome Root Program’s root CA term limits. While customer workloads for our subscribers have legacy device dependencies today, we expect those dependencies to diminish as those devices age out of the ecosystem in the coming years. Our reliance on cross-signing to support those devices will phase out accordingly.
Some of our action items are focused on resolving this incident, but others—such as implementing CRL partitioning and establishing Warm Standby ICAs—are intended to address the root causes that currently limit timely revocation. These forward-looking efforts are critical to ensuring we can respond more quickly and reliably to similar issues in the future.
(6) Incident Reporting Timeliness
Can Microsoft PKI please explain how this is in-keeping with CCADB's Incident Reporting Guidelines for handling incident reports? Note: 'When are reports expected?'
I strongly advise that Microsoft PKI at the very least read this comment by Mozilla in a recent incident.
We acknowledge that this bug (1965612) should have been opened earlier, ideally when it became clear that revocation within 5 days would not be feasible. While we filed the Preliminary Report for Bug 1962829 on 2025-05-09, we agree that separate reporting for revocation delays was warranted sooner.
We have noted this delay in the “What Did Not Go Well” section of our Full Incident Report and have committed to a related repair item: improving our internal processes for early scoping and rapid incident triage, including ensuring new bugs are filed in a timely manner when distinct revocation challenges arise.
(7) Demonstrating Public Trust and Best Practices
With the above comment in mind, can Microsoft PKI please explain how this plan is showing public trust in adhering to best practices to date?
For those with censys access this should cover the majority of impacted certificates.
(labels="trusted" and validation.nss.has_trusted_path=true and not labels="revoked") and parsed.extensions.extended_key_usage.server_auth="true" and parsed.validity_period.not_before: {2024-07-21to2025-04-21} and parsed.issuer.organization=Microsoft Corporationand parsed.subject_key_info.key_algorithm.name=RSA
Results: 70,044,495
I will leave further analysis to other parties, suffice to say Chrome Root Program did hint at issues in question 7 of the other incident.
We are performing batch revocations to demonstrate our ability to revoke at scale while preserving CRL space for additional revocations if necessary. In parallel, we’re advancing efforts to stand up warm standby CAs, reduce subscriber reliance on publicly trusted certificates, reduce certificate lifetime and investigate short-lived certificates, and implement partitioned CRLs—all aimed at minimizing the risk of delayed revocation going forward.
| Assignee | ||
Comment 18•10 months ago
|
||
Revocation Delay Status Update
- the number of certificates that have been revoked:
11,000
- the number of certificates that have not yet been revoked:
72,070,777
- the number of certificates planned for revocation that have expired:
3,290,688
- an estimate for when all remaining revocations will be completed:
we will continue to revoke certificates in batches until 11/15/2025 as mentioned in our Full Incident Report
| Assignee | ||
Comment 19•10 months ago
|
||
Update to Action Items
We are actively working on all repair items associated with this incident. In addition, we have updated the due dates for several action items to reflect current progress and planning.
| Action Item | Kind | Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Track % of impacted certificates revoked. | 11/15/2025 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | % certificates confirmed via CT logs and CDP endpoints | 11/15/2025 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Publish and disclose standby ICAs in CT logs. Validate readiness through test issuance. | 9/30/2025 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | TSG documentation will be created and training compliance tracked through internal processes. | 7/31/2025 | New |
| Reduce usage of public PKI | Prevent | Root Cause 1 | % reduction in public trusted certificates, unexpired certificates | 9/30/2025 | In Progress |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Playbook validated with multiple rounds of revocations. Tracked internally. | 7/27/2025 | In Progress |
Comment 20•10 months ago
|
||
(In reply to Microsoft PKI Services from comment #16)
Response to Comment 14
(4) Weekly Revocation Batch Strategy
Comment 9 states: Our plan is to revoke certificates in batches on a weekly basis, maintaining a CRL size which does not negatively impact clients, and leaving room for additional revocations in case other incidents occur. Can you please share:
• (a) The criteria used for determining which certificates will be included in each week’s “batch”?
• (b) The CRL size being targeted to accomplish the stated goal of using this batch strategy?
• (c) How Microsoft PKI Services concluded the target size described immediately above will not negatively impact clients?(a) The primary criterion is telemetry that tells us if the certificate is in use or not. A secondary criterion is the certificate expiration date, which allows us to demonstrate revocation of larger batch sizes over time while preventing the CRL from exceeding the target size.
(b) Our goal is for the CRL size to not exceed 10MB.
(c) We used the recommended CRL size from the Windows TRP (10MB) and a large known existing CRL (13.3MB) as reference. As a precaution, we will scale up the revocation batch size over time while observing the impact on clients.
Could you provide a link the "Windows TRP" document referenced here?
Is this referring to Program Requirements - Microsoft Trusted Root Program section 3.a.5 "If an AIA extension with a valid OCSP URL is NOT included, then the resulting CRL File should be <10MB." and repeated in 3.c.3.c "Maximum size of the CRL file (either full CRL or partitioned CRL) should not exceed 10M."? I note that both of those are SHOULD recommendations/requirements, not MUST.
Which CAs CRLs have you referenced to develop that maximum size goal?
The blog article An analysis of CRL sizes posted mid-last year shows some historical data at or above that limit, following the CRL links in that article allows easy discovery of current examples like Digicert Global G2 TLS RSA SHA256 2020 CA1 (around 12.7MB at 2025-05-31T02:20Z).
(5) CRL Partitioning Timeline
We understand that Microsoft PKI Services was aware of its “CRL bloat” concerns related to mass revocation events in February 2025, and presumably earlier. Can you help us understand that given the existence of this concern and the community’s emphasis on improving response to mass revocation events over the past year, Microsoft PKI Services did not move forward with planning (minimally) or implementing partitioned CRLs sooner?
Planning and implementation of CRL partitioning started before this incident and is currently being tested in a non-production environment. However, it has not been completed in time to be a mitigating factor in this incident.
Could you share more detail about your timeline for CRL partitioning, such as when planning began and when the implementation was first in a state where it was considered ready for testing?
While trying to find the "Windows TRP" document referenced above I see Microsoft published guidance for deploying PKI on Windows Server 2003 on or before 2014/05/27 which recommends CRL partitioning. I find it surprising that Microsoft PKI Services have not adopted that guidance internally in the intervening decade.
(In reply to Microsoft PKI Services from comment #17)
Response to Comment 13
(1) Learnings from Past Incidents
What learnings did Microsoft PKI take from that and similar incidents to make sure they would be capable of a mass-revocation event in keeping with the timelines in CCADB's Incident Reporting Guidelines?
The 'Timeline' section is remarkably quiet on what Microsoft PKI have been doing for 4 weeks. From statements provided it seems no assessment of the corpus of certificates has even occurred, but will start soon. This is called a 'Final' Incident Report because the work should have been completed already.We have completed our internal assessment of the full corpus of impacted certificates. Due to file size limitations, we were unable to upload this directly to Bugzilla. We have provided access via public blob storage.
Two learnings we had from previous incidents were the criticality of CRL partitioning and the value of reducing certificate lifetime. Work had already started on CRL partitioning before this incident occurred but was not complete in time to be a mitigating factor. On certificate lifetime, we took a first step by reducing the lifetime of new certificates for the majority of subscribers from 1 year to 6 months starting in October 2024.
You mention that you took learnings from previous incidents, however you do not reference any incidents from other CAs in the "Related Incidents" section of your Full Incident Report. The CCADB Full Incident Report template and accompanying explanation of this field explicitly states:
“Related Incidents” MUST consider incidents beyond those corresponding to the CA Owner subject of this report.
Which incident(s) did you review when implementing the October 2024 default lifetime changes? Which incident(s) did you review when planning/implementing the CRL partitioning changes?
This would be useful information to include in the timeline for Contributing Factor 1 as part of your Root Cause Analysis.
Which incident(s) did you review when developing the mass revocation plan for this incident?
While you have mentioned that until recently Microsoft PKI Services only informally monitored and reviewed incidents, based on your responses to this question in comment 13 and question 3 of comment 4 it appears you did use other incidents to guide your response/revocation plan in this instance. Explicit references to the other mass revocation incidents and revocation delay incidents you reviewed should also be included in the Related Incidents section.
Comment 21•9 months ago
|
||
[In response to Comment 16.]
Thank you for providing answers to the questions posted in Comment 14.
We have a few follow-up questions and comments:
Question 1: In its response, Microsoft PKI Services stated:
Greater than 99% of impacted certificates are managed through vaults and can be centrally renewed.
Can you please share:
- a) Approximately what percent of subscribers have adopted the automated distribution solution?
- b) Approximately what percent of affected certificates are represented by those subscribers?
- c) Whether Microsoft has triggered the automatic renewal and replacement for these subscribers’ certificates?
Question 2: In its response, Microsoft PKI Services stated:
The domain control validation (DCV) method used during these types of certificate renewals is BR Section 3.2.2.4.2 – Email, Fax, SMS, or Postal Mail to Domain Contact.
Given this method has been sunset, can you share what Microsoft PKI Services is planning for certificates issued beginning July 15, 2025?
Question 3: In its response, Microsoft PKI Services stated:
Our goal is for our default validity to meet or exceed industry requirements. In addition, we are working with subscribers to request shorter lifetime certificates based on their scenarios.
Can you please share more? This response doesn’t offer actionable detail or directly answer the question posed. Asked differently, considering that 90% of affected certificates were determined not in use at the time of this incident, can that be interpreted to indicate the default validity should be less than six months given observed real-world use of these certificates?
Question 4: Do you have any indication that the not-in-use certificates corresponded to Azure resources that were intentionally short-lived (e.g., someone standing up a test environment for a few days, confirming something, and then deleting it — however in the process the corresponding TLS certificate is orphaned)? Like you, we’re trying to understand the user patterns that resulted in the significant amount of non-use.
Comment 1: The Evaluation Criteria included in subsequent updates can be improved by offering concrete objectives. For example “% reduction in public trusted certificates, unexpired certificates” doesn’t offer sufficient detail to understand how Microsoft is evaluating this Action Item, or how a member of the community can help.
Comment 2: Studying data disclosed to the CCADB, we do observe CA records trusted in Chrome that disclose a full and complete CRL, but NOT a partitioned array and whose corresponding size is larger than 10 MB. We’re not aware of specific issues related to these CAs, though we would be interested to learn if there are, in fact, specific issues.
- http://crl.quovadisglobal.com/hinicag2.crl (99.45 MB)
- http://certificates.godaddy.com/mastergodaddy2issuing.crl (39.39 MB)
- http://httpcrl.trust.telia.com/teliasoneramobileidcav2.crl (21.97 MB)
- https://www.accv.es/fileadmin/Archivos/certificados/accvca120_der.crl (17.45 MB)
- http://www.accv.es/fileadmin/Archivos/certificados/accvca120_der.crl (serves the same file as above)
- http://crl.sectigo.com/SectigoRSADomainValidationSecureServerCA.crl (10.77 MB)
Question 5: Looking at some of the Microsoft PKI Services CRLs relevant to this incident (e.g., http://www.microsoft.com/pkiops/crl/Microsoft%20Azure%20RSA%20TLS%20Issuing%20CA%2007.crl and http://www.microsoft.com/pkiops/crl/Microsoft%20Azure%20RSA%20TLS%20Issuing%20CA%2008.crl ) - the current size (~4KB) is significantly less than the stated 10 MB goal. We understand Microsoft is intending to gradually ramp-up revocations, but the timing of that ramp-up is unclear.
Can you help us understand how Microsoft PKI Services intends to balance its planned revocations and the desire to leave room for additional revocations in case other incidents occur (described in Comment 9)?
| Assignee | ||
Comment 22•9 months ago
|
||
Weekly Status Update
We are actively progressing through all repair items identified in the incident report. All action items are currently in progress. We remain on track to meet the expected due dates outlined in the full report.
In addition, we encountered a new revocation-related issue yesterday that resulted in a non-compliance. We will be reporting this through a separate Bugzilla entry and will link it as a related incident once posted.
| Assignee | ||
Comment 23•9 months ago
|
||
Revocation Delay Status Update
-
The number of certificates that have been revoked:
- 347,644
-
The number of certificates that have not yet been revoked:
- 69,172,142
-
The number of certificates planned for revocation that have expired:
- 2,554,134
-
An estimate for when all remaining revocations will be completed:
- We will continue to revoke certificates in batches until 11/15/2025 as mentioned in our FIR
Comment 24•9 months ago
|
||
(In reply to Microsoft PKI Services from comment #23)
Revocation Delay Status Update
The number of certificates that have been revoked:
- 347,644
The number of certificates that have not yet been revoked:
- 69,172,142
The number of certificates planned for revocation that have expired:
- 2,554,134
An estimate for when all remaining revocations will be completed:
- We will continue to revoke certificates in batches until 11/15/2025 as mentioned in our FIR
Do we have a more detailed revocation plan yet? Currently we seem to be stalling until November when the final certificates will expire instead of being revoked as required.
If there is no intent for these certificate to ever be revoked, why are they listed as 'planned for revocation'?
I am dismayed at the attention this incident is receiving and the lack of pro-activeness. We have regular reports late Friday, and no sign that this is being treated with any severity internally. If we are to learn of the plan through weekly questioning then please advise us in advance so the questions can be more thoroughly worded.
| Assignee | ||
Comment 25•9 months ago
|
||
(In reply to Andrew from comment #20)
(In reply to Microsoft PKI Services from comment #16)
Response to Comment 14
(4) Weekly Revocation Batch Strategy
Comment 9 states: Our plan is to revoke certificates in batches on a weekly basis, maintaining a CRL size which does not negatively impact clients, and leaving room for additional revocations in case other incidents occur. Can you please share:
• (a) The criteria used for determining which certificates will be included in each week’s “batch”?
• (b) The CRL size being targeted to accomplish the stated goal of using this batch strategy?
• (c) How Microsoft PKI Services concluded the target size described immediately above will not negatively impact clients?(a) The primary criterion is telemetry that tells us if the certificate is in use or not. A secondary criterion is the certificate expiration date, which allows us to demonstrate revocation of larger batch sizes over time while preventing the CRL from exceeding the target size.
(b) Our goal is for the CRL size to not exceed 10MB.
(c) We used the recommended CRL size from the Windows TRP (10MB) and a large known existing CRL (13.3MB) as reference. As a precaution, we will scale up the revocation batch size over time while observing the impact on clients.Could you provide a link the "Windows TRP" document referenced here?
Is this referring to Program Requirements - Microsoft Trusted Root Program section 3.a.5 "If an AIA extension with a valid OCSP URL is NOT included, then the resulting CRL File should be <10MB." and repeated in 3.c.3.c "Maximum size of the CRL file (either full CRL or partitioned CRL) should not exceed 10M."? I note that both of those are SHOULD recommendations/requirements, not MUST.Which CAs CRLs have you referenced to develop that maximum size goal?
The blog article An analysis of CRL sizes posted mid-last year shows some historical data at or above that limit, following the CRL links in that article allows easy discovery of current examples like Digicert Global G2 TLS RSA SHA256 2020 CA1 (around 12.7MB at 2025-05-31T02:20Z).(5) CRL Partitioning Timeline
We understand that Microsoft PKI Services was aware of its “CRL bloat” concerns related to mass revocation events in February 2025, and presumably earlier. Can you help us understand that given the existence of this concern and the community’s emphasis on improving response to mass revocation events over the past year, Microsoft PKI Services did not move forward with planning (minimally) or implementing partitioned CRLs sooner?
Planning and implementation of CRL partitioning started before this incident and is currently being tested in a non-production environment. However, it has not been completed in time to be a mitigating factor in this incident.
Could you share more detail about your timeline for CRL partitioning, such as when planning began and when the implementation was first in a state where it was considered ready for testing?
While trying to find the "Windows TRP" document referenced above I see Microsoft published guidance for deploying PKI on Windows Server 2003 on or before 2014/05/27 which recommends CRL partitioning. I find it surprising that Microsoft PKI Services have not adopted that guidance internally in the intervening decade.(In reply to Microsoft PKI Services from comment #17)
Response to Comment 13
(1) Learnings from Past Incidents
What learnings did Microsoft PKI take from that and similar incidents to make sure they would be capable of a mass-revocation event in keeping with the timelines in CCADB's Incident Reporting Guidelines?
The 'Timeline' section is remarkably quiet on what Microsoft PKI have been doing for 4 weeks. From statements provided it seems no assessment of the corpus of certificates has even occurred, but will start soon. This is called a 'Final' Incident Report because the work should have been completed already.We have completed our internal assessment of the full corpus of impacted certificates. Due to file size limitations, we were unable to upload this directly to Bugzilla. We have provided access via public blob storage.
Two learnings we had from previous incidents were the criticality of CRL partitioning and the value of reducing certificate lifetime. Work had already started on CRL partitioning before this incident occurred but was not complete in time to be a mitigating factor. On certificate lifetime, we took a first step by reducing the lifetime of new certificates for the majority of subscribers from 1 year to 6 months starting in October 2024.
You mention that you took learnings from previous incidents, however you do not reference any incidents from other CAs in the "Related Incidents" section of your Full Incident Report. The CCADB Full Incident Report template and accompanying explanation of this field explicitly states:
“Related Incidents” MUST consider incidents beyond those corresponding to the CA Owner subject of this report.
Which incident(s) did you review when implementing the October 2024 default lifetime changes? Which incident(s) did you review when planning/implementing the CRL partitioning changes?
This would be useful information to include in the timeline for Contributing Factor 1 as part of your Root Cause Analysis.Which incident(s) did you review when developing the mass revocation plan for this incident?
While you have mentioned that until recently Microsoft PKI Services only informally monitored and reviewed incidents, based on your responses to this question in comment 13 and question 3 of comment 4 it appears you did use other incidents to guide your response/revocation plan in this instance. Explicit references to the other mass revocation incidents and revocation delay incidents you reviewed should also be included in the Related Incidents section.
(1) Windows TRP link
” Could you provide a link the "Windows TRP" document referenced here? Is this referring to Program Requirements - Microsoft Trusted Root Program section 3.a.5 "If an AIA extension with a valid OCSP URL is NOT included, then the resulting CRL File should be <10MB." and repeated in 3.c.3.c "Maximum size of the CRL file (either full CRL or partitioned CRL) should not exceed 10M."? I note that both of those are SHOULD recommendations/requirements, not MUST.”
Yes, the reference to "Windows TRP" in our response corresponds to the Microsoft Trusted Root Program requirements.
We acknowledge that both of these are "SHOULD" recommendations rather than "MUST" requirements. Our decision to target a CRL size around 10MB aligns with these recommendations and reflects a conservative approach aimed at minimizing potential impact to relying parties during revocation processing. This leaves some space for any revocations that we may need to do for potential problem reports.
(2) Referenced CRLs
"Which CAs CRLs have you referenced to develop that maximum size goal? The blog article An analysis of CRL sizes posted mid-last year shows some historical data at or above that limit, following the CRL links in that article allows easy discovery of current examples like Digicert Global G2 TLS RSA SHA256 2020 CA1 (around 12.7MB at 2025-05-31T02:20Z)."
We referenced CRLs from several widely deployed CAs when evaluating an acceptable maximum size. Specifically, we identified CRLs mentioned from How Big Are CRLs That Are Found In The Wild? | technotes.seastrom.com and from the link you mentioned.
We chose a 10MB target as a conservative threshold, aligning with Windows TRP recommendations and in line with some of the largest CRLs we found in the links mentioned above.
(3) CRL Partitioning Timeline
"Could you share more detail about your timeline for CRL partitioning, such as when planning began and when the implementation was first in a state where it was considered ready for testing? While trying to find the "Windows TRP" document referenced above I see Microsoft published guidance for deploying PKI on Windows Server 2003 on or before 2014/05/27 which recommends CRL partitioning. I find it surprising that Microsoft PKI Services have not adopted that guidance internally in the intervening decade. "
Implementation of CRL partitioning in our CA service started in November 2024 . We were already in the process of testing the changes in our pre-production environment at the time this bug was reported, but have identified issues in our testing, which we are working to resolve.
Specific to your question about the guidance from 2014, the method described in that reference specifies rolling the CA key every year to reduce the CRL size. That method does work to limit CRL size but has many other limitations that prohibit it from being a good option for managing our CA infrastructure. This article uses the word “partitioning” but describes a method different from what we have been discussing in this bug recently.
(4) Related Incidents
"You mention that you took learnings from previous incidents, however you do not reference any incidents from other CAs in the "Related Incidents" section of your Full Incident Report. The CCADB Full Incident Report template and accompanying explanation of this field explicitly states: “Related Incidents” MUST consider incidents beyond those corresponding to the CA Owner subject of this report. Which incident(s) did you review when implementing the October 2024 default lifetime changes? Which incident(s) did you review when planning/implementing the CRL partitioning changes? This would be useful information to include in the timeline for Contributing Factor 1 as part of your Root Cause Analysis."
Thank you for the clarification. We acknowledge the requirement to include relevant incidents from other CAs in the “Related Incidents” section of the Full Incident Report, as defined in the CCADB template guidance.
In relation to the default lifetime changes, there were multiple factors which drove that decision – Evolution of Microsoft’s own internal standards, evolution of industry requirements as well as learnings from past incidents like Bugzilla 1715672. Similarly, based on our own internal analysis, CRL partitioning was already in our plans prior to this incident as well as learnings from incidents like Bugzilla 1715672, As we have outlined in the action items for Bug 1962829 we are formalizing the process for Bugzilla bug reviews which will not only help us learn from other incidents systematically but will also make correlation of incidents easier.
We will update the “Related Incidents” section and the Root Cause Analysis timeline to reflect this.
(5) Incident Review for Mass Revocation Plan
"Which incident(s) did you review when developing the mass revocation plan for this incident? While you have mentioned that until recently Microsoft PKI Services only informally monitored and reviewed incidents, based on your responses to this question in comment 13 and question 3 of comment 4 it appears you did use other incidents to guide your response/revocation plan in this instance. Explicit references to the other mass revocation incidents and revocation delay incidents you reviewed should also be included in the Related Incidents section."
We did reference the following incidents as part of our response planning and will include them in our related incidents section -
1890896 - Entrust: CPS typographical (text placement) error
1910805 - DigiCert: Delayed revocation of 1910322
1715672 - Let's Encrypt: Failure to revoke for Certificate Lifetime Incident
| Assignee | ||
Comment 26•9 months ago
|
||
Response to Comment 21 - Chrome Root Program
Question 1
”In its response, Microsoft PKI Services stated:
Greater than 99% of impacted certificates are managed through vaults and can be centrally renewed. Can you please share:
- a) Approximately what percent of subscribers have adopted the automated distribution solution?
- b) Approximately what percent of affected certificates are represented by those subscribers?
- c) Whether Microsoft has triggered the automatic renewal and replacement for these subscribers’ certificates?”
a) Based on our analysis to date, which covers 50% of the impacted certificate population, we have confirmed 95% are auto distributed within 5 days of renewal. We will continue analyzing the remaining 50% and will provide an update once that is complete. Note that this is true for the population of the affected certificates, and based on how customer workloads for the subscriber services evolve, this mix could change in the future.
b) The subscribers analyzed so far represent approximately 50% of the affected certificate volume.
c) Microsoft has not triggered a rotation of the affected certificates. Of the affected certificates, ~98% have already been deleted, expired or renewed. >99% of the remaining certs will be automatically rotated before July 31st.
Question 2
” In its response, Microsoft PKI Services stated:
The domain control validation (DCV) method used during these types of certificate renewals is BR Section 3.2.2.4.2 – Email, Fax, SMS, or Postal Mail to Domain Contact.
Given this method has been sunset, can you share what Microsoft PKI Services is planning for certificates issued beginning July 15, 2025?”
We will be using the Email to DNS CAA Contact method outlined in section 3.2.2.4.13 of the BRs starting on July 15, 2025. We also support the DNS Change method as outlined in section 3.2.4.7 of the BRs.
Question 3
” In its response, Microsoft PKI Services stated:
Our goal is for our default validity to meet or exceed industry requirements. In addition, we are working with subscribers to request shorter lifetime certificates based on their scenarios.
Can you please share more? This response doesn’t offer actionable detail or directly answer the question posed. Asked differently, considering that 90% of affected certificates were determined not in use at the time of this incident, can that be interpreted to indicate the default validity should be less than six months given observed real-world use of these certificates?”
Yes, based on our investigation, 75% of the impacted certificates could have had a 30 day lifetime based on the lifecycle of the underlying resource using the certificate. We plan to work with subscribers with scenarios like this to move them to 30 day certificates.
Question 4
”Do you have any indication that the not-in-use certificates corresponded to Azure resources that were intentionally short-lived (e.g., someone standing up a test environment for a few days, confirming something, and then deleting it — however in the process the corresponding TLS certificate is orphaned)? Like you, we’re trying to understand the user patterns that resulted in the significant amount of non-use.”
There are 2 major categories of workloads that we have found which are driving a high % of not-in-use certificates, which would benefit from moving to short lived certificates:
- Short-lived customer workloads
- Synthetic testing workloads for customer experience
In these cases, endpoints are created and then deleted in a short period, causing the certs to be created but then no longer used even though they remain valid.
Comment 1
” The Evaluation Criteria included in subsequent updates can be improved by offering concrete objectives. For example “% reduction in public trusted certificates, unexpired certificates” doesn’t offer sufficient detail to understand how Microsoft is evaluating this Action Item, or how a member of the community can help.”
Thank you for the suggestion. We have updated the evaluation criteria’s for our action items and will include in our weekly update.
Comment 2
”Studying data disclosed to the CCADB, we do observe CA records trusted in Chrome that disclose a full and complete CRL, but NOT a partitioned array and whose corresponding size is larger than 10 MB. We’re not aware of specific issues related to these CAs, though we would be interested to learn if there are, in fact, specific issues.
- http://crl.quovadisglobal.com/hinicag2.crl (99.45 MB)
- http://certificates.godaddy.com/mastergodaddy2issuing.crl (39.39 MB)
- http://httpcrl.trust.telia.com/teliasoneramobileidcav2.crl (21.97 MB)
- https://www.accv.es/fileadmin/Archivos/certificados/accvca120_der.crl (17.45 MB)
- http://www.accv.es/fileadmin/Archivos/certificados/accvca120_der.crl (serves the same file as above)
- http://crl.sectigo.com/SectigoRSADomainValidationSecureServerCA.crl (10.77 MB)”
When setting the initial targets, we researched CRLs from several widely deployed CAs when evaluating an acceptable maximum size. Specifically, we identified CRLs mentioned in the following articles – How Big Are CRLs That Are Found In The Wild? | technotes.seastrom.com and An analysis of CRL sizes. At the time of analysis, the largest CRL we found in these articles was approximately 13MB. We chose a 10MB target as a conservative threshold, aligning with Windows TRP recommendations, in line with some of the largest CRLs we found in the links mentioned above, and leaving room to grow up to 13 MB.
During our recent revocation efforts, we have received an escalation from a Microsoft service regarding the CRL size being too large (5MB at the time). We will continue to follow windows TRP recommendations and monitor potential impacts closely.
Question 5
” Looking at some of the Microsoft PKI services CRLs relevant to this incident (e.g., http://www.microsoft.com/pkiops/crl/Microsoft%20Azure%20RSA%20TLS%20Issuing%20CA%2007.crl and http://www.microsoft.com/pkiops/crl/Microsoft%20Azure%20RSA%20TLS%20Issuing%20CA%2008.crl ) the current size (~4KB) is significantly less than the stated 10 MB goal. We understand Microsoft is intending to gradually ramp-up revocations, but the timing of that ramp-up is unclear.”
Please see the attached weekly revocation plan which details out the ramp plan for how many certs we plan to revoke on a weekly basis.
| Assignee | ||
Comment 27•9 months ago
|
||
Revocation Plan
Comment 28•9 months ago
|
||
(In reply to Microsoft PKI Services from comment #26)
Response to Comment 21 - Chrome Root Program
Question 1
”In its response, Microsoft PKI Services stated:
Greater than 99% of impacted certificates are managed through vaults and can be centrally renewed. Can you please share:
- a) Approximately what percent of subscribers have adopted the automated distribution solution?
- b) Approximately what percent of affected certificates are represented by those subscribers?
- c) Whether Microsoft has triggered the automatic renewal and replacement for these subscribers’ certificates?”
a) Based on our analysis to date, which covers 50% of the impacted certificate population, we have confirmed 95% are auto distributed within 5 days of renewal. We will continue analyzing the remaining 50% and will provide an update once that is complete. Note that this is true for the population of the affected certificates, and based on how customer workloads for the subscriber services evolve, this mix could change in the future.
b) The subscribers analyzed so far represent approximately 50% of the affected certificate volume.
c) Microsoft has not triggered a rotation of the affected certificates. Of the affected certificates, ~98% have already been deleted, expired or renewed. >99% of the remaining certs will be automatically rotated before July 31st.
Q1: By July 31st what percentage of certificates that should have been revoked will Microsoft have revoked as per the plan? Please include certificate that have expired since the start of May in that total as they should have been in the revocation to begin with.
Q2: The sampling is based on 50% of the affected certificates and we're getting results that ~98% are not longer in use. What is the barrier to moving the remaining ~2% to a different intermediary to work around the perceived CRL issue?
Question 3
” In its response, Microsoft PKI Services stated:
Our goal is for our default validity to meet or exceed industry requirements. In addition, we are working with subscribers to request shorter lifetime certificates based on their scenarios.
Can you please share more? This response doesn’t offer actionable detail or directly answer the question posed. Asked differently, considering that 90% of affected certificates were determined not in use at the time of this incident, can that be interpreted to indicate the default validity should be less than six months given observed real-world use of these certificates?”Yes, based on our investigation, 75% of the impacted certificates could have had a 30 day lifetime based on the lifecycle of the underlying resource using the certificate. We plan to work with subscribers with scenarios like this to move them to 30 day certificates.
That is good to hear.
Q3: Based off of data available so far how many subscribers can be moved to short-lived certificates bypassing the need for revocation entirely?
Q4: Are there any plans in the near future to move these subscribers to short-lived certificates?
Comment 2
”Studying data disclosed to the CCADB, we do observe CA records trusted in Chrome that disclose a full and complete CRL, but NOT a partitioned array and whose corresponding size is larger than 10 MB. We’re not aware of specific issues related to these CAs, though we would be interested to learn if there are, in fact, specific issues.
- http://crl.quovadisglobal.com/hinicag2.crl (99.45 MB)
- http://certificates.godaddy.com/mastergodaddy2issuing.crl (39.39 MB)
- http://httpcrl.trust.telia.com/teliasoneramobileidcav2.crl (21.97 MB)
- https://www.accv.es/fileadmin/Archivos/certificados/accvca120_der.crl (17.45 MB)
- http://www.accv.es/fileadmin/Archivos/certificados/accvca120_der.crl (serves the same file as above)
- http://crl.sectigo.com/SectigoRSADomainValidationSecureServerCA.crl (10.77 MB)”
When setting the initial targets, we researched CRLs from several widely deployed CAs when evaluating an acceptable maximum size. Specifically, we identified CRLs mentioned in the following articles – How Big Are CRLs That Are Found In The Wild? | technotes.seastrom.com and An analysis of CRL sizes. At the time of analysis, the largest CRL we found in these articles was approximately 13MB. We chose a 10MB target as a conservative threshold, aligning with Windows TRP recommendations, in line with some of the largest CRLs we found in the links mentioned above, and leaving room to grow up to 13 MB.
During our recent revocation efforts, we have received an escalation from a Microsoft service regarding the CRL size being too large (5MB at the time). We will continue to follow windows TRP recommendations and monitor potential impacts closely.
Q5: Could Microsoft elaborate on the service that is being impacted by a 5MB CRL? As elaborated there are multiple CAs pushing well past that boundary, and Microsoft's own data says that 10MB is a conservative threshold.
Q6: Are there any known publicly-used services that would be impacted by a CRL going past 10MB, or this figure solely reliant on unsourced figure from an old document?
Question 5
” Looking at some of the Microsoft PKI services CRLs relevant to this incident (e.g., http://www.microsoft.com/pkiops/crl/Microsoft%20Azure%20RSA%20TLS%20Issuing%20CA%2007.crl and http://www.microsoft.com/pkiops/crl/Microsoft%20Azure%20RSA%20TLS%20Issuing%20CA%2008.crl ) the current size (~4KB) is significantly less than the stated 10 MB goal. We understand Microsoft is intending to gradually ramp-up revocations, but the timing of that ramp-up is unclear.”
Please see the attached weekly revocation plan which details out the ramp plan for how many certs we plan to revoke on a weekly basis.
(In reply to Microsoft PKI Services from comment #27)
Created attachment 9494207 [details]
Bug1965612_Microsoft PKI Service_Revocation Plan.xlsxRevocation Plan
Q7: Is there a public version of that revocation plan? The version that is attached does not seem to be intended for public usage.
| Assignee | ||
Comment 29•9 months ago
|
||
Revocation Plan CSV
| Assignee | ||
Comment 30•9 months ago
|
||
Weekly Status Update
We are actively working on all repair items associated with this incident. In addition, we have updated the evaluation criteria per suggestion in Comment 21 to better align with CCADB guidance. After further review, we updated the due date of the last action item.
Action Items
| Action Item | Kind | Root Cause(s) | Updated Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 11/15/2025 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 11/15/2025 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 9/30/2025 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 7/31/2025 | In Progress |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 9/30/2025 | In Progress |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 09/01/2025 | In Progress |
Related Incidents
Additionally, we mentioned in Comment 25 that we will include additional incidents in our “Related Incidents” section. Please see the updated section below:
| Bug | Date | Description |
|---|---|---|
| 1962829 | 2025-04-25 | Microsoft PKI Services: Policy document bug. Microsoft PKI Services introduced a typo error in CPS Version 3.2.4 while reformatting the document, incorrectly stating that keyEncipherment is not present in Subscriber certificates. This contradicts longstanding practice and affects still-active certificates tied to the superseded document. |
| 1890896 | 2024-04-10 | Entrust CPS policyQualifier error. This incident involved a typographical error in Entrust’s CPS that mistakenly added a policyQualifier (cpsURI) requirement to OV TLS certificates—resulting in over 6,000 misissued certificates. Although the issue stemmed from documentation rather than technical controls, the outdated or inaccurate CPS language led to non-compliance. |
| 1910805 | 2024-07-30 | DigiCert delayed revocation due to TRO. This incident involved DigiCert’s delayed revocation of certificates originally identified in Bug 1910322 (CNAME validation error). Due to a Temporary Restraining Order (TRO), revocation—which should have occurred within 24 hours under the Baseline Requirements—was delayed by five days. |
| 1715672 | 2021-06-09 | Let’s Encrypt certificate validity issue. This incident involved Let’s Encrypt (ISRG) issuing certificates valid for 90 days plus one second due to a CPS timestamp calculation issue, in violation of the CA/Browser Forum Baseline Requirements. ISRG opted not to revoke the certificates, as they determined revocation would not benefit the Web PKI. |
| Assignee | ||
Comment 31•9 months ago
|
||
Response to Comment 24 from Wayne
"Do we have a more detailed revocation plan yet? Currently we seem to be stalling until November when the final certificates will expire instead of being revoked as required.
If there is no intent for these certificate to ever be revoked, why are they listed as 'planned for revocation'?
I am dismayed at the attention this incident is receiving and the lack of pro-activeness. We have regular reports late Friday, and no sign that this is being treated with any severity internally. If we are to learn of the plan through weekly questioning then please advise us in advance so the questions can be more thoroughly worded."
We acknowledge the concerns raised and want to clarify that Microsoft remains fully committed to revoking as many of the affected certificates as we can while managing the CRL size constraints described in our full incident report.
Earlier this week, we published an updated revocation plan with scheduled batches through November. Revocations began on May 28, 2025, and certificates marked as “planned for revocation” are actively queued for upcoming batches. We would also like to acknowledge your asks related to certs that were already expired before we began revocations, and will provide that in the next update.
We recognize that our responses have been largely following the 7-day response window due to the number of bugs we are concurrently managing and hope to shorten this to 3 days in the future.
We appreciate the feedback and will continue improving the clarity of our weekly updates to provide better visibility into progress and planning.
| Assignee | ||
Comment 32•9 months ago
|
||
Revocation Delay Status Update
- the number of certificates that have been revoked:
400,000 - the number of certificates that have not yet been revoked:
64,687,168 - the number of certificates planned for revocation that have expired:
2,737,202 - an estimate for when all remaining revocations will be completed:
we will continue to revoke certificates in batches until 11/15/2025 as mentioned in our FIR
Comment 33•9 months ago
•
|
||
(In reply to Microsoft PKI Services from comment #29)
Created attachment 9494215 [details]
Bug1965612_Microsoft PKI Service_Revocation Plan.csvRevocation Plan CSV
There is a rather concerning line in this plan that requires far more information:
*Note: There is a company wide change advisory that may impact our ability to revoke this week. We will provide further details once we have that clarity.
This is regarding a revocation period of 2025-10-27 to 2025-11-02.
Q1: Are we to interpret that as Microsoft PKI not being able to handle revocation for a week due to an org-wide freeze? More details would be appreciated, even if absolute clarity is not available yet.
Q2: Has this happened before?
Q3: If this has happened before, where was the inability to handle revocation disclosed in any of your prior audits?
Q4: Given this is the 3rd-last 'revocation week', what exactly is stopping an increase in revocations up to this date to make it irrelevant?
The action items note:
Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025
Q5: Is this plan now ready, and can we see it?
The current plan is showing 15 million certificates will be eventually revoked by November, while 56 million will be left to expire.
Q6: Can Microsoft PKI give examples of any prior incident where this has occurred, nevermind was considered acceptable practice?
Q7: Will Microsoft PKI be advising the Microsoft Root Store that this is to be considered the high standard to be held against all other CAs they govern?
Q8: Can Microsoft PKI explain why other Root Programs should take this plan in good faith, in spite of CRL evidence to the contrary and no change in plans appearing to date?
Comment 34•9 months ago
|
||
Thank you for surfacing that out of the spreadsheet and into the discussion, Wayne. I agree that it needs elaboration.
Question: in the event of a “change advisory”, are there other duties of a CA, beyond revocation for this incident, that Microsoft will not be performing? Specifically, will other revocations be performed in keeping with the BRs, and will Microsoft continue to issue certificates during that time?
| Assignee | ||
Comment 35•9 months ago
|
||
Response to Comment 28 – Wayne
Question 1
“Q1: By July 31st what percentage of certificates that should have been revoked will Microsoft have revoked as per the plan? Please include certificate that have expired since the start of May in that total as they should have been in the revocation to begin with.”
Of the affected certs that would otherwise have expired by August 3rd, we would have revoked 16% (7% of the overall population of affected certificates).
Question 2
“Q2: The sampling is based on 50% of the affected certificates and we're getting results that ~98% are not longer in use. What is the barrier to moving the remaining ~2% to a different intermediary to work around the perceived CRL issue?”
As previously mentioned in our Full Incident Report – Lessons Learned, we do not have ICAs available to move our subscribers to. As a repair action, we are working on standing up Warm Stand by ICAs for future use in such circumstances.
Question 3
”Question 3"
” In its response, Microsoft PKI Services stated:
Our goal is for our default validity to meet or exceed industry requirements. In addition, we are working with subscribers to request shorter lifetime certificates based on their scenarios.
Can you please share more? This response doesn’t offer actionable detail or directly answer the question posed. Asked differently, considering that 90% of affected certificates were determined not in use at the time of this incident, can that be interpreted to indicate the default validity should be less than six months given observed real-world use of these certificates?”
“Yes, based on our investigation, 75% of the impacted certificates could have had a 30 day lifetime based on the lifecycle of the underlying resource using the certificate. We plan to work with subscribers with scenarios like this to move them to 30 day certificates.
That is good to hear.”
“Q3: Based off of data available so far how many subscribers can be moved to short-lived certificates bypassing the need for revocation entirely?”
To obviate the need for revocation, we will need to move the subscribers to 7-day (or less) validity certificates, for which we do not yet have a defined plan. As shared previously, we are working on identifying and moving workloads to 30-day certificates in the interim.
Question 4
"Q4: Are there any plans in the near future to move these subscribers to short-lived certificates?"
We currently do not have short term plans to move workloads to 7-day (or less) validity certs. We are working on moving eligible workloads to 30 days certs and then progressively shorter.
Question 5
”Studying data disclosed to the CCADB, we do observe CA records trusted in Chrome that disclose a full and complete CRL, but NOT a partitioned array and whose corresponding size is larger than 10 MB. We’re not aware of specific issues related to these CAs, though we would be interested to learn if there are, in fact, specific issues.
- http://crl.quovadisglobal.com/hinicag2.crl (99.45 MB)
- http://certificates.godaddy.com/mastergodaddy2issuing.crl (39.39 MB)
- http://httpcrl.trust.telia.com/teliasoneramobileidcav2.crl (21.97 MB)
- https://www.accv.es/fileadmin/Archivos/certificados/accvca120_der.crl (17.45 MB)
- http://www.accv.es/fileadmin/Archivos/certificados/accvca120_der.crl (serves the same file as above)
- http://crl.sectigo.com/SectigoRSADomainValidationSecureServerCA.crl (10.77 MB)
When setting the initial targets, we researched CRLs from several widely deployed CAs when evaluating an acceptable maximum size. Specifically, we identified CRLs mentioned in the following articles – How Big Are CRLs That Are Found In The Wild? | technotes.seastrom.com and An analysis of CRL sizes. At the time of analysis, the largest CRL we found in these articles was approximately 13MB. We chose a 10MB target as a conservative threshold, aligning with Windows TRP recommendations, in line with some of the largest CRLs we found in the links mentioned above, and leaving room to grow up to 13 MB.
During our recent revocation efforts, we have received an escalation from a Microsoft service regarding the CRL size being too large (5MB at the time). We will continue to follow windows TRP recommendations and monitor potential impacts closely.
Q5: Could Microsoft elaborate on the service that is being impacted by a 5MB CRL? As elaborated there are multiple CAs pushing well past that boundary, and Microsoft's own data says that 10MB is a conservative threshold.
One Microsoft service which has clients on memory constrained Android devices has reported end user failures when processing a ~5MB CRL. This incident confirmed the need to carefully manage CRL sizes to avoid end-user impact. These observations do not alter our revocation plans. Rather, they inform our batch sizing to ensure revocation in conformance with our plan while maintaining broad client compatibility.
Question 6
"Q6: Are there any known publicly-used services that would be impacted by a CRL going past 10MB, or this figure solely reliant on unsourced figure from an old document?"
Please see the response from Question 5
Question 7
(In reply to Microsoft PKI Services from comment #27)
Created attachment 9494207 [details]
Bug1965612_Microsoft PKI Service_Revocation Plan.xlsx
Revocation Plan
Q7: Is there a public version of that revocation plan? The version that is attached does not seem to be intended for public usage.
Thank you for pointing this out. The revocation plan was corrected and republished that same day.
| Assignee | ||
Comment 36•9 months ago
|
||
Response to Comment 33 - Wayne
Question 1
"Q1: Are we to interpret that as Microsoft PKI not being able to handle revocation for a week due to an org-wide freeze? More details would be appreciated, even if absolute clarity is not available yet."
We understand the concern and appreciate the opportunity to clarify. The comment in the plan — "may impact our ability to revoke this week" — was not intended to indicate that revocation would be paused or unavailable. Revocation remains a critical function, and our systems and teams are equipped to execute it throughout the advisory period.
The company-wide change advisory referenced is part of our internal change management process. These advisories introduce additional oversight to ensure that any changes made during sensitive operational windows are executed safely and deliberately. The note was included out of an abundance of caution while we evaluate the optimal path to proceed without introducing risk to adjacent systems (which may include revoking the targeted certs in the week prior).
Question 2
"Q2: Has this happened before?"
Company-wide change advisories are regularly scheduled events within Microsoft’s change management process. These advisories introduce additional oversight but do not prevent critical operations such as certificate revocation. Revocations have always been permitted during these periods. This is not a new or exceptional situation, and we have not experienced an advisory that has blocked or delayed our ability to revoke certificates.
Question 3
"Q3: If this has happened before, where was the inability to handle revocation disclosed in any of your prior audits?"
There have been no instances where a change advisory prevented or delayed our ability to perform required revocations.
Question 4
"Q4: Given this is the 3rd-last 'revocation week', what exactly is stopping an increase in revocations up to this date to make it irrelevant?"
Thanks for the suggestion. We will consider it as an option.
Question 5
"Q5: Is this plan now ready, and can we see it?"
Outlined below is a high level plan for setting up Warm Standby Certificate Authorities (CAs) to ensure continuity and rapid response in case of CA revocation. The plan includes the creation, cross-signing, distribution, readiness timeline, and usage policy for the standby CAs.
- CA Creation: Create warm stand by RSA and ECC Certificate Authorities (CAs) from the Microsoft G1 root that meet all the expected baseline requirements. CCADB will be updated with the CA entries.(Mid July)
- Cross-Signing: Obtain cross-signatures for the newly created CAs from DigiCert, following the same process as used for existing CAs. CCADB will be updated after cross signing (Early August)
- Roll-out CRL partitioning to CAs:Issue a small batch of certs. CRL partitioning can be verified using CTLog entries for small batch of certs. (September)
- Distribution to MS Fleet:Distribute the CAs through the internal distribution pipeline to ensure availability and integration within the existing Subscriber infrastructure. (Mid August – Early October)
- Microsoft Fleet Ready to Consume Certificates:Ensure that the CAs are fully ready in production to start issuing publicly trusted TLS certificates by 10/15/2025. We plan to adopt the standby practice for all future iterations of CA creation.
- Usage Policy:These CAs are designated solely for standby purposes. They will only be activated in scenarios where existing CAs need to be revoked.
Question 6
"Q6: Can Microsoft PKI give examples of any prior incident where this has occurred, nevermind was considered acceptable practice?"
As previously noted in this bug, we acknowledge that our response plan deviates from the Baseline Requirements, and we are not presenting it as acceptable precedent.
As outlined in the full incident report, having CRL partitioning in place and/or having ready warm standby CAs would have allowed us to meet a more aggressive timeline for revocations, and both of those are part of our repair actions.
Question 7
"Q7: Will Microsoft PKI be advising the Microsoft Root Store that this is to be considered the high standard to be held against all other CAs they govern?"
No. We are not presenting our current revocation approach as a standard or as a benchmark for others. Our focus is on remediating the issue as responsibly and transparently as possible, not redefining expectations for root programs.
The Microsoft Trusted Root Program is operated independently from Microsoft PKI Services. Like other Root Programs, it sets its own requirements and enforcement expectations. We continue to support consistent application of Root Program policies and acknowledge that this incident highlights areas where our internal controls and infrastructure must improve.
Question 8
"Q8: Can Microsoft PKI explain why other Root Programs should take this plan in good faith, in spite of CRL evidence to the contrary and no change in plans appearing to date?"
We started at a lower number of certificates at the start of the revocations, so CRL sizes remained small. Since then, our revocations have been progressively ramping up.
| Assignee | ||
Comment 37•9 months ago
|
||
Response to Comment 34 - Mike Shaver
"Thank you for surfacing that out of the spreadsheet and into the discussion, Wayne. I agree that it needs elaboration.
Question: in the event of a “change advisory”, are there other duties of a CA, beyond revocation for this incident, that Microsoft will not be performing? Specifically, will other revocations be performed in keeping with the BRs, and will Microsoft continue to issue certificates during that time?"
We appreciate the follow-up. As noted in our response to Comment 33, the change advisory introduces additional oversight—not a freeze—and does not prevent revocation activity related to this incident.
To clarify further: all other CA duties, including unrelated revocations and certificate issuance, will continue during this period in accordance with the Baseline Requirements. The advisory does not limit our ability to meet our obligations as a publicly trusted CA.
| Assignee | ||
Comment 38•9 months ago
|
||
Weekly Update
We are actively progressing through all repair items identified in the incident report. All action items are currently in progress. We remain on track to meet the expected due dates outlined in the full incident report.
| Assignee | ||
Comment 39•9 months ago
|
||
Revocation Delay Status Update
- the number of certificates that have been revoked:
- 399,350
- the number of certificates that have not yet been revoked:
- 64,687,168
- the number of certificates planned for revocation that have expired:
- 2,737,852
- Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 40•9 months ago
|
||
Weekly Update
We are actively progressing through all repair items identified in the incident report. All action items are currently in progress. We remain on track to meet the expected due dates outlined in the full incident report.
| Assignee | ||
Comment 41•9 months ago
|
||
Revocation Delay Status Update
- the number of certificates that have been revoked this week:
- 600,960
- the number of certificates that have not yet been revoked:
- 58,197,100
- the number of certificates planned for revocation that have expired:
- 2,248,735
- Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
Comment 42•9 months ago
|
||
This is the fifth status update and I'm still unsure on the methodology or math involved.
Could Microsoft PKI please talk us through how they arrived at these figures? Comment 32 and 39 are especially odd, but overall I don't see the math quite adding up.
Comment 43•8 months ago
|
||
We have a few follow-up questions that will help us manage CRLite effectively in light of this incident:
1. Which issuing CAs and respective quantities were involved?
Please provide the names of the issuing CAs and quantities of affected certificates for those CAs.
2. For each of those issuing CAs, what percentage of certificates will be revoked vs. not revoked?
This information will help us assess CRL size and plan for distribution of revocation information using CRLite.
3. Has Microsoft considered revoking any of the issuing CAs involved?
If so, this could simplify our response because revocation at the ICA level would allow us to manage this incident via OneCRL, avoiding the scalability challenges of enumerating revocation information for millions of certificates through CRLite.
Thanks.
| Assignee | ||
Comment 44•8 months ago
|
||
Weekly Update
We are actively progressing through all repair items identified in the incident report. All action items are currently in progress. We remain on track to meet the expected due dates outlined in the full incident report.
| Assignee | ||
Comment 45•8 months ago
|
||
Response to Comment 42 - Wayne
Thanks, Wayne — we appreciate your continued engagement and the opportunity to clarify.
We acknowledge two key issues in our prior reporting which could be contributing to lack of clarity on the math:
-
(1) Cumulative vs. Weekly Totals: Our previous updates reported weekly figures, which may have caused confusion. Moving forward, we will report cumulative totals to provide clearer visibility.
-
(2) Duplicate Data on 6/20/2025: The numbers shared on that date were inadvertently duplicated.
The corrected figures for 6/20/2025 are:
- Revoked: 399,350
- Total: 61,245,835
- Expired Planned: 3,041,333
- Remaining Active: 57,805,152
As of this week (7/3/2025), our cumulative figures are:
- Total certificates revoked (planned to date): 2,558,954 (2,558,644)
- Remaining active certificates (total affected): 55,020,856 (72, 070,777)
- Total certificates expired and not revoked: 14,490,967
| Assignee | ||
Comment 46•8 months ago
|
||
Revocation Delay Status Update
As mentioned in Comment 45 here are the revocation delay status updates:
- Total certificates revoked (planned to date): 2,558,954 (2,558,644)
- Remaining active certificates (total affected): 55,020,856 (72, 070,777)
- Total certificates expired and not revoked: 14,490,967
| Assignee | ||
Comment 47•8 months ago
|
||
Response to Comment 43 - Ben
Question 1:
"Which issuing CAs and respective quantities were involved?
Please provide the names of the issuing CAs and quantities of affected certificates for those CAs."
| Issuing and Intermediate CAs | Impacted Certs |
|---|---|
| Microsoft Azure RSA TLS Issuing CA 04 | 26,342,303 |
| Microsoft Azure RSA TLS Issuing CA 07 | 24,014,300 |
| Microsoft Azure RSA TLS Issuing CA 03 | 26,328,523 |
| Microsoft Azure RSA TLS Issuing CA 08 | 23,637,853 |
Question 2:
" For each of those issuing CAs, what percentage of certificates will be revoked vs. not revoked?"
This information will help us assess CRL size and plan for distribution of revocation information using CRLite.
These number(s) represent revocations starting May 28th until November 15th as projected in our revocation plan:
| Issuing and Intermediate CAs | % Revoked | % Not Revoked |
|---|---|---|
| Microsoft Azure RSA TLS Issuing CA 04 | 16% | 84% |
| Microsoft Azure RSA TLS Issuing CA 07 | 15% | 85% |
| Microsoft Azure RSA TLS Issuing CA 03 | 16% | 84% |
| Microsoft Azure RSA TLS Issuing CA 08 | 15% | 85% |
Question 3:
"If so, this could simplify our response because revocation at the ICA level would allow us to manage this incident via OneCRL, avoiding the scalability challenges of enumerating revocation information for millions of certificates through CRLite."
Yes, we have considered revoking the ICAs. As mentioned in our Full Incident Report – Lessons Learned, ultimately we do not have a warm standby cross-signed ICA to move subscribers to.
| Assignee | ||
Comment 48•8 months ago
|
||
Weekly Status Update
We would like to request a change to the cadence of our action item updates for this bug. Several of the action items currently tracked are not due for several months, and as such, we propose to provide our next update on Friday, August 1st, unless action items are completed sooner.
Please note that this change would only apply to the action item updates. We will continue to provide weekly updates on our revocation progress as usual.
| Action Item | Kind | Root Cause(s) | Updated Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 11/15/2025 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 11/15/2025 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 9/30/2025 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 7/31/2025 | In Progress |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 9/30/2025 | In Progress |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 09/01/2025 | In Progress |
Let us know if there are any concerns with this approach. Thank you.
| Assignee | ||
Comment 49•8 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 3,358,954 (3,358,644)
-
Remaining active certificates (total affected):
- 51,822,781 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 16,517,878
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
Comment 50•8 months ago
|
||
We’d like to improve our understanding of Microsoft PKI Services' ability to stand up new Issuing CAs given comments made in this Incident Report.
Background:
- This Incident Report was opened on May 09, 2025.
- The high-level plan for setting up "Warm Standby" CAs provided in Comment 36 initially described a project completion date of October 15, 2025. This was subsequently updated in Comment 48 to September 30, 2025.
- Given these dates, it appears it will take Microsoft PKI Services approximately 145 days (from the opening of the report to September 30, 2025) to stand up a fleet of new issuing CAs that are considered usable.
Questions:
(1) Is Microsoft PKI Services satisfied with the approximately 145-day timeline for standing up a fleet of new issuing CAs, as outlined in this bug?
-
(a) If so, how does this align with Microsoft PKI Services' internal targets for operational readiness and community expectations for rapid response in a crisis or for routine CA rotation?
-
(b) If not, what specific, measurable plans are being implemented to significantly reduce this timeline for future deployments, and what are the updated target completion dates for these improvements?
(2) The inability to immediately revoke the ICAs subject of this Incident Report (as discussed in Comments 9 and 47) due to the absence of cross-signed "Warm Standby" CAs highlights a critical dependency.
-
(a) Can Microsoft PKI Services share more detail on the specific barriers or dependencies that caused this extended ICA deployment timeline?
-
(b) What concrete steps are being taken to ensure a significantly more rapid deployment capability in the future, should similar circumstances repeat?
-
(c) What is the new expected timeline for standing up cross-signed "Warm Standby" CAs under urgent conditions, and what factors will influence this?
(3) Can you help us quantify the specific risk(s) of moving forward with the use of the new ICAs prior to cross-certification given Microsoft PKI Services’ independent trust status?
(4) Can you explain how Microsoft PKI Services weighed the trade-offs of the above described risks (e.g., impact on subscribers if revocation of the (a) leafs or (b) corresponding ICAs subject of this discussion occurred without cross-signed warm standby CAs) against instead more quickly aligning with ecosystem expectations via revoking ICA certificates, as required by Baseline Requirements 4.9.1.1?
(5) Beyond addressing the current incident, what are Microsoft PKI Services' proactive, ongoing plans for routinely standing up and rotating new issuing CAs to practice continuous improvement in its operational practices while also enhancing cryptographic agility?
(6) This report, when considered with Bug 1974592 raises additional concern regarding the operational rigor and maturity of Microsoft PKI Services' existing ICA creation process. How does Microsoft PKI Services plan to address the combined implications of both incident reports regarding CA agility and operational robustness? If discussion is better scoped to 1974592, that works for us.
| Assignee | ||
Comment 51•8 months ago
|
||
Response to Comment 50 - Chrome Root Program
Question 1 a &b:
"(1) Is Microsoft PKI Services satisfied with the approximately 145-day timeline for standing up a fleet of new issuing CAs, as outlined in this bug?
• (a) If so, how does this align with Microsoft PKI Services' internal targets for operational readiness and community expectations for rapid response in a crisis or for routine CA rotation?
• (b) If not, what specific, measurable plans are being implemented to significantly reduce this timeline for future deployments, and what are the updated target completion dates for these improvements?"
Thank you for raising this important point. Microsoft PKI Services is not satisfied with the current ~145-day timeline to stand up a fleet of new issuing CAs. The primary driver for this is lack of available fit for purpose CAs. Fit for purpose here also includes ability to support trust on legacy devices which requires us to cross sign these CAs. This is adding delays to our current plan. Going forward, our plan is to eliminate these delays by always having warm stand-by CAs available which we can quickly switch our subscribers to in case an incident requires revocation of the active ICAs.
Question 2 a-c:
"(2) The inability to immediately revoke the ICAs subject of this Incident Report (as discussed in Comments 9 and 47) due to the absence of cross-signed "Warm Standby" CAs highlights a critical dependency.
• (a) Can Microsoft PKI Services share more detail on the specific barriers or dependencies that caused this extended ICA deployment timeline?
• (b) What concrete steps are being taken to ensure a significantly more rapid deployment capability in the future, should similar circumstances repeat?
• (c) What is the new expected timeline for standing up cross-signed "Warm Standby" CAs under urgent conditions, and what factors will influence this?"
Due to a need to support clients on legacy devices which do not trust the Microsoft PKI CAs, we currently rely on ICA-level cross-signing to establish that trust. In this case, the absence of pre-established, cross-signed warm standby ICAs prevented immediate ICA revocation. Further the need to do new cross-signing adds to the time required to standing up the new ICAs. While we are working on getting new ICAs from our existing root cross signed, for our next generation root, we are shifting strategy to cross sign at the root level. Once we have migrated our subscribers to this new root (target Q2 CY26), this approach will allow us to eliminate the time required for cross-signing when standing up new ICAs in emergency situations. Please note that our migration plan already includes creation of warm standby ICAs for this new root, so as we migrate subscribers to ICAs from this new root, there will always be warm standby ICAs available.
In response to (c), as stated above, our plan is to eliminate the need for cross-signed ICAs in the future. Once we have completed migration of our subscribers to our new cross-signed root, we will no longer need cross-signing at the ICA level.
Question 3:
"(3) Can you help us quantify the specific risk(s) of moving forward with the use of the new ICAs prior to cross-certification given Microsoft PKI Services’ independent trust status?"
We estimate that 4% of traffic to subscriber services originates from legacy devices which do not trust our current root. This necessitates the need to issue certificates from a cross-signed CA at this time.
Question 4:
"(4) Can you explain how Microsoft PKI Services weighed the trade-offs of the above described risks (e.g., impact on subscribers if revocation of the (a) leafs or (b) corresponding ICAs subject of this discussion occurred without cross-signed warm standby CAs) against instead more quickly aligning with ecosystem expectations via revoking ICA certificates, as required by Baseline Requirements 4.9.1.1?"
We considered the following factors when evaluating ICA revocation impacts:
- Lack of Alternatives: At the time of the incident, we did not have any other ICAs available to transition subscribers to.
- Scope of Non-Compliance: The ICAs themselves were not universally non-compliant—only certificates issued during a specific window were affected. Revoking the ICAs would have invalidated both compliant and non-compliant certificates, unnecessarily disrupting active, valid use cases.
For leaf level revocation, revoking tens of millions of leaf certificates would have resulted in CRLs exceeding 600MB, which many clients cannot process—leading to revocation checking failures and degraded reliability across the ecosystem. Ultimately, we chose a phased leaf revocation plan to contain the impact to affected certificates while preserving service continuity and working toward service and operational improvements which will enable faster, standards-aligned responses in the future.
Question 5:
"(5) Beyond addressing the current incident, what are Microsoft PKI Services' proactive, ongoing plans for routinely standing up and rotating new issuing CAs to practice continuous improvement in its operational practices while also enhancing cryptographic agility?"
We are adopting a continuous readiness model where warm standby ICAs are always maintained and replaced immediately upon activation. This ensures we are regularly exercising the full lifecycle, from creation to deployment, rather than only reacting to incidents or relying on long ICA validity periods.
Question 6:
"(6) This report, when considered with Bug 1974592 raises additional concern regarding the operational rigor and maturity of Microsoft PKI Services' existing ICA creation process. How does Microsoft PKI Services plan to address the combined implications of both incident reports regarding CA agility and operational robustness? If discussion is better scoped to 1974592, that works for us."
The issue reported in with Bug 1974592 was an implementation bug found in a feature we extended to CA creations just prior to the creation of the impacted ICAs, and we have identified the associated repair actions in that bug. Further discussion on those repair actions is likely best suited for that bug.
| Assignee | ||
Comment 52•8 months ago
|
||
Weekly Status update:
We would like to follow up our earlier request regarding action item update cadence: the remaining action items are mostly targeted for September and beyond, and as such, we propose to provide our next action item update on Friday, August 1, unless any are completed sooner.
This change would apply only to action item updates. We will continue providing weekly updates on revocation progress as usual.
Please let us know if this cadence is acceptable.
| Assignee | ||
Comment 53•8 months ago
|
||
Revocation Delay Status Update
Total certificates revoked (planned to date):
- 4,108,951 (4,158,644)
Remaining active certificates (total affected):
- 48,251,935 (72,070,777)
Total certificates expired and not revoked (to date):
- 19,288,724
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
Comment 54•8 months ago
|
||
(In reply to Microsoft PKI Services from comment #52)
Weekly Status update:
We would like to follow up our earlier request regarding action item update cadence: the remaining action items are mostly targeted for September and beyond, and as such, we propose to provide our next action item update on Friday, August 1, unless any are completed sooner.
Mozilla has concerns about the sufficiency of the current action items and intends to propose modifications and/or additions—potentially including topics such as accelerating the timeframe by which Microsoft will shorten certificate lifetimes, improved details to Microsoft's mass revocation planning, implementation of sharded/partitioned CRLs, and redundant use of diversified issuing CAs.
Comment 55•8 months ago
|
||
I think Ben is correct. This progress seems slow, like we are supposed to forget and let the issue continue.
Delayed revocations were a big reason other CAs were distrusted. Microsoft are of course too-big-to-fail so Mozilla (or Google or Apple or Microsoft on the 'trust' storing department) will not dis-trust them. Really they should, but it is obvious they will not.
Still Microsoft could show some fake concern and revoke more fast, or like Ben suggests - move now to short life certs. Microsoft control all the keys here, they are all 'managed' by Microsoft - every single one is within Microsoft or part of Microsoft.
Why can Microsoft not issuing 47-day certificates now?
Comment 56•8 months ago
|
||
(In response to Comment 51...)
General comments:
-
We’re having a hard time identifying direct responses to questions asked in Comment 50. We’d encourage Microsoft PKI Services to more directly address comments and questions from the community going forward.
-
We share the concerns communicated in comments 54 and 55 regarding the delivery schedule of Microsoft PKI Services’ Action Items.
Follow-up questions:
(Q1) The response to “Question 2 a-c” emphasizes Microsoft PKI Services’ plan to change its existing cross-certification relationship with DigiCert from cross-certifying ICAs to instead a next generation root. This change is described as targeted for completion in Q2 2026. Can Microsoft please provide a more specific and measurable plan, to include key milestones, assumptions, and dependencies that if not met, would shift this timeline to a later date? Particularly of interest is when this new root(s) will be established, and at what point there will no longer be a dependency on the existing hierarchies for issuance and validation.
(Q2) How should root store operators and members of the public consider Microsoft PKI Services’ response to this incident as an indicator for how it intends to reliably uphold community expectations going forward?
(Q3) Can Microsoft PKI Services directly confirm that in the absence of the above described root-level cross-certificate, all subsequently created ICAs will have the same risk as those affected by this incident? Said differently, can Microsoft PKI Services acknowledge that the immediate action to stand-up a fleet of cross-certified “warm stand-by” CAs may not reliably meet the intended goal if for some reason it’s later identified that those CAs are flawed in some way?
(Q4) Can Microsoft PKI Services explain why cross-certifying the existing, in-use Microsoft roots was not considered a simpler and more robust solution than continuing to cross-sign leaf-issuing intermediates?
(Q5) In response to Question 3, Microsoft PKI Services stated: “We estimate that 4% of traffic to subscriber services originates from legacy devices which do not trust our current root. This necessitates the need to issue certificates from a cross-signed CA at this time.” Can you please share how you determined 4% of traffic originates from devices that do not trust the current root(s)?
(Q6) If not the 4% described above, can Microsoft PKI Services share the threshold it would otherwise consider acceptable to move forward without the cross-certificate(s)?
(Q7) Can Microsoft PKI Services explain why the delayed revocation of the CA certificates responsible for issuing the misissued TLS server authentication certificates (subject of Bug 1962829) until new ICAs are cross-signed should not be interpreted as Microsoft PKI Services prioritizing the reduction of subscriber impact over its obligations to the TLS Baseline Requirements?
(Q8) The response to Comment 50 Question 5 does not provide sufficient detail to help us understand, in practical terms, how Microsoft PKI Services is planning to establish new ICAs, or how it plans to rotate issuance to new CAs once established. Can you please provide more specificity and directly address the question?
| Assignee | ||
Comment 57•8 months ago
|
||
Weekly Status Update
We are actively making progress on the action items identified in the full incident report. No changes to status at this time.
| Assignee | ||
Comment 58•8 months ago
|
||
Response to Comment 55 - JR Moir
In relation to reducing certificate lifetimes, MS PKI Services currently supports 1 month certificates. But the certificate validity period is chosen by the subscribers based on their cadence and constraints. Our current plan for enforcing shorter certificate lifetimes follows the timeline outlined in Ballot SC-081v3.
| Assignee | ||
Comment 59•8 months ago
|
||
Response to Comment 54 - Ben Wilson
We will focus on providing more details for these topics. We would be happy to consider any repair actions that you would like to propose. In the meantime, we will continue to provide the action item updates on a weekly basis.
| Assignee | ||
Comment 60•8 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
4,676,112 (4,958,644) -
Remaining active certificates (total affected):
44,451,262 ( 72,070,777) -
Total certificates expired and not revoked (to date):
22,289,397 -
Estimate for remaining revocations:
We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 61•8 months ago
|
||
Response to Comment 56 - Chrome Root Program
Question 1
"(Q1) The response to “Question 2 a-c” emphasizes Microsoft PKI Services’ plan to change its existing cross-certification relationship with DigiCert from cross-certifying ICAs to instead a next generation root. This change is described as targeted for completion in Q2 2026. Can Microsoft please provide a more specific and measurable plan, to include key milestones, assumptions, and dependencies that if not met, would shift this timeline to a later date? Particularly of interest is when this new root(s) will be established, and at what point there will no longer be a dependency on the existing hierarchies for issuance and validation."
- We have already created the cross-signed root (published in CCADB), and are in the process of creating and deploying ICAs from this new root. This is expected to complete in early August.
- We have a dependency on CRL partitioning to be available on these ICAs before we make them available for enrollment. CRL partitioning is targeted to be available by late October. At which point the ICAs will be available for enrollment to our subscribers. Though technically we do not have to wait for CRL sharding to start enrollment, issuing certificates from the new CAs without it will make them vulnerable to the same issues as the existing CAs.
- Subscriber migration is expected to complete by April 2026, at which point issuance from existing ICAs will cease.
Question 2
"(Q2) How should root store operators and members of the public consider Microsoft PKI Services’ response to this incident as an indicator for how it intends to reliably uphold community expectations going forward?"
We discovered gaps in our readiness to deal with a revocation event at this scale. Based on these gaps, we have identified action items to address those gaps (CRL partitioning, stand-by CAs, eliminating long lead time for new ICAs creation by eliminating need for cross-signing the ICAs, and mass revocation playbook as required by the Mozilla root program requirements). Microsoft remains committed to uphold the CAB/F and TRP requirements.
Question 3
"(Q3) Can Microsoft PKI Services directly confirm that in the absence of the above described root-level cross-certificate, all subsequently created ICAs will have the same risk as those affected by this incident? Said differently, can Microsoft PKI Services acknowledge that the immediate action to stand-up a fleet of cross-certified “warm stand-by” CAs may not reliably meet the intended goal if for some reason it’s later identified that those CAs are flawed in some way?"
The major limiting factor in not being able to revoke in a timely manner was lack of CRL partitioning on the existing CAs. The Warm Stand-bys are planned to have CRL partitioning. So will not suffer from the same issues.
Further, our plan is stop the use of cross-signed ICAs and move to issuance from ICAs off the newly cross signed G2 root. Once the ICAs from this new root are available for enrollment, in case we discover issues in the future with the current or the newly cross-signed (warm stand-by) ICAs from the G1 root we will accelerate migration of the workload to the ICAs from the cross-signed G2 root.
Question 4
"(Q4) Can Microsoft PKI Services explain why cross-certifying the existing, in-use Microsoft roots was not considered a simpler and more robust solution than continuing to cross-sign leaf-issuing intermediates?"
Even prior to this incident, our existing plan was to deprecate issuance from the cross signed G1 ICAs (since those are expiring in August 2026) and replace them with the ICAs off the G2 CA. The plan for cross-signing of the G2 CA was already in flight and we chose to rely on that plan as the primary path. That said, the idea of cross signing the G1 Microsoft root has merit and we will consider it for future readiness.
Question 5
"(Q5) In response to Question 3, Microsoft PKI Services stated: “We estimate that 4% of traffic to subscriber services originates from legacy devices which do not trust our current root. This necessitates the need to issue certificates from a cross-signed CA at this time.” Can you please share how you determined 4% of traffic originates from devices that do not trust the current root(s)?"
The 4% estimate is based analysis of aggregated, non-identifying telemetry data for major Microsoft subscriber services over multi-week periods. We analyzed browser and platform trust data to determine which clients trust the Microsoft Gen1 root hierarchy.
Our methodology included:
- Identifying the earliest versions of major platforms (Windows, macOS, iOS, Android, Firefox, Chrome, Edge) that trust the Gen1 roots.
- Mapping browser traffic to these trust anchors using user-agent strings and platform metadata.
- Categorizing traffic from clients that either do not trust the Gen1 roots or do not disclose trust anchor information.
Question 6
"(Q6) If not the 4% described above, can Microsoft PKI Services share the threshold it would otherwise consider acceptable to move forward without the cross-certificate(s)?"
The decisions related to legacy device support for Microsoft services are business decisions which are owned by the respective services. That said, our plan to cross-sign our G2 CA at the root level, and cease issuance from the G1 cross-signed ICAs will obviate the need for cross signing any additional ICAs in the future.
Question 7
"(Q7) Can Microsoft PKI Services explain why the delayed revocation of the CA certificates responsible for issuing the misissued TLS server authentication certificates (subject of Bug 1962829) until new ICAs are cross-signed should not be interpreted as Microsoft PKI Services prioritizing the reduction of subscriber impact over its obligations to the TLS Baseline Requirements?"
Revocation of the ICAs was considered as an option, but revoking the ICAs would have impacted active subscriber certificates which were not mis-issued (with no alternate available for them).
Question 8
"(Q8)The response to Comment 50 Question 5 does not provide sufficient detail to help us understand, in practical terms, how Microsoft PKI Services is planning to establish new ICAs, or how it plans to rotate issuance to new CAs once established. Can you please provide more specificity and directly address the question?"
In relation to the ICAs from the G2 CAs – we are creating double the number of required ICAs. Where half of them will be used for issuance, and the other half will not (those will be used as Warm Stand-bys for the G2 ICAs). Migrating subscribers to the new CAs will be done in a staged fashion. In case there is an incident requiring rotation and revocation, we can run emergency campaigns with all of our subscribers to complete such activity.
We are interpreting this question as “what is our capability to migrate issuance to new CAs for all our subscribers”. If that is not the intent of the question, please clarify.
Comment 62•8 months ago
|
||
After reading most comments, I am interesting in technical details of the revocations.
According to http://www.microsoft.com/pkiops/crl/Microsoft%20Azure%20RSA%20TLS%20Issuing%20CA%2007.crl, the revocation entry for one certificate needs about 50 bytes. So a CRL of 10 MB contains only 200,000 entries. As mentioned in the file "Bug1965612_Microsoft PKI Service_Revocation Plan.csv", each week 800,000 certificates will be revoked. And since total 4 ICAs are involved, in best case, each ICA has 200,000 certificates to be revoked per week.
For better understanding, let's consider the revocation of certificates issued by only one ICA, namely 200,000 certificates per week. And assuming 20% of the revoked certificates will expire in the next CRL issuing period. Then we have the following data:
- Week 1: 200,000 entries in CRL (10 MB)
- Week 2: 40,000 (20% of 200,000) certificates expired, then we have 160,000 remaining entries + 200,000 new entries = 360,000 entries (18MB)
- Week 3: 72,000 (20% of 360,000) certificates expired, then wen have 288,000 remaining + 200,000 new entries = 488,000 entries (24.4MB)
- Week 4: total 590 000 entries (29.5MB)
- and so on.
The maximal size of 10 MB per CRL remains only valid, if all the certificates in the CRL will expire in the next week. Then the question, what is the sense to revoke only the certificates which will expire shortly, but not the certificates with longer validation period.
Comment 63•8 months ago
|
||
And just for correctness:
"Additional considerations: Most Subscribers of the certificates issued by the CA require support for TLS 1.2, which requires keyEncipherment to be set as per RFC 5246: "keyEncipherment bit MUST be set if the key usage extension is present)." While this does not excuse the typographical mistake, it helps re-enforce that that this was a typo for a setting that was never planned to be changed."
RFC 5246 does not requires keyEncipherment to be set in RSA certificate. The requirements is valid only for the key exchange algorithms "RSA and RSA_PSK", but not for "DHE_RSA and ECDHE_RSA".
Comment 64•7 months ago
|
||
(Responding to Comment 61)
Thank you for your response to our questions in Comment 56. A few additional follow-ups are listed below.
In response to Question 1 of Comment 56, Microsoft PKI Services stated:
“Subscriber migration is expected to complete by April 2026, at which point issuance from existing ICAs will cease.”
(Q1) What stops Microsoft PKI Services from accomplishing this migration sooner?
In response to Question 2 of Comment 56, Microsoft PKI Services stated:
“We discovered gaps in our readiness to deal with a revocation event at this scale. Based on these gaps, we have identified action items to address those gaps (CRL partitioning, stand-by CAs, eliminating long lead time for new ICAs creation by eliminating need for cross-signing the ICAs, and mass revocation playbook as required by the Mozilla root program requirements). Microsoft remains committed to uphold the CAB/F and TRP requirements.”
We struggle to reconcile this statement with prior knowledge and Microsoft's own history.
-
Microsoft has been aware of the benefits of CRL partitioning since at least May 2014 when it published guidance recommending it for Windows Server 2003 PKI deployments. Again, it is surprising that Microsoft PKI Services had not adopted this guidance internally in the intervening decade.
-
Microsoft PKI Services’ public response to the Mozilla Policy 3.0 Survey focused on “mass revocation” readiness and challenges cited concerns related to CRL bloat.
-
The TLS Baseline Requirements have always included expectations for timely certificate revocation, and these expectations have been increasingly emphasized within the community over the past year (e.g., discussions within the CA/Browser Forum, Mozilla’s Mass Revocation Policy and surrounding discussions, and here in Bugzilla). Microsoft's current prolonged revocation plan appears to contradict these long-standing and recently amplified expectations.
From our view and when considering the above, Microsoft PKI Services’ handling of this incident depicts an organization that was operating in a capacity where it was (and seemingly still is) unprepared to take steps necessary to adhere to the expectations described in the TLS Baseline Requirements. If it was not already aware of these shortcomings when Bug 1962829 was disclosed, it raises significant concerns about Microsoft PKI Services’ long-standing operational readiness and reliability when considering the inherent risks posed to the public-trust ecosystem.
(Q2) Can you offer more substantial commentary, or even better, enact more meaningful change that demonstrates Microsoft PKI Services’s commitment to reliably upholding the public-trust requirements? (we offer some examples below)
In Comment 14 of this incident report we asked “We understand that Microsoft PKI Services was aware of its “CRL bloat” concerns related to mass revocation events in February 2025, and presumably earlier. Can you help us understand that given the existence of this concern and the community’s emphasis on improving response to large revocation events over the past year, Microsoft PKI Services did not move forward with planning (minimally) or implementing partitioned CRLs sooner?”
The response was “Planning and implementation of CRL partitioning started before this incident and is currently being tested in a non-production environment. However, it has not been completed in time to be a mitigating factor in this incident.”
(Q3) We’d like to understand why CRL partitioning was “not completed in time to be a mitigating factor in this incident.” Can you please explain this in more detail?
In response to Question 3 of Comment 56, Microsoft PKI Services stated:
“The major limiting factor in not being able to revoke in a timely manner was lack of CRL partitioning on the existing CAs. The Warm Stand-bys are planned to have CRL partitioning. So will not suffer from the same issues.”
(Q4) This only appears true once all leafs are migrated to an ICA with partitioned CRLs. Is there something that we are missing?
In response to Question 7 of Comment 56, Microsoft PKI Services stated:
“Revocation of the ICAs was considered as an option, but revoking the ICAs would have impacted active subscriber certificates which were not mis-issued (with no alternate available for them).”
This does not directly address the question presented to Microsoft PKI Services.
However, the response to Question 6 states:
“The decisions related to legacy device support for Microsoft services are business decisions which are owned by the respective services. That said, our plan to cross-sign our G2 CA at the root level, and cease issuance from the G1 cross-signed ICAs will obviate the need for cross signing any additional ICAs in the future.”
We interpret this to indicate that Microsoft PKI Services is allowing external needs (i.e., “business decisions which are owned by the respective services.”) to take precedence over its obligations to the TLS Baseline Requirements.
This response also appears to ignore that non-Microsoft PKI Services CA service providers could be an option for the affected subscribers.
(Q5) The responses in Comment 56 do not address the (mis?)perception that Microsoft PKI Services is misprioritizing its responsibilities. We will again ask for Microsoft PKI Services to explain why its response to this incident should not be interpreted as prioritizing subscriber needs over its obligations to the TLS Baseline Requirements as a publicly-trusted CA Owner?
In response to Question 8 of Comment 56, Microsoft PKI Services stated:
“We are interpreting this question as “what is our capability to migrate issuance to new CAs for all our subscribers”. If that is not the intent of the question, please clarify.”
(Q6) This question was to understand how you will in practice migrate subscribers across issuing CAs. For example, GlobalSign describes rotating ICAs on a quarterly basis. With this clarification, does your answer change?
In response to Comment 58:
Microsoft PKI Services stated: “In relation to reducing certificate lifetimes, MS PKI Services currently supports 1 month certificates. But the certificate validity period is chosen by the subscribers based on their cadence and constraints. Our current plan for enforcing shorter certificate lifetimes follows the timeline outlined in Ballot SC-081v3.”
Despite supporting 1-month certificates and allowing validity to be chosen by subscribers, approximately 90% of the certificates affected by Bug 1962829 were determined by Microsoft as not in use. That seems to describe that the existing approach could and should be improved.
As one possible alternative, one might imagine that by default Microsoft PKI Services’ could issue short-lived certificates (i.e., those that do not need to be revoked), and instead could issue longer-lived certificates when explicitly requested by the applicant - for the validity requested.
(Comment) Given the circumstances of this report and Microsoft’s response, we strongly encourage Microsoft PKI Services to more aggressively pursue a remedy to this incident that includes a reduction of validity well in advance of the timelines included in SC-081 as a demonstration of its commitment to promoting agility, resilience, and improved security across the ecosystem.
Comment 65•7 months ago
|
||
This comment follows up on Comment #54. While we are pleased by MPS’s commitment to implement CRL partitioning and provision standby ICAs, we remain concerned that the current set of action items may not fully address the operational gaps that led to the delayed revocations and their impacts on the broader ecosystem.
MPS has already championed shorter certificate lifetimes by transitioning a number of users to six-month certificates by default, Comment #6. And in its responses, MPS has said that it is evaluating the use of short-lived certificates, Comment #16, and it has also discussed efforts to migrate a large fraction of the certificates it issues to 30-day lifetimes, Comment #26.
Mozilla requests that MPS commit to these efforts as part of its formal Action Items, with a clear timetable.
Specifically, we would like MPS to commit to concrete steps to increase the adoption of 30-day certificates, including:
- adoption targets with clear evaluation criteria for success; and
- specific actions to promote subscriber adoption, such as making 30-day lifetimes the default issuance profile.
In parallel, we would like MPS to make “Short-lived Subscriber Certificates”—as defined in the TLS Baseline Requirements (≤10 days until March 15, 2026; ≤7 days thereafter)— available as a profile, and if suitable, the default option for short-lived cloud deployments. As outlined in section 4.9.1.1 of the TLS BRs, MPS wouldn’t need to provide any revocation services for such certificates, thereby improving both scalability and resilience.
Given the high volume of unused certificates reported, MPS’s recent informal commitments, and control over its issuance and deployment infrastructure, we believe these steps are not only achievable, but also would significantly enhance agility and reduce reliance on large-scale revocation in the event of future incidents.
Additionally, to help Mozilla assess MPS’s alignment with our expectations and readiness improvements, we’d appreciate learning additional, clarifying details, as follows:
A. Mass Revocation Planning
Mozilla requires that MPS adopt a Mass Revocation Plan on or before September 1, 2025. The newly adopted section 5.7.1.2 of the TLS BRs requires that by December 1 MPS include a statement in its CPS that MPS maintains a Mass Revocation Plan. In addition to these requirements, MPS must perform annual operational testing and incorporate lessons learned into the plan.
The plan must cover plan activation criteria, customer contact mechanisms, differentiation of automated and manual steps, time-based objectives for triage and revocation, subscriber notifications, role assignments, training, testing methods, and post-event or post-test analysis.
Can MPS:
(1) share more detail about the structure, testing approach, and frequency for its mass revocation plan;
(2) confirm that its mass revocation plan includes the foregoing required components;
(3) describe the testing methodology used (e.g., simulations, tabletops);
(4) indicate whether and how the Plan and CPS updates are being adopted before the required deadlines; and
(5) share how it intends to validate its readiness internally or through audit processes?
B. Partitioned/Sharded CRLs and G2 Root-Based Hierarchy Migration
Bug comments indicate that CRL partitioning is a gating factor for deploying the G2-based ICAs into production, and we understand that G2 ICA creation is expected to be completed in early August. That milestone is imminent.
To better understand feasibility and preparedness in meeting MPS’s deadlines:
(1) can MPS share the specific design being used for partitioning (e.g. based on serial number, time, hash, etc.)?
(2) has MPS scheduled its key ceremony, and can a specific target date be shared?
(3) how many ICAs will be created?
(4) what are the target dates in late October for MPS’s deployment under the new G2 hierarchy?
(5) besides CRL partitioning, what other dependencies are there on deploying the ICAs into production/standby?
Again, we encourage MPS to update the Action Items section of its incident report to reflect any newly confirmed timelines or improvements, including those in response to community feedback and our current requests.
Comment 66•7 months ago
|
||
Microsoft root program says that all certificates need to have CRL or OCSP information. Even though it would be very nice to have certificates that do not need revokation, it is not possible now.
| Assignee | ||
Comment 67•7 months ago
|
||
Response to Comment 62 - Lijun Liao
Thank you for the thoughtful analysis Lijun. You are correct that the CRL size and certificate expiration timelines are critical for our revocation strategy for this bug. Our approach to revoking certificates in weekly batches was designed to balance several competing priorities:
- CRL Size Management: As noted in Comment 16, we are aligning with the Windows Trusted Root Program’s recommendation to maintain CRL sizes at or below 10MB.
- Certificate Expiration: By revoking certificates that are nearing expiration, we ensure that they fall off the CRLs to make space of more certificates to be revoked. This allows us to maximize the revocations while operating within the CRL size constraints.
| Assignee | ||
Comment 68•7 months ago
|
||
Response to Comment 63 - Lijun Liao
We acknowledge and thank you for the correction, you're right that RFC 5246 only requires keyEncipherment for RSA and RSA_PSK key exchange, not for DHE_RSA or ECDHE_RSA.
| Assignee | ||
Comment 69•7 months ago
|
||
Weekly Status Update
We are actively making progress on the action items identified in the full incident report. Also, action item #4 has been marked as complete.
| Action Item | Kind | Root Cause(s) | Updated Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 11/15/2025 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 11/15/2025 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 9/30/2025 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 7/31/2025 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 9/30/2025 | In Progress |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 09/01/2025 | In Progress |
| Assignee | ||
Comment 70•7 months ago
|
||
Revocation Delay Status Update
- Total certificates revoked (planned to date):
- 5,396,112 (5,758,644)
- Remaining active certificates (total affected):
- 40,401,165 (72,070,777)
- Total certificates expired and not revoked (to date):
- 25,539,494
- Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
Comment 71•7 months ago
|
||
(In reply to Microsoft PKI Services from comment #67)
Response to Comment 62 - Lijun Liao
Thank you for the thoughtful analysis Lijun. You are correct that the CRL size and certificate expiration timelines are critical for our revocation strategy for this bug. Our approach to revoking certificates in weekly batches was designed to balance several competing priorities:
- CRL Size Management: As noted in Comment 16, we are aligning with the Windows Trusted Root Program’s recommendation to maintain CRL sizes at or below 10MB.
- Certificate Expiration: By revoking certificates that are nearing expiration, we ensure that they fall off the CRLs to make space of more certificates to be revoked. This allows us to maximize the revocations while operating within the CRL size constraints.
If the certificates will expire in one week, no revocation is needed (similar to the short-lived certificates). You can skip the revocation, such revocation does not have any real sense. So my understanding of this revocation strategy is just to say the "Microsoft is taking the revocation action".
For me, I would like to follow other direction. Since only 10% of the certificates are active. I will use the limited resource to revoke these active certificates. Instead of revoking all certificates expired in one week, I may have the capability to revoke the active certificates expired in the next n weeks (where n seems to be between 4-10).
Comment 72•7 months ago
|
||
(In reply to Microsoft PKI Services from comment #67)
- Certificate Expiration: By revoking certificates that are nearing expiration, we ensure that they fall off the CRLs to make space of more certificates to be revoked. This allows us to maximize the revocations while operating within the CRL size constraints.
In my understanding, it is an incorrect assumption that the expiration of a certificate automatically exempts you from the obligation to track it in a revocation list. The BR (7.2.2) even recommends (SHOULD) to update revocation entries (including dates) if new information about the compromise gets known.
Comment 73•7 months ago
|
||
At what point do we realize Microsoft have no intent to meet revocation timeline and just slow-walk revocation. More are expring than revoked. CRL size is simply excuses.
It's also interesting that browsers do not seem to care. Question asked, not answered, but no action taken.
Comment 74•7 months ago
|
||
If this is the “Secure Future Initiative”[1][2] that Microsoft promised to its customers and the general public after heavy criticism from the security community, US Senators and others following multiple security disasters, then maybe it should be questioned whether cloud providers and OS/browser vendors should be allowed to run a public Web PKI CAs in the first place. There are many apparent conflicts of interest surfacing here.
What is the consensus on this question of principle? This probably has been discussed in the past.
[1] https://www.bleepingcomputer.com/news/microsoft/microsoft-pledges-to-bolster-security-as-part-of-secure-future-initiative/
[2] https://www.microsoft.com/en-us/trust-center/security/secure-future-initiative
| Assignee | ||
Comment 75•7 months ago
|
||
Response to Comment 64 – Chrome Root Program
Question 1
"In response to Question 1 of Comment 56, Microsoft PKI Services stated:
“Subscriber migration is expected to complete by April 2026, at which point issuance from existing ICAs will cease.”
(Q1) What stops Microsoft PKI Services from accomplishing this migration sooner?"
April ’26 was our high confidence date for completing the migration. We are accelerating work, and our new target is by end of Feb ’26. We will look to accelerate further as we implement that plan. The major publicly trackable milestones are below, and we will provide updates to the community as we hit those milestones:
- Issuance of certificates from G2 ICAs with partitioned CRLs begins
- Issuance of certificates from G1 ICAs with partitioned CRLs begins
- Issuance fully migrated to partitioned CRLs (G1 or G2)
We will also provide regular updates on burndown for G1 to G2 transition.
Question 2
"In response to Question 2 of Comment 56, Microsoft PKI Services stated:
“We discovered gaps in our readiness to deal with a revocation event at this scale. Based on these gaps, we have identified action items to address those gaps (CRL partitioning, stand-by CAs, eliminating long lead time for new ICAs creation by eliminating need for cross-signing the ICAs, and mass revocation playbook as required by the Mozilla root program requirements). Microsoft remains committed to uphold the CAB/F and TRP requirements.”
We struggle to reconcile this statement with prior knowledge and Microsoft's own history.
• Microsoft has been aware of the benefits of CRL partitioning since at least May 2014 when it published guidance recommending it for Windows Server 2003 PKI deployments. Again, it is surprising that Microsoft PKI Services had not adopted this guidance internally in the intervening decade.
• Microsoft PKI Services’ public response to the Mozilla Policy 3.0 Survey focused on “mass revocation” readiness and challenges cited concerns related to CRL bloat.
• The TLS Baseline Requirements have always included expectations for timely certificate revocation, and these expectations have been increasingly emphasized within the community over the past year (e.g., discussions within the CA/Browser Forum, Mozilla’s Mass Revocation Policy and surrounding discussions, and here in Bugzilla). Microsoft's current prolonged revocation plan appears to contradict these long-standing and recently amplified expectations.
From our view and when considering the above, Microsoft PKI Services’ handling of this incident depicts an organization that was operating in a capacity where it was (and seemingly still is) unprepared to take steps necessary to adhere to the expectations described in the TLS Baseline Requirements. If it was not already aware of these shortcomings when Bug 1962829 was disclosed, it raises significant concerns about Microsoft PKI Services’ long-standing operational readiness and reliability when considering the inherent risks posed to the public-trust ecosystem.
(Q2) Can you offer more substantial commentary, or even better, enact more meaningful change that demonstrates Microsoft PKI Services’s commitment to reliably upholding the public-trust requirements? (we offer some examples below)"
In addition to the already committed repairs, we are also committing to the below repair actions –
- Accelerate migration to ICAs with partitioned CRLs (see response to Q1)
- Accelerating default certificate lifetimes reduction (see response to last comment)
- Plan for frequent ICA rotations to maintain operational readiness and crypto agility (Q6).
With these changes in place, we will be in a much improved state to be able to respond to revocation events at a scale such as this.
Question 3
"
In Comment 14 of this incident report we asked “We understand that Microsoft PKI Services was aware of its “CRL bloat” concerns related to mass revocation events in February 2025, and presumably earlier. Can you help us understand that given the existence of this concern and the community’s emphasis on improving response to large revocation events over the past year, Microsoft PKI Services did not move forward with planning (minimally) or implementing partitioned CRLs sooner?”
The response was “Planning and implementation of CRL partitioning started before this incident and is currently being tested in a non-production environment. However, it has not been completed in time to be a mitigating factor in this incident.”
(Q3) We’d like to understand why CRL partitioning was “not completed in time to be a mitigating factor in this incident.” Can you please explain this in more detail?"
The underlying CA software we use did not support partitioned CRLs without re-keying CAs. Given our current volumes, utilizing that method would have required us to re-key CAs at a frequency which is not operationally viable.
Development and testing of operationally viable CA software features was already in progress prior to the Feb 2025 MRP survey. Testing and bug fixes have completed at this time, and the CA software update is slated for release by mid-August, after which we will validate and roll out to production as a pre-requisite to G2 ICA migration start.
Question 4
"In response to Question 3 of Comment 56, Microsoft PKI Services stated:
“The major limiting factor in not being able to revoke in a timely manner was lack of CRL partitioning on the existing CAs. The Warm Stand-bys are planned to have CRL partitioning. So will not suffer from the same issues.”
(Q4) This only appears true once all leafs are migrated to an ICA with partitioned CRLs. Is there something that we are missing?"
You are correct that CRL partitioning risk will only be eliminated once all the leaf certificates are renewed post CRL partitioning implementation. To address this issue, per the plan provided in our response to Q1, we will accelerate migration to G2 ICAs to accelerate issuance of all new certs to CAs with partitioned CRLs.
Question 5
"In response to Question 7 of Comment 56, Microsoft PKI Services stated:
“Revocation of the ICAs was considered as an option, but revoking the ICAs would have impacted active subscriber certificates which were not mis-issued (with no alternate available for them).”
This does not directly address the question presented to Microsoft PKI Services.
However, the response to Question 6 states:
“The decisions related to legacy device support for Microsoft services are business decisions which are owned by the respective services. That said, our plan to cross-sign our G2 CA at the root level, and cease issuance from the G1 cross-signed ICAs will obviate the need for cross signing any additional ICAs in the future.”
We interpret this to indicate that Microsoft PKI Services is allowing external needs (i.e., “business decisions which are owned by the respective services.”) to take precedence over its obligations to the TLS Baseline Requirements.
This response also appears to ignore that non-Microsoft PKI Services CA service providers could be an option for the affected subscribers.
(Q5) The responses in Comment 56 do not address the (mis?)perception that Microsoft PKI Services is misprioritizing its responsibilities. We will again ask for Microsoft PKI Services to explain why its response to this incident should not be interpreted as prioritizing subscriber needs over its obligations to the TLS Baseline Requirements as a publicly-trusted CA Owner?"
Inability to precisely direct revocations to only the affected certificates in a revocation event of this scale is the primary driver for the delayed revocations. To address this issue and to reinforce MPS’s commitment to the public TLS requirements, we are making and accelerating significant investments to improve our systems as per plan provided in response to Q2.
Question 6
"In response to Question 8 of Comment 56, Microsoft PKI Services stated:
“We are interpreting this question as “what is our capability to migrate issuance to new CAs for all our subscribers”. If that is not the intent of the question, please clarify.”
(Q6) This question was to understand how you will in practice migrate subscribers across issuing CAs. For example, GlobalSign describes rotating ICAs on a quarterly basis. With this clarification, does your answer change?"
Though there is no BR requirement stipulating ICA rotation schedules, we recognize the benefits of frequent ICA rotations (operational readiness, eliminating CA pinning by subscribers, crypto agility etc.). By mid-October, we will develop a plan for doing scheduled ICA rotations at a fixed cadence.
Comment
"In response to Comment 58:
Microsoft PKI Services stated: “In relation to reducing certificate lifetimes, MS PKI Services currently supports 1 month certificates. But the certificate validity period is chosen by the subscribers based on their cadence and constraints. Our current plan for enforcing shorter certificate lifetimes follows the timeline outlined in Ballot SC-081v3.”
Despite supporting 1-month certificates and allowing validity to be chosen by subscribers, approximately 90% of the certificates affected by Bug 1962829 were determined by Microsoft as not in use. That seems to describe that the existing approach could and should be improved.
As one possible alternative, one might imagine that by default Microsoft PKI Services’ could issue short-lived certificates (i.e., those that do not need to be revoked), and instead could issue longer-lived certificates when explicitly requested by the applicant - for the validity requested.
(Comment) Given the circumstances of this report and Microsoft’s response, we strongly encourage Microsoft PKI Services to more aggressively pursue a remedy to this incident that includes a reduction of validity well in advance of the timelines included in SC-081 as a demonstration of its commitment to promoting agility, resilience, and improved security across the ecosystem."
We are committed to reducing the default validity periods of certificates much ahead of the BR required dates, including availability of 7-day profiles. We will share the details of the plan by 08/22. We have added an action item for this to the repair actions.
Additional Committed Action Items
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | New |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | TBD (based on plan milestones) | New |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. We will provide regular updates on the burndown for G1 to G2 transition. | 2026-02-16 | New |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as we execute the plan. | TBD | New |
| Assignee | ||
Comment 76•7 months ago
|
||
Response to Comment 65 - Ben Wilson
Reduction in Certificate validity
"This comment follows up on Comment #54. While we are pleased by MPS’s commitment to implement CRL partitioning and provision standby ICAs, we remain concerned that the current set of action items may not fully address the operational gaps that led to the delayed revocations and their impacts on the broader ecosystem.
MPS has already championed shorter certificate lifetimes by transitioning a number of users to six-month certificates by default, Comment #6. And in its responses, MPS has said that it is evaluating the use of short-lived certificates, Comment #16, and it has also discussed efforts to migrate a large fraction of the certificates it issues to 30-day lifetimes, Comment #26.
Mozilla requests that MPS commit to these efforts as part of its formal Action Items, with a clear timetable.
Specifically, we would like MPS to commit to concrete steps to increase the adoption of 30-day certificates, including:
• adoption targets with clear evaluation criteria for success; and
• specific actions to promote subscriber adoption, such as making 30-day lifetimes the default issuance profile.
In parallel, we would like MPS to make “Short-lived Subscriber Certificates”—as defined in the TLS Baseline Requirements (≤10 days until March 15, 2026; ≤7 days thereafter)— available as a profile, and if suitable, the default option for short-lived cloud deployments. As outlined in section 4.9.1.1 of the TLS BRs, MPS wouldn’t need to provide any revocation services for such certificates, thereby improving both scalability and resilience.
Given the high volume of unused certificates reported, MPS’s recent informal commitments, and control over its issuance and deployment infrastructure, we believe these steps are not only achievable, but also would significantly enhance agility and reduce reliance on large-scale revocation in the event of future incidents.
We estimate that we can move 25% of our subscriber certs to 30 day certs over the next 9-12 months. To address your questions related to reduction of certificate validity periods, please see action item added as part of Comment 75.
Mass Revocation Plan
"A. Mass Revocation Planning
Mozilla requires that MPS adopt a Mass Revocation Plan on or before September 1, 2025. The newly adopted section 5.7.1.2 of the TLS BRs requires that by December 1 MPS include a statement in its CPS that MPS maintains a Mass Revocation Plan. In addition to these requirements, MPS must perform annual operational testing and incorporate lessons learned into the plan.
The plan must cover plan activation criteria, customer contact mechanisms, differentiation of automated and manual steps, time-based objectives for triage and revocation, subscriber notifications, role assignments, training, testing methods, and post-event or post-test analysis.
Can MPS:
(1) share more detail about the structure, testing approach, and frequency for its mass revocation plan;
(2) confirm that its mass revocation plan includes the foregoing required components;
(3) describe the testing methodology used (e.g., simulations, tabletops);
(4) indicate whether and how the Plan and CPS updates are being adopted before the required deadlines; and
(5) share how it intends to validate its readiness internally or through audit processes?"
This incident highlighted critical areas for improving our mass revocation readiness. We are actively working on the plan to comply with all the MRP requirements by the September 1st deadline.
We will provide more detailed response for these specific questions before September 5th.
Partitioned/Sharded CRLs and G2 Root-Based Hierarchy Migration
"Bug comments indicate that CRL partitioning is a gating factor for deploying the G2-based ICAs into production, and we understand that G2 ICA creation is expected to be completed in early August. That milestone is imminent.
To better understand feasibility and preparedness in meeting MPS’s deadlines:
(1) can MPS share the specific design being used for partitioning (e.g. based on serial number, time, hash, etc.)?
(2) has MPS scheduled its key ceremony, and can a specific target date be shared?
(3) how many ICAs will be created?
(4) what are the target dates in late October for MPS’s deployment under the new G2 hierarchy?
(5) besides CRL partitioning, what other dependencies are there on deploying the ICAs into production/standby?"
- At issuance, each certificate is randomly assigned a CRL partition number, which determines the specific CRL that will contain its serial number if revoked. The certificate’s CDP extension includes an IDP extension indicating the scope of the partition it covers.
- For security reasons, Microsoft does not disclose the exact dates of the key ceremonies. We have already created 7 G2 ICAs which are disclosed in CCADB, and the remaining G2 and G1 CAs will be created by 8/15.
- Microsoft plans to create 12 G2 CAs—4 RSA and 2 ECC for certificate issuance, and 4 RSA and 2 ECC as warm standbys. Additionally, 6 CAs—4 RSA and 2 ECC—will be created from G1 root for warm standby purposes.
- Our current target date for G2 CA availability for enrollment is 10/27. Though, as mentioned in Comment 75 we are planning to accelerate that availability.
- The primary dependency is CRL partitioning, no other dependencies are known at this time for G2 ICAs. For G1 ICAs, we have an additional dependency to have them cross-signed.
| Assignee | ||
Comment 77•7 months ago
|
||
Response to Comment 71 - Lijun Liao
"If the certificates will expire in one week, no revocation is needed (similar to the short-lived certificates). You can skip the revocation, such revocation does not have any real sense. So my understanding of this revocation strategy is just to say the "Microsoft is taking the revocation action".
For me, I would like to follow other direction. Since only 10% of the certificates are active. I will use the limited resource to revoke these active certificates. Instead of revoking all certificates expired in one week, I may have the capability to revoke the active certificates expired in the next n weeks (where n seems to be between 4-10)."
We acknowledge that revoking certificates close to expiration may have limited operational impact. However, the challenge was not about convenience; it was about avoiding a scenario where revocation itself would destabilize the ecosystem.
At the time of this incident, approximately 4.5M of the impacted certificates were active. Revoking all of them immediately, without partitioned CRLs, would have produced CRLs so large that relying-party software could not process them reliably. This would have caused widespread failures across the ecosystem, including for parties unrelated to the incident.
Given this constraint, we executed the maximum safe revocation possible under the circumstances while accelerating the structural fix, partitioned CRLs, that permanently removes this limitation. Per Comment 75, we are also expediting migration off these CAs.
| Assignee | ||
Comment 78•7 months ago
|
||
Response to Comment 72 - Stephan Verbücheln
"In my understanding, it is an incorrect assumption that the expiration of a certificate automatically exempts you from the obligation to track it in a revocation list. The BR (7.2.2) even recommends (SHOULD) to update revocation entries (including dates) if new information about the compromise gets known."
We appreciate the clarification regarding BR 7.2.2. Our process follows BR 4.10, were we remove CRL entries for expired certs. To confirm, our strategy does not assume that expiration exempts revocation obligations. Rather, our decision to defer revocation for certificates nearing expiration was driven by the need to manage CRL size and avoid ecosystem-wide failures. We remain committed to revoking compromised certificates regardless of expiration status when new compromise information is discovered.
| Assignee | ||
Comment 79•7 months ago
|
||
Weekly Status Update
We are actively progressing on the full set of action items outlined in the incident report. Below is the complete and updated list, which now includes several newly added items.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2025-11-15 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 2025-11-15 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 2025-09-30 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | In Progress |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | In Progress |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | New |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | TBD (based on plan milestones) | New |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. We will provide regular updates on the burndown for G1 to G2 transition. | 2026-02-16 | New |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as we execute the plan. | TBD | New |
| Assignee | ||
Comment 80•7 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 6,006,812 (6,558,644)
-
Remaining active certificates (total affected):
- 36,079,260 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 29,061,399
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 81•7 months ago
|
||
Weekly Status Update
We are actively progressing on the full set of action items outlined in the incident report. We have updated the Due Date for the last repair item. Please see full list below:
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2025-11-15 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 2025-11-15 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 2025-09-30 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | In Progress |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | In Progress |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | In Progress |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | TBD (based on plan milestones) | New |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. We will provide regular updates on the burndown for G1 to G2 transition. | 2026-02-28 | New |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as we execute the plan. | 2025-10-17 | New |
| Assignee | ||
Comment 82•7 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 6,806,812 (7,358,644)
-
Remaining active certificates (total affected):
- 30,930,510 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 33,410,149
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
Comment 83•7 months ago
|
||
Mozilla appreciates MPS’ progress in improving certificate lifecycle management, its commitment to reducing certificate validity periods ahead of required timelines, and its adoption of a phased implementation plan to issue more shorter-lived certificates.
We note that in Comment #76 MPS estimates that it can move 25% of its subscriber certificates to 30-day certificates over the next 9-12 months, which is a meaningful first step, but in Comment #26 MPS said that approximately 75% of impacted certificates could have had 30-day validity based on the lifecycle of the underlying resources. Given that prior estimate, we would appreciate additional context around the mention of this lower 25% migration target.
Can MPS please explain the rationale and factors involved in the 25% 9-12-month goal? Did MPS identify implementation issues or subscriber constraints that pushed it out? Will the plan to be provided next week include additional adoption targets (e.g. 50%, 75%, 90%) and more aggressive dates ?
Understanding the assumptions and barriers informing MPS' staged plan will help the community better assess MPS’ path toward broader adoption of shorter-lived certificates.
Thanks.
| Assignee | ||
Comment 84•7 months ago
|
||
Response to Comment 83 - Ben Wilson
"Mozilla appreciates MPS’ progress in improving certificate lifecycle management, its commitment to reducing certificate validity periods ahead of required timelines, and its adoption of a phased implementation plan to issue more shorter-lived certificates.
We note that in Comment #76 MPS estimates that it can move 25% of its subscriber certificates to 30-day certificates over the next 9-12 months, which is a meaningful first step, but in Comment #26 MPS said that approximately 75% of impacted certificates could have had 30-day validity based on the lifecycle of the underlying resources. Given that prior estimate, we would appreciate additional context around the mention of this lower 25% migration target.
Can MPS please explain the rationale and factors involved in the 25% 9-12-month goal? Did MPS identify implementation issues or subscriber constraints that pushed it out? Will the plan to be provided next week include additional adoption targets (e.g. 50%, 75%, 90%) and more aggressive dates ?
Understanding the assumptions and barriers informing MPS' staged plan will help the community better assess MPS’ path toward broader adoption of shorter-lived certificates.
Thanks."
In Comment 26 we referred to a 75% migration potential, which reflected the upper bound of what is technically feasible. However, since then we have engaged with the subscriber community and have learned additional constraints - workload durations not being known in advance, upstream dependencies for the subscribers, and safe deployment norms which are reflected in the more realistic 25% figure mentioned in Comment 75. We are continuing to work with our largest subscribers to understand their constraints and address those constraints. As those constraints are solved, we hope to make faster progress in this front.
| Assignee | ||
Comment 85•7 months ago
|
||
Below please find the lifetime reduction plan as part of this action item:
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
Lifetime Reduction Plan
Our goal is to reduce the default certificate validity to 47-day certificates by May 2026. Below are the dates when transitions to each default validity period will begin. Note that once a transition begins, it can take up to 6 weeks for it to saturate through our entire subscriber base. Also note that these are defaults, and subscribers can ask for and receive exceptions (the max validity periods with exception are noted in the last column). Beyond this committed plan, we are investigating a plan to introduce 7-day default validity before the end of CY26.
| Certificate issued on or after | Expected saturation date for policy changes | Default validity period | Exception maximums |
|---|---|---|---|
| ~September 15, 2025 | November 1, 2025 | 100 days | 360 days |
| March 15, 2026 | May 1, 2026 | 47 days | 200 days |
Though it is difficult to estimate how many of our subscribers may choose to exercise exceptions, we have estimated the bounds of adoption based on the historical subscriber behavior. Based on these projections, we estimate that subscribers who account for approximately 65% of the certificates are likely to adopt the defaults (or less) when they are rolled out (e.g. once the defaults change to 100 days is rolled out, we expect ~65% of certs issued after that date to be 100 day certs or less). We expect the subscribers for the remaining 35% to take between 6-18 months to make the necessary changes at their end to adopt the shorter validity periods. Note that these are estimates based on historical data. Changes in subscriber behavior (e.g. more subscribers than projected delay adoption, changes to a usage pattern for a high-volume customer of a subscriber service) can skew these numbers. Adoption of these changes can be publicly tracked via validity data on certs in crt.sh.
| Assignee | ||
Comment 86•7 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 7,606,812 (8,158,644)
-
Remaining active certificates (total affected):
- 26,644,634 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 36,896,025
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 87•7 months ago
|
||
Weekly Status Update
We are actively progressing on the full set of action items outlined in the incident report. We have completed action item #7.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2025-11-15 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 2025-11-15 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 2025-09-30 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | In Progress |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | In Progress |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | TBD (based on plan milestones) | New |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. We will provide regular updates on the burndown for G1 to G2 transition. | 2026-02-28 | In Progress |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as we execute the plan. | 2025-10-17 | In Progress |
| Assignee | ||
Comment 88•6 months ago
|
||
Weekly Status Report
We are actively progressing on the full set of action items outlined in the incident report.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2025-11-15 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 2025-11-15 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 2025-09-30 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | In Progress |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | In Progress |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | TBD (based on plan milestones) | New |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. We will provide regular updates on the burndown for G1 to G2 transition. | 2026-02-28 | New |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as we execute the plan. | 12025-10-17 | In Progress |
| Assignee | ||
Comment 89•6 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 8,408,144 (8,958,644)
-
Remaining active certificates (total affected):
- 22,063,419 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 40,677,240
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 90•6 months ago
|
||
Follow Up to Comment 76 - Ben Wilson
A. Mass Revocation Planning
Mozilla requires that MPS adopt a Mass Revocation Plan on or before September 1, 2025. The newly adopted section 5.7.1.2 of the TLS BRs requires that by December 1 MPS include a statement in its CPS that MPS maintains a Mass Revocation Plan. In addition to these requirements, MPS must perform annual operational testing and incorporate lessons learned into the plan.
The plan must cover plan activation criteria, customer contact mechanisms, differentiation of automated and manual steps, time-based objectives for triage and revocation, subscriber notifications, role assignments, training, testing methods, and post-event or post-test analysis.
Can MPS:
(1) share more detail about the structure, testing approach, and frequency for its mass revocation plan;
We added a new Knowledge Base (KB) article to our internal wiki to formally document our Mass Revocation Plan (MRP). This MRP meets all of the requirements laid out in Mozilla’s TRP Policy and is laid out in sections, such as Definitions, Scope, Activation Criteria, Subscriber Notification, Automation Points, Targets and Timelines, Training, Plan Testing and Third-Party Assessment. Upon the implementation of the first version of the MRP, we conducted a Tabletop exercise with a facilitator providing prompts to the team with the testing scenario and the team walking through the processes that would be followed given the prompts. The results of the tabletop will be used to update and improve the MRP. The frequency of testing in our MRP meets the annual requirement from the Mozilla TRP policy.
(2) confirm that its mass revocation plan includes the foregoing required components;
Our MRP includes all of the foregoing required components.
(3) describe the testing methodology used (e.g., simulations, tabletops);
We described our testing methodology that we used in August 2025 above, a tabletop exercise. In the future our testing will likely involve tabletops and evolve to add more complex situations and simulations.
(4) indicate whether and how the Plan and CPS updates are being adopted before the required deadlines; and
The first version of the MRP was implemented in late August 2025. We have already drafted the CPS updates related to MRP and will have them in place before the Dec 1 deadline.
(5) share how it intends to validate its readiness internally or through audit processes?"
We intend to validate readiness using a third-party assessment, that will assess our audit period (May 1, 2025 through April 30, 2026). This assessment will be completed within 3 months of our audit cycle end (April 30, 2026 + 3 months or by July 30,2026).
| Assignee | ||
Comment 91•6 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
9,208,144 (9,758,644) -
Remaining active certificates (total affected):
17,344,568 (72,070,777) -
Total certificates expired and not revoked (to date):
44,596,091 -
Estimate for remaining revocations:
We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 92•6 months ago
|
||
Weekly Status Update
We are actively progressing on the full set of action items outlined in the incident report. Action Item #6 was completed on 8/29 and has been marked Complete. We have also updated the due date for action item #8.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2025-11-15 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 2025-11-15 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 2025-09-30 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | In Progress |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | New |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. We will provide regular updates on the burndown for G1 to G2 transition. | 2026-02-28 | New |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as we execute the plan. | 2025-10-17 | New |
Comment 93•6 months ago
|
||
Microsoft PKI Services has typically been providing weekly status updates on Fridays for this incident. Given that the last update was 10 days ago (September 5, 2025), it appears the update for last week was missed.
Can Microsoft PKI Services please provide the weekly revocation status update?
Additionally, the status reports have shown a consistent gap between the "Total certificates revoked" and the "(planned to date)" target in every update since mid-July (starting with the July 18 report).
Can Microsoft PKI Services please explain the circumstances that have caused the execution of revocations to lag behind the plan (published in Comment 27) for the past two months?
| Assignee | ||
Comment 94•6 months ago
|
||
Revocation Delay Status Update
As of 9/12:
-
Total certificates revoked (planned to date):
9,808,144 (10,558,644) -
Remaining active certificates (total affected):
13,845,195 (72,070,777) -
Total certificates expired and not revoked (to date):
47,295,464 -
Estimate for remaining revocations:
We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 95•6 months ago
|
||
Weekly Status Update
We are actively progressing on the full set of action items outlined in the incident report. Please note the update action #8. We have also closed action item #5, based on investigation with subscribers who account for 90+% of the certificate volume we have determined that the usage is valid use of public TLS certificates, but there are opportunities to reduce certificate lifetimes, which is now tracked as part of action item #7.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2025-11-15 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 2025-11-15 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 2025-09-30 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | In-Progress |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. We will provide regular updates on the burndown for G1 to G2 transition. | 2026-02-28 | New |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as we execute the plan. | 2025-10-17 | New |
| Assignee | ||
Comment 96•6 months ago
|
||
Response to Comment 93 - Chrome Root Program
We apologize for the confusion—both the weekly status update and the revocation delay update were posted on Friday under Bug 1979475 by accident. We’ve now included those updates here in the correct bug Bug 1965612 for visibility.
Regarding the consistent gap between the “Total certificates revoked” and the “(planned to date)” target since mid-July, we acknowledge the concern and will provide a detailed explanation with this week's update.
| Assignee | ||
Comment 97•6 months ago
|
||
Response to Comment 93 Cont'd - Chrome Root Program
The revocation plan was originally constructed by taking the remaining population of affected certificates and distributing an even weekly target across the four impacted issuing CAs while maintaining a 10MB CRL Size, yielding an approximate goal of ~800,000 revocations per week.
During execution, due to the actual distribution of certificates across the 4 issuing CAs, we found that we could not revoke 800k certificates a week and maintain a 10MB CRL for each CA. There were certain weeks where we could only revoke fewer while hitting the CRL size limit on one or more of the CAs. We’ve increased our CRL threshold to 15MB and will publish an updated revocation plan next week based on this new threshold.
| Assignee | ||
Comment 98•6 months ago
|
||
Weekly Status Update
We are actively progressing on the full set of action items outlined in the incident report. Please note the update to action item #3.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2025-11-15 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 2025-11-15 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 2025-11-30 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | In Progress |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. We will provide regular updates on the burndown for G1 to G2 transition. | 2026-02-28 | New |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as we execute the plan. | 2025-10-17 | New |
Comment 99•6 months ago
|
||
(In reply to Microsoft PKI Services from comment #97)
Response to Comment 93 Cont'd - Chrome Root Program
The revocation plan was originally constructed by taking the remaining population of affected certificates and distributing an even weekly target across the four impacted issuing CAs while maintaining a 10MB CRL Size, yielding an approximate goal of ~800,000 revocations per week.
During execution, due to the actual distribution of certificates across the 4 issuing CAs, we found that we could not revoke 800k certificates a week and maintain a 10MB CRL for each CA. There were certain weeks where we could only revoke fewer while hitting the CRL size limit on one or more of the CAs. We’ve increased our CRL threshold to 15MB and will publish an updated revocation plan next week based on this new threshold.
What changed in Microsoft's prior analysis that made any CRL beyond 10MB not feasible?
Is the 'Revocation Delay Status Update' no longer occurring on Fridays?
| Assignee | ||
Comment 100•6 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 10,507,085 (11,358,644)
-
Remaining active certificates (total affected):
- 11,368,451 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 48,972,208
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 101•6 months ago
|
||
Response to Comment 99 - Wayne
What changed in Microsoft's prior analysis that made any CRL beyond 10MB not feasible?
Is the 'Revocation Delay Status Update' no longer occurring on Fridays?
As noted in Comment 97, our original plan assumed an even distribution of revocations across the four issuing CAs, which allowed us to target ~800k revocations per week while keeping CRLs under 10 MB.
In practice, uneven distribution caused some CAs to hit the 10 MB limit early, forcing us to reduce batch sizes. Based on this experience and telemetry, we’ve raised the limit to 15 MB to accelerate progress and will publish an updated plan soon. It will take an additional week to produce our updated revocation forecast.
In regards to the status updates, these updates are still being posted weekly. The recent gap was due to an admin error where we posted to the wrong bug. We’ve corrected this and will continue Friday updates here.
| Assignee | ||
Comment 102•6 months ago
|
||
Weekly Status Update
We are actively progressing on the full set of action items outlined in the incident report. Please note action item #8 has been marked complete.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2025-11-15 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 2025-11-15 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 2025-11-30 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. We will provide regular updates on the burndown for G1 to G2 transition. | 2026-02-28 | New |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as we execute the plan. | 2025-10-17 | In Progress |
Comment 103•6 months ago
|
||
(In reply to Microsoft PKI Services from comment #101)
Response to Comment 99 - Wayne
What changed in Microsoft's prior analysis that made any CRL beyond 10MB not feasible?
Is the 'Revocation Delay Status Update' no longer occurring on Fridays?As noted in Comment 97, our original plan assumed an even distribution of revocations across the four issuing CAs, which allowed us to target ~800k revocations per week while keeping CRLs under 10 MB.
In practice, uneven distribution caused some CAs to hit the 10 MB limit early, forcing us to reduce batch sizes. Based on this experience and telemetry, we’ve raised the limit to 15 MB to accelerate progress and will publish an updated plan soon. It will take an additional week to produce our updated revocation forecast.
From my perspective we've had a soft limit self-imposed by your CA of 10MB for 4 months into this incredibly protracted 'revocation' process. There were hints previously of perhaps it growing to 13MB over time, but no action seemed to occur until some minor pushback from Chrome Root Program. In Comment 96 suddenly to correct the actual revocation not aligning to previous claims suddenly the CRL size grows in ways that would be been greatly beneficial months ago.
Given the prior claims of such changes following a plan, and a lack of any increase to the CRL size to date I'm sure others can see my hesitance to believe the CA is handling this with any due care. Now let me rephrase my previous question.
Q1: What telemetry informed your CA's change in approach, and when did this information start to get collected?
Q2: How often was this information checked for how feasible changes would be in the ecosystem you're monitoring?
Q3: Who was capable of making a decision to allow this change, and how often were they being provided with information to make a determined response?
Q4: When was the hard decision made to change the CRL size in practice, and how long did it take to change in production?
Q5: Why did the decision to change the CRL size approach occur?
In regards to the status updates, these updates are still being posted weekly. The recent gap was due to an admin error where we posted to the wrong bug. We’ve corrected this and will continue Friday updates here.
While updates are required to be posted at least weekly, there was a small commitment made three and a half months ago to improve response times to questions.
We recognize that our responses have been largely following the 7-day response window due to the number of bugs we are concurrently managing and hope to shorten this to 3 days in the future.
Q6: Is there a defined plan for when 'in the future' this will occur?
Q7: What has been a barrier to such a change occurring and how could we help?
The incidents to date have been generously light on questions that would impose strong time commitments. I do worry that words are being said in these incidents that do not reflect how your CA intends to act going forward, but instead to make yourselves look better. Please try to show other CAs how to behave and set a higher standard for yourselves.
| Assignee | ||
Comment 104•6 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 10,949,085 (12,158,644)
-
Remaining active certificates (total affected):
- 8,457,212 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 51,083,447
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 105•5 months ago
|
||
(In reply to Wayne from comment #103)
(In reply to Microsoft PKI Services from comment #101)
Response to Comment 99 - Wayne
What changed in Microsoft's prior analysis that made any CRL beyond 10MB not feasible?
Is the 'Revocation Delay Status Update' no longer occurring on Fridays?As noted in Comment 97, our original plan assumed an even distribution of revocations across the four issuing CAs, which allowed us to target ~800k revocations per week while keeping CRLs under 10 MB.
In practice, uneven distribution caused some CAs to hit the 10 MB limit early, forcing us to reduce batch sizes. Based on this experience and telemetry, we’ve raised the limit to 15 MB to accelerate progress and will publish an updated plan soon. It will take an additional week to produce our updated revocation forecast.
From my perspective we've had a soft limit self-imposed by your CA of 10MB for 4 months into this incredibly protracted 'revocation' process. There were hints previously of perhaps it growing to 13MB over time, but no action seemed to occur until some minor pushback from Chrome Root Program. In Comment 96 suddenly to correct the actual revocation not aligning to previous claims suddenly the CRL size grows in ways that would be been greatly beneficial months ago.
Given the prior claims of such changes following a plan, and a lack of any increase to the CRL size to date I'm sure others can see my hesitance to believe the CA is handling this with any due care. Now let me rephrase my previous question.
Q1: What telemetry informed your CA's change in approach, and when did this information start to get collected?
Q2: How often was this information checked for how feasible changes would be in the ecosystem you're monitoring?
Q3: Who was capable of making a decision to allow this change, and how often were they being provided with information to make a determined response?
Q4: When was the hard decision made to change the CRL size in practice, and how long did it take to change in production?
Q5: Why did the decision to change the CRL size approach occur?In regards to the status updates, these updates are still being posted weekly. The recent gap was due to an admin error where we posted to the wrong bug. We’ve corrected this and will continue Friday updates here.
While updates are required to be posted at least weekly, there was a small commitment made three and a half months ago to improve response times to questions.
We recognize that our responses have been largely following the 7-day response window due to the number of bugs we are concurrently managing and hope to shorten this to 3 days in the future.
Q6: Is there a defined plan for when 'in the future' this will occur?
Q7: What has been a barrier to such a change occurring and how could we help?The incidents to date have been generously light on questions that would impose strong time commitments. I do worry that words are being said in these incidents that do not reflect how your CA intends to act going forward, but instead to make yourselves look better. Please try to show other CAs how to behave and set a higher standard for yourselves.
With Regard to Q1-Q5
Our primary monitoring mechanism was incident reports from customers as well as monitoring our internal systems for impacts from increasing CRL sizes. We started monitoring for these immediately after we started the batched revocations. This information was checked continuously via our internal alerting as well as incident reporting mechanisms. We did have one customer reported as mentioned in comment 35. As mentioned in Comment 99, uneven distribution of revocations across CAs combined with above monitoring, we made the decision to start raising the CRL size limits. The information of monitoring results and any incidents was shared with MPS leadership minimally on a weekly basis (in addition to ad-hoc basis) who approved the decision to increase the CRL size. The approval to change CRL size was approved in early August, and we issued our first CRL ~13MB on August 21st.
With Regard to Q6-Q7
etween the time when we made the referenced comment and now, our open Bugzilla bugs load has not reduced, and we have added repair actions to accelerate long term fixes for the root causes of this bug. Once our open bug counts drop and we are past the early stages of rollout of the long term fixes (Jan 2026), we anticipate being able to respond more quickly to comments.
| Assignee | ||
Comment 106•5 months ago
|
||
Weekly Status update:
We are actively progressing on the full set of action items outlined in the incident report. No major changes.
Updated Revocation Plan
Based on CRL size limit of 15MB, distribution of certificates across CAs, and CRL drop off schedule, we project that we will be able to revoke approximately 2.75m certificates between 10/4 and 11/15. Though exact number of certificates is difficult to project given variabilities in exact revocation time and CRL drop off times, below is an approximate projection of how many certificates we project that we will be able to revoke on a per week basis –
| Week | # of Certs |
|---|---|
| 10/4 -10/10 | 800K |
| 10/11 – 10/17 | 750K |
| 10/18 - 10/24 | 500K |
| 10/25 - 10/31 | 300K |
| 10/31 – 11/7 | 300K |
| 11/7 – 11/14 | 150K |
| Assignee | ||
Comment 107•5 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 11,367,376 (12,158,644) (Note: this number reflects through Thursday 2025-10-02. Approximately ~220k more are in progress as of this writing today)
-
Remaining active certificates (total affected):
- 5,309,154 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 53,431,505
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 108•5 months ago
|
||
Weekly Status update:
We are actively progressing on the full set of action items outlined in the incident report. Please see the updated due date for #10.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2025-11-15 | |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 2025-11-15 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 2025-11-30 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. We will provide regular updates on the burndown for G1 to G2 transition. | 2026-02-28 | New |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as we execute the plan. | 2025-12-31 | New |
| Assignee | ||
Comment 109•5 months ago
|
||
Revocation Delay Status Update
-Total certificates revoked (planned to date):
- 12,598,376(12,158,644)
-Remaining active certificates (total affected):
- 2,713,492 (72,070,777)
-Total certificates expired and not revoked (to date):
- 55,227,167
-Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 110•5 months ago
|
||
Corrections have been made to last week’s revocation and weekly update posts. Please review the details below:
Revocation Delay Update Corrections:
- Total certificates revoked (planned to date):
- 12,598,376 (13,758,644)
Weekly Status Updates:
Please see the updated due date for #10.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2025-11-15 | In Progress |
| Migrate cert issuance to use partitioned CRLs | Prevent | Root Cause 1 | Percentage of newly issued certificates appearing in CT logs with updated CDP endpoints pointing to partitioned CRLs. Logs and CRL URLs can be independently verified by the public. | 2025-11-15 | In Progress |
| Standup cross-signed warm standby CAs. We are currently in planning stages. We will have the plan ready before 06/14/2025 | Prevent | Root Cause 1 | Standby ICAs will be disclosed in CT logs with test certificates. Public can verify issuance and presence of standby ICAs through CT logs and Microsoft’s published CA repository. | 2025-11-30 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. The results of these exercises will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. We will provide regular updates on the burndown for G1 to G2 transition. | 2026-02-28 | New |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as we execute the plan. | 2025-12-31 | New |
| Assignee | ||
Comment 111•5 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. Please note that we are removing Action Item #2 due to redundancy. Its objective is already being addressed through our ongoing efforts under Action Item #8, which includes migrating customers to our new G2 ICAs with CRL partitioning.
| Assignee | ||
Comment 112•5 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 13,361,709 (14,508,644)
-
Remaining active certificates (total affected):
- 905,014 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,235,645
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 113•5 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. No changes at this time.
| Assignee | ||
Comment 114•5 months ago
|
||
Revocation Delay Status Update
We have found a discrepancy in our active certificates remaining data point, and have updated below.
-
Total certificates revoked (planned to date):
- 565,932 (15,008,644)
-
Remaining active certificates (total affected):
- 822,119 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 57,321,017
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
Comment 115•5 months ago
|
||
Is there a typo in
Total certificates revoked (planned to date):
565,932 (15,008,644)
as this appears to be ~13,000,000 fewer than last reported.
Comment 116•5 months ago
•
|
||
Why is it taking so long? Why is Microsoft not making a new CRL every hour instead of every week? It would not solve the size problem, but it would solve the delay problem.
| Assignee | ||
Comment 117•5 months ago
|
||
Response to Comment 115 - Malcom D
Yes this was a typo, thank you for identifying it. Please see the corrected number below:
- Total certificates revoked (planned to date):
- 13,927,641 (15,008,644)
| Assignee | ||
Comment 118•4 months ago
|
||
Response to Comment 116 - Stephan Verbücheln
Why is it taking so long? Why is Microsoft not making a new CRL every hour instead of every week? It would not solve the size problem, but it would solve the delay problem.
We are publishing CRLs at a higher frequency than weekly and performing continuous revocation batches to minimize delays. While revocations occur on an ongoing basis, we report aggregate metrics on a weekly basis.
| Assignee | ||
Comment 119•4 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. No changes at this time.
| Assignee | ||
Comment 120•4 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 14,049,083 (15,308,644)
-
Remaining active certificates (total affected):
- 700,697 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 57,321,017
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
| Assignee | ||
Comment 121•4 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. No changes at this time.
| Assignee | ||
Comment 122•4 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 14,252,559 (15,608,644)
-
Remaining active certificates (total affected):
- 497,221 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 57,321,017
-
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
Comment 123•4 months ago
|
||
(In reply to Microsoft PKI Services from comment #122)
Revocation Delay Status Update
Total certificates revoked (planned to date):
- 14,252,559 (15,608,644)
Remaining active certificates (total affected):
- 497,221 (72,070,777)
Total certificates expired and not revoked (to date):
- 57,321,017
Estimate for remaining revocations:
- We will continue to revoke certificates in batches until 11/15/2025
The past 6 weeks of planned certificates for revocation are:
10-03: 12,158,644
10-10: 13,758,644 * Corrected 10-13
10-17: 14,508,644
10-24: 15,008,644
10-31: 15,308,644
11-08: 15,608,644
There are 1,356,085 certificates still planned for revocation - just shy of 10% revoked to date. This is taking today's total planned - total revoked figure. In theory there are at most 2 revocation batches left (i.e. today 11/08/2025, and next week on 11/15/2025).
Q1: Will the revocations still occur on the planned schedule, or are there any outliers that we should know about in advance?
Not entirely sure on how maths works over there either, and this isn't the first time I've raised such a concern...
The total affected certificate count is 72,070,777. The amount of certificates expired and not revoked, to date, is 57,321,017. This leaves us with 14,749,760 certificates.
It is worth noting that 14.7 million is slightly less than the 15.6m planned revocations to date. We're off by 858,884 certificates, not a clean figure so someone didn't typo something either.
The expired count is also the same as left week, however there was a mention of an internal freeze around now that would somehow make your CA incapable of handling revocation and issuance for a period of time if my memory serves.
Q2: Could Microsoft please explain the simple discrepancies in their provided data to date?
These incidents are supposed to show other CAs your self-reflection, as well as steps others can take to improve if similar issues were to occur. We're well over 6 months into handling this 'mass revocation event', although by the numbers expiration would be more accurate. Lessons beyond the initial incident report should have been internalized but we're lacking details in the incident.
Q3: What other lessons have your organization learned that you would encourage other CAs to do?
Q4: If this incident were to occur again on November 16th, would Microsoft be capable of handling it within the BR timeframe?
I appreciate there are outstanding action items, however more than 6 months of remediation is considerable time to improve.
Q5: Is this the quality and timeliness of incident response that should be considered standard going forward?
Q6: Will Microsoft commit to a postmortem on this incident report on how they could have handled it better?
| Assignee | ||
Comment 124•4 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,136,769 (15,758,644)
- Please note that the number of certificates revoked this week is larger than the number we reported as outstanding last week. As part of the ongoing work to improve our mass revocation capabilities, we improved our tracking which helped us discover an error in estimation method for counting expired/not revoked certificates. Corrections are reflected in the numbers below.
-
Remaining active certificates (total affected):
- 85,620 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,848,388
-
Estimate for remaining revocations:
- March 6th 2026
| Assignee | ||
Comment 125•4 months ago
|
||
Weekly Status Update
We’re actively progressing through all action items outlined in the incident report. Please note that Action Item #2 is being removed due to redundancy. The main factors considered in this decision are – 1. Migration to G2 CAs with partitioned CRLs is already underway, and 2. We have already stood up warm standbys for the G2 ICAs. As a result, we will not proceed with setting up G1 warm standbys at this time. If a need to bring standbys into rotation arises, we plan to utilize G2 CAs rather than G1 warm standbys.
Additionally, we are extending the due date for Action Item #1. Our published target was to finish revocations by 11/15/2025, and we have reached very close to that with approximately 85.6K unexpired, unrevoked certs remaining. However, we underestimated the size of the final batches and have reached the 15 MB CRL size. To avoid impacting other certs and preserve space for additional revocations if new incidents occur, we will wait for CRL space to become available before revoking more certificates. We will revoke a majority of these remaining certificates in batches to maintain a CRL size of 15MB. Additionally, between 10K-20K (~0.02%) where automation and telemetry gaps are still being addressed will be further delayed for revocation. For these certificates, we will migrate those subscribers to the new CRL-sharded ICAs first and then revoke their impacted certs after. Target date to complete all revocations is March 6th 2026. We will continue to provide weekly progress updates on the revocation numbers.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2026-03-06 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. Regular updates will be provided on the burndown for G1 to G2 transition. | 2026-02-28 | In Progress |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as the plan is executed. | 2025-12-31 | In Progress |
| Assignee | ||
Comment 126•4 months ago
|
||
Response to Comment 123 - Wayne
Question 1:
Q1: Will the revocations still occur on the planned schedule, or are there any outliers that we should know about in advance?
Our published target was to finish revocations by 11/15/2025, and we have reached very close to that with approximately 85.6K unexpired, unrevoked certs remaining. However, we underestimated the size of the final batches and have reached the 15 MB CRL size. To avoid impacting other certs and preserve space for additional revocations if new incidents occur, we will wait for CRL space to become available before revoking more certificates. We will revoke a majority of these remaining certificates in batches to maintain a CRL size of 15MB. Additionally, between 10K-20K (~0.02%) where automation and telemetry gaps are still being addressed will be further delayed for revocation. For these certificates, we will migrate those subscribers to the new CRL-sharded ICAs first and then revoke their impacted certs after. Target date to complete all revocations is March 6th 2026. We will continue to provide weekly progress updates on the revocation numbers.
Question 2:
Not entirely sure on how maths works over there either, and this isn't the first time I've raised such a concern...The total affected certificate count is 72,070,777. The amount of certificates expired and not revoked, to date, is 57,321,017. This leaves us with 14,749,760 certificates.It is worth noting that 14.7 million is slightly less than the 15.6m planned revocations to date. We're off by 858,884 certificates, not a clean figure so someone didn't typo something either.The expired count is also the same as left week, however there was a mention of an internal freeze around now that would somehow make your CA incapable of handling revocation and issuance for a period of time if my memory serves.
Q 2: Could Microsoft please explain the simple discrepancies in their provided data to date?
The variance in numbers stems from the fact that, earlier in the process, we were unable to revoke certain certificates as originally planned (see comment 97). Some of these certificates subsequently expired. We continued to count them under “planned to revoke” to honor our initial commitment, but since they also expired, they were also included in the “expired” category. Further, as part of the ongoing work to improve our mass revocation capabilities, we have made improvements to our tracking which helped us discover an error in estimation method for counting expired/not revoked certificates. Corrections are reflected in the weekly metrics update.
As noted in comment 36, the internal change freeze does not impact our revocation schedule.
Question 3:
Q3: What other lessons have your organization learned that you would encourage other CAs to do?
Please see response to Question 6.
Question 4:
Q4: If this incident were to occur again on November 16th, would Microsoft be capable of handling it within the BR timeframe?
Over the past six months, we have made considerable progress in addressing the underlying root causes of this bug (CRL partitioning, new CAs, migration to new CAs, lifetime reduction, warm standbys), which are all in various stages of rollout. Once all repair actions are complete, we do expect to be in a much better position to be able to meet the BR stipulated timelines.
Question 5:
I appreciate there are outstanding action items, however more than 6 months of remediation is considerable time to improve.
Q5: Is this the quality and timeliness of incident response that should be considered standard going forward?
We always aim to improve our process and response to incidents. If you have specific actions that you would like to suggest, we are open to feedback.
Question 6:
Q6: Will Microsoft commit to a postmortem on this incident report on how they could have handled it better?
Per CCADB incident guidelines, as part of the closure report we will include any additional learnings on top of the already committed repair actions.
Comment 127•4 months ago
|
||
Once all repair actions are complete, we do expect to be in a much better position to be able to meet the BR stipulated timelines.
-
What actions, if any, do you plan to take this expectation from: ”a much better position to be able” to ”being able” to follow the BRs?
-
How did you underestimate the size of the final batch?
-
When did you discover this underestimation?
-
How did you discover that you had underestimated?
| Assignee | ||
Comment 128•4 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. No changes at this time.
| Assignee | ||
Comment 129•4 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,136,769 (15,758,644)
-
Remaining active certificates (total affected):
- 84,176 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,849,832 (72,070,777)
-
Estimate for remaining revocations:
- March 6th 2026
| Assignee | ||
Comment 130•4 months ago
|
||
Response to Comment 127 - Zacharias
Question 1
- What actions, if any, do you plan to take this expectation from: ”a much better position to be able” to ”being able” to follow the BRs?
As mentioned in comment 126, numerous investments and improvements have been made as part of remediation of this bug to address the root causes of not being able to meet the stipulated timelines. We stay committed to meeting BR requirements for revocation timelines and readiness. To address the question around further actions, as part of our regular mass revocation readiness planning, we plan on conducting ongoing testing to measure not only the effectiveness of these repairs, but also the overall preparedness as mandated by Ballot SC089 (and MRP), and apply learning from those tests as part of continuous improvement.
Question 2
- How did you underestimate the size of the final batch?
We underestimated the size of the final batch because the CRL drop-off rates were estimations rather than exact calculations (drop off rates are particularly difficult to precisely project given the volume of certificates and frequency of revocations). The Certificate Revocation List reached the 15 MB operational limit mid-week, and did not drop as quickly as projected, which prevented additional revocations until entries age out and space becomes available.
Question 3
- When did you discover this underestimation?
See response to #2.
Question 4
- How did you discover that you had underestimated?
See response to #2.
Comment 131•4 months ago
•
|
||
(In reply to Microsoft PKI Services from comment #130)
As mentioned in comment 126, numerous investments and improvements have been made as part of remediation of this bug to address the root causes of not being able to meet the stipulated timelines. We stay committed to meeting BR requirements for revocation timelines and readiness.
Previously we were told that Microsoft PKI were not ready to stick by BR requirements until March 2026 at the earliest:
Over the past six months, we have made considerable progress in addressing the underlying root causes of this bug (CRL partitioning, new CAs, migration to new CAs, lifetime reduction, warm standbys), which are all in various stages of rollout. Once all repair actions are complete, we do expect to be in a much better position to be able to meet the BR stipulated timelines.
Q1: Which is true: Microsoft PKI are capable of sticking to BR-mandated timelines as of today, or when "repairs are complete"?
To address the question around further actions, as part of our regular mass revocation readiness planning, we plan on conducting ongoing testing to measure not only the effectiveness of these repairs, but also the overall preparedness as mandated by Ballot SC089 (and MRP), and apply learning from those tests as part of continuous improvement.
Q2: Could you clarify what is mandated by SC089, and what is additional as 'MRP'?
Question 2
- How did you underestimate the size of the final batch?
We underestimated the size of the final batch because the CRL drop-off rates were estimations rather than exact calculations (drop off rates are particularly difficult to precisely project given the volume of certificates and frequency of revocations). The Certificate Revocation List reached the 15 MB operational limit mid-week, and did not drop as quickly as projected, which prevented additional revocations until entries age out and space becomes available.
Q3: Why was anything involved an estimate? Does Microsoft PKI lack inventory of their certificates issued and plan of action for revocation?
We have already heard claims of 'planned' revocations not accounting for expiration which can not be charitably read in a way compliant with a functional CA.
Q4: Given the 15MB limit was entirely fabricated late in the game after claims of a 10MB limit - there was no technical limitation as shown by far larger CRLs in practice Comment 21. Why is this still being used as an excuse for non-compliance?
Question 3
- When did you discover this underestimation?
See response to #2.
The underestimation mentioned in question 2 was in regards to CRL mismatches noticed mid-week. It does not account for hundreds of thousands of expired certificates that were 'planned' to be revoked. This is after over 6 months of alleged planning.
Q5: Can Microsoft PKI please explain when they discovered an underestimation would occur for their revocation numbers, without referring to another response?
Question 4
- How did you discover that you had underestimated?
See response to #2.
Again, the final batch was hundreds of thousands of certificates bigger. We have no excuse to date as to why November 15th is an absolute cut-off for Microsoft PKI to be capable of revocation until next March. There should be no technical or procedural block for Microsoft PKI to revoke these remaining 85k certificates within 24h or 5d if they make bold claims as to be able to stick by the baseline requirements.
Q6: Is there any intention of handling all of the remaining certificates prior to March 2026? If not, why not?
Q7: There is no transparency as to whether Microsoft PKI is truly separate from Microsoft's Root Program, what is the functional difference day-to-day on the personnel side?
Q8: Can the personnel involved honestly state how many Copilot, or similar, briefly skimmed summaries are posted to this mailing list under the guise of official communications from your compliance team?
Q9: Could the above be the actual root cause for these 'estimations' being off when plans have allegedly been drafted from a strict list of certificates to be revoked?
To say there is a lack of trust or reliability in actions to date is to be overly generous. Please try to show you are at least putting a modicum of effort into this process.
| Assignee | ||
Comment 132•3 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. No changes at this time.
| Assignee | ||
Comment 133•3 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,136,769 (15,758,644)
-
Remaining active certificates (total affected):
- 81,095 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,852,913 (72,070,777)
-
Estimate for remaining revocations:
- March 6th 2026
| Assignee | ||
Comment 134•3 months ago
|
||
Response to Comment 131 - Wayne
Question 1:
Q1: Which is true: Microsoft PKI are capable of sticking to BR-mandated timelines as of today, or when "repairs are complete"?
To address the question around further actions, as part of our regular mass revocation readiness planning, we plan on conducting ongoing testing to measure not only the effectiveness of these repairs, but also the overall preparedness as mandated by Ballot SC089 (and MRP), and apply learning from those tests as part of continuous improvement.
As mentioned in comment #126, Microsoft PKI fully anticipates meeting the BR stipulated timelines once all repair items have been completed.
Question 2
Q2: Could you clarify what is mandated by SC089, and what is additional as 'MRP'?
Question 2
How did you underestimate the size of the final batch?
We underestimated the size of the final batch because the CRL drop-off rates were estimations rather than exact calculations (drop off rates are particularly difficult to precisely project given the volume of certificates and frequency of revocations). The Certificate Revocation List reached the 15 MB operational limit mid-week, and did not drop as quickly as projected, which prevented additional revocations until entries age out and space becomes available.
By MRP, we meant Mozilla Root Program. We did not mean to infer that Mass Revocation Plan was outside the scope of SC089. The ballot refers to the formalized mass revocation plans. Please see Ballot SC089 for further details.
Question 3
Q3: Why was anything involved an estimate? Does Microsoft PKI lack inventory of their certificates issued and plan of action for revocation?
We have already heard claims of 'planned' revocations not accounting for expiration which can not be charitably read in a way compliant with a functional CA.
Given the volume of certificates and frequency of revocations precise calculation of CRL drop-off rate was difficult to pinpoint. That said, once all outstanding certificates are moved to CAs with partitioned CRLs (G2 migration repair action), we will no longer have CRL size as a limiting factor, which will make having to do revocation planning based on CRL size limitations a non-issue.
Question 4
Q4: Given the 15MB limit was entirely fabricated late in the game after claims of a 10MB limit - there was no technical limitation as shown by far larger CRLs in practice Comment 21. Why is this still being used as an excuse for non-compliance?
Question 3
When did you discover this underestimation?See response to #2.
The underestimation mentioned in question 2 was in regards to CRL mismatches noticed mid-week. It does not account for hundreds of thousands of expired certificates that were 'planned' to be revoked. This is after over 6 months of alleged planning.
We have provided explanation of the rationale for the need to manage CRL sizes in both comment #26 and comment #35. Further in comment #101 we have provided information on the decision to go to 15 MB CRLs. Additionally, comment #126 we have outlined the rationale for staying close to 15MB.
Question 5
Q5: Can Microsoft PKI please explain when they discovered an underestimation would occur for their revocation numbers, without referring to another response?
Question 4
How did you discover that you had underestimated?
See response to #2.
Again, the final batch was hundreds of thousands of certificates bigger. We have no excuse to date as to why November 15th is an absolute cut-off for Microsoft PKI to be capable of revocation until next March. There should be no technical or procedural block for Microsoft PKI to revoke these remaining 85k certificates within 24h or 5d if they make bold claims as to be able to stick by the baseline requirements.
As mentioned in comment #127, we discovered the underestimation mid-week, which is on or around 11/12.
Question 6
Q6: Is there any intention of handling all of the remaining certificates prior to March 2026? If not, why not?
As mentioned in comment #126, the target date to complete all outstanding revocations is March 6th 2026. We plan to revoke about 45k remaining certificates on December 9th , pending CRL size, followed by a second batch in early January as CRL capacity allows. The remaining certificates, including the 10–20k requiring manual rotation, will be handled post-migration per comment #126.
Question 7
Q7: There is no transparency as to whether Microsoft PKI is truly separate from Microsoft's Root Program, what is the functional difference day-to-day on the personnel side?
Microsoft’s internal organizational details are not relevant to the discussion of this incident.
Question 8
Q8: Can the personnel involved honestly state how many Copilot, or similar, briefly skimmed summaries are posted to this mailing list under the guise of official communications from your compliance team?
Posts in this forum are official posts from the Microsoft PKI Services organization.
Question 9
Q9: Could the above be the actual root cause for these 'estimations' being off when plans have allegedly been drafted from a strict list of certificates to be revoked?
To say there is a lack of trust or reliability in actions to date is to be overly generous. Please try to show you are at least putting a modicum of effort into this process.
See Response to #8
| Assignee | ||
Comment 135•3 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. No changes at this time.
| Assignee | ||
Comment 136•3 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,136,769 (15,758,644)
-
Remaining active certificates (total affected):
- 77,926 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,856,082 (72,070,777)
-
Estimate for remaining revocations:
- March 6th 2026
Comment 137•3 months ago
|
||
3169 certificates expired, and no revocations occurred.
Q1: Are they any updates on the revocations planned in the following months?
(In reply to Microsoft PKI Services from comment #134)
Question 1:
Q1: Which is true: Microsoft PKI are capable of sticking to BR-mandated timelines as of today, or when "repairs are complete"?
To address the question around further actions, as part of our regular mass revocation readiness planning, we plan on conducting ongoing testing to measure not only the effectiveness of these repairs, but also the overall preparedness as mandated by Ballot SC089 (and MRP), and apply learning from those tests as part of continuous improvement.
As mentioned in comment #126, Microsoft PKI fully anticipates meeting the BR stipulated timelines once all repair items have been completed.
My point is you are saying entirely contradictory answers to people in the same incident for the same questions. The latest update to the ongoing repairs has the due date shifted from Dec 31st 2025, to Feb 28th 2026.
Q2: Does Microsoft PKI intend to say out of compliance until at least February 28th 2026?
Question 2
Q2: Could you clarify what is mandated by SC089, and what is additional as 'MRP'?
By MRP, we meant Mozilla Root Program. We did not mean to infer that Mass Revocation Plan was outside the scope of SC089. The ballot refers to the formalized mass revocation plans. Please see Ballot SC089 for further details.
Fair enough, that was a bit confusing when originally written.
Q3: Why was anything involved an estimate? Does Microsoft PKI lack inventory of their certificates issued and plan of action for revocation?
We have already heard claims of 'planned' revocations not accounting for expiration which can not be charitably read in a way compliant with a functional CA.
Given the volume of certificates and frequency of revocations precise calculation of CRL drop-off rate was difficult to pinpoint. That said, once all outstanding certificates are moved to CAs with partitioned CRLs (G2 migration repair action), we will no longer have CRL size as a limiting factor, which will make having to do revocation planning based on CRL size limitations a non-issue.
I fail to see actual answers to my question, let me rephrase.
Q3: Did Microsoft PKI create specific revocation dates attached to specific certificates for revocation during their planning?
Regardless of whether revocation was per-certificate or bucketed, it does rely on knowing the impacted certificates and at the very least their not_after date to account for when they'd drop off of the CRL. Otherwise I'm not entirely sure what is being planned.
Q4: As of today does Microsoft PKI have a fixed list of certificates impacted that it is checking against to provide these weekly figures?
Q5: If the list is fixed, how are these figures generated as to cause problems weekly? I do mean, how, as in the literal step-by-step process that is causing these errors.
At no point in any step of this process should estimates be involved, all of the information is at your disposal. The most generous read I can give is that 90% of the CRL list was set aside for this revocation and the other 10% was pushing past projections. Again, something that could be communicated in this incident in a clear and transparent manner, but I am instead stuck with guessing as to what the CA has been doing these past 7 months.
Q6: Could Microsoft PKI please try to explain the situation through poetry how they take the list of planned certificates and decide which need to be revoked?
Q4: Given the 15MB limit was entirely fabricated late in the game after claims of a 10MB limit - there was no technical limitation as shown by far larger CRLs in practice Comment 21. Why is this still being used as an excuse for non-compliance?
Question 3
When did you discover this underestimation?See response to #2.
The underestimation mentioned in question 2 was in regards to CRL mismatches noticed mid-week. It does not account for hundreds of thousands of expired certificates that were 'planned' to be revoked. This is after over 6 months of alleged planning.
We have provided explanation of the rationale for the need to manage CRL sizes in both comment #26 and comment #35. Further in comment #101 we have provided information on the decision to go to 15 MB CRLs. Additionally, comment #126 we have outlined the rationale for staying close to 15MB.
None of which explain why 3 months are required to handle revocation of 77,926 certificates. That is with over 7 months of notice and planning to get to this stage.
Q5: Can Microsoft PKI please explain when they discovered an underestimation would occur for their revocation numbers, without referring to another response?
Question 4
How did you discover that you had underestimated?
See response to #2.
Again, the final batch was hundreds of thousands of certificates bigger. We have no excuse to date as to why November 15th is an absolute cut-off for Microsoft PKI to be capable of revocation until next March. There should be no technical or procedural block for Microsoft PKI to revoke these remaining 85k certificates within 24h or 5d if they make bold claims as to be able to stick by the baseline requirements.As mentioned in comment #127, we discovered the underestimation mid-week, which is on or around 11/12.
So in summary Microsoft PKI became aware that their planning was going to fall short on November 12th, 3 days before the deadline they set. Presumably after becoming aware discussions occurred.
Q7: Were any alternative proposals suggested that would have handled the remaining certificates before March 6th 2026?
Q8: What were the proposals, and why was the current plan considered the most feasible?
These are things that other CAs would benefit from knowing to inform their processes and decision making.
Q6: Is there any intention of handling all of the remaining certificates prior to March 2026? If not, why not?
As mentioned in comment #126, the target date to complete all outstanding revocations is March 6th 2026. We plan to revoke about 45k remaining certificates on December 9th , pending CRL size, followed by a second batch in early January as CRL capacity allows. The remaining certificates, including the 10–20k requiring manual rotation, will be handled post-migration per comment #126.
Q9: Were a key compromise to occur for these 10-20k remaining certificates would Microsoft PKI be willing to revoke them before the 'post-migration' period?
December 1st mandated the inclusion of Mass Revocation Plans into the CPS (5.7.1.2), which has been added. Notably while these are not public plans they are intended to show that a CA is capable of handling a mass revocation.
Mozilla Root Program Policy, 6.1.3 states:
Beginning September 1, 2025, each CA operator MUST:
- engage in proactive communication and advise subscribers well in advance about the revocation timelines and explicitly warn them against using publicly-trusted TLS server certificates on systems that cannot tolerate timely revocation;
By Microsoft PKI's own admission they have a customer base 3 months after that cannot tolerate timely revocation. We have manual-revocation certificates still being nudged to the back of the queue.
Dec 9th: ~45k planned
Early Jan: 'More' planned
Post-migration: 10-20k manual rotation certificates.
Q10: Have the remaining subscribers been warned that they are using certificates in a manner that is not tolerating timely revocation?
Q11: If I'm a subscriber with a certificate due to be revoked what functionally am I being told other than a revocation will occur before next March? Can I bring that date forward at all, or request one that's more convenient for me?
Q7: There is no transparency as to whether Microsoft PKI is truly separate from Microsoft's Root Program, what is the functional difference day-to-day on the personnel side?
Microsoft’s internal organizational details are not relevant to the discussion of this incident.
It was my hope that Microsoft PKI would state they were entirely separate, and would therefore have the ability to bring in more expertise to provide an internal overview of how to proceed in a more professional and compliant manner going forward. Given that no transparency is intentional, there is no point in pursuing any hopes of improvement in that manner. We're not talking trade secrets here, just whether institutional knowledge could be called upon.
Q8: Can the personnel involved honestly state how many Copilot, or similar, briefly skimmed summaries are posted to this mailing list under the guise of official communications from your compliance team?
Posts in this forum are official posts from the Microsoft PKI Services organization.
Q12: Is Copilot used in the summation or generation of responses to this incident?
Anything you post would be considered an official post from the organization, that was nowhere near the question posed. I had been hoping for an explanation for the competence in reading comprehension and general lack of answers in this incident.
Q9: Could the above be the actual root cause for these 'estimations' being off when plans have allegedly been drafted from a strict list of certificates to be revoked?
To say there is a lack of trust or reliability in actions to date is to be overly generous. Please try to show you are at least putting a modicum of effort into this process.
See Response to #8
Q13: Is Copilot involved in generating any figures to provide weekly revocation/expiration figures?
I am looking for some transparency here as it would explain a lot about the responses to date, and where improvements could be made going forward. This is supposed to be a blameless postmortem analysis, so making it clear that an unreliable tool is being used would be quite beneficial to explaining things.
| Assignee | ||
Comment 138•3 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. We have also updated the due date for action item #7. This is because we need to plan for additional features in our service and upstream certificate lifecycle management automation systems to be able to safely and transparently rotate the ICAs.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2026-03-06 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. Regular updates will be provided on the burndown for G1 to G2 transition. | 2026-02-28 | In Progress |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as the plan is executed. | 2025-12-31 | In Progress |
| Assignee | ||
Comment 139•3 months ago
|
||
Revocation Delay Status Update
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,818,802 (15,758,644)
-
Remaining active certificates (total affected):
- 31,555 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,857,420 (72,070,777)
-
Estimate for remaining revocations:
- March 6th 2026
| Assignee | ||
Comment 140•3 months ago
|
||
Response to Comment 137 - Wayne
Q1:
3169 certificates expired, and no revocations occurred.
Q1: Are they any updates on the revocations planned in the following months?
As noted in Comment 134 (response to Question 6), we planned to revoke a batch of ~45,000 certificates on December 9, which was completed (See weekly status update). And as mentioned in comments 134 and comment 126, another round of revocations is planned for January with a final batch planned in early March.
Q2:
Q2: Does Microsoft PKI intend to say out of compliance until at least February 28th 2026?
Yes. Microsoft PKI will remain out of compliance for the affected requirements until all repair items are completed. This is expected for an active incident, and the Bugzilla report remains open to reflect the ongoing remediation work.
Q3:
Q3: Did Microsoft PKI create specific revocation dates attached to specific certificates for revocation during their planning?
We do have a full inventory of our certificates including their metadata. Due to the size of the affected certificates, we planned for batches using batch sizes and the certificates to be revoked were dynamically picked via automation while operating within the CRL sizes limits.
Q4 & Q5
Q4: As of today does Microsoft PKI have a fixed list of certificates impacted that it is checking against to provide these weekly figures?
Q5: If the list is fixed, how are these figures generated as to cause problems weekly? I do mean, how, as in the literal step-by-step process that is causing these errors.
Yes. We have a fixed list of outstanding affected certificates and we have assigned the certificates to one of 3 batches that were mentioned in comment 126. All current figures and schedules are based on this known inventory, not estimates, and we will continue to provide weekly updates to maintain transparency.
Q6:
Q6: Could Microsoft PKI please try to explain the situation through poetry how they take the list of planned certificates and decide which need to be revoked?
Till ~end of October, since the number of certificates available for revocations was significantly larger than number we could revoke, the revocation decisions were dynamically made based on below factors:
• The issuing CA for the certificate
• The current CRL size for that CA relative to the operational limit
• The certificate’s expiration date. Certificates which were closer to expiration were given precedence so that they would fall off the CRL sooner, thus making space for more.
For certificates expiring after 11/15, we moved to tracking revocations on a per certificate basis. As of this update, we have 31,555 certificates which now remain to be revoked. Those certificates will be revoked in one of the 2 remaining batches.
Q7:
Q7: Were any alternative proposals suggested that would have handled the remaining certificates before March 6th 2026?
There were no alternatives available which could have worked within the CRL size limitations. Moving all subscribers to CAs with CRL Sharding is the only way for us to eliminate this dependency, and we have already started working on migrating customers to such CAs (repair item #6).
Q8:
Q8: What were the proposals, and why was the current plan considered the most feasible?
Please see response to Question 7.
Q9:
Q9: Were a key compromise to occur for these 10-20k remaining certificates would Microsoft PKI be willing to revoke them before the 'post-migration' period?
Yes, in the case of a key compromise we would revoke the affected certificates.
Q10:
Q10: Have the remaining subscribers been warned that they are using certificates in a manner that is not tolerating timely revocation?
Yes, this is part of our subscriber agreement. As part of this revocation/migration exercise we have re-emphasized this point to our subscribers, including the ones that own the remaining 10-20K.
Q11:
Q11: If I'm a subscriber with a certificate due to be revoked what functionally am I being told other than a revocation will occur before next March? Can I bring that date forward at all, or request one that's more convenient for me?
Subscribers are instructed to replace impacted certificates by migrating to a new CA before the scheduled revocation date. Earlier revocation can be requested if migration is complete sooner.
Q12:
Q12: Is Copilot used in the summation or generation of responses to this incident?
All our responses are drafted by humans (who may use assistive AI tools to summarize/clarify their drafts). Every response goes through multiple rounds of human review to ensure accuracy and alignment with organizational standards.
Q13:
Q13: Is Copilot involved in generating any figures to provide weekly revocation/expiration figures?
No, Copilot is not involved in generating weekly revocation or expiration figures.
Comment 141•3 months ago
|
||
(In reply to Microsoft PKI Services from comment #139)
Revocation Delay Status Update
Revocation Delay Status Update
Total certificates revoked (planned to date):
- 15,818,802 (15,758,644)
Remaining active certificates (total affected):
- 31,555 (72,070,777)
Total certificates expired and not revoked (to date):
- 56,857,420 (72,070,777)
Estimate for remaining revocations:
- March 6th 2026
I expected this to be corrected before now but alas.
Q1: Why are there more certificates revoked than planned?
Q2: Why are there more certificates revoked and expired than the total impacted certificate figure?
Q3: Despite this being a recurring issue throughout this incident no improvements to basic maths and proofreading your data before publication have occurred. What is the cause of these ongoing issues, and will this be improved in the immediate future?
A large part of the incident response is an openness and transparency on the CA's behalf with intent to improve. The answers and weekly updates to date have only shown a further erosion of any trust that remained. There is a lack of oversight and inventory ongoing that is obvious to outside observers, but it is the lack of self-reflection that has been most damning.
(In reply to Microsoft PKI Services from comment #140)
Response to Comment 137 - Wayne
Q1:
3169 certificates expired, and no revocations occurred.
Q1: Are they any updates on the revocations planned in the following months?As noted in Comment 134 (response to Question 6), we planned to revoke a batch of ~45,000 certificates on December 9, which was completed (See weekly status update). And as mentioned in comments 134 and comment 126, another round of revocations is planned for January with a final batch planned in early March.
The action item of "migration of all customers to G2 ICAs with CRL partitioning" was moved to late February. That was where the January 'revocations' were alleged to be planned. The reason I am asking is a) because of that, and b) so we have all of the information in one place rather than multiple conflicting comments.
Q4: What does the plan going forward look like with specificity to dates, cert quantity, and whether the certs will be revoked or left to expire?
Q2:
Q2: Does Microsoft PKI intend to say out of compliance until at least February 28th 2026?
Yes. Microsoft PKI will remain out of compliance for the affected requirements until all repair items are completed. This is expected for an active incident, and the Bugzilla report remains open to reflect the ongoing remediation work.
It is expected in terms of those impacted certificates that Microsoft PKI would be out of compliance. The question was in terms of overall Mass Revocation Planning, where you were required to be in compliance months ago outwith this incident.
Q5: Can Microsoft PKI please explain with specificity which parts of the baseline requirements and root program policies they will not be compliant with for the foreseeable future?
Q3:
Q3: Did Microsoft PKI create specific revocation dates attached to specific certificates for revocation during their planning?
We do have a full inventory of our certificates including their metadata. Due to the size of the affected certificates, we planned for batches using batch sizes and the certificates to be revoked were dynamically picked via automation while operating within the CRL sizes limits.
Q4 & Q5Q4: As of today does Microsoft PKI have a fixed list of certificates impacted that it is checking against to provide these weekly figures?
Q5: If the list is fixed, how are these figures generated as to cause problems weekly? I do mean, how, as in the literal step-by-step process that is causing these errors.
Yes. We have a fixed list of outstanding affected certificates and we have assigned the certificates to one of 3 batches that were mentioned in comment 126. All current figures and schedules are based on this known inventory, not estimates, and we will continue to provide weekly updates to maintain transparency.
As stated at the start the latest Revocation Delay Status Update has no bearing on reality.
Q6: Can Microsoft PKI please give an actual explanation for how they generate and produce these figures on a weekly basis? A counter-example of how you obtained the wildly erroneous figures purported to be the state of this incident last week would also be beneficial.
Frankly, we are well beyond a simple typo.
Q6:
Q6: Could Microsoft PKI please try to explain the situation through poetry how they take the list of planned certificates and decide which need to be revoked?
Till ~end of October, since the number of certificates available for revocations was significantly larger than number we could revoke, the revocation decisions were dynamically made based on below factors:
• The issuing CA for the certificate
• The current CRL size for that CA relative to the operational limit
• The certificate’s expiration date. Certificates which were closer to expiration were given precedence so that they would fall off the CRL sooner, thus making space for more.For certificates expiring after 11/15, we moved to tracking revocations on a per certificate basis. As of this update, we have 31,555 certificates which now remain to be revoked. Those certificates will be revoked in one of the 2 remaining batches.
Thank you for this information it is the closest we've gotten to hearing how decisions are being made.
Q7:
Q7: Were any alternative proposals suggested that would have handled the remaining certificates before March 6th 2026?
There were no alternatives available which could have worked within the CRL size limitations. Moving all subscribers to CAs with CRL Sharding is the only way for us to eliminate this dependency, and we have already started working on migrating customers to such CAs (repair item #6).
Q8:
Q8: What were the proposals, and why was the current plan considered the most feasible?
Please see response to Question 7.
Q7: Can Microsoft PKI please explain with examples why it will take until March to revoke the remaining 31k certificates?
Q9:
Q9: Were a key compromise to occur for these 10-20k remaining certificates would Microsoft PKI be willing to revoke them before the 'post-migration' period?
Yes, in the case of a key compromise we would revoke the affected certificates.
Q8: Would revocation be handled within the baseline requirements timeline, or pushed off until March? That the certificates would be planned for revocation is hardly reassuring.
Q10: Have the remaining subscribers been warned that they are using certificates in a manner that is not tolerating timely revocation?
Yes, this is part of our subscriber agreement. As part of this revocation/migration exercise we have re-emphasized this point to our subscribers, including the ones that own the remaining 10-20K.
Q9: Were any certificates initially in batch 1 or 2, but changed to a later batch? If so, how many and what reasons were provided.
Q11: If I'm a subscriber with a certificate due to be revoked what functionally am I being told other than a revocation will occur before next March? Can I bring that date forward at all, or request one that's more convenient for me?
Subscribers are instructed to replace impacted certificates by migrating to a new CA before the scheduled revocation date. Earlier revocation can be requested if migration is complete sooner.
Q10: How many subscribers requested an earlier replacement, and upon issuance did this result in their original certificate being revoked earlier?
Q12:
Q12: Is Copilot used in the summation or generation of responses to this incident?
All our responses are drafted by humans (who may use assistive AI tools to summarize/clarify their drafts). Every response goes through multiple rounds of human review to ensure accuracy and alignment with organizational standards.
Q13:
Q13: Is Copilot involved in generating any figures to provide weekly revocation/expiration figures?
No, Copilot is not involved in generating weekly revocation or expiration figures.
Then sadly the human review is quite lacking and needs to be improved upon. I appreciate these is a lot of upheaval occurring internally, but that does not reduce the expectations bestowed upon you as a certificate authority.
| Assignee | ||
Comment 142•3 months ago
|
||
Weekly Status Update
We continue to make progress on the action items outlined in the incident reports. As previously noted, the due date for Action Item #7 was expected to be updated; this change was inadvertently not reflected in the table and has now been corrected. In addition, we plan to complete our second revocation batch on January 7, revoking 19,520 certificates based on current CRL capacity.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2026-03-06 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. Regular updates will be provided on the burndown for G1 to G2 transition. | 2026-02-28 | In Progress |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as the plan is executed. | 2026-02-01 | In Progress |
| Assignee | ||
Comment 143•3 months ago
|
||
Response to Comment 141 - Part 1 - Wayne
To keep responses clear and organized, this comment addresses Questions 1–4. We will follow up with responses to Questions 5–10 in a subsequent comment.
Q1:
Q1: Why are there more certificates revoked than planned?
We apologize for the incorrect revoked figure in the prior update; this was a human error in the published value. There are not more certificates revoked than planned. The previously reported revoked figure (15,818,802) was incorrect. The correct value is 15,181,802, which is below the planned figure of 15,758,644.
Q2:
Q2: Why are there more certificates revoked and expired than the total impacted certificate figure?
With the corrected revoked value, the figures reconcile as expected. The earlier mismatch was caused by the incorrect revoked figure in the prior update. Using the corrected values, the totals align to the total impacted certificate count of 72,070,777.
Q3:
Q3: Despite this being a recurring issue throughout this incident no improvements to basic maths and proofreading your data before publication have occurred. What is the cause of these ongoing issues, and will this be improved in the immediate future?
A large part of the incident response is an openness and transparency on the CA's behalf with intent to improve. The answers and weekly updates to date have only shown a further erosion of any trust that remained. There is a lack of oversight and inventory ongoing that is obvious to outside observers, but it is the lack of self-reflection that has been most damning
We acknowledge the concern regarding accuracy and presentation of incident updates. While the underlying data sources and remediation activities have remained consistent, we recognize the importance of continually improving how information is validated and communicated.
To address this, we are refining our existing workflows by adding additional validation and review steps prior to publication. These enhancements are intended to further reduce manual handling and improve consistency and reliability in future updates. We remain committed to transparency and continuous improvement and expect these refinements to improve the quality of future reporting.
Q4:
Q4: What does the plan going forward look like with specificity to dates, cert quantity, and whether the certs will be revoked or left to expire?
We have noted in our weekly status update that our next revocation batch will occur on 1/7/2026 where we plan to revoke 19,520 on that date.
We will plan to complete the final revocation batch for the remaining certificates after migration to be done by March 6th, as noted in Comment 126.
| Assignee | ||
Comment 144•3 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,181,802 (15,758,644)
-
Remaining active certificates (total affected):
- 31,117 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,857,858 (72,070,777)
-
Estimate for remaining revocations:
- March 6th, 2026
| Assignee | ||
Comment 145•3 months ago
|
||
Response to Comment 141 - Part 2 - Wayne
To keep responses clear and organized, this comment addresses Questions 5-10.
Q5:
Q5: Can Microsoft PKI please explain with specificity which parts of the baseline requirements and root program policies they will not be compliant with for the foreseeable future?
This Bugzilla remains open for non-compliance with TLS Baseline Requirements Section 4.9.1.1 (Reasons for Revoking a Subscriber Certificate), as documented in our Full Incident Report.
This is the only Baseline Requirements section implicated by this incident; all other Baseline Requirements and root program policy requirements remain in compliance except those specific to other Bugzilla's that Microsoft PKI services currently have open. The Bugzilla will remain open until the committed remediation actions are completed, and compliance with this requirement is fully restored.
Q6:
Q6: Can Microsoft PKI please give an actual explanation for how they generate and produce these figures on a weekly basis? A counter-example of how you obtained the wildly erroneous figures purported to be the state of this incident last week would also be beneficial.
Weekly figures are generated from a fixed inventory of affected certificates that serves as the source of truth for this incident. Counts for revoked, expired, and remaining certificates are derived through automated aggregation against this inventory.
The incorrect value reported in the prior update was introduced during the preparation of the published status summary, not from the underlying inventory or aggregation logic. We have added additional validation and review steps prior to publication to reduce the risk of similar errors going forward.
Q7:
Q7: Can Microsoft PKI please explain with examples why it will take until March to revoke the remaining 31k certificates?
As previously described in Comment 134, the affected certificates were divided into three revocation batches based on operational constraints.
- Batch 1 and Batch 2 consist of certificates that could be revoked within existing CRL size limits and automation capabilities. These batches were executed in December and are planned for January, respectively.
- Batch 3, which accounts for the remaining ~10k certificates, was noted in Comment 126 to include certificates with automation and telemetry gaps that are still being addressed. As a result, these certificates will be revoked after subscribers are migrated to CAs with CRL partitioning, with revocation planned following completion of that migration.
Q8:
Q8: Would revocation be handled within the baseline requirements timeline, or pushed off until March? That the certificates would be planned for revocation is hardly reassuring.
In the event of a key compromise affecting any certificate within the scope you defined, we would treat that as a separate incident, independent of the remediation work described in this Bugzilla. In such a scenario, we would follow the Baseline Requirements for key compromise response, including the associated revocation timelines.
The March 6, 2026 date applies only to the completion of the planned, incident driven revocation batches associated with this Bugzilla.
Q9:
Q9: Were any certificates initially in batch 1 or 2, but changed to a later batch? If so, how many and what reasons were provided.
Our original projection for the first batch was 48K revocations, but we adjusted this to 45K to stay within CRL size constraints. The remaining ~3K certificates will be processed as part of the second batch. No Certificates from batch 1 or 2 were moved to later batches.
Q10:
Q10: How many subscribers requested an earlier replacement, and upon issuance did this result in their original certificate being revoked earlier?
We have not encountered this scenario to date. Impacted subscribers are being actively migrated as part of a coordinated remediation effort, rather than through ad hoc early replacement requests.
If a subscriber completes migration ahead of schedule, the original certificate becomes eligible for earlier revocation, subject to the same operational constraints described elsewhere in this incident.
| Assignee | ||
Comment 146•3 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. No changes at this time.
| Assignee | ||
Comment 147•3 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,181,802 (15,758,644)
-
Remaining active certificates (total affected):
- 30,673 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,858,302 (72,070,777)
-
Estimate for remaining revocations:
- March 6th, 2026
| Assignee | ||
Comment 148•2 months ago
|
||
Weekly Status Update
We are actively progressing through all repair items identified in the incident report. No major changes at this time.
| Assignee | ||
Comment 149•2 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,181,802 (15,758,644)
-
Remaining active certificates (total affected):
- 30,144 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,858,831 (72,070,777)
-
Estimate for remaining revocations:
- March 6th, 2026
| Assignee | ||
Comment 150•2 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. No changes at this time.
| Assignee | ||
Comment 151•2 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,201,377 (15,758,644)
-
Remaining active certificates (total affected):
- 10,569 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,858,831 (72,070,777)
-
Estimate for remaining revocations:
- March 6th, 2026
Please also note that we completed the revocations for Batch 2 this week, revoking 19,575 certificates.
| Assignee | ||
Comment 152•2 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. No changes at this time.
| Assignee | ||
Comment 153•2 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,201,377 (15,758,644)
-
Remaining active certificates (total affected):
- 10,332 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,859,068 (72,070,777)
-
Estimate for remaining revocations:
- March 6th, 2026
| Assignee | ||
Comment 154•2 months ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,201,377 (15,758,644)
-
Remaining active certificates (total affected):
- 9,782 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,859,618 (72,070,777)
-
Estimate for remaining revocations:
- March 6th, 2026
| Assignee | ||
Comment 155•2 months ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. Please note the update to the due date of action item #9. The due date is being extended as additional time is required to complete the rotation plan.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2026-03-06 | In Progress |
| Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. Regular updates will be provided on the burndown for G1 to G2 transition. | 2026-02-28 | In Progress |
| Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Prevent | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as the plan is executed. | 2026-03-06 | In Progress |
| Assignee | ||
Comment 156•1 month ago
|
||
Weekly Status Update
We are actively progressing through all repair items identified in the incident report. No major changes at this time.
| Assignee | ||
Comment 157•1 month ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,201,377 (15,758,644)
-
Remaining active certificates (total affected):
- 9,013 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,860,387 (72,070,777)
-
Estimate for remaining revocations:
- March 6th, 2026
Comment 158•1 month ago
|
||
(In reply to Microsoft PKI Services from comment #155)
Please note the update to the due date of action item #9. The due date is being extended as additional time is required to complete the rotation plan.
Can you give more detail on why an extension of a month is necessary for completing an action item that does not involve any user-facing changes? What roadblocks were encountered that prevented completing this plan on the originally-committed timeline? Why was extending the remediation timeline the best solution to those roadblocks?
Comment 159•1 month ago
|
||
(In reply to Microsoft PKI Services from comment #155)
Please note the update to the due date of action item #9. The due date is being extended as additional time is required to complete the rotation plan.
There only appear to be 8 action items in this list, so I am at a loss to identify #9.
It would be useful to explicitly number the action items when referring to them by their index number alone.
| Assignee | ||
Comment 160•1 month ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. No changes at this time.
| Assignee | ||
Comment 161•1 month ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,201,377 (15,758,644)
-
Remaining active certificates (total affected):
- 8,297 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,861,103 (72,070,777)
-
Estimate for remaining revocations:
- March 6th, 2026
| Assignee | ||
Comment 162•1 month ago
|
||
Response to Comment 159 - Malcolm D
There only appear to be 8 action items in this list, so I am at a loss to identify #9.
It would be useful to explicitly number the action items when referring to them by their index number alone.
Thank you for the correction. There are only eight repair items; we were referring to action item #8 in our previous response, not #9. As suggested, we will explicitly number the action items.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| #1 - Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2026-03-06 | In Progress |
| #2 - Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| #3 - Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| #4 - Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| #5 - Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| #6 - Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| #7 - Complete migration of all customers to G2 ICAs with CRL partitioning and eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from G2 ICAs with partitioned CRLs, visible in CT logs. Public can verify through CCADB hierarchy updates and issuance patterns in CT. Internal tracking will confirm deprecation of non-partitioned ICAs. Regular updates will be provided on the burndown for G1 to G2 transition. | 2026-02-28 | In Progress |
| #8 - Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Prevent | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as the plan is executed. | 2026-03-06 | In Progress |
| Assignee | ||
Comment 163•1 month ago
|
||
Response to Comment 158 - Aaron Gable
Can you give more detail on why an extension of a month is necessary for completing an action item that does not involve any user-facing changes? What roadblocks were encountered that prevented completing this plan on the originally-committed timeline? Why was extending the remediation timeline the best solution to those roadblocks?
During planning, we identified additional upstream dependencies that require further planning and coordination. These dependencies were the primary blocker to complete the work within the original timeline. Extending the remediation date allows us to appropriately address these dependencies.
Comment 164•1 month ago
|
||
(In reply to Microsoft PKI Services from comment #163)
Response to Comment 158 - Aaron Gable
Can you give more detail on why an extension of a month is necessary for completing an action item that does not involve any user-facing changes? What roadblocks were encountered that prevented completing this plan on the originally-committed timeline? Why was extending the remediation timeline the best solution to those roadblocks?
During planning, we identified additional upstream dependencies that require further planning and coordination. These dependencies were the primary blocker to complete the work within the original timeline. Extending the remediation date allows us to appropriately address these dependencies.
Could you answer the 3 questions provided in detail and with specificity as to when you became aware, the dependencies involved, and why other plans were not possible?
If you do not wish to actually participate in responding to questions in a good faith manner it would be more intellectually honest to say so. Having to craft specific questions that are deigned to be responded to only late on Fridays is not respectful to the process. It's even counter to prior claims made 7 months ago in Comment 31:
Response to Comment 24 from Wayne
"Do we have a more detailed revocation plan yet? Currently we seem to be stalling until November when the final certificates will expire instead of being revoked as required.
If there is no intent for these certificate to ever be revoked, why are they listed as 'planned for revocation'?
I am dismayed at the attention this incident is receiving and the lack of pro-activeness. We have regular reports late Friday, and no sign that this is being treated with any severity internally. If we are to learn of the plan through weekly questioning then please advise us in advance so the questions can be more thoroughly worded."We acknowledge the concerns raised and want to clarify that Microsoft remains fully committed to revoking as many of the affected certificates as we can while managing the CRL size constraints described in our full incident report.
Earlier this week, we published an updated revocation plan with scheduled batches through November. Revocations began on May 28, 2025, and certificates marked as “planned for revocation” are actively queued for upcoming batches. We would also like to acknowledge your asks related to certs that were already expired before we began revocations, and will provide that in the next update.
We recognize that our responses have been largely following the 7-day response window due to the number of bugs we are concurrently managing and hope to shorten this to 3 days in the future.
We appreciate the feedback and will continue improving the clarity of our weekly updates to provide better visibility into progress and planning.
My comments from that date still stand and are a testament to the substantial progress your CA has made in improving their practices.
Comment 165•1 month ago
|
||
Can we get a week-by-week breakdown table of certificates that have been revoked this week, expired this week, and still live? I'm trying to reconstruct the revocation history and as soon as June 2025 the numbers become either erroneous or at least difficult to follow.
Comment 166•1 month ago
|
||
We’d like to remind Microsoft of a few of the expectations set in the CCADB Incident Reporting Guidelines (IRGs).
The IRGs encourage CA Owners to “avoid generic statements that do not address specific issues raised in the report or in response to community questions or comments.” As an example, and as identified in Comment 164, we do not consider the response in Comment 163 to directly address the questions asked in Comment 158.
Comment 167•1 month ago
|
||
This incident is categorized as a 'Failure to Revoke in 5 Days,' yet it has now been 278 days since this bug was opened. As of this writing, a significant volume of affected, non-expired certificates remain unrevoked, well past the CA's (original) stated deadline of November 15, 2025.
The justifications regarding CRL size constraints and subscriber impact are unpersuasive given the time elapsed; technical debt does not excuse prolonged non-compliance with the BRs.
Furthermore, this CA Owner is also a Root Store Operator and, in my opinion, should be modeling exemplary adherence to root program requirements. Continued failure to revoke sets a dangerous precedent for the ecosystem, signaling to other CAs that revocation timelines are perhaps negotiable, if the entity is sufficiently large.
| Assignee | ||
Comment 168•1 month ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,201,377 (15,758,644)
-
Remaining active certificates (total affected):
- 7,846 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,861,554 (72,070,777)
-
Estimate for remaining revocations:
- March 6th, 2026
| Assignee | ||
Comment 169•1 month ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. Please note that we are updating the scope of Action Item #7 to focus on the core objective of eliminating all new certificate issuance from non-partitioned ICA.
The original repair item addressed CRL bloat through migration to G2 ICAs with CRL partitioning. However, G2 migration is taking longer than planned. A key reason for the delay is the discovery of more usage of certificates for client authentication than was originally anticipated. G2 only supports server authentication, requiring subscribers moving to G2 to make code changes before moving. To mitigate the CRL bloat issue according to the original schedule, we are now also implementing CRL partitioning on our G1 ICAs. As a result, by February 28th, 2026, all new certificate issuance from Microsoft PKI services will be from our G1 or G2s ICAs with partitioned CRLs, addressing the CRL bloat issue.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| #1 Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2026-03-06 | In Progress |
| #2 Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| #3 Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| #4 Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. | 2025-09-01 | Complete |
| #5 Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| #6 Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. | 2025-09-22 | Complete |
| #7 Eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from ICAs with partitioned CRLs (G1 or G2), visible in CT logs. Public can verify through CCADB hierarchy updates showing CRL partitioning. | 2026-02-28 | In Progress |
| #8 Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Prevent | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs. | 2026-03-06 | In Progress |
| Assignee | ||
Comment 170•1 month ago
|
||
Response to Comment 164 - Wayne
Could you answer the 3 questions provided in detail and with specificity as to when you became aware, the dependencies involved, and why other plans were not possible?
- When did we become aware of the dependencies, and what were they?
In January 2026, while drafting the ICA rotation plan, we discovered new cases where internal relying parties are using ICA pinning as a threat mitigation. We needed additional time to devise solutions for these cases to prevent disruptions as ICAs move to shorter lifetimes. At this time, we believe all major dependencies have been identified and will publish a comprehensive plan by March 6th.
- What roadblocks did we encounter that prevented completing the plan on the original timeline?
The main roadblock was the discovery of internal ICA pinning usage. We identified this dependency during our planning which must be incorporated into the rotation plan to ensure future ICA rotation can be executed without service disruption.
- Why was extending the timeline the best solution?
We needed additional time to account for pinning scenarios by internal relying parties to ensure the plan would be viable.
In regards to our response cadence, we thank you for the feedback. We acknowledge that in Comment 31 we said we hoped to shorten response times to ~3 days, and we have not consistently met that expectation. Going forward, we are adjusting our internal processes to respond within ~3 business days more consistently when feasible, rather than routinely using the end of the 7 day window. day window.
| Assignee | ||
Comment 171•1 month ago
|
||
Response to Comment 165 - Bkerley
Can we get a week-by-week breakdown table of certificates that have been revoked this week, expired this week, and still live? I'm trying to reconstruct the revocation history and as soon as June 2025 the numbers become either erroneous or at least difficult to follow.
Thank you for requesting additional clarity on our weekly revocation telemetry. We understand the importance of transparent, consistent reporting.
We will provide a comprehensive week-by-week breakdown covering the entire revocation period from May 28, 2025 through present, showing the number of:
- Certificates revoked each week
- Certificates expired each week
- Certificates remaining active at the end of each week
This breakdown will be included in next week’s update. We are finalizing the table to ensure consistent presentation across the entire reporting period.
| Assignee | ||
Comment 172•1 month ago
|
||
Response to Comment 166 - Chrome Root Program
We’d like to remind Microsoft of a few of the expectations set in the CCADB Incident Reporting Guidelines (IRGs).
The IRGs encourage CA Owners to “avoid generic statements that do not address specific issues raised in the report or in response to community questions or comments.” As an example, and as identified in Comment 164, we do not consider the response in Comment 163 to directly address the questions asked in Comment 158.
Thank you for the reminder regarding the CCADB Incident Reporting Guidelines. We acknowledge that Comment 163 did not fully address the specific questions raised in Comment 158. In Comment 170 (in response to Comment 164), we have now provided the detailed information that should have been included earlier. We will ensure all questions are addressed directly with clear, factual details, consistent with CCADB IRG expectations from the outset.
| Assignee | ||
Comment 173•1 month ago
|
||
Response to Comment 167 - Erakura
This incident is categorized as a 'Failure to Revoke in 5 Days,' yet it has now been 278 days since this bug was opened. As of this writing, a significant volume of affected, non-expired certificates remain unrevoked, well past the CA's (original) stated deadline of November 15, 2025.
The justifications regarding CRL size constraints and subscriber impact are unpersuasive given the time elapsed; technical debt does not excuse prolonged non-compliance with the BRs.
Furthermore, this CA Owner is also a Root Store Operator and, in my opinion, should be modeling exemplary adherence to root program requirements. Continued failure to revoke sets a dangerous precedent for the ecosystem, signaling to other CAs that revocation timelines are perhaps negotiable, if the entity is sufficiently large.
We recognize the concern regarding the duration of this incident. While the effort has spanned 278 days and revocation of impacted certificates is still in progress, we remain focused on driving this action item to closure.
Of the original ~72 million affected certificates, we have revoked ~15 million, and an additional ~56 million have naturally expired. As of
February 9, 2026, 8,297 certificates remain unrevoked. As we shared in Comment 126, these remaining certificates represent a subset with automation and telemetry gaps that we committed to revoking by March 6, 2026.
We acknowledge our responsibility, as both a CA operator and Root Store Operator, to set the standard for compliance with industry requirements. In this incident, we fell short of that expectation. We are fully committed to resolving all remaining revocations and remediation items.
Comment 174•1 month ago
|
||
(In reply to Microsoft PKI Services from comment #170)
- When did we become aware of the dependencies, and what were they?
In January 2026, while drafting the ICA rotation plan, we discovered new cases where internal relying parties are using ICA pinning as a threat mitigation. We needed additional time to devise solutions for these cases to prevent disruptions as ICAs move to shorter lifetimes. At this time, we believe all major dependencies have been identified and will publish a comprehensive plan by March 6th.
Q1: Why was the ICE rotation plan being drafted in January 2026? This was originally to be finalised by 2025-10-17.
Q2: What precisely are the new cases of cert pinning of intermediaries that were unknown before January 2026? Cert pinning have been explicitly brought up as an issue multiple times in this incident prior to 2026.
Q3: Given the intention is to let every certificate expire rather than do your due diligence, I don't see what potential solutions could exist given the G1s were going to be retired. Could you elaborate on what feasible solutions are relevant to these expiring certificates?
As per Comment 61:
Even prior to this incident, our existing plan was to deprecate issuance from the cross signed G1 ICAs (since those are expiring in August 2026) and replace them with the ICAs off the G2 CA.
- What roadblocks did we encounter that prevented completing the plan on the original timeline?
The main roadblock was the discovery of internal ICA pinning usage. We identified this dependency during our planning which must be incorporated into the rotation plan to ensure future ICA rotation can be executed without service disruption.
See above. Cert pinning is not an an acceptable reason for delay, and that it's being brought up so late shows a disregard for ongoing practices and related incidents.
- Why was extending the timeline the best solution?
We needed additional time to account for pinning scenarios by internal relying parties to ensure the plan would be viable.
Q4: When you say internal relying parties, you are explicitly meaning that this is not about third-parties and only relevant to Microsoft and its internal divisions, correct?
Frankly you should consider the root compromised and take this as a fire drill for any of your internal departments to be ready for when this occurs. The ongoing excuses do not show any intent to improve, make it clear that plans are being made at the last minute, and that there is not any internal coverage of certificate implementation to minimize damage when breaches occur.
In regards to our response cadence, we thank you for the feedback. We acknowledge that in Comment 31 we said we hoped to shorten response times to ~3 days, and we have not consistently met that expectation. Going forward, we are adjusting our internal processes to respond within ~3 business days more consistently when feasible, rather than routinely using the end of the 7 day window. day window.
The point was not merely the 3-day response time, but also the lack of clear answers to questions posed.
| Assignee | ||
Comment 175•1 month ago
|
||
Response to Comment 174 - Wayne
Q1: Why was the ICE rotation plan being drafted in January 2026? This was originally to be finalised by 2025-10-17.
The ICA rotation plan was originally scheduled to be finalized by October 17, 2025. However, work on this repair item was delayed because several foundational repair items took longer to complete than originally anticipated.
CRL partitioning implementation and rollout on G2 ICAs took longer than expected. Once we started rollout of these CAs we discovered larger than anticipated usage of certificates for Client Authentication. This resulted in a change in strategy to implement CRL partitioning on G1 ICAs as well, in addition to provisioning warm stand-bys with Client Authentication. We understand the importance of hitting our ETAs, however, in this case, we prioritized addressing the root cause (lack of CRL partitioning) before ICA rotation planning.
Q2: What precisely are the new cases of cert pinning of intermediaries that were unknown before January 2026? Cert pinning have been explicitly brought up as an issue multiple times in this incident prior to 2026.
The pinning discussions earlier in this incident (Comments 6 and 75) focused on external relying parties using certificate pinning.
In January 2026, during detailed stakeholder engagement for the ICA rotation plan, we discovered specific ICAs are referenced in internal operational and security configurations. These configurations were implemented as security controls contrary to guidance.
We are working with these services to change their security controls so they are not affected by ICA lifecycle events.
Q3:Given the intention is to let every certificate expire rather than do your due diligence, I don't see what potential solutions could exist given the G1s were going to be retired. Could you elaborate on what feasible solutions are relevant to these expiring certificates?
Could you clarify what you're asking in this question? We want to ensure we provide an accurate and complete response.
Q4: When you say internal relying parties, you are explicitly meaning that this is not about third-parties and only relevant to Microsoft and its internal divisions, correct?
Correct, the new discoveries are for internal Microsoft systems and operational configurations, not external third-party relying parties. As referenced in our response to Q2, we are working with them to ensure that they will be unaffected by future ICA lifecycle events. We have asked our internal relying parties to remove ICA pinning with high priority and urgency; and if that does not meet their security requirements, to work with us on alternative solutions.
Comment 176•1 month ago
|
||
(In reply to Microsoft PKI Services from comment #175)
Q3:Given the intention is to let every certificate expire rather than do your due diligence, I don't see what potential solutions could exist given the G1s were going to be retired. Could you elaborate on what feasible solutions are relevant to these expiring certificates?
Could you clarify what you're asking in this question? We want to ensure we provide an accurate and complete response.
I was under the impression that the G1 ICAs were to be retired and the new ICAs were to be used going forward. Therefore any impacted cert relevant to this incident will expire by March 6th. Comment 61 seems to align with my memory of events:
Question 3
"(Q3) Can Microsoft PKI Services directly confirm that in the absence of the above described root-level cross-certificate, all subsequently created ICAs will have the same risk as those affected by this incident? Said differently, can Microsoft PKI Services acknowledge that the immediate action to stand-up a fleet of cross-certified “warm stand-by” CAs may not reliably meet the intended goal if for some reason it’s later identified that those CAs are flawed in some way?"
The major limiting factor in not being able to revoke in a timely manner was lack of CRL partitioning on the existing CAs. The Warm Stand-bys are planned to have CRL partitioning. So will not suffer from the same issues.
Further, our plan is stop the use of cross-signed ICAs and move to issuance from ICAs off the newly cross signed G2 root. Once the ICAs from this new root are available for enrollment, in case we discover issues in the future with the current or the newly cross-signed (warm stand-by) ICAs from the G1 root we will accelerate migration of the workload to the ICAs from the cross-signed G2 root.
Question 4
"(Q4) Can Microsoft PKI Services explain why cross-certifying the existing, in-use Microsoft roots was not considered a simpler and more robust solution than continuing to cross-sign leaf-issuing intermediates?"
Even prior to this incident, our existing plan was to deprecate issuance from the cross signed G1 ICAs (since those are expiring in August 2026) and replace them with the ICAs off the G2 CA. The plan for cross-signing of the G2 CA was already in flight and we chose to rely on that plan as the primary path. That said, the idea of cross signing the G1 Microsoft root has merit and we will consider it for future readiness.
The ICAs were due to expire by August 2026, and issuance was to be depreciated.
Therefore I'm not seeing how ICAs that should have already stopped being used for new issuance are part of the roadblock.
To clarify in my original question I was quoting Comment 170:
In January 2026, while drafting the ICA rotation plan, we discovered new cases where internal relying parties are using ICA pinning as a threat mitigation. We needed additional time to devise solutions for these cases to prevent disruptions as ICAs move to shorter lifetimes. At this time, we believe all major dependencies have been identified and will publish a comprehensive plan by March 6th.
Given the specific ICAs are no longer relevant, and due to expire I'm unsure as to why this was a major issue. I'm also rather confused as to why none of these internal systems flagged this as an issue after their latest certificates were issued, but that's outside the scope of this incident.
| Assignee | ||
Comment 177•1 month ago
|
||
Update to Comment 171
Response to Comment 165 - Bkerley
Can we get a week-by-week breakdown table of certificates that have been revoked this week, expired this week, and still live? I'm trying to reconstruct the revocation history and as soon as June 2025 the numbers become either erroneous or at least difficult to follow.
Thank you for requesting additional clarity on our weekly revocation telemetry. We understand the importance of transparent, consistent reporting.
We will provide a comprehensive week-by-week breakdown covering the entire revocation period from May 28, 2025 through present, showing the number of:
• Certificates revoked each week
• Certificates expired each week
• Certificates remaining active at the end of each week
This breakdown will be included in next week’s update. We are finalizing the table to ensure consistent presentation across the entire reporting period.
We would like to provide an update to this request:
We've encountered a discrepancy between our internal telemetry and crt.sh, which impacts our ability to construct an accurate week-by-week breakdown. We will investigate and address the discrepancy and will post the week-by-week breakdown by March 13th.
| Assignee | ||
Comment 178•1 month ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. No changes at this time.
Comment 179•1 month ago
|
||
(In reply to Microsoft PKI Services from comment #177)
Update to Comment 171
Response to Comment 165 - Bkerley
Can we get a week-by-week breakdown table of certificates that have been revoked this week, expired this week, and still live? I'm trying to reconstruct the revocation history and as soon as June 2025 the numbers become either erroneous or at least difficult to follow.
Thank you for requesting additional clarity on our weekly revocation telemetry. We understand the importance of transparent, consistent reporting.
We will provide a comprehensive week-by-week breakdown covering the entire revocation period from May 28, 2025 through present, showing the number of:
• Certificates revoked each week
• Certificates expired each week
• Certificates remaining active at the end of each week
This breakdown will be included in next week’s update. We are finalizing the table to ensure consistent presentation across the entire reporting period.We would like to provide an update to this request:
We've encountered a discrepancy between our internal telemetry and crt.sh, which impacts our ability to construct an accurate week-by-week breakdown. We will investigate and address the discrepancy and will post the week-by-week breakdown by March 13th.
Could you elaborate on this discrepancy that will take 3 weeks to investigate and address...?
To be clear the breakdown requested was for work already done in this incident and every figure should be at-hand and require no significant work. It is due to the inaccuracies and lack of clear communication to date, previously highlighted, that a third-party has to request such simple information.
| Assignee | ||
Comment 180•1 month ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,201,377 (15,758,644)
-
Remaining active certificates (total affected):
- 7,275 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,862,125 (72,070,777)
-
Estimate for remaining revocations:
- March 6th, 2026
| Assignee | ||
Comment 181•1 month ago
|
||
Response to Comment 175 - Wayne
(In reply to Microsoft PKI Services from comment #175)
Q3:Given the intention is to let every certificate expire rather than do your due diligence, I don't see what potential solutions could exist given the G1s were going to be retired. Could you elaborate on what feasible solutions are relevant to these expiring certificates?
Could you clarify what you're asking in this question? We want to ensure we provide an accurate and complete response.
I was under the impression that the G1 ICAs were to be retired and the new ICAs were to be used going forward. Therefore any impacted cert relevant to this incident will expire by March 6th. Comment 61 seems to align with my memory of events:
Question 3
"(Q3) Can Microsoft PKI Services directly confirm that in the absence of the above described root-level cross-certificate, all subsequently created ICAs will have the same risk as those affected by this incident? Said differently, can Microsoft PKI Services acknowledge that the immediate action to stand-up a fleet of cross-certified “warm stand-by” CAs may not reliably meet the intended goal if for some reason it’s later identified that those CAs are flawed in some way?"
The major limiting factor in not being able to revoke in a timely manner was lack of CRL partitioning on the existing CAs. The Warm Stand-bys are planned to have CRL partitioning. So will not suffer from the same issues.
Further, our plan is stop the use of cross-signed ICAs and move to issuance from ICAs off the newly cross signed G2 root. Once the ICAs from this new root are available for enrollment, in case we discover issues in the future with the current or the newly cross-signed (warm stand-by) ICAs from the G1 root we will accelerate migration of the workload to the ICAs from the cross-signed G2 root.
Question 4
"(Q4) Can Microsoft PKI Services explain why cross-certifying the existing, in-use Microsoft roots was not considered a simpler and more robust solution than continuing to cross-sign leaf-issuing intermediates?"
Even prior to this incident, our existing plan was to deprecate issuance from the cross signed G1 ICAs (since those are expiring in August 2026) and replace them with the ICAs off the G2 CA. The plan for cross-signing of the G2 CA was already in flight and we chose to rely on that plan as the primary path. That said, the idea of cross signing the G1 Microsoft root has merit and we will consider it for future readiness.
The ICAs were due to expire by August 2026, and issuance was to be depreciated.
Therefore I'm not seeing how ICAs that should have already stopped being used for new issuance are part of the roadblock.
To clarify in my original question I was quoting Comment 170:
In January 2026, while drafting the ICA rotation plan, we discovered new cases where internal relying parties are using ICA pinning as a threat mitigation. We needed additional time to devise solutions for these cases to prevent disruptions as ICAs move to shorter lifetimes. At this time, we believe all major dependencies have been identified and will publish a comprehensive plan by March 6th.
Given the specific ICAs are no longer relevant, and due to expire I'm unsure as to why this was a major issue. I'm also rather confused as to why none of these internal systems flagged this as an issue after their latest certificates were issued, but that's outside the scope of this incident.
Thank you for your clarification. You are correct that our original plan was to stop using the G1 ICAs and migrate all issuance to G2 ICAs by March 6th. As we executed the plan, we ran into issues which required extending usage of the G1 ICAs beyond the March 6th date.
As part of the original plan, we stood up the new G2 ICAs (Server Auth only), configured them for CRL partitioning and started migrating subscribers to them. As noted in Comment 175, as we started moving subscribers to these G2 ICAs, we discovered 2 issues – internal relying party ICA pinning and larger than anticipated usage of certificates for Client Auth. This discovery required us to change plans and extend dependence on G1 ICAs for longer than the originally planned date of March 6th. Once that change in plan occurred, we prioritized partitioning of the G1 ICAs over the ICA rotation plan to address one of the root causes of this Bugzilla which was risk of CRL bloat.
Further, we would like to clarify that the last of the impacted certificates does not expire on March 6th. The last certificate expires on April 15th.
| Assignee | ||
Comment 182•29 days ago
|
||
Response to Comment 179 - Wayne
We would like to provide an update to this request:
We've encountered a discrepancy between our internal telemetry and crt.sh, which impacts our ability to construct an accurate week-by-week breakdown. We will investigate and address the discrepancy and will post the week-by-week breakdown by March 13th.
Could you elaborate on this discrepancy that will take 3 weeks to investigate and address...?To be clear the breakdown requested was for work already done in this incident and every figure should be at-hand and require no significant work. It is due to the inaccuracies and lack of clear communication to date, previously highlighted, that a third-party has to request such simple information.
We have been producing the weekly revocation updates using a snapshot of the crt.sh metadata impacted by this incident. That snapshot was deleted once the December revocation batch was completed, as we did not believe we would need that dataset anymore. At that time, we took a smaller snapshot of the remaining certificates, which served as the source for our weekly status tracking since then.
To provide the weekly breakdown requested in Comment 165, we now need to recreate the snapshot of the full population of impacted certificates from crt.sh. We are using crt.sh as the authoritative data source since it is the publicly available source that can be cross-checked against. This repopulation effort is taking time due to API throttling and the volume of certificates associated with this incident.
Comment 183•29 days ago
|
||
(In reply to Microsoft PKI Services from comment #182)
We have been producing the weekly revocation updates using a snapshot of the crt.sh metadata impacted by this incident. That snapshot was deleted once the December revocation batch was completed, as we did not believe we would need that dataset anymore. At that time, we took a smaller snapshot of the remaining certificates, which served as the source for our weekly status tracking since then.
To provide the weekly breakdown requested in Comment 165, we now need to recreate the snapshot of the full population of impacted certificates from crt.sh. We are using crt.sh as the authoritative data source since it is the publicly available source that can be cross-checked against. This repopulation effort is taking time due to API throttling and the volume of certificates associated with this incident.
Could you explain in detail with 'internal telemetry' exists with regards to handling this incident?
Some foundational issues with this answer so far:
- There was no 'discrepancy', just a lack of any data.
- From a methodological standpoint crt.sh do not have full coverage on all of your subscriber certificates - it's a well-intended best-capture off of CT logs and observed certificates in the wild
- You should have internal tooling to manage your own certificates, the authoritative source should be the CA itself.
- Usage of a third-party raises serious concerns on the reliability and trustworthiness of 'internal telemetry', and why it was originally used to 'construct' a breakdown.
- This data should be held for audit purposes, ad-hoc estimations from third-party services is not good practice
- All of this data should have been already checked and double checked for both revocation purposes, and for this incident report
- I highly doubt that crt.sh was intended for per-cert analysis of 77m certificates in this fashion, especially commercially, especially every week.
- I also extremely doubt that their infrastructure was built for this and they have been DDoSed by a CA for approximately a year
This does confirm and paint a different light on prior claims of estimations and planned revocation. There does not seem to be a single source of truth in Microsoft PKI's side for per-certificate revocation, expiration, and per-week 'plans' at all. None of the previous weekly reports can actually be trusted and currently provided impacted certificate figures are at best vague estimations with no fixed basis in reality.
In what world is this considered compliant and expected for a CA operating in the 21st century?
On the other side:
Could we please have Sectigo give their viewpoint on crt.sh's usage in this manner? What is the rough financial cost of a CA doing this, and given the rate limits in place if you were being compliant from a single endpoint how long would this take to check all of the impacted certificates? Are there any other unforeseen issues that you would be aware of if a CA were to attempt to use this method in practice?
I have done revocation analysis in the past using crt.sh, so have some idea of the practical limits. However let's just ask the people who run the services rather than making pointless estimations in what should be a factual, evidenced report.
Comment 184•28 days ago
|
||
Hi Microsoft PKI team,
I'm interested in the organisational issues that resulted in the gaps of process that led to this delrev incident still continuing through March with unrevoked certificates, and how we can benefit as a PKI ecosystem from the situation to improve processes.
I notice that the action items produced thus far focus mainly on the initial issue preventing the immediate revocation of the misissued certificates, and was hoping to flush out some additional improvements from issues that have come to light in the ensuing >9 months of delrev.
We have been producing the weekly revocation updates using a snapshot of the crt.sh metadata impacted by this incident. That snapshot was deleted once the December revocation batch was completed, as we did not believe we would need that dataset anymore. At that time, we took a smaller snapshot of the remaining certificates, which served as the source for our weekly status tracking since then.
Q1: In Comment 182 you mention that crt.sh data was deleted because "we did not believe we would need that dataset anymore". Can you elaborate on how that belief was formed in light of the WebTrust auditing requirements around data record retention, and what process gaps existed such that data being used as a source of truth in an ongoing incident was deleted? What remediation steps are being taken to add sufficient controls around the deletion of data that is involved in incidents? I'm especially interested to see lessons that other CAs can benefit from here.
Q2: Additionally, I'd like some more detail around the usage of crt.sh data as primary source for this incident as mentioned. What social and technical flaws have you identified that prevented you from generating the list of affected certificates from your own audit records (especially once the crt.sh data snapshot disappeared), and what remediation steps is Microsoft PKI taking to address it? As a relying party with a history in incident response, I know how easy it can be to develop a sort of "tunnel vision" in the immediate aftermath of an incident, but I'm curious how that lasted this long. What are some lessons the community can take from this?
| Assignee | ||
Comment 185•27 days ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. Please note that we have completed the work for Action Item #7.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| #1 Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2026-03-06 | In Progress |
| #2 Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| #3 Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| #4 Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| #5 Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| #6 Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| #7 Eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from ICAs with partitioned CRLs (G1 or G2), visible in CT logs. Public can verify through CCADB hierarchy updates showing CRL partitioning implementation on active ICAs and issuance patterns in CT logs. | 2026-02-28 | Complete |
| #8 Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Prevent | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as the plan is executed. | 2026-03-06 | In Progress |
| Assignee | ||
Comment 186•27 days ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (planned to date):
- 15,201,377 (15,758,644)
-
Remaining active certificates (total affected):
- 6,798 (72,070,777)
-
Total certificates expired and not revoked (to date):
- 56,862,602 (72,070,777)
-
Estimate for remaining revocations:
- March 6th, 2026
Comment 187•27 days ago
|
||
(In reply to Microsoft PKI Services from comment #186)
Revocation Delay Status Update
Total certificates revoked (planned to date):
- 15,201,377 (15,758,644)
Remaining active certificates (total affected):
- 6,798 (72,070,777)
Total certificates expired and not revoked (to date):
- 56,862,602 (72,070,777)
Estimate for remaining revocations:
- March 6th, 2026
Given recent statements how do you know for certain these figures are true and accurate, and not a mistake that the third-party crt.sh is making?
| Assignee | ||
Comment 188•24 days ago
|
||
Response to Comment 183 - Wayne
Could you explain in detail with 'internal telemetry' exists with regards to handling this incident?
Some foundational issues with this answer so far:
- There was no 'discrepancy', just a lack of any data.
- From a methodological standpoint crt.sh do not have full coverage on all of your subscriber certificates - it's a well-intended best-capture off of CT logs and observed certificates in the wild
- You should have internal tooling to manage your own certificates, the authoritative source should be the CA itself.
Usage of a third-party raises serious concerns on the reliability and trustworthiness of 'internal telemetry', and why it was originally used to 'construct' a breakdown.- This data should be held for audit purposes, ad-hoc estimations from third-party services is not good practice
- All of this data should have been already checked and double checked for both revocation purposes, and for this incident report
- I highly doubt that crt.sh was intended for per-cert analysis of 77m certificates in this fashion, especially commercially, especially every week.
- I also extremely doubt that their infrastructure was built for this and they have been DDoSed by a CA for approximately a year
This does confirm and paint a different light on prior claims of estimations and planned revocation. There does not seem to be a single source of truth in Microsoft PKI's side for per-certificate revocation, expiration, and per-week 'plans' at all. None of the previous weekly reports can actually be trusted and currently provided impacted certificate figures are at best vague estimations with no fixed basis in reality.
In what world is this considered compliant and expected for a CA operating in the 21st century?
Microsoft PKI Services maintains authoritative internal certificate inventory systems that track all issued certificates, their status, and lifecycle events in accordance with BR 5.4.1. We use these systems as the source of truth for our CA operations.
We used crt.sh to cross-check our internal data, against publicly verifiable reference points. Our process involved performing an initial query of the affected certificate population, at the start of this incident, which we used as our baseline dataset for validation purposes.
To clarify the issue raised in Comment 165: we deleted our cross-check validation dataset, believing it would no longer be needed. This deletion prevented us from efficiently cross-checking our internal data against the public record when the weekly breakdown was subsequently requested.
Comment 189•23 days ago
|
||
(In reply to Microsoft PKI Services from comment #186)
Microsoft PKI Services maintains authoritative internal certificate inventory systems that track all issued certificates, their status, and lifecycle events in accordance with BR 5.4.1. We use these systems as the source of truth for our CA operations.
If it is the source of truth then it should not take 3 weeks to produce a report of what has occurred in this incident from that data. That you are checking a third-party service implies a lack of trust in your own data.
We used crt.sh to cross-check our internal data, against publicly verifiable reference points. Our process involved performing an initial query of the affected certificate population, at the start of this incident, which we used as our baseline dataset for validation purposes.
If it was only at the initial query - then there's no revocation data to check, and expiration data should be identical to your own dataset. The usage of crt.sh in this manner would only be beneficial if your records were unreliable, that you mistrusted your own revocation (CRL/OCSP) statuses, or both. I can't see a good faith interpretation that speaks well to your compliance here.
To clarify the issue raised in Comment 165: we deleted our cross-check validation dataset, believing it would no longer be needed. This deletion prevented us from efficiently cross-checking our internal data against the public record when the weekly breakdown was subsequently requested.
A cross-reference dataset still existing in December very strongly implies that this was not a one-off check against crt.sh but part of a regular workflow. As part of your Webtrust audit requirements these should never be deleted anyway, even if we ignore this is an ongoing incident crossing multiple audit periods.
That the dataset was deleted in December when further planned revocations existed also speaks volumes. This is not a particularly complex problem, and we're getting unreliable answers that contradict your previous statements.
| Assignee | ||
Comment 190•23 days ago
|
||
Response to Comment 184 - Macy
Hi Microsoft PKI team,
I'm interested in the organisational issues that resulted in the gaps of process that led to this delrev incident still continuing through March with unrevoked certificates, and how we can benefit as a PKI ecosystem from the situation to improve processes.
I notice that the action items produced thus far focus mainly on the initial issue preventing the immediate revocation of the misissued certificates, and was hoping to flush out some additional improvements from issues that have come to light in the ensuing >9 months of delrev.
We have been producing the weekly revocation updates using a snapshot of the crt.sh metadata impacted by this incident. That snapshot was deleted once the December revocation batch was completed, as we did not believe we would need that dataset anymore. At that time, we took a smaller snapshot of the remaining certificates, which served as the source for our weekly status tracking since then.
Q1: In Comment 182 you mention that crt.sh data was deleted because "we did not believe we would need that dataset anymore". Can you elaborate on how that belief was formed in light of the WebTrust auditing requirements around data record retention, and what process gaps existed such that data being used as a source of truth in an ongoing incident was deleted? What remediation steps are being taken to add sufficient controls around the deletion of data that is involved in incidents? I'm especially interested to see lessons that other CAs can benefit from here.
Thank you for the questions. To clarify a few points, as detailed in Comment 188, the deleted data was our crt.sh cross-check validation dataset, not our authoritative internal certificate inventory or audit logs, which remain fully intact and compliant with BR 5.4.1 retention requirements.
Going forward we will retain all relevant data until formal incident closure. A postmortem review will be included in the closure report summary to identify additional process improvements.
Q2: Additionally, I'd like some more detail around the usage of crt.sh data as primary source for this incident as mentioned. What social and technical flaws have you identified that prevented you from generating the list of affected certificates from your own audit records (especially once the crt.sh data snapshot disappeared), and what remediation steps is Microsoft PKI taking to address it? As a relying party with a history in incident response, I know how easy it can be to develop a sort of "tunnel vision" in the immediate aftermath of an incident, but I'm curious how that lasted this long. What are some lessons the community can take from this?
To clarify, crt.sh was not our primary source. As stated above, our authoritative internal systems are the source of truth.
Comment 191•23 days ago
|
||
(In reply to Microsoft PKI Services from comment #190)
Q1: In Comment 182 you mention that crt.sh data was deleted because "we did not believe we would need that dataset anymore". Can you elaborate on how that belief was formed in light of the WebTrust auditing requirements around data record retention, and what process gaps existed such that data being used as a source of truth in an ongoing incident was deleted? What remediation steps are being taken to add sufficient controls around the deletion of data that is involved in incidents? I'm especially interested to see lessons that other CAs can benefit from here.
Thank you for the questions. To clarify a few points, as detailed in Comment 188, the deleted data was our crt.sh cross-check validation dataset, not our authoritative internal certificate inventory or audit logs, which remain fully intact and compliant with BR 5.4.1 retention requirements.
That question was in regards to the WebTrust auditing requirements, however if we continue with this misdirection and read the BRs a little bit more:
5.5.2 Retention period for archive
Archived audit logs (as set forth in Section 5.5.1 SHALL be retained for a period of at least two (2) years from their record creation timestamp, or as long as they are required to be retained per Section 5.4.3, whichever is longer.
Additionally, the CA and each Delegated Third Party SHALL retain, for at least two (2) years: 1. All archived documentation related to the security of Certificate Systems, Certificate Management Systems, Root CA Systems and Delegated Third Party Systems (as set forth in Section 5.5.1); and 2. All archived documentation relating to the verification, issuance, and revocation of certificate requests and Certificates (as set forth in Section 5.5.1) after the later occurrence of: 1. such records and documentation were last relied upon in the verification, issuance, or revocation of certificate requests and Certificates; or 2. the expiration of the Subscriber Certificates relying upon such records and documentation.
Note: While these Requirements set the minimum retention period, the CA MAY choose a greater value as more appropriate in order to be able to investigate possible security or other types of incidents that will require retrospection and examination of past records archived.
Perhaps there is a way to read that and be compliant by your standards. That certainly reads to me that any third-party sources compiled to check your own verification of revocation/expiration must be retained for audit purposes. I'd go a step further and suggest that it should cover any notes during an incident for an auditor to review.
...Also strictly speaking this incident should be operating under BR 2.1.4 published 1 March 2025 as it covers the incident start date. A check of the relevant CPS for your CA also has a lackluster 5.5.2 section:
5.5.2 Retention Period for Archive
CA SHALL retain all documentation relating to a Certificate’s activities for a period of at least two (2) years after the Certificate ceases to be valid.
That's the entirety of your CA's interpretation of the baseline requirements for retention periods. To save a future Webtrust auditor trouble: Principle 7 - Criteria 7.4.
Going forward we will retain all relevant data until formal incident closure. A postmortem review will be included in the closure report summary to identify additional process improvements.
How can you have a closure report that contains a postmortem review with further action items to be complete? Would that be held in a separate incident?
Q2: Additionally, I'd like some more detail around the usage of crt.sh data as primary source for this incident as mentioned. What social and technical flaws have you identified that prevented you from generating the list of affected certificates from your own audit records (especially once the crt.sh data snapshot disappeared), and what remediation steps is Microsoft PKI taking to address it? As a relying party with a history in incident response, I know how easy it can be to develop a sort of "tunnel vision" in the immediate aftermath of an incident, but I'm curious how that lasted this long. What are some lessons the community can take from this?
To clarify, crt.sh was not our primary source. As stated above, our authoritative internal systems are the source of truth.
Even as a secondary source the question deserves some answers.
Comment 192•23 days ago
|
||
(In reply to Microsoft PKI Services from comment #190)
Q1: In Comment 182 you mention that crt.sh data was deleted because "we did not believe we would need that dataset anymore". Can you elaborate on how that belief was formed in light of the WebTrust auditing requirements around data record retention, and what process gaps existed such that data being used as a source of truth in an ongoing incident was deleted? What remediation steps are being taken to add sufficient controls around the deletion of data that is involved in incidents? I'm especially interested to see lessons that other CAs can benefit from here.
Thank you for the questions. To clarify a few points, as detailed in Comment 188, the deleted data was our crt.sh cross-check validation dataset, not our authoritative internal certificate inventory or audit logs, which remain fully intact and compliant with BR 5.4.1 retention requirements.
Going forward we will retain all relevant data until formal incident closure. A postmortem review will be included in the closure report summary to identify additional process improvements.
Hi Microsoft PKI Services, thank you for clarifying that the data deletion in question was of a "cross-check validation dataset". Can you answer my questions re: its deletion? What process gaps, how the belief was formed, what controls are being added so that all artifacts created as part of incident response are retained?
The point of these incident tickets is to work out process improvements in public for the benefit of the ecosystem, and news of the deletion has definitely raised some eyebrows among relying parties. I'd really like some details on the how of things because when we're left to fill in the blanks, none of it looks particularly good.
Q2: Additionally, I'd like some more detail around the usage of crt.sh data as primary source for this incident as mentioned. What social and technical flaws have you identified that prevented you from generating the list of affected certificates from your own audit records (especially once the crt.sh data snapshot disappeared), and what remediation steps is Microsoft PKI taking to address it? As a relying party with a history in incident response, I know how easy it can be to develop a sort of "tunnel vision" in the immediate aftermath of an incident, but I'm curious how that lasted this long. What are some lessons the community can take from this?
To clarify, crt.sh was not our primary source. As stated above, our authoritative internal systems are the source of truth.
Can you provide the complete list of affected certificates? I looked around the bugs for this incident and haven't found the list for the misissuance (#1962829) or the delayed revocation (this bug), just the smaller list in the similar-but-different #1962830. I know it's a lot of certs, but it would be helpful for third parties looking to analyse the situation (and I believe it is strongly encouraged in the CCADB incident report guidelines).
| Assignee | ||
Comment 193•22 days ago
|
||
Response to Comment 187 - Wayne
Given recent statements how do you know for certain these figures are true and accurate, and not a mistake that the third-party crt.sh is making?
As mentioned previously, crt.sh was used as an external cross-reference. However, to your question about the accuracy of the weekly breakdown, in our work to restate the full weekly breakdown since the start of the incident, we have found inaccuracies in the data that was originally reported. We are investigating the root cause of these inaccuracies, and we will have the restated breakdown, root cause analysis, and if needed any new repair items by March 13th.
Comment 194•21 days ago
|
||
(In reply to Wayne from comment #183)
...
- From a methodological standpoint crt.sh do not have full coverage on all of your subscriber certificates - it's a well-intended best-capture off of CT logs and observed certificates in the wild
Good summary.
crt.sh aggregates CT log entries. However, all of the root store policies still permit subscriber certificates to be issued without any CT logging.
- You should have internal tooling to manage your own certificates, the authoritative source should be the CA itself.
+100
...
- I highly doubt that crt.sh was intended for per-cert analysis of 77m certificates in this fashion, especially commercially, especially every week.
- I also extremely doubt that their infrastructure was built for this and they have been DDoSed by a CA for approximately a year
...
Could we please have Sectigo give their viewpoint on crt.sh's usage in this manner?
My philosophy has always been that everyone is welcome to fetch all of the data in crt.sh...at least in theory. In practice, there are obviously practical limitations, but we've never been able to formally define what crt.sh's infrastructure is or isn't "built for".
What is the rough financial cost of a CA doing this
I have absolutely no idea how to go about measuring that.
and given the rate limits in place if you were being compliant from a single endpoint how long would this take to check all of the impacted certificates?
That's probably impossible to measure without actually performing the analysis.
Key tip though, for anyone who doesn't already know: For bulk analysis, scraping https://crt.sh/ webpages is not the best approach, not least because the rate-limiting will be applied per-certificate. Instead, directly access a read-only replica of the crt.sh database and write your own SQL queries. Wherever possible, write SQL queries that operate over limited data-sets, because long-running queries are likely to get terminated early.
| Assignee | ||
Comment 195•20 days ago
|
||
Response to Comment 191 - Wayne
That question was in regards to the WebTrust auditing requirements, however if we continue with this misdirection and read the BRs a little bit more:
5.5.2 Retention period for archive
Archived audit logs (as set forth in Section 5.5.1 SHALL be retained for a period of at least two (2) years from their record creation timestamp, or as long as they are required to be retained per Section 5.4.3, whichever is longer.
Additionally, the CA and each Delegated Third Party SHALL retain, for at least two (2) years: 1. All archived documentation related to the security of Certificate Systems, Certificate Management Systems, Root CA Systems and Delegated Third Party Systems (as set forth in Section 5.5.1); and 2. All archived documentation relating to the verification, issuance, and revocation of certificate requests and Certificates (as set forth in Section 5.5.1) after the later occurrence of: 1. such records and documentation were last relied upon in the verification, issuance, or revocation of certificate requests and Certificates; or 2. the expiration of the Subscriber Certificates relying upon such records and documentation.
Note: While these Requirements set the minimum retention period, the CA MAY choose a greater value as more appropriate in order to be able to investigate possible security or other types of incidents that will require retrospection and examination of past records archived.
Based on our interpretation of the Baseline Requirements, we do not believe the deletion of the crt.sh metadata constitutes non-compliance, as the documentation subject to retention under BR 5.5.2 pertains to audit-relevant decision points, specifically, certificate identification for revocation, timing rationale, and the revocation actions themselves. The crt.sh dataset was not used for these decision points, and was only used as an external cross-reference. We will consult with our auditors to confirm our interpretation and implement any recommended actions.
How can you have a closure report that contains a postmortem review with further action items to be complete? Would that be held in a separate incident?
Postmortem-identified process improvements will be documented in the Commitment Summary section of the closure report. Per CCADB incident reporting guidelines these represent "ongoing commitments made in response to this incident beyond those described in the Action Items section” that “should be considered distinct from, but complementary to Action Items” and do not require the incident to remain open.
| Assignee | ||
Comment 196•20 days ago
|
||
Response to Comment 192 - Macy
Hi Microsoft PKI Services, thank you for clarifying that the data deletion in question was of a "cross-check validation dataset". Can you answer my questions re: its deletion? What process gaps, how the belief was formed, what controls are being added so that all artifacts created as part of incident response are retained?
We are reviewing our internal processes to clarify what is considered “relevant data” for incident management, and the handling of information in these types of incidents. As part of our incident postmortem, we are working with our auditors to verify and obtain their recommendations on information retention and will include any findings as part of ongoing commitments in our Closure Report.
Can you provide the complete list of affected certificates? I looked around the bugs for this incident and haven't found the list for the misissuance (#1962829) or the delayed revocation (this bug), just the smaller list in the similar-but-different #1962830. I know it's a lot of certs, but it would be helpful for third parties looking to analyse the situation (and I believe it is strongly encouraged in the CCADB incident report guidelines).
Yes, we plan to include a full list of affected certificates along with our week by week breakdown which will be provide on March 13th.
| Assignee | ||
Comment 197•20 days ago
|
||
Weekly Status Update
We are actively working through the full set of action items outlined in the incident report. On March 6th we completed the third batch of revocations under Action Item #1, revoking 6,783 certificates. During our final analysis of certs to conclude the revocations repair item, we found an error in our query which led to discovery of additional 602 certificates. Due to our continued telemetry/automation gaps, we will be revoking these certificates by 3/23.
In relation to action item #8, please see our high-level ICA rotation plan below. That action item has been marked as complete.
ICA Rotation Plan:
Our goal is to shorten usage of ICAs to one year or less and ICA lifetimes to 3 years or less. We plan to make this change at our next ICA rotation, and will become the norm thereafter. The next regular ICA rotation is expected to begin at least a year prior to the expiration of our current G2 ICAs in June 2029. To accomplish this, we will employ a layered lifecycle model for the ICAs that maintains multiple generations of ICAs, ensuring continuity of service, reducing reliance on any single ICA, and providing resilience during both planned transitions and security-driven events. To make these regular ICA rotations repeatable, in this intervening period we will make investments in solving residual pinning among the relying parties, and in reducing human involvement for standing up and rolling out new ICAs.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| #1 Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2026-03-23 | In Progress |
| #2 Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| #3 Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| #4 Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| #5 Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| #6 Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| #7 Eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from ICAs with partitioned CRLs (G1 or G2), visible in CT logs. Public can verify through CCADB hierarchy updates showing CRL partitioning implementation on active ICAs and issuance patterns in CT logs. | 2026-02-28 | Complete |
| #8 Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Prevent | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as the plan is executed. | 2026-03-06 | Complete |
| Assignee | ||
Comment 198•20 days ago
|
||
Revocation Delay Status Update
-
Total certificates revoked:
- 6,783
-
Remaining active certificates:
- 602
-
Total certificates expired and not revoked:
- 676
-
Estimate for remaining revocations:
- March 23rd, 2026
This week's report reflects only certificates revoked, expired, and remaining active for this week. This is a change from our previous methodology of reporting aggregated totals. Given the data inaccuracies disclosed in Comment 193, we determined it was not appropriate to continue reporting on figures we know to be incorrect. A full corrected week by week breakdown will be published next week. Additionally, as disclosed in Comment 197, an additional 602 certificates were identified as requiring revocation, these are reflected in this week's remaining active and estimated revocation figures.
Comment 199•20 days ago
|
||
Okay so... the week update for 2026-02-27 in Comment 186 stated:
- Remaining active certificates (total affected):
- 6,798 (72,070,777)
...- Estimate for remaining revocations:
- March 6th, 2026
But your latest figures state 6783 revoked - meaning if it's 'all' done 15 expired. Instead we have 602 remaining active, and 676 expired (+74???) but not revoked? There's also the slight issue of your CA needing 16 days to handle 602 newly found certificates after 10 months of being involved in this rectifying this incident. Trying to claim that you stated in Comment 193 is ignoring that you restricted the 'inaccuracies' to the original report - and not every figure to date.
Q1: If we're off by ~9% (at least) this late in the lifecycle, how can you have any confidence on your CA having an authoritative certificate list for this incident? This isn't exactly a small figure...
It's all well and good saying that your previous... methodology was producing fabricated figures but we've not been told how this occurred. This is despite multiple months of pressing you on the lack of any accuracy to date and repeated assurances that you definitely checked your math. We're past blissful ignorance and small discrepancies at this stage.
Q2: Can you just walk me through these very small and very basic numbers?
Yes, you'll give an 'updated' figure next week, but your weekly statement as-is lacks any confidence in any accuracy or truthfulness.
Q3: Can you detail what your previous methodology entailed? As in step-by-step from "I need to produce this report" to "Here it is on bugzilla".
Q4: How will this new methodology differ to provide accurate figures?
Q5: Why you are now confident in your abilities to create a functional new methodology in this relatively fast timeframe?
I'm not sure how telemetry/automation gaps mean you need 16 days to revoke 602 certs. It's not an insurmountable figure at all, even for manual revocation. The more I try to find any reason to your statements the more baffling it gets, but that is par for the course at this stage.
Q6: What exact problems are you running into for the remaining 602 certificates?
Q7: Can you give precise figures when you claim it's CRL bloat - for 602 certificates - in your statement?
Q8: What vital services are these certificates involved in that the BRs are once again considered incompatible with your capabilities?
Q9: Is there a more detailed plan for ICA rotation?
Q10: What roadblocks existed to completing that action item?
From our perspective it took your CA 10 months to consider a 3-year ICA timeline with 1 year setup, 1 year active, 1 year falloff setup. All in the complex detailed plan of a single paragraph.
My condolences in advance to the auditors involved in this, although I do wonder how none of these issues were ever identified before.
Also thank you Rob for providing some context on crt.sh's position here, while I'm aware of the SQL interface I'm doubtful it was involved here. For napkin math 77 million seconds is 891 days, so even 100 requests/second gets us to 8.9 days. Just something to consider...
Comment 200•15 days ago
|
||
(In reply to Microsoft PKI Services from comment #170)
Response to Comment 164 - Wayne
Could you answer the 3 questions provided in detail and with specificity as to when you became aware, the dependencies involved, and why other plans were not possible?
- When did we become aware of the dependencies, and what were they?
In January 2026, while drafting the ICA rotation plan, we discovered new cases where internal relying parties are using ICA pinning as a threat mitigation. We needed additional time to devise solutions for these cases to prevent disruptions as ICAs move to shorter lifetimes. At this time, we believe all major dependencies have been identified and will publish a comprehensive plan by March 6th.
- What roadblocks did we encounter that prevented completing the plan on the original timeline?
The main roadblock was the discovery of internal ICA pinning usage. We identified this dependency during our planning which must be incorporated into the rotation plan to ensure future ICA rotation can be executed without service disruption.
- Why was extending the timeline the best solution?
We needed additional time to account for pinning scenarios by internal relying parties to ensure the plan would be viable.
In regards to our response cadence, we thank you for the feedback. We acknowledge that in Comment 31 we said we hoped to shorten response times to ~3 days, and we have not consistently met that expectation. Going forward, we are adjusting our internal processes to respond within ~3 business days more consistently when feasible, rather than routinely using the end of the 7 day window. day window.
These ”internal relying parties” are they internal to MSPKI or are they ”internal” as in other parts of Microsoft?
There has been many incidents on bugzilla about prioritizing avoiding business disruption over timely revocation in recent years.
What lessons have MSPKI learned from studying these incidents?
Specifically regarding these two would be very enlightening:
| Assignee | ||
Comment 201•15 days ago
|
||
Response to Comment 199 - Wayne
Q1: If we're off by ~9% (at least) this late in the lifecycle, how can you have any confidence on your CA having an authoritative certificate list for this incident? This isn't exactly a small figure...
It's all well and good saying that your previous... methodology was producing fabricated figures but we've not been told how this occurred. This is despite multiple months of pressing you on the lack of any accuracy to date and repeated assurances that you definitely checked your math. We're past blissful ignorance and small discrepancies at this stage.
We intend to provide a breakdown of our entire revocation history with a detailed analysis next week that will describe the gap that we encountered.
Q2: Can you just walk me through these very small and very basic numbers?
As mentioned in Comment 198, we did not report the aggregated numbers in last weeks Revocation Update as we were aware of these discrepancies. We will provide further details for data discrepancies in our week-by-week breakdown.
Q3: Can you detail what your previous methodology entailed? As in step-by-step from "I need to produce this report" to "Here it is on bugzilla".
Q4: How will this new methodology differ to provide accurate figures?
Our prior methodology was:
- Establish baseline totals for the incident population.
- Execute revocations in accordance with the revocation plans published in this bug.
- After all revocation batch for the week are completed, update the revoked / remaining active / expired (not revoked) counts based on the batch outcomes.
- Publish those figures in our weekly Revocation Delay Status Updates.
As previously stated, the week-by-week breakdown will explain the drivers of the discrepancies.
Q5: Why you are now confident in your abilities to create a functional new methodology in this relatively fast timeframe?
We identified issues and addressed them. This increases our confidence in the approach. The issues will be described in our week-by-week breakdown.
Q6: What exact problems are you running into for the remaining 602 certificates?
Due to gaps in telemetry and automation, it will take 2 weeks to obtain acknowledgement from our subscribers and complete certificate rotation and revocation for these 602 certificates.
Q7: Can you give precise figures when you claim it's CRL bloat - for 602 certificates - in your statement?
CRL Bloat is not a factor preventing us from revoking the 602 certificates at this time.
Q8: What vital services are these certificates involved in that the BRs are once again considered incompatible with your capabilities?
The decision is not driven by service criticality; it is driven by gaps in telemetry and automation associated with these certificates.
Q9: Is there a more detailed plan for ICA rotation?
We have internal plans, however, the level of detail provided is consistent with other public CA communications on intermediate rotations (ex: Atlas TLS ICA Rotations :: GlobalSign Atlas Documentation, New Intermediate Certificates - Let's Encrypt)
Q10: What roadblocks existed to completing that action item?
From our perspective it took your CA 10 months to consider a 3-year ICA timeline with 1 year setup, 1 year active, 1 year falloff setup. All in the complex detailed plan of a single paragraph.
As mentioned in comments 175, discovery of additional dependencies (pinning, use of client auth), and the need to prioritize solutions for complexities encountered in executing the move to partitioned CRLs led to needing additional time to complete the plan.
Comment 202•15 days ago
|
||
A month ago, bkerley asked what seems like a pretty basic question:
(In reply to bkerley from comment #165)
Can we get a week-by-week breakdown table of certificates that have been revoked this week, expired this week, and still live? I'm trying to reconstruct the revocation history and as soon as June 2025 the numbers become either erroneous or at least difficult to follow.
The fact that such a table hasn't been supplied this far, and the descriptions of reliance on crt.sh for certificate statistics leads to some questions that I hope are softballs, but fear in my heart are not:
- Does Microsoft PKI Services keep a record of all the certificates that it issues? If not, what are the bounds on the set that are recorded and retained? How were those bounds decided upon?
- Does Microsoft PKI Services keep a record of all the certificates that it revokes? If not, etc.
but finally and most importantly IMO - Does Microsoft PKI Services feel that their handling of this revocation event is an example that should be followed by other CAs that are seeking to be included in future MRSP updates?
For Mozilla:
- Is Microsoft's extremely lengthy and confusing revocation process consistent with Mozilla's expectations for root CAs who have the technical means to issue a certificate for literally any site on the internet?
- If not (please please), what remedies would you consider to be appropriate for the extended and (IMO) sloppy delrev incident that we've all been enduring alongside Microsoft for the last ten months?
- If a cross-signed CA were to apply for root inclusion after an incident like this, what would be considered to be sufficient evidence that they were in fact appropriate for inclusion?
Comment 203•14 days ago
|
||
The decision is not driven by service criticality; it is driven by gaps in telemetry and automation associated with these certificates.
Whose gaps in telemetry and automation? You or the subscribers? Subscribers lacking capabilities is not a valid reason to delay revocation. If the gaps are within MSPKI, please explain in a more detailed manner so that the community can understand.
CRL Bloat is not a factor preventing us from revoking the 602 certificates at this time.
So this is a separate incident or a new root cause? Because CRL bloat is the only root cause listed
Comment 204•13 days ago
|
||
(In reply to Microsoft PKI Services from comment #181)
Further, we would like to clarify that the last of the impacted certificates does not expire on March 6th. The last certificate expires on April 15th.
With the imminent release of the remaining certificates breakdown dropping next week, aka today, how confident are Microsoft that the last certificate expires April 15th...?
I'm doing a cursory glance at censys with a query I made last year (and updated to their new platform) and I'm not seeing quite what you're claiming.
Any information and transparency on the remaining "602" certificates will be refreshing to see.
| Assignee | ||
Comment 205•13 days ago
|
||
Week-by-Week Breakdown
We are providing the corrected week-by-week breakdown covering the full revocation period from May 28, 2025 through present. This table supersedes previously posted aggregate rollups that contained inaccuracies as noted in past comments. All figures below are derived from our internal certificate lifecycle systems of record.
Week definition: Weeks are reported as Friday–Thursday, and “Remaining Active (EOW)” reflects the population still active at the end of the stated week.
| # | Period Start | Period End | Remaining (Start of period) | Expired This Period | Cumulative Expired | Revoked This Period | Cumulative Revoked |
|---|---|---|---|---|---|---|---|
| 1 | 4/25/2025 | 5/1/2025 | 81,240,662 | 2,302,023 | 2,302,023 | - | - |
| 2 | 5/2/2025 | 5/8/2025 | 78,938,639 | 2,755,098 | 5,057,121 | - | - |
| 3 | 5/9/2025 | 5/15/2025 | 76,183,541 | 2,788,260 | 7,845,381 | - | - |
| 4 | 5/16/2025 | 5/22/2025 | 73,395,281 | 2,951,414 | 10,796,795 | - | - |
| 5 | 5/23/2025 | 5/29/2025 | 70,443,867 | 2,889,587 | 13,686,382 | 1,000 | 1,000 |
| 6 | 5/30/2025 | 6/5/2025 | 67,553,280 | 2,707,867 | 16,394,249 | 351,817 | 352,817 |
| 7 | 6/6/2025 | 6/12/2025 | 64,493,596 | 2,560,652 | 18,954,901 | 300,000 | 652,817 |
| 8 | 6/13/2025 | 6/19/2025 | 61,632,944 | 2,554,067 | 21,508,968 | 422,563 | 1,075,380 |
| 9 | 6/20/2025 | 6/26/2025 | 58,656,314 | 2,483,804 | 23,992,772 | 616,079 | 1,691,459 |
| 10 | 6/27/2025 | 7/3/2025 | 55,556,431 | 2,291,466 | 26,284,238 | 806,870 | 2,498,329 |
| 11 | 7/4/2025 | 7/10/2025 | 52,458,095 | 2,291,088 | 28,575,326 | 455,123 | 2,953,452 |
| 12 | 7/11/2025 | 7/17/2025 | 49,711,884 | 2,609,770 | 31,185,096 | 786,187 | 3,739,639 |
| 13 | 7/18/2025 | 7/24/2025 | 46,315,927 | 2,902,482 | 34,087,578 | 717,158 | 4,456,797 |
| 14 | 7/25/2025 | 7/31/2025 | 42,696,287 | 3,286,341 | 37,373,919 | 520,000 | 4,976,797 |
| 15 | 8/1/2025 | 8/7/2025 | 38,889,946 | 3,190,706 | 40,564,625 | 802,471 | 5,779,268 |
| 16 | 8/8/2025 | 8/14/2025 | 34,896,769 | 2,991,273 | 43,555,898 | 784,260 | 6,563,528 |
| 17 | 8/15/2025 | 8/21/2025 | 31,121,236 | 3,480,495 | 47,036,393 | 735,744 | 7,299,272 |
| 18 | 8/22/2025 | 8/28/2025 | 26,904,997 | 3,134,841 | 50,171,234 | 754,323 | 8,053,595 |
| 19 | 8/29/2025 | 9/4/2025 | 23,015,833 | 3,518,006 | 53,689,240 | 612,358 | 8,665,953 |
| 20 | 9/5/2025 | 9/11/2025 | 18,885,469 | 3,032,651 | 56,721,891 | 716,909 | 9,382,862 |
| 21 | 9/12/2025 | 9/18/2025 | 15,135,909 | 2,349,701 | 59,071,592 | 512,729 | 9,895,591 |
| 22 | 9/19/2025 | 9/25/2025 | 12,273,479 | 2,265,825 | 61,337,417 | 487,433 | 10,383,024 |
| 23 | 9/26/2025 | 10/2/2025 | 9,520,221 | 1,698,006 | 63,035,423 | 825,437 | 11,208,461 |
| 24 | 10/3/2025 | 10/9/2025 | 6,996,778 | 1,355,429 | 64,390,852 | 1,143,748 | 12,352,209 |
| 25 | 10/10/2025 | 10/16/2025 | 4,497,601 | 1,467,826 | 65,858,678 | 870,008 | 13,222,217 |
| 26 | 10/17/2025 | 10/23/2025 | 2,159,767 | 390,497 | 66,249,175 | 412,326 | 13,634,543 |
| 27 | 10/24/2025 | 10/30/2025 | 1,356,944 | 37,975 | 66,287,150 | 80,282 | 13,714,825 |
| 28 | 10/31/2025 | 11/6/2025 | 1,238,687 | 21,088 | 66,308,238 | - | 13,714,825 |
| 29 | 11/7/2025 | 11/13/2025 | 1,217,599 | 23,562 | 66,331,800 | 1,074,620 | 14,789,445 |
| 30 | 11/14/2025 | 11/20/2025 | 119,417 | 5,121 | 66,336,921 | 28,447 | 14,817,892 |
| 31 | 11/21/2025 | 11/27/2025 | 85,849 | 3,502 | 66,340,423 | - | 14,817,892 |
| 32 | 11/28/2025 | 12/4/2025 | 82,347 | 3,022 | 66,343,445 | - | 14,817,892 |
| 33 | 12/5/2025 | 12/11/2025 | 79,325 | 2,205 | 66,345,650 | 42,833 | 14,860,725 |
| 34 | 12/12/2025 | 12/18/2025 | 34,287 | 442 | 66,346,092 | 1,084 | 14,861,809 |
| 35 | 12/19/2025 | 12/25/2025 | 32,761 | 473 | 66,346,565 | - | 14,861,809 |
| 36 | 12/26/2025 | 1/1/2026 | 32,288 | 464 | 66,347,029 | - | 14,861,809 |
| 37 | 1/2/2026 | 1/8/2026 | 31,824 | 407 | 66,347,436 | 19,425 | 14,881,234 |
| 38 | 1/9/2026 | 1/15/2026 | 11,992 | 247 | 66,347,683 | - | 14,881,234 |
| 39 | 1/16/2026 | 1/22/2026 | 11,745 | 564 | 66,348,247 | - | 14,881,234 |
| 40 | 1/23/2026 | 1/29/2026 | 11,181 | 788 | 66,349,035 | - | 14,881,234 |
| 41 | 1/30/2026 | 2/5/2026 | 10,393 | 728 | 66,349,763 | - | 14,881,234 |
| 42 | 2/6/2026 | 2/12/2026 | 9,665 | 502 | 66,350,265 | - | 14,881,234 |
| 43 | 2/13/2026 | 2/19/2026 | 9,163 | 528 | 66,350,793 | - | 14,881,234 |
| 44 | 2/20/2026 | 2/26/2026 | 8,635 | 537 | 66,351,330 | - | 14,881,234 |
| 45 | 2/27/2026 | 3/5/2026 | 8,098 | 578 | 66,351,908 | - | 14,881,234 |
| 46 | 3/6/2026 | 3/12/2026 | 7,520 | 139 | 66,352,047 | 6,783 | 14,888,017 |
| 47 | 3/13/2026 | 3/19/2026 | 598 | 1 | 66,352,048 | - | 14,888,017 |
| 48 | 3/20/2026 | 3/26/2026 | 597 | 6 | 66,352,054 | - | 14,888,017 |
| 49 | 3/27/2026 | 4/2/2026 | 591 | 63 | 66,352,117 | - | 14,888,017 |
| 50 | 4/3/2026 | 4/9/2026 | 528 | 2 | 66,352,119 | - | 14,888,017 |
| 51 | 4/10/2026 | 4/16/2026 | 526 | 526 | 66,352,645 | - | 14,888,017 |
| 52 | 4/17/2026 | 4/23/2026 | - | - | 66,352,645 | - | 14,888,017 |
Root Cause Analysis - Weekly Reporting Data Inaccuracies:
We have provided weekly revocation‑delay progress updates throughout this incident, including how many certificates were revoked, expired, and remaining active per CCADB incident reporting guidelines. Following a request for a week‑by‑week table, we recompiled the data, covering the full revocation period from May 28, 2025, through present. During this recompilation, we identified errors in previously shared aggregated rollups. Two root causes contributed to these errors;
(1) Incorrect reported impacted population baseline due to duplicates
At the start of this incident, according to CCADB guidelines, we posted a list of impacted certificates. CCADB requires impacted certificates to be published as crt.sh URLs, which reference SHA256 thumbprints. Microsoft PKI Services (MPS) does not store SHA256 thumbprints and instead tracks certificates using serial numbers. Instead of generating crt.sh URLs by calculating SHA256 thumbprints from our internal dataset, we downloaded the list of impacted certificates directly from crt.sh. We used this downloaded dataset to create the post for the impact list and used the count of entries in the downloaded dataset as the baseline count of impacted certificates. However, the crt.sh downloaded dataset in some cases contained multiple rows for the same certificate, so the reported baseline count of impacted certificates was higher than the actual number of impacted certificates. This counting error cascaded through subsequent weekly updates. For the re-compilation of the week-by-week report, we used only MPS' internal datasets, and the totals are now accurate. Processes used to identify and revoke impacted certificates only used MPS' internal datasets and did not use the crt.sh downloaded dataset. The crt.sh downloaded dataset was used only for generating the post for the impact list and the baseline count for the weekly reports. In the future we will not use crt.sh data in this manner.
(2) Revocation counts were overstated by counting batch execution rows (duplicate thumbprints) instead of unique thumbprints
Our internal systems revoked certificates in batches to manage capacity. We found that there were batches that contained duplicates of the same thumbprint and reported these certificates as successfully revoked in each instance even though the certificate was only revoked once. We reported the total batch row count as the number of certificates revoked without taking into account the distinct set of certificate thumbprints, which overstated the weekly revoked figures. However, we can confirm despite the reported counts being higher than the count of certs revoked, we did revoke all the certificates we intended to revoke in the incident except for the 602 remaining per comment 198. We have updated our reporting tooling to ensure deduplication is applied when producing revocation counts going forward.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Update procedures to generate SHA256 thumbprints when we need to produce impacted list of certificates via the CCADB guidelines. | Prevent | Root Cause #1 | Impacted certificate lists published to Bugzilla will be generated from our internal datasets and will include internally calculated SHA‑256 certificate thumbprints, eliminating reliance on crt.sh metadata. | 03/03/2026 | Complete |
| Update procedures for counting revocations to use unique certificate counts rather than batch execution totals. | Prevent | Root Cause #2 | For each weekly revocation‑delay update, the reported “revoked” count is derived from, and equals, the count of distinct certificate thumbprints revoked for the reporting period. | 03/03/2026 | Complete |
The full list of impacted certificates as it pertains to this bug, based on our most recent investigations can be found here:
Batch 1
Batch 2
Batch 3
Batch 4
Batch 5
Batch 6
Batch 7
Batch 8
Batch 9
Batch 10
Batch 11
Batch 12
Batch 13
Batch 14
Batch 15
Batch 16
Batch 17
Batch 18
Batch 19
Batch 20
| Assignee | ||
Comment 206•13 days ago
|
||
Weekly Status Update
We are actively working on the remaining repair item. In addition, we have included the completed repair items from the week‑by‑week breakdown in one consolidated list.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| #1 Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2026-03-23 | In Progress |
| #2 Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| #3 Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| #4 Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| #5 Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| #6 Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| #7 Eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from ICAs with partitioned CRLs (G1 or G2), visible in CT logs. Public can verify through CCADB hierarchy updates showing CRL partitioning implementation on active ICAs and issuance patterns in CT logs. | 2026-02-28 | Complete |
| #8 Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Prevent | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as the plan is executed. | 2026-03-06 | Complete |
| Update procedures to generate SHA256 thumbprints when we need to produce impacted list of certificates via the CCADB guidelines. | Prevent | Root Cause #1 | Impacted certificate lists published to Bugzilla will be generated from our internal datasets and will include internally calculated SHA‑256 certificate thumbprints, eliminating reliance on crt.sh metadata. | 2026-03-03 | Complete |
| Update procedures for counting revocations to use unique certificate counts rather than batch execution totals. | Prevent | Root Cause #2 | For each weekly revocation‑delay update, the reported “revoked” count is derived from, and equals, the count of distinct certificate thumbprints revoked for the reporting period. | 2026-03-03 | Complete |
| Assignee | ||
Comment 207•13 days ago
|
||
Revocation Delay Status Update
-
Total certificates revoked (to date):
- 14,888,017
-
Remaining active certificates:
- 598
-
Total certificates expired and not revoked (to date):
- 66,352,048
-
Estimate for remaining revocations:
- March 23, 2026
*Please note that this list is based on our recompilation.
Comment 208•13 days ago
|
||
Others have commented on Comment 201, however I must also suggest you provide a methodology and not handwave this mess.
(In reply to Microsoft PKI Services from comment #205)
Week-by-Week Breakdown
We are providing the corrected week-by-week breakdown covering the full revocation period from May 28, 2025 through present. This table supersedes previously posted aggregate rollups that contained inaccuracies as noted in past comments. All figures below are derived from our internal certificate lifecycle systems of record.
Week definition: Weeks are reported as Friday–Thursday, and “Remaining Active (EOW)” reflects the population still active at the end of the stated week.
# Period Start Period End Remaining (Start of period) Expired This Period Cumulative Expired Revoked This Period Cumulative Revoked 1 4/25/2025 5/1/2025 81,240,662 2,302,023 2,302,023 - - 2 5/2/2025 5/8/2025 78,938,639 2,755,098 5,057,121 - - 3 5/9/2025 5/15/2025 76,183,541 2,788,260 7,845,381 - - 4 5/16/2025 5/22/2025 73,395,281 2,951,414 10,796,795 - - 5 5/23/2025 5/29/2025 70,443,867 2,889,587 13,686,382 1,000 1,000 6 5/30/2025 6/5/2025 67,553,280 2,707,867 16,394,249 351,817 352,817 7 6/6/2025 6/12/2025 64,493,596 2,560,652 18,954,901 300,000 652,817 8 6/13/2025 6/19/2025 61,632,944 2,554,067 21,508,968 422,563 1,075,380 9 6/20/2025 6/26/2025 58,656,314 2,483,804 23,992,772 616,079 1,691,459 10 6/27/2025 7/3/2025 55,556,431 2,291,466 26,284,238 806,870 2,498,329 11 7/4/2025 7/10/2025 52,458,095 2,291,088 28,575,326 455,123 2,953,452 12 7/11/2025 7/17/2025 49,711,884 2,609,770 31,185,096 786,187 3,739,639 13 7/18/2025 7/24/2025 46,315,927 2,902,482 34,087,578 717,158 4,456,797 14 7/25/2025 7/31/2025 42,696,287 3,286,341 37,373,919 520,000 4,976,797 15 8/1/2025 8/7/2025 38,889,946 3,190,706 40,564,625 802,471 5,779,268 16 8/8/2025 8/14/2025 34,896,769 2,991,273 43,555,898 784,260 6,563,528 17 8/15/2025 8/21/2025 31,121,236 3,480,495 47,036,393 735,744 7,299,272 18 8/22/2025 8/28/2025 26,904,997 3,134,841 50,171,234 754,323 8,053,595 19 8/29/2025 9/4/2025 23,015,833 3,518,006 53,689,240 612,358 8,665,953 20 9/5/2025 9/11/2025 18,885,469 3,032,651 56,721,891 716,909 9,382,862 21 9/12/2025 9/18/2025 15,135,909 2,349,701 59,071,592 512,729 9,895,591 22 9/19/2025 9/25/2025 12,273,479 2,265,825 61,337,417 487,433 10,383,024 23 9/26/2025 10/2/2025 9,520,221 1,698,006 63,035,423 825,437 11,208,461 24 10/3/2025 10/9/2025 6,996,778 1,355,429 64,390,852 1,143,748 12,352,209 25 10/10/2025 10/16/2025 4,497,601 1,467,826 65,858,678 870,008 13,222,217 26 10/17/2025 10/23/2025 2,159,767 390,497 66,249,175 412,326 13,634,543 27 10/24/2025 10/30/2025 1,356,944 37,975 66,287,150 80,282 13,714,825 28 10/31/2025 11/6/2025 1,238,687 21,088 66,308,238 - 13,714,825 29 11/7/2025 11/13/2025 1,217,599 23,562 66,331,800 1,074,620 14,789,445 30 11/14/2025 11/20/2025 119,417 5,121 66,336,921 28,447 14,817,892 31 11/21/2025 11/27/2025 85,849 3,502 66,340,423 - 14,817,892 32 11/28/2025 12/4/2025 82,347 3,022 66,343,445 - 14,817,892 33 12/5/2025 12/11/2025 79,325 2,205 66,345,650 42,833 14,860,725 34 12/12/2025 12/18/2025 34,287 442 66,346,092 1,084 14,861,809 35 12/19/2025 12/25/2025 32,761 473 66,346,565 - 14,861,809 36 12/26/2025 1/1/2026 32,288 464 66,347,029 - 14,861,809 37 1/2/2026 1/8/2026 31,824 407 66,347,436 19,425 14,881,234 38 1/9/2026 1/15/2026 11,992 247 66,347,683 - 14,881,234 39 1/16/2026 1/22/2026 11,745 564 66,348,247 - 14,881,234 40 1/23/2026 1/29/2026 11,181 788 66,349,035 - 14,881,234 41 1/30/2026 2/5/2026 10,393 728 66,349,763 - 14,881,234 42 2/6/2026 2/12/2026 9,665 502 66,350,265 - 14,881,234 43 2/13/2026 2/19/2026 9,163 528 66,350,793 - 14,881,234 44 2/20/2026 2/26/2026 8,635 537 66,351,330 - 14,881,234 45 2/27/2026 3/5/2026 8,098 578 66,351,908 - 14,881,234 46 3/6/2026 3/12/2026 7,520 139 66,352,047 6,783 14,888,017 47 3/13/2026 3/19/2026 598 1 66,352,048 - 14,888,017 48 3/20/2026 3/26/2026 597 6 66,352,054 - 14,888,017 49 3/27/2026 4/2/2026 591 63 66,352,117 - 14,888,017 50 4/3/2026 4/9/2026 528 2 66,352,119 - 14,888,017 51 4/10/2026 4/16/2026 526 526 66,352,645 - 14,888,017 52 4/17/2026 4/23/2026 - - 66,352,645 - 14,888,017
The starting figures for 'remaining' certificates is "81,240,662", but this doesn't tell us the actual impacted figures for the original incident which now needs corrected.
Q1: When can we expect an updated final incident report containing the figures for the full corpus and figures on how many were missed vs prior claims? The transparency is key.
And as eluded to in Comment 204 I'm having trouble finding some impacted certificates in certs_final_final_for_real_copy(2).csv. Here is but one example:
Serial: 33018b8a39a455abb952e0d7670000018b8a39
It's in the Azure Roots, the validity period matches, and it expires in a day so what gives. I did collate the 20 txt files, and would suggest zipping them for your newer incidents.
Q2: Can you please explain why, as a simple example, the above certificate was missed and how many more are missing?
Root Cause Analysis - Weekly Reporting Data Inaccuracies:
We have provided weekly revocation‑delay progress updates throughout this incident, including how many certificates were revoked, expired, and remaining active per CCADB incident reporting guidelines. Following a request for a week‑by‑week table, we recompiled the data, covering the full revocation period from May 28, 2025, through present. During this recompilation, we identified errors in previously shared aggregated rollups. Two root causes contributed to these errors;
(1) Incorrect reported impacted population baseline due to duplicates
At the start of this incident, according to CCADB guidelines, we posted a list of impacted certificates. CCADB requires impacted certificates to be published as crt.sh URLs, which reference SHA256 thumbprints. Microsoft PKI Services (MPS) does not store SHA256 thumbprints and instead tracks certificates using serial numbers. Instead of generating crt.sh URLs by calculating SHA256 thumbprints from our internal dataset, we downloaded the list of impacted certificates directly from crt.sh. We used this downloaded dataset to create the post for the impact list and used the count of entries in the downloaded dataset as the baseline count of impacted certificates. However, the crt.sh downloaded dataset in some cases contained multiple rows for the same certificate, so the reported baseline count of impacted certificates was higher than the actual number of impacted certificates. This counting error cascaded through subsequent weekly updates. For the re-compilation of the week-by-week report, we used only MPS' internal datasets, and the totals are now accurate. Processes used to identify and revoke impacted certificates only used MPS' internal datasets and did not use the crt.sh downloaded dataset. The crt.sh downloaded dataset was used only for generating the post for the impact list and the baseline count for the weekly reports. In the future we will not use crt.sh data in this manner.
Q3: Why on earth are you not storing SHA256 hashes for all of your certificates? Not only is it part of incident reporting, it's the intended method for relying parties to report impacted certificates.
Q4: If I were to report a compromised certificate to Microsoft PKI in February 2026 by identifying its SHA256 hash, would you then check crt.sh, or similar third-party service such as censys to figure out what serial number the certificate contained?
Q5: Can Microsoft PKI explain what a pre-certificate is?
Q6: Can Microsoft PKI explain why a pre-certificate and a final certificate should have the same serial number?
Q7: Did the personnel involved in revocation have any experience or understanding of certificate issuance?
I will not entertain the rest of that paragraph, needless to say there have been endless warnings on the use of Microsoft PKI's math to date. This includes Microsoft PKI alleging they have fixed all the issues and that there is nothing to be worried about. (Comment 126, Comment 143).
(2) Revocation counts were overstated by counting batch execution rows (duplicate thumbprints) instead of unique thumbprints
Our internal systems revoked certificates in batches to manage capacity. We found that there were batches that contained duplicates of the same thumbprint and reported these certificates as successfully revoked in each instance even though the certificate was only revoked once. We reported the total batch row count as the number of certificates revoked without taking into account the distinct set of certificate thumbprints, which overstated the weekly revoked figures. However, we can confirm despite the reported counts being higher than the count of certs revoked, we did revoke all the certificates we intended to revoke in the incident except for the 602 remaining per comment 198. We have updated our reporting tooling to ensure deduplication is applied when producing revocation counts going forward.
Given that you intended to revoke pre-certificates alongside final certificates it does not sound like you did what you intended to do. As a simple example you gave a list of 2 certificates to revoke not recognizing they were duplicates and got 1 revocation. That the 2 certificates were unintended duplicates on the serial number side does not change that you intended to revoke 2 certificates.
And yes, I aware that it's a serial number in the CRL for revocation purposes, but the point still stands. This impacts your original claims of "crl bloat", when we scale it to, say, the millions it starts to become an issue.
Another slight issue is that "crl bloat" is not why the 600 certificates could be handled as you're required to do, but I don't believe there is any possible answers you could provide on that front that would improve the situation.
Action Item Description Kind Corresponding Root Cause(s) Evaluation Criteria Due Date Status Update procedures to generate SHA256 thumbprints when we need to produce impacted list of certificates via the CCADB guidelines. Prevent Root Cause #1 Impacted certificate lists published to Bugzilla will be generated from our internal datasets and will include internally calculated SHA‑256 certificate thumbprints, eliminating reliance on crt.sh metadata. 03/03/2026 Complete Update procedures for counting revocations to use unique certificate counts rather than batch execution totals. Prevent Root Cause #2 For each weekly revocation‑delay update, the reported “revoked” count is derived from, and equals, the count of distinct certificate thumbprints revoked for the reporting period. 03/03/2026 Complete
Can I recommend you start manage revocations per-certificate rather than your current approach which had caused ... all of this? Even 10 months in you're incapable of providing accurate figures of what you've done each week, the impacted certificates are questionable, and we're no closer to you being capable of operating to the baseline requirements.
The full list of impacted certificates as it pertains to this bug, based on our most recent investigations can be found here:
After explaining to the class why serial numbers are bad and cause problems for revocation, you then provide a final corpus of only serial numbers.
Q8: Given that allegedly the unnumbered action item above includes a procedure to produce SHA256 'thumbprints' and was 'complete' 2026-03-03 why have you not provided any?
The fact that this action item materialized and has immediately not shown any substance is not helping matters. Likewise for prior claims (Comment 162) that:
As suggested, we will explicitly number the action items.
Comment 206 makes it clear that there is no human review of this before its posted, just a merge of action items as its apparent at a glance that something doesn't fit. The root cause numbers aren't even coherent either, pointing at either the final incident report or an offhand comment that isn't part of any incident report - despite appearing to be a unique and notifiable event.
The update to action item 1 implies that the remaining certificates will be revoked by 2026-03-23, but that isn't listed on the breakdown table.
Bluntly as the original reporter of this mess I have no confidence in Microsoft PKI's role in WebPKI going forward, and they are making zero attempts to improve that position. We're still lacking any firm dates of when they believe they will be in compliance and capable of operating as a functional CA in the WebPKI. The most recently discovered missing certificates and revocation dates make that very clear, as is their reasoning.
An open question to Mozilla, Apple, Chrome, and Microsoft Root Programs (noting that Microsoft PKI have stated they are independent, but also they don't have an account here anyway):
Is this incident response the bar we are now setting for CAs going forward?
Should all future incidents take 10+ months, with 200+ comments trying to get a CA to perform their basic duties?
Where is the threshold for using OneCRL, or a similar mechanism to take the decision out of a misbehaving CA's hands?
What can we do to improve things going forward?
I appreciate this is likely best taken to MRSP, but it does mainly pertain to this incident currently and it is an evolving issue.
Comment 209•10 days ago
|
||
Returning after some poor health, it is unexpected to see this Microsoft PKI bug still open and still not resolved.
I agree with Wayne's last few comments, but I feel the answer to his question are 'No, but Microsoft are too big to fail, so we will not do anything'.
Microsoft have prove now that they cannot competently run a CA. This has been true for months. Mis-issuance of millions of certificates, and a pathetic try to barely fix the problem.
It is proof that the root-programs will not take action when it is a 'big' CA. It is safer for a big CA to mis-issue everything and claim a year to fix than to admit smaller problems.
I think we can expect similar reaction when incidents happen again in future.
Why would a big CA do any better?
Comment 210•9 days ago
|
||
(In reply to Mike Shaver (:shaver emeritus) from comment #202)
For Mozilla:
- Is Microsoft's extremely lengthy and confusing revocation process consistent with Mozilla's expectations for root CAs who have the technical means to issue a certificate for literally any site on the internet?
- If not (please please), what remedies would you consider to be appropriate for the extended and (IMO) sloppy delrev incident that we've all been enduring alongside Microsoft for the last ten months?
- If a cross-signed CA were to apply for root inclusion after an incident like this, what would be considered to be sufficient evidence that they were in fact appropriate for inclusion?
Hi Mike,
The revocation obligation in this case arose from a discrepancy between Microsoft PKI Services' CPS and the certificate profile being used in practice. The CPS stated that the keyEncipherment key usage bit was absent, when in fact it had always been present: the certificates themselves were technically compliant with respect to the key usage, but the CPS documentation was incorrect. Once this discrepancy was identified, section 4.9.1.1(12) of the TLS Baseline Requirements required revocation of the affected certificates within five (5) days.
Mozilla has taken steps in recent years to reduce delayed revocations and to strengthen CA Operators' ability to respond to large-scale revocation events. These efforts include supporting shorter certificate lifetimes; requiring CA Operators to develop, maintain, and test mass revocation plans; and encouraging operational practices that improve automation, visibility, and readiness for large-scale certificate replacement.
Mozilla generally expects publicly trusted CA Operators to maintain sufficient capability to meet TLS BR revocation timelines. When incidents reveal gaps in that capability, we expect those gaps to be clearly identified and addressed.
Mozilla evaluates incidents and determines appropriate responses based on several factors: the nature and scope of the incident; the CA Operator's transparency, cooperation, and community engagement; the timeliness and effectiveness of remediation; and whether structural improvements are implemented to prevent recurrence.
This incident revealed limitations in Microsoft PKI Services' ability to perform revocation at the required scale. Microsoft has acknowledged these gaps and committed to remediation, including: CRL partitioning; migration to hierarchies designed to avoid CRL scalability constraints; reduced certificate lifetimes; development and testing of a mass-revocation plan; and introduction of standby Issuing CAs with improved lifecycle management.
Microsoft reported that revocation at the required scale would have produced CRLs large enough to risk disruption for relying parties, and it proposed a staged revocation approach. While this approach resulted in many certificates expiring rather than being revoked, we have been monitoring progress as reported in this bug. Microsoft originally reported approximately 72 million affected certificates. Based on status updates provided in this bug, approximately 15 million were revoked (while a large majority expired without revocation). Given that the number of certificates still now requiring revocation appears to be less than 600, Mozilla expects Microsoft to complete those revocations promptly so that this incident can be closed.
Root inclusion decisions are made on a case-by-case basis. Mozilla considers each applicant's operational maturity, incident history, transparency, remediation effectiveness, and demonstrated ability to meet ecosystem requirements. A CA Operator that had experienced a comparable incident would need to provide clear evidence that the underlying causes had been resolved and that it has the capability to meet those requirements going forward. Mozilla applies these expectations consistently across all participants in the Root Program.
Once the remaining revocations are completed and this incident is closed, Mozilla will evaluate Microsoft PKI Services, including the timeliness of revocation and the effectiveness of remediation, and will consider the results in its ongoing oversight of the Root Program.
Mozilla appreciates the continued community engagement and scrutiny this incident has prompted. Such feedback helps inform Mozilla’s ongoing evaluation of incidents and our Root Program policies.
Thanks,
Ben
Comment 211•9 days ago
|
||
[In response to Comment 208]
As is the case with all incidents filed to the “CA Compliance” Bugzilla component, the Chrome Root Program has closely followed along with this report and the others opened by Microsoft PKI Services over the past few months. We appreciate the community remaining actively engaged and providing timely, diligent feedback across this and other ongoing reports. We especially would like to thank Wayne for the observant and thoughtful participation in the incident reporting process.
As we did a few months ago, it’s important to restate that the Chrome Root Program expects CA Owners to comply with our program policy, their own policies, and the TLS BRs. However, the enforcement mechanisms and the visibility of actions in response to when those expectations are not met might not always be immediate or publicly evident. The frequency of incidents for a CA Owner, the quality of responses (detail, transparency, etc.), and commitment to make meaningful and demonstrable change aligned with evidenced continuous improvement are all significant factors when we evaluate CA Owners for continued inclusion in the Chrome Root Store. It should also come as no surprise that this same criteria is considered when CA Owners are looking to replace roots. We emphasized these views, and more, before.
Also said before, while a single incident of delayed revocation by a CA might not lead to an explicit enforcement action, it does feed into our constant evaluation and assessment of a CA Owner's ability to comply with the policies and commitments they have made to the ecosystem, and their ability to competently and reliably serve the community. These evaluations may result in the removal of a CA Owner from the Chrome Root Store, or the application of other technical controls that affect how the certificates they issue are trusted in Chrome.
Regarding future improvements, and related to the Chrome Root Program, we believe we have championed progress towards the systemic resolution of a major contributing factor to this incident's root cause: lack of agility. Within the CA/Browser Forum, proposing short-lived certificates, endorsing shorter certificate lifetimes, and proposing the future sunset of manual validation methods. Within our policy, requiring automated solution support, requiring ARI or ARI-like features, encouraging shorter subordinate CA lifetimes, and requiring Markdown or AsciiDoc formatted policy documents that will lend themselves to automation going forward. We also suggest several illustrative behaviors, like utilizing partitioned CRLs, that are considered positive signals for Chrome Root Store inclusion acceptance. During the last two CA/Browser Forum F2F events, we’ve also encouraged improvements to both (a) profile simplicity and (b) system-generated and enforced profiles (where the actual certificate profiles on the CA are used to generate aspects of the CPS, rather than allowing humans to generate it where they are more likely to make the same type of mistakes that began Microsoft PKI Services’ incident). Similarly, we’ve respectfully pushed back on non-compelling ideas that create additional opportunities for delayed remediation and can work against agility.
Generally speaking, the volume of comments on a bug is not an absolute indicator of quality. However, the extent of community participation, the ongoing concerns raised, and the numerous data corrections required in this thread does speak volumes.
Overall, Microsoft PKI Services' handling of this incident has fallen short of expectations. The prolonged response highlighted significant gaps in operational readiness for mass revocation (despite it being required by the TLS BRs, and before that, the Mozilla Root Store Policy), delays in CA agility, challenges in generating authoritative reporting metrics, unfamiliarity with Bugzilla incident history, and misalignment with the CCADB Incident Reporting Guidelines. It’s especially concerning, following so many of Microsoft PKI Service’s actions being marked as completed, there is still an inability to perform a timely revocation of 602 certificates.
Though Microsoft has made some positive commitments in response to this incident (e.g., reduced default certificate lifetime), given the views offered in this bug, it’s evident community member confidence is strained. We welcome community feedback on any additional steps Microsoft PKI Services can take to restore confidence, but ultimately the responsibility to prove ongoing reliability, transparency, and maturity rests with Microsoft PKI Services, as it does with any CA Owner trusted by default in Chrome.
| Assignee | ||
Comment 212•9 days ago
|
||
Response to Comment 200 - Zacharias
These ”internal relying parties” are they internal to MSPKI or are they ”internal” as in other parts of Microsoft?
Regarding the clarification on “internal relying parties”: these are Microsoft services outside of Microsoft PKI Services (MPS) that consume certificates issued by our CAs. They are not internal to the MPS service itself.
here has been many incidents on bugzilla about prioritizing avoiding business disruption over timely revocation in recent years.
What lessons have MSPKI learned from studying these incidents?
Specifically regarding these two would be very enlightening:
- DigiCert: Delayed revocation of 1910322
- Entrust: Delayed revocation of EV TLS certificates with missing cPSuri
We understand the concern behind this question. The DigiCert and Entrust incidents reinforced industry expectations around revocation handling for publicly trusted CAs. We were aware of those precedents and the expectations they set at an industry level.
Some of the key lessons reinforced by those incidents were:
- CA Pinning is incompatible with public WebPKI revocation timelines and cannot be handled during an incident.
- Revocation obligations should not be deferred based on business impact.
While we understood these expectations, they had not yet been fully translated into controls that would prevent similar outcomes in our environment. The remediation work in this bug is focused on closing that gap. by making revocation readiness a standing capability, rather than something addressed during an incident.
| Assignee | ||
Comment 213•9 days ago
|
||
Response to Comment 202- Mike Shaver
1.Does Microsoft PKI Services keep a record of all the certificates that it issues? If not, what are the bounds on the set that are recorded and retained? How were those bounds decided upon?
Yes. Microsoft PKI Services (MPS) maintains authoritative internal certificate inventory that serve as the source of truth for per-certificate lifecycle data as mentioned in Comment 188. The bounds are based on Baseline Requirements 5.4.
2.Does Microsoft PKI Services keep a record of all the certificates that it revokes? If not, etc.
but finally and most importantly IMO
Yes. The same systems of record and audit logs retain the revocation history for each certificate (including revocation timing), and those audit logs remain intact and are being retained in accordance with BR 5.4.3.
3.Does Microsoft PKI Services feel that their handling of this revocation event is an example that should be followed by other CAs that are seeking to be included in future MRSP updates?
Absolutely not — we are not presenting this incident as a model. This incident identified critical gaps in our operational readiness and reporting at this scale, which we have been addressing through the remediation work documented in this bug.
| Assignee | ||
Comment 214•9 days ago
|
||
Response to Comment 203 - Zacharias
Whose gaps in telemetry and automation? You or the subscribers? Subscribers lacking capabilities is not a valid reason to delay revocation. If the gaps are within MSPKI, please explain in a more detailed manner so that the community can understand.
Microsoft PKI Services provides mechanisms that support automated certificate rotation and revocation. For a subset of services that are not yet onboarded to these automation mechanisms, certificate rotations require additional manual coordination. This affects execution timing. This is the telemetry/automation gap that we are referring to.
So this is a separate incident or a new root cause? Because CRL bloat is the only root cause listed.
We regard this issue as a general and common one not specific to this incident. However, as part of the remediations for this bug and to meet MRSP requirements, we have notified all of our subscribers of our revocation responsibilities and the need for automation.
| Assignee | ||
Comment 215•9 days ago
|
||
Response to Comment 204 - Wayne
With the imminent release of the remaining certificates breakdown dropping next week, aka today, how confident are Microsoft that the last certificate expires April 15th...?
I'm doing a cursory glance at censys with a query I made last year (and updated to their new platform) and I'm not seeing quite what you're claiming.
Any information and transparency on the remaining "602" certificates will be refreshing to see.
As noted in Comment 201 and detailed in the week‑by‑week breakdown in Comment 205, we identified and corrected specific issues that affected earlier aggregate reporting.
The remaining expiration dates, including April 15, are derived directly from internal certificate lifecycle records. Based on this recompilation, we have high confidence in the remaining figures and expirations.
| Assignee | ||
Comment 216•8 days ago
|
||
Response to Comment 208 - Wayne
Q1: When can we expect an updated final incident report containing the figures for the full corpus and figures on how many were missed vs prior claims? The transparency is key.
The “81,240,662” figure represents the total impacted certificates for this bug. Since this bug relates to delayed revocation, we interpret “impacted certificates” to be all certificates that we failed to revoke in a timely manner once we became aware of the problem. For that reason, the start date for counting the impacted certificates for this bug was set to 4/25/2025 (the date we became aware of the problem).
That said, we acknowledge that we should update the Total Impacted certificates for Bugzilla 1962829. The number of total impacted certificates for Bugzilla 1962829 is 99,957,701. Comment 67 in that bug reflects this update.
Can you clarify what you mean by “figures on how many were missed vs prior claims?”
And as eluded to in Comment 204 I'm having trouble finding some impacted certificates in certs_final_final_for_real_copy(2).csv. Here is but one example:
Serial: 33018b8a39a455abb952e0d7670000018b8a39It's in the Azure Roots, the validity period matches, and it expires in a day so what gives. I did collate the 20 txt files, and would suggest zipping them for your newer incidents.
Q2: Can you please explain why, as a simple example, the above certificate was missed and how many more are missing?
The certificate referenced is a precertificate. Per the CCADB Incident Reporting Guidelines definition section, it states “Unless otherwise stated, “certificate” on this page refers to a final certificate, distinct from a precertificate (as described in RFC 6962).”
Based on this guidance we only included final certificates in our list. We would welcome feedback on this interpretation if others interpret this differently.
Q3: Why on earth are you not storing SHA256 hashes for all of your certificates? Not only is it part of incident reporting, it's the intended method for relying parties to report impacted certificates.
Microsoft PKI Services has historically tracked certificates using serial number and SHA1 thumbprints for performance optimization, though SHA256 thumbprints could be calculated on demand. We agree that storing the SHA256 hash is a good practice and learning from this incident. We have added this as an internal repair item to begin tracking SHA256 thumbprints in our data sources to better support future incident reporting.
Q4: If I were to report a compromised certificate to Microsoft PKI in February 2026 by identifying its SHA256 hash, would you then check crt.sh, or similar third-party service such as censys to figure out what serial number the certificate contained?
Historically, whenever we receive certificate problem reports, they are based on certificate file which contain metadata we can correlate back to our system. Although we have never been in a situation where only the SHA 256 hash is available, we agree this is a potential scenario and it is a good practice to store the SHA256 hash as well. As mentioned above we have taken an internal repair item to track this value moving forward.
Q5: Can Microsoft PKI explain what a pre-certificate is?
Q6: Can Microsoft PKI explain why a pre-certificate and a final certificate should have the same serial number?
The definition and explanation of a precertificate is in BR 7.1.2.9.
Q7: Did the personnel involved in revocation have any experience or understanding of certificate issuance?
Yes
Q8: Given that allegedly the unnumbered action item above includes a procedure to produce SHA256 'thumbprints' and was 'complete' 2026-03-03 why have you not provided any?
The current procedures relies on retroactive calculations for the SHA256 hashes. We do not plan to retroactively calculate SHA256 hashes for this population of certs. As mentioned above, to be better prepared for incidents of this scale, we have added an internal repair item to start tracking SHA256 thumbprint in our internal data sources moving forward.
| Assignee | ||
Comment 217•8 days ago
|
||
Update to Revocation
We would like to provide an update to Action Item #1. Of the 602 certificates left for revocation, 597 were revoked today and 5 expired. No further certificates remain outstanding.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| #1 Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2026-03-23 | Complete |
| #2 Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| #3 Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| #4 Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| #5 Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| #6 Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| #7 Eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from ICAs with partitioned CRLs (G1 or G2), visible in CT logs. Public can verify through CCADB hierarchy updates showing CRL partitioning implementation on active ICAs and issuance patterns in CT logs. | 2026-02-28 | Complete |
| #8 Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Prevent | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as the plan is executed. | 2026-03-06 | Complete |
| Update procedures to generate SHA256 thumbprints when we need to produce impacted list of certificates via the CCADB guidelines. | Prevent | Root Cause #1 | Impacted certificate lists published to Bugzilla will be generated from our internal datasets and will include internally calculated SHA‑256 certificate thumbprints, eliminating reliance on crt.sh metadata. | 2026-03-03 | Complete |
| Update procedures for counting revocations to use unique certificate counts rather than batch execution totals. | Prevent | Root Cause #2 | For each weekly revocation‑delay update, the reported “revoked” count is derived from, and equals, the count of distinct certificate thumbprints revoked for the reporting period. | 2026-03-03 | Complete |
Comment 218•8 days ago
|
||
(In reply to chrome-root-program from comment #211)
Though Microsoft has made some positive commitments in response to this incident (e.g., reduced default certificate lifetime), given the views offered in this bug, it’s evident community member confidence is strained. We welcome community feedback on any additional steps Microsoft PKI Services can take to restore confidence, but ultimately the responsibility to prove ongoing reliability, transparency, and maturity rests with Microsoft PKI Services, as it does with any CA Owner trusted by default in Chrome.
The change in tact and having a human response in the past 24 hours is encouraging. As far as specific steps, we've already been told that a postmortem will occur. We have precedent for these postmortem documents being produced, and it would give Microsoft PKI Services more freedom to delve into the details of what went wrong, and how a similar issue in a month won't cause a relapse.
As far as fine-detailed actions they can take, we'd functionally be reiterating a discussion done in depth 2 years ago. I'd rather point Microsoft PKI Services to read the previous incidents and to align with practices made standard from then onwards.
(In reply to Microsoft PKI Services from comment #212)
We understand the concern behind this question. The DigiCert and Entrust incidents reinforced industry expectations around revocation handling for publicly trusted CAs. We were aware of those precedents and the expectations they set at an industry level.
Some of the key lessons reinforced by those incidents were:
- CA Pinning is incompatible with public WebPKI revocation timelines and cannot be handled during an incident.
- Revocation obligations should not be deferred based on business impact.
While we understood these expectations, they had not yet been fully translated into controls that would prevent similar outcomes in our environment. The remediation work in this bug is focused on closing that gap. by making revocation readiness a standing capability, rather than something addressed during an incident.
While I appreciate that glacial improvements are being made, there is the concern over why it's taken this long for action to be made. I don't solely mean in response to this incident, but that Mozilla has had policies in place for mass revocation before it got placed into the baseline requirements. When the issue was debated in CABF, that should have been when plans and actions started happening to coincide with the regulation deadline.
We've yet to hear why that never occurred. While it is slightly out of scope for this issue specifically, this issue should not have occurred had the mass revocation playbook existed and reflected operational reality. This should have occurred when you met with DigiCert every two to four weeks. While it would be nice to see any discussions between those parties in the past year for compliance issues, that is unlikely. I would note that DigiCert should have found out these issues proactively given their relationship with Microsoft PKI Services.
(In reply to Microsoft PKI Services from comment #216)
Response to Comment 208 - Wayne
Q1: When can we expect an updated final incident report containing the figures for the full corpus and figures on how many were missed vs prior claims? The transparency is key.
The “81,240,662” figure represents the total impacted certificates for this bug. Since this bug relates to delayed revocation, we interpret “impacted certificates” to be all certificates that we failed to revoke in a timely manner once we became aware of the problem. For that reason, the start date for counting the impacted certificates for this bug was set to 4/25/2025 (the date we became aware of the problem).
That said, we acknowledge that we should update the Total Impacted certificates for Bugzilla 1962829. The number of total impacted certificates for Bugzilla 1962829 is 99,957,701. Comment 67 in that bug reflects this update.
Understood, and thank you for updating the original incident.
Can you clarify what you mean by “figures on how many were missed vs prior claims?”
I can, however it mainly relates to the original incident at this point in time. We have a final figure of 99.9 million, but the original impacted figure was stagnant since the initial disclosure.
Ideally we would have a week-by-week breakdown of divergence between: Impacted/Valid/Expired/Revoked and what those figures looked like with the corpus now vs then. However I do appreciate that could be burdensome and really I'm not looking for 1:1 numbers matching in that regard, but an overall picture of how the 'query' issue impacted the plan in practice. A similar trend on CRL size, vs what it was intended to be without the precert issue would be nice too.
Q2: Can you please explain why, as a simple example, the above certificate was missed and how many more are missing?
The certificate referenced is a precertificate. Per the CCADB Incident Reporting Guidelines definition section, it states “Unless otherwise stated, “certificate” on this page refers to a final certificate, distinct from a precertificate (as described in RFC 6962).”
Based on this guidance we only included final certificates in our list. We would welcome feedback on this interpretation if others interpret this differently.
Yup, that is correct - it's a precert without a final certificate and I missed that detail.
Q3: Why on earth are you not storing SHA256 hashes for all of your certificates? Not only is it part of incident reporting, it's the intended method for relying parties to report impacted certificates.
Microsoft PKI Services has historically tracked certificates using serial number and SHA1 thumbprints for performance optimization, though SHA256 thumbprints could be calculated on demand. We agree that storing the SHA256 hash is a good practice and learning from this incident. We have added this as an internal repair item to begin tracking SHA256 thumbprints in our data sources to better support future incident reporting.
I hope there is an absolute move away from SHA-1. With NIST advocating it stopped being used from January 2031 it would impact FedRAMP surely. CABF is underway to remove it as-is.
Q4: If I were to report a compromised certificate to Microsoft PKI in February 2026 by identifying its SHA256 hash, would you then check crt.sh, or similar third-party service such as censys to figure out what serial number the certificate contained?
Historically, whenever we receive certificate problem reports, they are based on certificate file which contain metadata we can correlate back to our system. Although we have never been in a situation where only the SHA 256 hash is available, we agree this is a potential scenario and it is a good practice to store the SHA256 hash as well. As mentioned above we have taken an internal repair item to track this value moving forward.
I can't help but notice that you did not answer my question. What would be your source for tying the SHA256 hash to a serial number?
Q5: Can Microsoft PKI explain what a pre-certificate is?
Q6: Can Microsoft PKI explain why a pre-certificate and a final certificate should have the same serial number?The definition and explanation of a precertificate is in BR 7.1.2.9.
Q7: Did the personnel involved in revocation have any experience or understanding of certificate issuance?
Yes
I'm still perplexed that this was discovered so late, as it seems like an incredibly odd mistake to repeat. Given this all about a query mistake, could we see the query?
Q8: Given that allegedly the unnumbered action item above includes a procedure to produce SHA256 'thumbprints' and was 'complete' 2026-03-03 why have you not provided any?
The current procedures relies on retroactive calculations for the SHA256 hashes. We do not plan to retroactively calculate SHA256 hashes for this population of certs. As mentioned above, to be better prepared for incidents of this scale, we have added an internal repair item to start tracking SHA256 thumbprint in our internal data sources moving forward.
Then from our perspective you have failed the evaluation criteria for this action item. The entire action item is that you would produce SHA256 hashes when compiling any impacted certificate corpuses. This was set as complete on 2026-03-03, yet on 2026-03-13 we received an updated final batch that was non-compliant with your own standards.
I'm looking for some consistency and the ability to hold yourself to your own self-imposed standards here.
(In reply to Microsoft PKI Services from comment #217)
Update to Revocation
We would like to provide an update to Action Item #1. Of the 602 certificates left for revocation, 597 were revoked today and 5 expired. No further certificates remain outstanding.
I'm glad this was complete before 2026-03-23, although I must question the sharp turnaround in response time and divergence from previous plans. It is good that someone over there can get the basics done when required.
Action Item Description Kind Corresponding Root Cause(s) Evaluation Criteria Due Date Status #1 Revoke impacted certificates (in batches beginning 5/28/2025) Mitigate Root Cause 1 Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. 2026-03-23 Complete #2 Create training and TSG Documentation to educate team on revocation expectations Prevent Root Cause 1 Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. 2025-07-31 Complete #3 Reduce usage of public PKI Prevent Root Cause 1 Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. 2025-09-30 Complete #4 Exercise and refine the mass revocation playbook Prevent Root Cause 1 Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. 2025-09-01 Complete #5 Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. Preventive Root Cause 1 Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. 2025-08-22 Complete #6 Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. Preventive Root Cause 1 Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. 2025-09-22 Complete #7 Eliminate issuance from non-partitioned ICAs. Mitigate Root Cause 1 Effectiveness will be measured by the percentage of certificates issued from ICAs with partitioned CRLs (G1 or G2), visible in CT logs. Public can verify through CCADB hierarchy updates showing CRL partitioning implementation on active ICAs and issuance patterns in CT logs. 2026-02-28 Complete #8 Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. Prevent Root Cause 1 Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as the plan is executed. 2026-03-06 Complete Update procedures to generate SHA256 thumbprints when we need to produce impacted list of certificates via the CCADB guidelines. Prevent Root Cause #1 Impacted certificate lists published to Bugzilla will be generated from our internal datasets and will include internally calculated SHA‑256 certificate thumbprints, eliminating reliance on crt.sh metadata. 2026-03-03 Complete Update procedures for counting revocations to use unique certificate counts rather than batch execution totals. Prevent Root Cause #2 For each weekly revocation‑delay update, the reported “revoked” count is derived from, and equals, the count of distinct certificate thumbprints revoked for the reporting period. 2026-03-03 Complete
You didn't read my comment on the issues with your action items and sloppiness. To quote Comment 208:
The fact that this action item materialized and has immediately not shown any substance is not helping matters. Likewise for prior claims (Comment 162) that:
As suggested, we will explicitly number the action items.
Comment 206 makes it clear that there is no human review of this before its posted, just a merge of action items as its apparent at a glance that something doesn't fit. The root cause numbers aren't even coherent either, pointing at either the final incident report or an offhand comment that isn't part of any incident report - despite appearing to be a unique and notifiable event.
The update to action item 1 implies that the remaining certificates will be revoked by 2026-03-23, but that isn't listed on the breakdown table.
Your Final Incident Report only has 1 root cause - CRL bloat. We've not had an updated incident report this entire time, even as more root causes were made evident. What is this mysterious root cause 2?
Comment 219•6 days ago
|
||
In response to Comment 218:
This should have occurred when you met with DigiCert every two to four weeks. While it would be nice to see any discussions between those parties in the past year for compliance issues, that is unlikely. I would note that DigiCert should have found out these issues proactively given their relationship with Microsoft PKI Services.
Microsoft PKI Services’ Root CA certificates are directly trusted in the different Root stores, with the inherent compliance obligations under the different Root program policies. DigiCert provides cross-signing to certain of those CAs to provide additional ubiquity. DigiCert has a policy of only cross-signing CAs that are already independently part of Root program/CCADB supervision.
As noted by Wayne, we take our oversight responsibilities seriously. We meet regularly with Microsoft PKI Services to discuss changes in Root and CCADB policy as well as pending changes to CA/B Forum standards. Historically we worked with Microsoft to implement pre-issuance linting, and more recently on their plans to improve their webPKI CA operations and test their mass revocation plans. On an ongoing basis we review changes to their CPS as well as their audit reports for compliance with our CP and with external standards.
| Assignee | ||
Comment 220•6 days ago
|
||
Weekly Status Update
We have completed the revocation of all outstanding impacted certificates and there are no remaining certificates that must be revoked. All identified actions associated with the bug are now complete. We are conducting a detailed postmortem of this incident before submitting the closure report. We will publish the learnings from this exercise by April 8th.
Updated•6 days ago
|
Comment 221•5 days ago
|
||
This bug has too many comments about too many distinct failures and root causes and unanswered questions to keep track of. I recommend that Microsoft PKI Services split this out into separate incident reports in separate Bugzilla bugs for each of the identified failures.
At a minimum I would like to see these incident reports:
- Keep the current Bug 1965612 and its incident report focused on the initial revocation delay caused by lack of CRL partitioning, cross-signatures done at the intermediate level and lack of standby intermediates.
- File a new bug with a new incident report about the incorrect reported counts of revoked certificates identified in Comment 124.
- File a new bug with a new incident report about the incorrect reported counts of affected and revoked certificates identified in Comment 177.
- File a new bug with a new incident report about the delay between Comment 177 and Comment 205 to disclose details about the reporting error, which is more than the 14 days allowed by the CCADB incident reporting guidelines.
- File a new bug with a new incident report about the automation and telemetry gaps identified in Comment 125 and why this resulted in a further revocation delay for 10K-20K certificates.
- File a new bug with a new incident report about why ICA pinning and Client Auth resulted in a further revocation delay identified in Comment 175.
- File a new bug with a new incident report about the failure to discover 602 certificates as identified in Comment 197.
- File a new bug with a new incident report about the failure to revoke the 602 certificates within 5 days after they were discovered in Comment 197.
These are all separate but related incidents, and keeping them all in the same bug and not filling out the incident report template for each of them makes it close to impossible for the community to track remediation of these. Filling out the incident report template for each of them might also naturally answer many of the questions for these related incidents, that Microsoft PKI Services has not yet managed to answer.
| Assignee | ||
Comment 222•3 days ago
|
||
Response to Comment 218 - Wayne
Ideally we would have a week-by-week breakdown of divergence between: Impacted/Valid/Expired/Revoked and what those figures looked like with the corpus now vs then. However I do appreciate that could be burdensome and really I'm not looking for 1:1 numbers matching in that regard, but an overall picture of how the 'query' issue impacted the plan in practice. A similar trend on CRL size, vs what it was intended to be without the precert issue would be nice too.
In regards to the CRL size inquiry, per Comment 67, we revoked the maximum number of certificates to hit our CRL threshold. Pre-certificates did not impact CRL size. The query issue did not impact our revocation plan as mentioned in Comment 205.
I can't help but notice that you did not answer my question. What would be your source for tying the SHA256 hash to a serial number?
As of today we would rely on a 3rd party solution to obtain further metadata of the certificate. As mentioned in Comment 216 we plan incorporate the SHA256 hash into our metrics. Additionally we have amended this internal item to retroactively include this hash for unexpired certificates.
I'm still perplexed that this was discovered so late, as it seems like an incredibly odd mistake to repeat. Given this all about a query mistake, could we see the query?
We have addressed this question in Comment 205.
Your Final Incident Report only has 1 root cause - CRL bloat. We've not had an updated incident report this entire time, even as more root causes were made evident. What is this mysterious root cause 2?
Root Cause #2 was in reference to the second root cause (Revocation counts were overstated by counting batch execution rows (duplicate thumbprints) instead of unique thumbprints) that was identified in the latest analysis that was conducted for our week-by- week breakdown. We have updated the verbiage in our action item table to make this explicit.
| Action Item Description | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| #1 Revoke impacted certificates (in batches beginning 5/28/2025) | Mitigate | Root Cause 1 | Percent of impacted certificates revoked will be tracked and published monthly. Verification possible via Certificate Transparency (CT) logs and serial number disclosure via Microsoft’s CRL. | 2026-03-23 | Complete |
| #2 Create training and TSG Documentation to educate team on revocation expectations | Prevent | Root Cause 1 | Training completion rates will be tracked internally. Effectiveness will be evaluated through internal audits and inclusion of the training materials in external audit reviews. | 2025-07-31 | Complete |
| #3 Reduce usage of public PKI | Prevent | Root Cause 1 | Publish a monthly percentage reduction of unexpired, publicly trusted certificates issued from impacted hierarchies. Public can track progress using CT log data filtered for affected intermediates. | 2025-09-30 | Complete |
| #4 Exercise and refine the mass revocation playbook | Prevent | Root Cause 1 | Effectiveness will be assessed through internal tracking of simulated revocation scenarios, including coverage and execution timing. Results will inform iterative improvements to the playbook. While objective external metrics are limited, Microsoft will evaluate the impact through internal reviews and incorporate this action into relevant audit scopes. | 2025-09-01 | Complete |
| #5 Publish a phased plan to reduce the default certificate validity period, with the long-term goal of transitioning to short-lived certificates. | Preventive | Root Cause 1 | Effectiveness will be measured by publication of the plan by 2025-08-22. Public can verify via the published plan and future CPS updates reflecting the proposed changes. | 2025-08-22 | Complete |
| #6 Begin implementation of the phased certificate lifecycle reduction plan, including updates to issuance systems and CPS. | Preventive | Root Cause 1 | Effectiveness will be measured by issuance of certificates with reduced validity periods, visible in CT logs, and updated CPS language. Public can verify through CT data and CPS version history. | 2025-09-22 | Complete |
| #7 Eliminate issuance from non-partitioned ICAs. | Mitigate | Root Cause 1 | Effectiveness will be measured by the percentage of certificates issued from ICAs with partitioned CRLs (G1 or G2), visible in CT logs. Public can verify through CCADB hierarchy updates showing CRL partitioning implementation on active ICAs and issuance patterns in CT logs. | 2026-02-28 | Complete |
| #8 Develop and publish a plan for regular ICA rotations to maintain operational readiness and crypto agility. | Prevent | Root Cause 1 | Effectiveness will be measured by publication of the ICA rotation plan. ICA rotation can be publicly verified through CCADB and CT logs as the plan is executed. | 2026-03-06 | Complete |
| #9 Update procedures to generate SHA256 thumbprints when we need to produce impacted list of certificates via the CCADB guidelines. | Prevent | N/A see Comment 205 | Impacted certificate lists published to Bugzilla will be generated from our internal datasets and will include internally calculated SHA‑256 certificate thumbprints, eliminating reliance on crt.sh metadata. | 2026-03-03 | Complete |
| #10 Update procedures for counting revocations to use unique certificate counts rather than batch execution totals. | Prevent | N/A see Comment 205 | For each weekly revocation‑delay update, the reported “revoked” count is derived from, and equals, the count of distinct certificate thumbprints revoked for the reporting period. | 2026-03-03 | Complete |
Comment 223•1 day ago
|
||
Could you please provide some quantitative data on past and current certificate lifetime distributions. Here is a suggested framework:
- ≤7 days
- 8–30 days
- 31–47 days
- 48–100 days
- 101–200 days
- 201–397 days
Also, please explain how changes in these distributions demonstrate a shift to shorter lifetimes? Thanks.
Comment 224•1 day ago
|
||
(In reply to Microsoft PKI Services from comment #222)
Response to Comment 218 - Wayne
In regards to the CRL size inquiry, per Comment 67, we revoked the maximum number of certificates to hit our CRL threshold. Pre-certificates did not impact CRL size. The query issue did not impact our revocation plan as mentioned in Comment 205.
Okay so revocation was just blindly revoking certificates until some threshold on the CRL side was hit. That does plan the lack of any concrete numbers on the plans so far.
I can't help but notice that you did not answer my question. What would be your source for tying the SHA256 hash to a serial number?
As of today we would rely on a 3rd party solution to obtain further metadata of the certificate.
While I'm unaware of any policy where this is strictly non-compliant, it seems to fall into the "no one ever felt the need to write this down" patently obvious category. Can we agree this approach opens a can of worms and is asking for trouble?
I'm still perplexed that this was discovered so late, as it seems like an incredibly odd mistake to repeat. Given this all about a query mistake, could we see the query?
We have addressed this question in Comment 205.
I can't see the query in that response. Can you show me the query before and after?
Your Final Incident Report only has 1 root cause - CRL bloat. We've not had an updated incident report this entire time, even as more root causes were made evident. What is this mysterious root cause 2?
Root Cause #2 was in reference to the second root cause (Revocation counts were overstated by counting batch execution rows (duplicate thumbprints) instead of unique thumbprints) that was identified in the latest analysis that was conducted for our week-by- week breakdown. We have updated the verbiage in our action item table to make this explicit.
Root causes are for incident reports. Can you provide an up to date incident report for this incident?
Comment 221 makes it clear that this incident has been a mess of new issues and root causes appearing. For the record my initial email for the incident that kicked this off was 2025-04-25 - 11 months ago, or rather it will be precisely when I sent in in 1 hour.
We've gotten to the end on the impacted certificate list through slow attrition and expiry, with a final push for 600 revocations at the end. The why of how we got to each part of the stage is largely unanswered despite this being the most commented on incident that I'm aware of, albeit that's because it's one of the longest running ones.
If in a month a similar issue is found I have little faith in MIcrosoft PKI Services being capable of providing a better response. I don't mean solely revocation, but in identifying issues, root causes, remediation steps, and being proactively transparent in each blunder along the way. Please try to show us that you have improved going forward, we have little to show for that so far.
| Assignee | ||
Comment 225•1 day ago
|
||
Response to Comment 221 - Jesper
This bug has too many comments about too many distinct failures and root causes and unanswered questions to keep track of. I recommend that Microsoft PKI Services split this out into separate incident reports in separate Bugzilla bugs for each of the identified failures.
At a minimum I would like to see these incident reports:
Keep the current Bug 1965612 and its incident report focused on the initial revocation delay caused by lack of CRL partitioning, cross-signatures done at the intermediate level and lack of standby intermediates.
- File a new bug with a new incident report about the incorrect reported counts of revoked certificates identified in Comment 124.
- File a new bug with a new incident report about the incorrect reported counts of affected and revoked certificates identified in Comment 177.
- File a new bug with a new incident report about the delay between Comment 177 and Comment 205 to disclose details about the reporting >error, which is more than the 14 days allowed by the CCADB incident reporting guidelines.
- File a new bug with a new incident report about the automation and telemetry gaps identified in Comment 125 and why this resulted in a >further revocation delay for 10K-20K certificates.
- File a new bug with a new incident report about why ICA pinning and Client Auth resulted in a further revocation delay identified in Comment >175.
- File a new bug with a new incident report about the failure to discover 602 certificates as identified in Comment 197.
- File a new bug with a new incident report about the failure to revoke the 602 certificates within 5 days after they were discovered in Comment >197.
These are all separate but related incidents, and keeping them all in the same bug and not filling out the incident report template for each of them makes it close to impossible for the community to track remediation of these. Filling out the incident report template for each of them might also naturally answer many of the questions for these related incidents, that Microsoft PKI Services has not yet managed to answer.
Following internal review, Microsoft PKI Services does not view the identified items represents a separate reportable incident under the CA Browser Forum Baseline Requirements or CCADB Incident Reporting Guidelines, which define an incident as a failure to meet applicable policy commitments (e.g., certificate misissuance or failure to revoke within required timelines).
While these issues involved reporting or execution errors, they did not result in certificate non‑compliance and are directly related to the original revocation delay and its remediation.
However, we agree that our automation and telemetry gap did further delay revocation. We will publish an amendment to include this second root cause before April 8th.
Comment 226•1 day ago
|
||
Previously I received this response:
(In reply to Microsoft PKI Services from comment #214)
Response to Comment 203 - Zacharias
Whose gaps in telemetry and automation? You or the subscribers? Subscribers lacking capabilities is not a valid reason to delay revocation. If the gaps are within MSPKI, please explain in a more detailed manner so that the community can understand.
Microsoft PKI Services provides mechanisms that support automated certificate rotation and revocation. For a subset of services that are not yet onboarded to these automation mechanisms, certificate rotations require additional manual coordination. This affects execution timing. This is the telemetry/automation gap that we are referring to.
I interpret this to mean that the gaps belong to subscribers, internal to Microsoft by not internal to MSPKI.
Now we get this:
However, we agree that our automation and telemetry gap did further delay revocation. We will publish an amendment to include this second root cause before April 8th.
This is very confusing, as this comment implies that the gaps are internal to MSPKI.
But my ultimate conclusion is that MSPKI decided to delay revocation to avoid service disruption within Microsoft instead of honoring their commitment to webPKI community and the BRs.
Description
•