KIR: Delayed revocation within seven (7) days for bug 1921598
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: piotr.grabowski, Assigned: piotr.grabowski)
Details
(Whiteboard: [ca-compliance] [ca-revocation-delay])
Incident Report
This is a preliminary report.
Summary
KIR has issued SZAFIR Trusted CA3 Intermediate CA certificate with missing Reserved Certificate Policy Identifiers that indicate adherence and compliance with S/MIME BR as described in https://bugzilla.mozilla.org/show_bug.cgi?id=1921598
According to SBR [https://cabforum.org/uploads/CA-Browser-Forum-SMIMEBR-1.0.6.pdf] section 4.9.1.2 - The Issuing CA SHALL revoke a Subordinate CA Certificate within seven (7) days.
This has not been completed. A full incident report will be provided no later than Friday October 11th 2024.
Updated•10 months ago
|
Assignee | ||
Comment 1•10 months ago
|
||
Incident Report
Summary
KIR has issued SZAFIR Trusted CA3 Intermediate CA certificate with missing Reserved Certificate Policy Identifiers that indicate adherence and compliance with S/MIME BR as described in https://bugzilla.mozilla.org/show_bug.cgi?id=1921598
According to SBR [https://cabforum.org/uploads/CA-Browser-Forum-SMIMEBR-1.0.6.pdf] section 4.9.1.2 - The Issuing CA SHALL revoke a Subordinate CA Certificate within seven (7) days.
Impact
1 Intermediate CA certificates issued on Oct 11, 2023 – 10:49 UTC.
https://crt.sh/?caid=278655
Timeline
Sep 28, 2024 – 09:36 UTC – https://bugzilla.mozilla.org/show_bug.cgi?id=1921598 report incident was posted - KIR: Intermediate CA - SZAFIR Trusted CA3 - Certificate Policies extension - non-compliance
Sep 28, 2024 – 10:00 UTC – First inspection, assessment and forwarding of the information.
Sep 30, 2024 – 11:00 UTC – Remediation plan was initialized.
Oct 02, 2024 – 11:00 UTC – End-user certificates from affected subCA were grouped by systems, usage.
Oct 03, 2024 – 11:00 UTC – Management as well as personnel responsible for the affected certificates were informed about the
potential severity of the problem.
Oct 03, 2024 - 16:12 UTC - preliminary report was posted - KIR: Delayed revocation within seven (7) days for bug 1921598 indicating
that full incident report will be provided no later than Friday October 11th 2024.
Oct 08, 2024 – 08:30 UTC – Issuance of new CA SZAFIR Trusted CA5 with new keys.
Oct 10, 2024 – 11:00 UTC – Remediation plan was communicated.
Oct 11, 2024 – 16:40 UTC – Post of this Incident Report.
Root Cause Analysis
First of all this issue has been prioritized at the highest level within KIR.
The main root cause for the delayed certificates revocation lies in the outages in critical infrastructure if given revocation would take place.
The root casue can be dived into 2 issues:
a) Issues with new chain in back-end systems.
The main root cause for the delayed certificates revocation lies in the complex and lengthy process required for banks, government entities and other insitusions to distrust the affected subCA that should be revoked and trust new subCA instead. It involves the approval and coordinated efforts from multiple internal departments to deploy a new subCA certificate in their back-end systems, which makes the time required to complete the process last long. The back-end systems use their own specific limited truststores with custom implementation. In many cases third party entities should be involved to plan and handle such a change. In many instances the entities have specific already announced in advance maintanance windows for deploying changes and usually do not plan to deploy any changes at the end of the year.
b) Issues with subscribers' certificates
Regardless of the issues with new chain in the back-end systems truststore there is a problem with end-user certificates replacement which blocks usage of new subCA.
In most cases subscribers are in the critical industries where manual replacement usualy takes place, especially on devices (mostly HSMs) without automation. Due to the risk of error it requires more time and change control. The risk of causing a potential outage by a rapid replacement could directly impact the Subscriber’s end-users and introduce potential security issues, especially in some of the industries that these certificates were installed on. Some other installations require a restart, which even automated deployments are unable to perform outside of their change management window.
8355 end-user certificates are affected by the potential subCA revocation.
End-user certificates issued from affected subCA are used in following systems
Secure data exchange financial system 84,5%
Clearing settlement system 3,5%
Electronic identity system 1,0%
Others 11,0%
Our user migration plan:
until 2024-11-01: 10%.
until 2024-12-01: 40%
until 2025-01-01: 65%
until 2025-02-01: 87%
until 2025-03-07: 100%
Lessons Learned
What went well
On October 08, 2024 the new CA "SZAFIR Trusted CA5" was put into operation. On October 10, 2024 the active user migration will be started.
Remediation plan was already accepted and communicated.
User migration has been started.
What didn't go well
Where we got lucky
Action Items
Action Item | Kind | Due Date |
---|---|---|
SZAFIR Trusted CA5 was put into operation | mitigate | 2024-10-08 (completed) |
Remediation plan was already accepted and communicated | mitigate | 2024-10-10 (completed) |
We will highlight, educate and actively support subscibers to migrate from their custom trustore implementations in their back-end systems to more flexible solutions allowing simple and secure certificate management or using publicly truststores | prevent | 2024-12-30 |
We will educate the subscribers, enhance their understanding of immediate revocation requirements, and facilitate them in preparing for a swift certificate replacement process, to ensure that the certificates can be replaced within the revocation deadline in case needed. | prevent | 2024-12-30 |
For situations that require special treatment subscribers will also be advised to get them off publicly trusted certificates or consider utilizing private PKI, and prepare other contingency plans for enforced certificate revocation to minimize disruptions to their systems | prevent | 2024-12-30 |
Although the incident does not derictly affect TLS/SSL end-user certifcates we will also consider providing ARI extension to our ACME protocol | prevent | 2025-03-01 |
With the implementation of these actions, we are confident that we will be able to fulfil the BRs of timely revocation going forward.
Appendix
Details of affected certificates
Based on Incident Reporting Template v. 2.0
From 28th September 2024 to 7th March 2025 I count 161 days. This is slightly longer than the 7 days required.
The action items are to give yourselves 3 months to provide some information to customers. None of those ensure that this will not happen again. On the contrary reading your report on the RCA it seems like pinning these intermediates is by design and no one is in a rush to do anything at all.
What aspect of your operations requires the use of public trust stores at this point? I can see no attempt to entertain any adherence to compliance within reasonable timescales.
Comment 3•10 months ago
|
||
(In reply to Piotr Grabowski from comment #1)
8355 end-user certificates are affected by the potential subCA revocation.
End-user certificates issued from affected subCA are used in following systems
Secure data exchange financial system 84,5%
Clearing settlement system 3,5%
Electronic identity system 1,0%
Others 11,0%
Could you give some specific examples of how these certificates are used in secure data exchange financial systems? You said that there would be outages, so could you also provide some examples of how these systems interact with your OCSP service and published CRL and provide details about the behaviour these systems in that scenario?
Assignee | ||
Comment 4•10 months ago
|
||
(In reply to Wayne from comment #2)
From 28th September 2024 to 7th March 2025 I count 161 days. This is slightly longer than the 7 days required.
The action items are to give yourselves 3 months to provide some information to customers. None of those ensure that this will not happen again. On the contrary reading your report on the RCA it seems like pinning these intermediates is by design and no one is in a rush to do anything at all.
What aspect of your operations requires the use of public trust stores at this point? I can see no attempt to entertain any adherence to compliance within reasonable timescales.
Dear Wayne,
Thank you for raising your concerns. We acknowledge the severity of the situation, particularly regarding the timeline, and the perception of actions. As noted in the incident report, management and personnel were informed about the problem's gravity early on (Oct 03, 2024 – 11:00 UTC – Management as well as personnel responsible for the affected certificates were informed about the potential severity of the problem.) Since then, KIR is actively collaborating with subscribers, their management, and third-party stakeholders to expedite certificate replacement while balancing the criticality of affected systems.
Regarding the design of intermediate certificate pinning during the first stages of collaboration we found out, it is indeed the case that in most subscriber backend systems, this is by design to ensure security annd stability. However, KIR has already started to actively support subscribers and third parties to find a more agile and efficient approach to manage root and intermediates certificates in their applications.
We support them for a best approches for redesign, development in several technlogies and programming languages and further maintanance. We are fully committed to the unpinning process, as indicated by our action items.
To address your question about the necessity of public trust stores, KIR subscirbers usually operates in highly secure environments where trust store implementations are limited and heavily focused on security. So far this approach ensured that only specific, trusted certificates are accepted, but it also made the unpinning process more complex. Nonetheless, we are working closely with subscribers to navigate this challenge.
Lastly, regarding the compliance timelines, we understand that there appears to be a delay in addressing this issue. However, KIR has adopted this strategy in its remediation plan to ensure the proper balance between operational security and compliance requirements. We took into the consideration complexity, approval processes, coordinated efforts, third-parties involved, maintanance windows and other important factors to handle migration as quickly as possible.
We appreciate your feedback and assure you that every entity involved is working diligently and without undue delay to resolve this issue as quickly as possible and we really hope we will manage to execute the revocation even earlier than in a given estimation.
Assignee | ||
Comment 5•10 months ago
|
||
(In reply to Mathew Hodson from comment #3)
(In reply to Piotr Grabowski from comment #1)
8355 end-user certificates are affected by the potential subCA revocation.
End-user certificates issued from affected subCA are used in following systems
Secure data exchange financial system 84,5%
Clearing settlement system 3,5%
Electronic identity system 1,0%
Others 11,0%Could you give some specific examples of how these certificates are used in secure data exchange financial systems? You said that there would be outages, so could you also provide some examples of how these systems interact with your OCSP service and published CRL and provide details about the behaviour these systems in that scenario?
Dear Mathew,
In these cases, these subscribers' certificates issued from the affected CA are mainly used for mutual authentication in a distributed financial system that ensures the secure data exchange and information between banks and institutions authorized to obtain information about their customers. Authentication and authorization is a two-step process - at the frontend level where basic certificate data, its status (OCSP, CRL) are checked and then at the backend level where full validation of the certificate as well as its issuer takes place.
Assignee | ||
Comment 6•10 months ago
|
||
We keep our migration plan on track and under control.
Updated•10 months ago
|
Assignee | ||
Comment 7•9 months ago
|
||
We keep our migration plan on track and under control.
Assignee | ||
Comment 8•9 months ago
|
||
We have migrated 50% affected end-user certifcates.
Updated•9 months ago
|
Assignee | ||
Comment 9•8 months ago
|
||
We keep our migration plan on track and under control.
Updated•7 months ago
|
Assignee | ||
Comment 10•7 months ago
|
||
We keep our migration plan on track and under control.
Assignee | ||
Comment 11•6 months ago
|
||
We keep our migration plan on track and under control.
Updated•6 months ago
|
Assignee | ||
Comment 12•6 months ago
|
||
We keep our migration plan on track and under control.
Assignee | ||
Comment 13•5 months ago
|
||
Affected certificate https://crt.sh/?caid=278655 has been revoked
Updated•5 months ago
|
Assignee | ||
Comment 14•5 months ago
|
||
Incident Report Closure Summary
Incident Description: KIR has issued SZAFIR Trusted CA3 Intermediate CA certificate with missing Reserved Certificate Policy Identifiers that indicate adherence and compliance with S/MIME BR as described in https://bugzilla.mozilla.org/show_bug.cgi?id=1921598
According to SBR [https://cabforum.org/uploads/CA-Browser-Forum-SMIMEBR-1.0.6.pdf] section 4.9.1.2 - The Issuing CA SHALL revoke a Subordinate CA Certificate within seven (7) days. This has not been completed.
Incident Root Cause(s): The main root cause for the delayed certificates revocation lies in the outages in critical infrastructure if given revocation would take place. The root casue can be dived into 2 issues: a) Issues with new chain in back-end systems b) Issues with subscribers' certificates
Remediation Description: SZAFIR Trusted CA5 was put into operation. Remediation plan has been executed. Subscribers were educated, we enhanced their understanding of immediate revocation requirements, and facilitate them in preparing for a swift certificate replacement process, to ensure that the certificates can be replaced within the revocation deadline in case needed. They were also advised to get them off publicly trusted certificates or consider utilizing private PKI, and prepare other contingency plans for enforced certificate revocation to minimize disruptions to their systems. We have also planned to provide ARI extension to ACME.
Commitment Summary: All checks said above are in place.
All Action Items disclosed in this Incident Report have been completed as described, and we request its closure.
Comment 15•5 months ago
|
||
(In reply to Piotr Grabowski from comment #14)
Incident Root Cause(s): The main root cause for the delayed certificates revocation lies in the outages in critical infrastructure if given revocation would take place. The root casue can be dived into 2 issues: a) Issues with new chain in back-end systems b) Issues with subscribers' certificates
Enormous word count on Bugzilla in the past year has been dedicated to explaining to public CAs that their failure to enforce BR-mandated revocation timelines is not the Subscriber’s failure but the CA’s. It’s disappointing at this late date to see a closing summary with no acknowledgement of the CA’s responsibility in this incident and no action items that directly address the decision-making process and criteria that led to this event.
KIR delayed revocation of more than 8000 certificates, some of them for as long as five months. You can’t simply blame this on your Subscribers. But that is what you have done. Looking at the Incident Report or the Closing Summary, I see nothing to indicate that KIR would not behave the exact same way if a similar misissuance event were to occur today.
I believe before this bug is closed, KIR should look at its policies and decision-making criteria. KIR should affect policy changes sufficient to prevent repetition of this massive and sweeping revocation delay, even in the event of misissuance of the same scope. KIR needs to acknowledge that it and it alone is responsible for delays in BR-mandated revocation. KIR needs to agree that there is no “exceptional circumstances” carve-out in the BRs or any root program and that delaying revocation for exceptional circumstances is never permitted.
Though it is not a hard requirement for closing a delrev bug, most CAs are choosing to establish policies not to deliberately delay revocation at all. Making this firm public commitment and establishing a strict internal policy matching it would go a long way in making this bug ready to close.
Assignee | ||
Comment 16•5 months ago
|
||
Thank you for your detailed feedback. We acknowledge the concerns raised regarding the responsibility of the CA in ensuring compliance with BR-mandated revocation timelines.
To clarify, we do not intend to shift responsibility to Subscribers, and we recognize that the delays in revocation were a failure on our part. The primary cause of the delay was the risk of significant outages due to infrastructure dependencies, which we should have anticipated and mitigated earlier. That being said, we understand that adherence to BR timelines must remain a priority regardless of operational challenges.
We take your feedback seriously and have already reviewed our internal policies and decision-making criteria to ensure that similar delays do not occur in the future. Specifically, we what we did:
- Reevaluated our incident response processes to ensure that future revocation events are handled in strict compliance with BRs, without undue delays.
- Strengthened internal policies to explicitly state that revocation delays for exceptional circumstances are not permitted, in alignment with BRs and root program requirements.
We recognize that a firm commitment to immediate revocation is necessary to meet industry expectations and that is why KIR has already established strict policies to ensure that revocation is never deliberately delayed. Furthermore, we have developed and implemented multiple tools and procedures to significantly accelerate the revocation process and prevent similar delays in the future.
Comment 17•5 months ago
|
||
(In reply to Piotr Grabowski from comment #4)
To address your question about the necessity of public trust stores, KIR subscirbers usually operates in highly secure environments where trust store implementations are limited and heavily focused on security. So far this approach ensured that only specific, trusted certificates are accepted, but it also made the unpinning process more complex. Nonetheless, we are working closely with subscribers to navigate this challenge.
From your description of limited trust store implementations, and in other comments you even mention custom implementations, it is not clear why this needs to rely on public trust at all. Of the specified categories, 89% of affected certificates, none appear to be something meant for public consumption.
Assignee | ||
Comment 18•5 months ago
|
||
(In reply to Zacharias from comment #17)
(In reply to Piotr Grabowski from comment #4)
To address your question about the necessity of public trust stores, KIR subscirbers usually operates in highly secure environments where trust store implementations are limited and heavily focused on security. So far this approach ensured that only specific, trusted certificates are accepted, but it also made the unpinning process more complex. Nonetheless, we are working closely with subscribers to navigate this challenge.
From your description of limited trust store implementations, and in other comments you even mention custom implementations, it is not clear why this needs to rely on public trust at all. Of the specified categories, 89% of affected certificates, none appear to be something meant for public consumption.
Thank you for your thoughtful question, Zacharias.
The requirement for a specific type of certificate is dictated by the system design, which is beyond KIR’s control. The system enforces strict security measures that mandate the use of predefined trust models, ensuring compatibility and compliance with established security policies. While we understand the concerns regarding reliance on public trust stores, these design decisions were made to balance security and operational requirements.
The classification and impact of these certificates can be quite complex. While 89% of the affected certificates fall within specified categories and do not appear to be intended for public consumption, the systems governing them are intricate. Understanding the full impact requires a deep dive into the underlying infrastructure and usage of these certificates within the broader ecosystem. That said, we very closely worked with our subscribers to address any challenges arising from these constraints and also be advised to get them off publicly trusted certificates or consider utilizing private PKI and prepare other contingency plans for enforced certificate revocation to minimize disruptions to their systems.
Comment 19•4 months ago
|
||
This is a final call for comments or questions on this Incident Report.
Otherwise, this bug will be closed on approximately 2025-05-08.
Updated•4 months ago
|
Updated•3 months ago
|
Description
•