Closed Bug 1861682 Opened 1 year ago Closed 1 year ago

SwissSign: EV delayed revocation

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: roman.fischer, Assigned: roman.fischer)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

Attachments

(1 file)

Incident Report

Summary

This Bugzilla is related to Bugzilla 1860750 https://bugzilla.mozilla.org/show_bug.cgi?id=1860750 and is due to delayed revocation.
We have customers who requested to delay revocation of certain of the mis-issued certificates reported by the above mentioned Bugzilla beyond the 5 days period mandated by CA/B BR 4.9.1.1.
Normally we would not consider such requests and revoke after 5 days.

However three specific factors to this case had to be considered:
- Mis-issuance was purely compliance related and did not have any security impact, neither on the end-users trusting the certificates we mis-issued nor on the customer running the services using the certificates.
- We have affected customers who are part of critical infrastructure: one hospital, one social-security and insurance company, one system-relevant bank, one cloud-service provider that provides services to a bank and suppliers for power-plants in Austria
- We have affected customers who have certificate pinning in their mobile apps (iOS and Android)

The customers were able to explain to us that they do have emergency processes in place that would allow them to replace the certificates within 5 days. But these emergency processes skip some testing, QA, validation… steps and thus have inherent risks.

By comparing the risk of the mis-issued certificates versus the risks of invoking the emergency processes, we came to the conclusion that it makes more sense to grant a delayed revocation in this specific instance than to take the risks inherent in the emergency processes.

Impact

Delaying revocation beyond the mandated 5 days is a violation of CA/B Baseline regulation and must be reported as a separate incident, which we do with this Bugzilla.
We do not expect any other impact on the ecosystem by this delay.

Timeline

2023-10-24 - 2023-10-27 Collection of customer requests, arguments and risk analysis

2023-10-27
08:15 UTC Information of Audit body
13:00 UTC Decision taking of Compliance team for each individual customer case with risk assessment
14:15 UTC Posting of this Bugzilla

Root Cause Analysis

Customer A: Hospital

Hospital with 7 affected EV certificates used in systems involved in patient information, lab result submission, medical image transmission and communication with doctors, staff and patients.
Delayed revocation by 03.11.2023 23:59
Certifiate serialnumbers:
42EF7BF359B02A42706E5E2C40E276E40F7BA812
18A4961601F80A445193FA328107FFA674B6D5D0
46B3735602E5A856FE69869DCA58BB2D5A77C88D
76068A4461457E2DF2E33A99032859991A603753
1D676C87B779E178F4B642C2DC465727675588F8
1678209186902A45F71A4CFF940ED029B3E351B7
3EB6372BD64AA36DB9EE6AF6F507AA547012D91B

Customer B: Social security and insurance provider

Social security and insurance provider with 1 affected EV certificate pinned in an app that is used to submit accident reports to doctors.
Delayed revocation by 10.11.2023 23:59
Certificate serialnumber:
1EC9A835D6206546EB3941FC94C739D17F5F0B56

Customer C: System-relevant bank

Bank classified by Swiss Government as system-relevant with 1 affected EV certificate used in securing e-banking and interbank communication.
Delayed revocation by 10.11.2023 23:59
Certificate serialnumber:
495CB02ABF5615C750C9C7E1D560E1A876C589CD

Customer D: Cloud provider for critical infrastructure

A cloud-service provider that provides services to a bank and suppliers for power-plants in Austria using 5 EV certificates.
Delayed revocation by 03.11.2023 23:59
Certificate serialnumbers:
366A0D6A170A032F089765DE5531D0B3707E7CCE
51865A482C32017C5B29D2586AE022072F740BB8
593AADB29A0E95C1294A43BABBEBA6DC8F377BA3
67E04E668A89463153BE60F5EF460CADB453E10C
656F0CDC80509900A21B9B683429B89DB6D59AFD

Customers E-K: Banks with certificate pinning in mobile apps

These customers are banks with mobile banking apps with certificate pinning for additional security. Besides the above described risks of emergency updates to add the replaced EV certificates to the list of pinned certificates in the apps, some time must be allotted to the end-users to actually update the apps on their devices.
Delayed revocation depending on the customers committed timeline:

Delayed revocation by 03.11.2023 23:59:
7A755A903274AE59DADBA8E1D01FC480577450FE
1D32587EDB82836667DCECDAC3A82981AD6370B4
342A900F6DF242046B393AD15EECBBEC8F356B52

6758736A7702646E246B3E1F63FDCF79EC27CE7D
4ABAFF23B0D39FE9289D45BF403B8B9EDAD50793

12128A433E30D4242B828B2842AD2031335172D7

1CE9D60B80B01E37BEF2EFCA2161B6236DE3A97A
29CBBD0C164E7612A9E6733E27CCA1CB05A92492
21A1F92FAF9BB9C39D46E1E4A12A7ADF4B3EEF7A
268E50B9B91C41F59DB79185B83B3B77B39A39C3
7C958E5C9527F1EC5063CAAB8126197CBA9E48CC

6FDE7650E5520634617EC91415DB90113792BE1A
34A0799962AAF9DF1EA6CE9957C64EB2AF7DAB2C
6CF24D41F67496E4410763822FCE68C59DB1CEBE
3B1B02D582989AD96E68D7B1F64A6660C2083F39
3490E513E529DE3FC9FE0E19D761F8F31BB256BB
48963B684A667C6AF0EE9B3655287FED0A63C119
0295F085D0A19513705897D7C14AA900D730EEF9
6BAFB05F6C1A8C0FE6506815BFF4654A00814CD5
6E89EE7D8D048F783886D765AD4823D532EB1CE5
3571DA5CD01B13EA885A805BF7CB9D9A54FBEAA4

Delayed revocation by 10.11.2023 23:59:
78A96E0C18A4F3CFD9B4E7BAD3627C63BB8D061E

Delayed revocation by 16.11.2023 23:59
7C509C7F33A496A42A08D87819B0A0DA2A2A8324
54E0AB24AF10C2EE522A1139D4B2125A6D03CFCF
769B3D69418B62DAA30B744058074A4F08A34CA9
375D597A818DBA2FBF83AF02AE3FD96A8C27A5A5

Lessons Learned

What went well

  • Almost all customers reacted very quickly to mis-issuance information and were able to replace the affected certificates within the 5 day timeframe
  • Also the customers who requested delayed revocation were very responsive and reduced the number of certificates that should be revoked later to a very small set

What didn't go well

  • A lot of customers don't / can't use automation such as ACME due to system isolation or other architectural design decisions
  • Some of our revocation request emails landed in the spam folders of customers

Where we got lucky

  • Experience from this incident showed that a revocation within 24 hours in case of a really bad security incident would break payment systems for most of Switzerland. This is a very high risk that needs to be addressed together with our customers. And we are sure that other CA's and their customers face similar challenges!

Action Items

Action Item Kind Due Date
Revocation of remaining mis-issued certificates according to the deadlines given above reactive 16.11.2023
Weekly updates on revoked certificates reactive 03.11.2023

Appendix

Details of affected certificates

See attached CSV.

Assignee: nobody → roman.fischer
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2023-11-16
Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2023-11-16 → [ca-compliance] [leaf-revocation-delay] Next update 2023-11-03

Thank you for the report. A few initial comments:

Request for update #1: The Impact section should be improved by including the total count of certificates impacted (e.g., “The Impact section should contain a short description of the size and nature of the incident.” as detailed on the CCADB incident report page).

Request for update #2: The Timeline section should be updated to include the separate incident that determined the need for revocation, which was 1860750 on October 23, 2023. All of the events detailed on the CCADB incident report page need to be included in the Timeline.

Request for update #3: What appears to be at least one of the contributing factors to the root cause of the incident is stated in the Summary section of this report, but it is not reflected in the Lessons Learned or Action Items sections:

The customers were able to explain to us that they do have emergency processes in place that would allow them to replace the certificates within 5 days. But these emergency processes skip some testing, QA, validation… steps and thus have inherent risks.

Request for update #4: The Actions Items list should be updated to minimally include a Prevent, Mitigate, or Detect type of action for each thing under "What didn't go well" and should include an action to address "Where we got lucky" (so that one does not have to rely on luck in the future). Ideally, at least one of each of these three types is included in the list of action items. This guidance is reflected on the CCADB incident report page.

Question: Has SwissSign explored other options that better position it to 1) adhere to its policy commitments, and 2) avoid causing undue harm to its customers? If so, what are those options and can they become action items? For example, are there additional steps that SwissSign can take to eliminate the opportunity for the mis-issuance that necessitated the revocation described in 1860750 - like adopting a linting framework - or creating custom lints to support SwissSign customer use cases?

Dear Chris,

Please excuse our oversight. Below are the updated sections that lacked content / details (changes in italic for clarity).

Re "Request for update #1":

Impact (updated)

Delaying revocation beyond the mandated 5 days is a violation of CA/B Baseline regulation and must be reported as a separate incident, which we do with this Bugzilla.
We decided to delay the revocation of 40 EV certificates. The certificates are listed in the attached CSV and discussed in the Root Cause section.
We do not expect any other impact on the ecosystem by this delay.

Re "Request for update #2":

Timeline (updated)

2023-10-23
05:30 UTC Compliance incident raised (led to Bugzilla 1860750)
2023-10-24 - 2023-10-27 Collection of customer requests, arguments and risk analysis
2023-10-27
08:15 UTC Information of Audit body
13:00 UTC Decision taking of Compliance team for each individual customer case with risk assessment
14:15 UTC Posting of this Bugzilla

Re "Request for update #3":
Thanks for this request. Looking at it again, we think that the quoted section is actually the effective root cause for this delayed revocation and we suggest to put it in that section rather than into Lessons Learned or Action Items. Do you agree?

Root Cause Analysis

Customers who asked for delayed revocation were able to explain to us that they do have emergency processes in place that would allow them to replace the certificates within 5 days. But these emergency processes skip some testing, QA, validation… steps and thus have inherent risks.

By comparing the risk of the mis-issued certificates versus the risks of invoking the emergency processes, we came to the conclusion that it makes more sense to grant a delayed revocation in this specific instance than to take the risks inherent in the emergency processes.
(Rest of chapter remains as posted above.)

Re "Request for update #4":
We struggle to find any possible actions on our side to address the problem of customers not being able to automate certificate deployment any further to avoid having to activate risky emergency processes in the future. These customers are themselves under stringent regulatory requirements (e.g. Healthcare, Banking, Critical Infrastructure) that can limit the architectures / processes they can implement and influence their risk assessments when it comes to certificate usage. If there are any ideas that we could forward to our customers, then we're more than willing to take these up.

Similarly, we currently don't see how we can help customers to ensure that they can replace pinned critical certificates in Mobile Apps, B2B applications and similar cases within 24h or even in 5 days as there are multiple parties involved (such as app stores). Any ideas that we can forward to our customers are highly appreciated.

Action Items (updated)

Action Item Kind Due Date
Revocation of remaining mis-issued certificates according to the deadlines given above mitigate 16.11.2023
Weekly updates on revoked certificates mitigate 03.11.2023
Analysis of why some revocation request emails landed in the spam folders of customers and mitigating measures if any improvement on our side is at all possible mitigate 16.11.2023

Re "Question":

Has SwissSign explored other options that better position it to 1) adhere to its policy commitments, and

We do not think that there are more options available. Either we revoke the certificate in the given timeline of 5 days or we grant extended timelines for each individual case as we did. But maybe we misunderstand your point?

  1. avoid causing undue harm to its customers?

By taking the risks to our customers (and the eco system they serve) seriously and weighing them against the risks of delayed revocation, we believe to have avoided undue harm to them as good as possible.

are there additional steps that SwissSign can take to eliminate the opportunity for the mis-issuance that necessitated the revocation described in 1860750 - like adopting a linting framework - or creating custom lints to support SwissSign customer use cases?

Thank you for the input regarding more stringent linting framework. Ryan has commented in a similar way in Comment 6 of Bugzilla 1860750 to which we responded in comment 7.

Again, we're more than happy for any other suggestions about how to avoid delayed revocations in the future.

Update 2023-11-06

The 33 certificates scheduled to be revoked by 03.11.2023 23:59 latest have been revoked.

Thank you for updating those sections of this incident report.

We struggle to find any possible actions on our side to address the problem of customers not being able to automate certificate deployment any further to avoid having to activate risky emergency processes in the future. These customers are themselves under stringent regulatory requirements (e.g. Healthcare, Banking, Critical Infrastructure) that can limit the architectures / processes they can implement and influence their risk assessments when it comes to certificate usage.

Automation is one approach to addressing the contributing factor of some Subscribers having existing processes related to certificate lifecycle management that “skip some testing, QA, validation”, but there could be others. Were other approaches or solutions considered and explored? For example:

  • investigating how might those Subscriber risky emergency processes be improved to include testing, so that their risk is reduced, and SwissSign can uphold its policy commitments?
  • exploring and understanding the specific regulatory requirement(s) that hampers the Subscribers certificate lifecycle agility?

Dear Chris,

Re Improvement of Emergency processes

We're considering to communicate (Newsletter or Blogpost or similar format) to our userbase about learnings from this incident. We might suggest that they put extra mitigations in place to avoid having "single point of failures".
E.g. if they have mobile apps with certificate pinning: instead of depending on a single pinned certificate, having several from separate vendors, using different crypto algorithms and expiration cycles….
Also, we'll remind our customers again of the importance of being able to replace certificates within 24h/5days.

Re Understanding customer regulatory requirements

We did indeed look at some of the regulatory requirements and technical constraints the customers cited (e.g. Swiss banking regulation, minimal duraion of mobile app updates). We didn't find any points that our customers haven't considered already.

But to restate the (for us) obvious: Our first priority is to revoke within the required timeframe, placing the security of the ecosystem above our or our customers profits or resource restrictions.

Update 2023-11-13

The 3 certificates scheduled to be revoked by 10.11.2023 23:59 latest have been revoked.

Update 2023-11-17

The 4 certificates scheduled to be revoked by 17.11.2023 23:59 latest have been revoked.
With this, all certificates affected by Bugzilla 1860750 are now revoked.

Unless there are further questions, we would request this Bugzilla to be closed.

I'll close this tomorrow - 1-Dec-2023.

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2023-11-03 → [ca-compliance] [leaf-revocation-delay]
Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: