Open Bug 1722089 Opened 4 months ago Updated 2 days ago

SSL.com: Issuance of 3 EV TLS certificates without 2-person validation of the organization information

Categories

(NSS :: CA Certificate Compliance, task)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: support, Assigned: support, NeedInfo)

Details

(Whiteboard: [ca-compliance])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36

Steps to reproduce:

This is a preliminary incident report. Our investigation into this matter is ongoing.

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

The issue was discovered by our validation team while performing a routine check of an EV TLS order.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
  • 2021-07-10T02:34:38-00:00 EV TLS order reviewed and approved by one of our Validation Specialists.
  • 2021-07-12T03:38:36-00:00 Customer care requests for an update on the abovementioned order.
  • 2021-07-12T09:15:41-00:00 The issuance of an EV TLS certificate with serial number 776752935acf6697078d9cb5547921a6 is triggered with the invocation of an API call by one of our resellers.
  • 2021-07-13T16:05:00-00:00 In the process of performing 2p approval, one of our senior Validation Specialists determines that the certificate was issued without stored evidence of 2p approval in the system. The issue is reported by registering an internal ticket.
  • 2021-07-13T16:37:00-00:00 The ticket is processed with high priority by a senior software engineer. A bug is discovered in our API and a hotfix is immediately deployed. Investigation continues to determine the population of the possibly affected certificates.
  • 2021-07-14T20:22:00-00:00 The population of possibly affected certificates is sought and determined to be eleven (11) EV TLS certificates.
  • 2021-07-15T15:00:00-00:00 The issue is picked up by our Security Auditing department and investigation begins.
  • 2021-07-16T00:46:58-00:00 Additional technical information is shared by the software engineers.
  • 2021-07-16T02:07:05-00:00 Audit logs of the approvals of the abovementioned population are presented. Security Auditors initiate a detailed review to determine whether and for which exact certificates there is a lack of 2p approval evidence.
  • 2021-07-19T19:31:00-00:00 After detailed review of the abovementioned population by our Security Auditors and discussions with the validation team, two (2) additional problematic certificates are found. Subsequent review confirms that validation evidence supported issuance for all three (3) affected certificates, however immediate revocation is requested due to the lack of 2p approval at the time of issuance.
  • 2021-07-19T21:56:00-00:00 All three (3) affected certificates revoked.
  • 2021-07-20T11:00:00-00:00 Security Auditors start gathering all the information and compiling a preliminary incident report.
  • 2021-07-23: Filed initial Bugzilla report.
  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

A hotfix was immediately deployed and tested after the issue was detected. No similar issuances can be performed currently.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

Three (3) EV TLS certificates, issued between 2021-05-18 and 2021-07-12.

  1. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

Impacted certificates (and their corresponding pre-certificates):

S/N: 4f3ff4dc563aa9c46f18254998643ba6 (https://crt.sh/?id=4549278351)
S/N: 45ba1526442ba72cc562493d99bcf757 (https://crt.sh/?id=4834857365)
S/N: 776752935acf6697078d9cb5547921a6 (https://crt.sh/?id=4851729123)

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The bug was introduced when the 2-person approval was migrated from the CA software to the RA Portal on 2021-04-13. The purpose of this change was to allow full processing of validations in the RA Portal by Validation Specialists.

A faulty IF check in the RA API code allowed issuance of EV TLS certificates with a single person approval. The bug did not affect EV Code Signing or other types of certificates except EV TLS . This issue also did not affect other validations performed via our RA Portal; it was limited to our API, which is currently being used only by specific resellers for the issuance of a limited number of certificates.

The bug passed undetected to the production systems despite the fact that the change was reviewed both within the development department (code review) and by the compliance department (change review). The issue was also not detected during our quarterly certificate reviews due to the low number of affected certificates.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Immediate actions are described in steps 2-4 of this report and include the emergency deployment of the code fix to prevent further problematic issuances, the involvement of our internal auditing department, the thorough review of the possibly affected certificates and the revocation of the affected certificates after confirming the issue.

Initial review focuses on:

  • why the bug was introduced in the first place and in particular why it affected only EV TLS certificates but not EV Code Signing, which follow the same 2p approval principle;
  • how the issue was not detected during the code or compliance review, before reaching production.

Our investigation is ongoing and analysis is being conducted to reveal any underlying weaknesses and, according to the results, decide any additional measures and improvements in our systems and processes, so that such occurrences are not repeated in the future.

A full incident report shall be filed here when our investigation is complete. In the meantime, we will post regular updates.

Assignee: bwilson → support
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance]

This is an update to report current progress on this issue.

Our analysis is ongoing. Our plan is to complete our initial review within the next week and update the ticket at that time.

Chris: When you provide your further update, can you also help explain the delta between the initial 11 suspect certificates and the later determination of it only being 3 certificates? There's not anything wrong in collecting too many certs and then reviewing to drill down, but I think sharing a bit more detail about how the 11 were determined, how it was only 3 in practice, and what set the other 8 certificates apart, would be useful for transparency here.

Flags: needinfo?(support)

This is an update to report our current progress on this issue.

Our initial review has focused on answering the following questions:

Q: Why was this bug was introduced and in particular why did it affect only EV TLS certificates but not EV Code Signing, which follow the same 2p approval principle?

A: Upon further investigation, the mis-issuance occurred from a flawed certificate down-stepping process, which only occurs in EV TLS products, and for which a more complicated logic applies. In particular, after a certificate requester for an EV TLS certificate completes the DV process, a DV TLS certificate may be issued. A flawed condition statement in the API code allowed the certificate request to issue an EV certificate instead of the DV counterpart. (EV CS issuance was not affected, as this down-stepping mechanism does not apply for that product type.)

Q: How was the issue not detected during the code or compliance review before reaching production?

A: Our investigation confirms that the change passed through both code review and compliance review; however, neither of these reviews revealed the issue.

  • The former included a documented code review by a separate developer; the review did include the code in question; however, it failed to reveal the bug in the conditional logic.
  • The latter included a detailed review of the staging system's behavior after the change, against our CPS and the applicable requirements; however, it did not include API testing, because it was not considered as a separate case.

Our review is ongoing and we shall continue to post regular updates here.

Regarding the population: this API bug was introduced when 2-person approval was migrated from the CA software to the RA Portal on 2021-04-13 and was fixed on 2021-07-13 (see our timeline above). Within this period, a total of twenty four (24) EV TLS certificates were issued via this API. Eleven (11) certificates were deemed suspicious, due to the fact that our initial investigation did not reveal 2p approval records for these. Per our IMP, this required manual review of all suspicious certificates (as our target population), through which we confirmed 2-person approval records for eight (8) of them, thus resulting in considering all three (3) remaining certificates as mis-issued.

Flags: needinfo?(support)

Eleven (11) certificates were deemed suspicious, due to the fact that our initial investigation did not reveal 2p approval records for these. Per our IMP, this required manual review of all suspicious certificates (as our target population), through which we confirmed 2-person approval records for eight (8) of them

I guess I'm trying to understand: Why were 2P approval records hard to fine through the initial corpus?

Yes, this is ultimately a question about how data records are recorded, and trying to understand "how quickly" can a CA answer questions about the BR status of their issuance. This isn't so much a critique that you were (ultimately) able to reconcile this, but a recognition that such disparate systems can just as likely end up with undercounting than overcounting.

Flags: needinfo?(support)

This is an update to report current progress on this issue.

We have determined that one user account was misusing this API (designed for another, non-TLS product type) to submit TLS requests. They have been warned not to do this again, and to only use documented API endpoints and procedures for their intended purposes going forward. Our investigation has found no other similar misuse of this API.

Regarding our record retrieval: all issuing API endpoints call the function that allowed this TLS mis-issuance. Requests sent via APIs intended for TLS issuance result in capture and organization of audit information for ready review.

However, this API was designed for another (non-TLS) product type and was not authorized for use with TLS products. Although the capture of audit information is standardized, the use of the non-TLS API thus didn't allow for proper organizing of audit information for our quick, standard TLS reporting. Manual review was required to find records of TLS issuance requests made via this particular API.

We are taking this opportunity to review our auditing methods with an eye to strengthening our capture and organization of relevant information.

We also intend to review our code and compliance review methodology to improve our ability to anticipate and prevent similar unintended operations.

Our review is ongoing and we shall continue to post regular updates here.

Flags: needinfo?(support)

(In reply to Chris Kemmerer from comment #5)

We have determined that one user account was misusing this API (designed for another, non-TLS product type) to submit TLS requests. They have been warned not to do this again, and to only use documented API endpoints and procedures for their intended purposes going forward. Our investigation has found no other similar misuse of this API.

Comment #3 stated:

A flawed condition statement in the API code allowed the certificate request to issue an EV certificate instead of the DV counterpart

So which is it? A bug in the API, or a misuse of a supported API? If the latter, why would "warned not to do this again" be at all appropriate for mitigating the risk? I'm just trying to make sense of Comment #5, because it raises far more concerns than Comment #3, and I'm hoping this is just poor communication on SSL.com's part.

Flags: needinfo?(support)

(In reply to Ryan Sleevi from comment #6)

So which is it? A bug in the API, or a misuse of a supported API? If the latter, why would "warned not to do this again" be at all appropriate for mitigating the risk? I'm just trying to make sense of Comment #5, because it raises far more concerns than Comment #3, and I'm hoping this is just poor communication on SSL.com's part.

We regret any confusion, as we focused on answering your query regarding record retrieval. In our reply in Comment 5, we should have clearly corrected our previous language in Comment 3 reading "A flawed condition statement in the API code".

Misuse of the API (meant for non-TLS products) didn't allow for proper organizing of audit information for quick, standard TLS evidence reporting (addressing Comment 4).

Mis-issuance occurred via calls by the API to a flawed certificate down-stepping process.

Flags: needinfo?(support)

We are monitoring this bug for further questions.

Flags: needinfo?(support)

(In reply to Chris Kemmerer from comment #0)

A full incident report shall be filed here when our investigation is complete. In the meantime, we will post regular updates.

What is the timeline to complete your investigation and post a full incident report?

(In reply to Chris Kemmerer from comment #5)

We also intend to review our code and compliance review methodology to improve our ability to anticipate and prevent similar unintended operations.

What is the timeline to complete the review of your code and compliance review methodology?

(In reply to Mathew Hodson from comment #9)

What is the timeline to complete your investigation and post a full incident report?

Our plan is to complete our analysis and file our final report for this issue next week.

What is the timeline to complete the review of your code and compliance review methodology?

This will be part of our final report.

This is our final report on this issue.

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

The issue was discovered by our validation team while performing a routine check of an EV TLS order.

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2021-07-10T02:34:38-00:00 EV TLS order reviewed and approved by one of our Validation Specialists.

2021-07-12T03:38:36-00:00 Customer care requests for an update on the abovementioned order.

2021-07-12T09:15:41-00:00 The issuance of an EV TLS certificate with serial number 776752935acf6697078d9cb5547921a6 is triggered with the invocation of an API call by one of our resellers.

2021-07-13T16:05:00-00:00 In the process of performing 2p approval, one of our senior Validation Specialists determines that the certificate was issued without stored evidence of 2p approval in the system. The issue is reported by registering an internal ticket.

2021-07-13T16:37:00-00:00 The ticket is processed with high priority by a senior software engineer. A bug is discovered in our API and a hotfix is immediately deployed. Investigation continues to determine the population of the possibly affected certificates.

2021-07-14T20:22:00-00:00 The population of possibly affected certificates is sought and determined to be eleven (11) EV TLS certificates.

2021-07-15T15:00:00-00:00 The issue is picked up by our Security Auditing department and investigation begins.

2021-07-16T00:46:58-00:00 Additional technical information is shared by the software engineers.

2021-07-16T02:07:05-00:00 Audit logs of the approvals of the abovementioned population are presented. Security Auditors initiate a detailed review to determine whether and for which exact certificates there is a lack of 2p approval evidence.

2021-07-19T19:31:00-00:00 After detailed review of the abovementioned population by our Security Auditors and discussions with the validation team, two (2) additional problematic certificates are found. Subsequent review confirms that validation evidence supported issuance for all three (3) affected certificates, however immediate revocation is requested due to the lack of 2p approval at the time of issuance.

2021-07-19T21:56:00-00:00 All three (3) affected certificates revoked.

2021-07-20T11:00:00-00:00 Security Auditors start gathering all the information and compiling a preliminary incident report.

2021-07-23T21:02:00-00:00 Filed initial Bugzilla report , with full report to follow pending completion of in-depth review.

2021-07-30 to 2021-09-03 Ongoing investigation and discussions between the engineering, validation and compliance departments to analyze any underlying weaknesses and, according to the results, decide any additional measures and improvements in our systems and processes, so that such occurrences are not repeated in the future. The analysis also takes into account incident no. 1724520 which was registered on 2021-08-06. Weekly updates were made to the public bug to inform the community about the ongoing investigation/analysis and address any questions raised.

2021-09-06: Started drafting the final Bugzilla report.

2021-09-10: Filed final Bugzilla report (this document).

3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

A hotfix was immediately deployed and tested after the issue was detected. No similar issuances can be performed currently.

4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

Three (3) EV TLS certificates, issued between 2021-05-18 and 2021-07-12.

5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

Impacted certificates (and their corresponding pre-certificates):

S/N: 4f3ff4dc563aa9c46f18254998643ba6 (https://crt.sh/?id=4549278351)

S/N: 45ba1526442ba72cc562493d99bcf757 (https://crt.sh/?id=4834857365)

S/N: 776752935acf6697078d9cb5547921a6 (https://crt.sh/?id=4851729123)

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The bug was introduced when the 2-person approval was migrated from the CA software to the RA Portal on 2021-04-13. The purpose of this change was to allow full processing of validations in the RA Portal by the Validation Specialists.

How and why the mistakes were made or bugs introduced:

The source of the bug was a faulty IF check in the certificate down-stepping process, which only occurs in EV TLS products, and for which a more complicated logic applies. This down-stepping mechanism allows issuance of a DV TLS certificate, after a certificate applicant for an EV TLS certificate completes the DV process. Its purpose is for customers to receive a DV TLS certificate while waiting for the extended validation process to be completed.

In this case, the abovementioned flawed condition statement allowed the certificate request to issue an EV certificate instead of the DV counterpart, after the order was validated by the first Validation Specialist. The scope of the bug was limited to EV TLS certificates issued through an API endpoint which was not intended for TLS issuance.

Our investigation determined that issuance via the RA Portal was not affected by this bug. The bug also did not affect EV Code Signing or other types of certificates, because this down-stepping mechanism only applies to TLS certificates.

Upon further investigation, we determined that only one user account was misusing this API to submit TLS requests.

(A side effect impacted the efficiency of our evidence gathering. In particular, our auditing mechanisms did capture all the relevant API calls, but the use of this non-TLS API for TLS issuance didn't allow for proper organization of audit information for our standard TLS reporting.)

Apart from fixing the bug and reminding the user of their obligation to only use documented API endpoints and procedures for their intended purposes going forward, we extended our investigation and proceeded with in-depth manual review of all possibly affected certificates to confirm no other such case exists.

How they avoided detection until now:

The bug passed undetected to the production systems despite the fact that the change was reviewed both within the development department (code review) and by the compliance department (change review). Our investigation confirmed the following:

  • With regards to the code, a documented code review by a separate developer took place before approving the pull request, in accordance with our standard code approval policy. The review did include the code in question, however it failed to reveal the bug in the conditional logic (see point 7 for more details on this).
  • A documented compliance review by our Security Auditors took place before approving the migration of the 2-person approval from the CA software to the RA Portal, in accordance with our Change Management Policy. It included a detailed review of the staging system's behavior after the change, against both our CP/CPS and the applicable requirements. The review did not include scenarios which involved misuse of API endpoints (see point 7 for more details on this).

The issue was also not detected during our quarterly certificate reviews due to the low number of affected certificates.

7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Immediate actions are described in steps 2-4 of this report and include the emergency deployment of the code fix to prevent further problematic issuances, the involvement of our internal auditing department, the thorough review of the possibly affected certificates and the revocation of the affected certificates after confirming the issue.

After determining why the bug was introduced in the first place and in particular why it affected only EV TLS certificates but not EV Code Signing certificates, which follow the same 2p approval principle (see point 6 above), our analysis focused on why the issue was not detected during the code or compliance review, before reaching production.

Several discussions and meetings took place between all involved departments: engineering, validation and compliance. Our objectives were dictated by our Incident Management Policy: (a) to understand where and why the process failed, (b) to decide and introduce the proper mitigating measures to address any underlying weaknesses.

With regards to the compliance review, our investigation determined that several tests and use cases with different actors were executed by our internal Security Auditors. In particular, the review included the following:

  • Verify that 2P rule is enforced by the RA for all EV TLS certs;
  • Review implications with regards to the process/workflow of EV TLS application, validation, approval and enrollment;
  • Identify any edge cases and review implementation for these.

Based on the test cases and the review notes, we consider the depth of the review to be satisfactory. The problem was that the misuse of the API was not considered as an edge case - in other words, the process failed in widening the review to include such cases. Based on our analysis, the underlying weakness was the lack of sufficient information passed to the compliance team regarding the details of the RA system and its use in the validation process. This resulted in overlooking API misuse as an edge case.

Our plan to address the above shortcomings in the compliance review process includes the following actions:

(a) closer collaboration between the compliance department, the engineering department and any other stakeholders shall be enhanced, such that all necessary information is duly passed to the compliance reviewers to minimize the risk of overlooking some aspect of the process in question. (This action is already in progress: dedicated developer resources have been assigned to liaise with the compliance department, and our Change Management Policy is being updated to improve the flow of information and the level of inter-departmental coordination.)

(b) technical documentation shall be extended to more fully specify all the steps of the validation process in detail. This documentation shall be used by the compliance department in preparing more complete compliance reviews and to broaden the scope (where required) of compliance testing;

(c) a registry of critical system components shall be made available to the compliance department and shall be consulted in every change review, so that any such critical component is considered in compliance reviews.

The plan is to complete action (a) by the end of October 2021, and actions (b) and (c) by the end of the year.

Our investigation confirmed that the code review process has been in place, requires review by an additional developer, and is enforced in our code repository. Daily meetings, standups and collaboration tools are actively used within the development team (and thus by code reviewers) to request clarifications or raise issues when performing code reviews.

Based on our analysis, this particular bug was extremely difficult to detect via code review, since it required a combination of two conditions: a partially (1p) approved order and a misuse of an API which was not intended for TLS issuance. This highlights the fact that detection of such a bug in code is unlikely when reviewed outside of the context of the process.

Given the above, we believe that the detection of such bugs requires a more collaborative and systemic testing practice; a combination of code reviews, acceptance testing and compliance reviews, sharing the knowledge of both the system and the process.

We plan to update our software development lifecycle requirements to mandate more rigorous and collaborative testing and code review standards. This shall also extend automated testing (including unit and feature testing) of all critical areas of issuance and validation. Changes in our testing infrastructure (e.g. automated test environments) are part of this effort to increase our testing capacity. Going forward, we are also planning to adopt periodic code audits covering all abovementioned critical areas.

This update of our SDLC policies and procedures is already underway. Our plan is to finalize this update in the next month and proceed with implementation in the next 2 quarters.

How is your progress on (a), (b) and (c) mentioned in 7 above? Has your SDLC documentation been updated to include greater rigor and collaboration in testing and code review? In other words, what is the status on your remediation and prevention steps?
Thanks,
Ben

Items 2, 3 and 5 have been completed; 1 and 4 are in progress. In particular:

Regarding improvements in our testing infrastructure and practices (items no. 2 and 3), we have retooled our tests to run faster and in parallel. Automated unit and integration testing, including testing for all critical areas of issuance and validation, has been more thoroughly incorporated into the development workflow. Additionally, two separate environments, for Development and QA, are now available and in use, for further testing of pre-production code, as well as testing of infrastructure changes.

For item no.5, UI changes have been put in place to make production and sandbox environments immediately distinguishable.

Regarding strengthening contractual controls (item no.4), the legal team is working on upgrades to applicable agreements to specify in more detail our expectations for our partners. We expect these to be either incorporated into our existing agreements or created as an additional code of conduct. This task is expected to complete in Q1 2022.

For the SDLC (item no.1), please refer to our update in Bug #1722089.

Flags: needinfo?(bwilson)

(In reply to Chris Kemmerer from comment #13)

For the SDLC (item no.1), please refer to our update in Bug #1722089.

You seem to have copied the wrong bug number there.

(In reply to Mathew Hodson from comment #14)

(In reply to Chris Kemmerer from comment #13)

For the SDLC (item no.1), please refer to our update in Bug #1722089.

You seem to have copied the wrong bug number there.

Thank you and apologies, reference should be to Bug 1724520.

You need to log in before you can comment on or make changes to this bug.