Closed Bug 1724520 Opened 3 years ago Closed 2 years ago

SSL.com: Incorrect Domain Validation for 1 TLS certificate with FQDN having "www." string within domain labels

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: support, Assigned: support)

Details

(Whiteboard: [ca-compliance] [dv-misissuance])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36

Steps to reproduce:

This is a preliminary incident report. Our investigation into this matter is ongoing.

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

On 2021-07-26, a customer informed us that during processing an order for a TLS certificate, a certificate was issued with a malformed and non-validated common name.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2021-07-26T16:27:22-00:00 Issuance of a TLS certificate with serial number 5D1429EA7D3D94548E68F98FF46397B4 is triggered by one of our customers. The certificate is issued to CN=062wwww.ww.com, SAN=062wwww.ww.com, www.062wwww.ww.com.

2021-07-26T16:28:07-00:00 The customer notices the issue and proceeds with immediate revocation of the abovementioned certificate.

2021-07-26T18:56:27-00:00 The customer reports the issue to SSL.com.

2021-07-26T19:10:06-00:00 Our engineers pick up the issue and immediately begin technical investigation.

2021-07-26T20:18:00-00:00 Security auditing department is informed via an internal security event ticket.

2021-07-27T08:25:00-00:00 The issue is discussed in a dedicated meeting between engineering, security auditing and PKI management.

2021-07-27T11:49:00-00:00 Our engineers reproduce the issue and locate its source.

2021-07-27T18:00:00-00:00 An internal meeting takes place with the participation of the head engineer and security auditors to review the gathered information.

2021-07-27T18:50:00-00:00 A hotfix is deployed to the RA system; mitigation is confirmed.

2021-07-28T05:19:00-00:00 Engineering provides details about the affected domain patterns.

2021-07-28T19:24:00-00:00 Confirmation that no other certificates are affected by the issue, after searching our entire certificate database for the affected domain patterns.

2021-07-29T16:21:00-00:00 In accordance with our Incident Management Policy, the security event is formally escalated to an incident.

2021-08-03T15:30:00-00:00 An internal meeting takes places with the participation of the head engineer, the security auditor and senior validation specialists to analyze the weaknesses which allowed the issue to occur.

2021-08-04T17:00:00-00:00 Security auditors start compiling a preliminary incident report.

2021-08-06: Filed initial Bugzilla report.

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

A hotfix was deployed and tested within one (1) day after the issue was detected. No similar issuances can be performed currently.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

One (1) TLS certificate, issued on 2021-07-26.

  1. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

Impacted certificates (and their corresponding pre-certificates):

S/N: 5D1429EA7D3D94548E68F98FF46397B4 (https://crt.sh/?id=4931323746)

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The issue was caused by a bug in our codebase which would manifest only under very strict circumstances, thus no other similar cases have been found in our entire certificate population.

For the issue to occur, two (2) conditions had to be met at the same time:

  1. the applicant had to demonstrate control over an FQDN, and

  2. the applicant then had to craft a request for an altered version of the FQDN which would include a "www." string within, but not at the beginning of, domain labels.

To make it clearer, in this particular case, the customer (1) demonstrated control over 062www.com and (2) submitted a request for the domain 062wwww.ww.com.

Immediately after issuance, the customer noticed the problematic domain name in the certificate and self-revoked it. The issue was reported to SSL.com within the same day.

As mentioned above, the issue avoided detection till now due to the fact that it would manifest only under strict circumstances which were not foreseen by our code/unit tests.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Immediate actions are described in steps 2-4 of this report and include the immediate revocation of the problematic certificate, the deployment of the code fix to prevent further problematic issuances, the involvement of our engineering, internal auditing and PKI management departments and the review of the certificate population to confirm no other such cases exist.

Our investigation is ongoing and analysis is being conducted to reveal any underlying weaknesses and, according to the results, decide any additional measures and improvements in our systems and processes, so that such occurrences are not repeated in the future. In our analysis, we are also taking into account information gathered from our investigation into incident no. 1722089.

A full incident report shall be filed here when our investigation is complete. In the meantime, we will post regular updates.

Assignee: bwilson → support
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

(In reply to Chris Kemmerer from comment #0)

For the issue to occur, two (2) conditions had to be met at the same time:

  1. the applicant had to demonstrate control over an FQDN, and
  2. the applicant then had to craft a request for an altered version of the FQDN which would include a "www." string within, but not at the beginning of, domain labels.

I'm not really sure I understand this description here. Could you share more details about this bug? Perhaps expressing in pseudo-code? It's really difficult to understand how something like this could happen, and how it's being fixed.

Flags: needinfo?(support)
Type: defect → task

This is an update regarding our ongoing review.

We are working on a more detailed description of this bug and shall post it here as early as possible next week.

Regular updates will also be made here to report progress on this issue.

Flags: needinfo?(support)

(In reply to Ryan Sleevi from comment #1)

(In reply to Chris Kemmerer from comment #0)

For the issue to occur, two (2) conditions had to be met at the same time:

  1. the applicant had to demonstrate control over an FQDN, and
  2. the applicant then had to craft a request for an altered version of the FQDN which would include a "www." string within, but not at the beginning of, domain labels.

I'm not really sure I understand this description here. Could you share more details about this bug? Perhaps expressing in pseudo-code? It's really difficult to understand how something like this could happen, and how it's being fixed.

Our BasicSSL product only allows a single, non-wildcard FQDN. This product provides users with a certificate that covers both the base domain (domain.com) and its 'www.' counterpart (www.domain.com) as long as control over the base domain is satisfied.

The following logic applies:

  • If the FQDN does not begin with 'www.', then 'www.' + FQDN is added as a SAN entry.
  • If the FQDN does begin with 'www.', then an extra SAN is included with the leading 'www.' removed (again, provided domain validation to the right of 'www.' is satisfied).

The 'www.' bug was a result of this Ruby code snippet which replaces 'www.' with '' regardless of where 'www.' appears:

name.gsub('www.', '')

The intended behavior should only remove the leading 'www.' so the previous code snippet was replaced with the following:

name.delete_prefix("www.")

In this case, the original code snippet removed the 'www.' from 062wwww.ww.com, resulting in using 062www.com in the domain challenge. The customer did have control of this domain and therefore successfully validated the order. As a side effect, the original domain 062wwww.ww.com, along with 'www.' prepended as a SAN entry, was subsequently sent in the certificate request which resulted in an issued certificate.

Our review of potentially impacted certificates confirms that this is the only certificate affected by this bug.

(In reply to Chris Kemmerer from comment #3)

In this case, the original code snippet removed the 'www.' from 062wwww.ww.com, resulting in using 062www.com in the domain challenge. The customer did have control of this domain and therefore successfully validated the order. As a side effect, the original domain 062wwww.ww.com, along with 'www.' prepended as a SAN entry, was subsequently sent in the certificate request which resulted in an issued certificate.

It's hard to believe that the customer coincidentally also controlled 062www.com. Was the customer aware of the bug, and were they purposefully trying to trigger it?

It's hard to believe that the customer coincidentally also controlled 062www.com. Was the customer aware of the bug, and were they purposefully trying to trigger it?

Yes, the tester was attempting to perform issuance in our dev/sandbox environment but issued from the production environment instead.

We are monitoring this bug for further questions.

Flags: needinfo?(support)

We intend to file our final report for this issue this week.

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

On 2021-07-26, a customer informed us that during processing an order for a TLS certificate, a certificate was issued with a malformed and non-validated common name.

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2021-07-26T16:27:22-00:00 Issuance of a TLS certificate with serial number 5D1429EA7D3D94548E68F98FF46397B4 is triggered by one of our customers. The certificate is issued to CN=062wwww.ww.com, SAN=062wwww.ww.com, www.062wwww.ww.com. 2021-07-26T16:28:07-00:00 The customer notices the issue and proceeds with immediate revocation of the abovementioned certificate.

2021-07-26T18:56:27-00:00 The customer reports the issue to SSL.com.

2021-07-26T19:10:06-00:00 Our engineers pick up the issue and immediately begin technical investigation.

2021-07-26T20:18:00-00:00 Security auditing department is informed via an internal security event ticket.

2021-07-27T08:25:00-00:00 The issue is discussed in a dedicated meeting between engineering, security auditing and PKI management.

2021-07-27T11:49:00-00:00 Our engineers reproduce the issue and locate its source.

2021-07-27T18:00:00-00:00 An internal meeting takes place with the participation of the head engineer and security auditors to review the gathered information.

2021-07-27T18:50:00-00:00 A hotfix is deployed to the RA system; mitigation is confirmed.

2021-07-28T05:19:00-00:00 Engineering provides details about the affected domain patterns.

2021-07-28T19:24:00-00:00 Confirmation that no other certificates are affected by the issue, after searching our entire certificate database for the affected domain patterns.

2021-07-29T16:21:00-00:00 In accordance with our Incident Management Policy, the security event is formally escalated to an incident.

2021-08-03T15:30:00-00:00 An internal meeting takes places with the participation of the head engineer, the security auditor and senior validation specialists to analyze the weaknesses which allowed the issue to occur.

2021-08-04T17:00:00-00:00 Security auditors start compiling a preliminary incident report.

2021-08-06T21:36:00-00:00: Filed initial Bugzilla report, with full report to follow pending completion of in-depth review.

2021-08-09 to 2021-09-13 Ongoing investigation and discussions between the engineering, validation and compliance departments to analyze any underlying weaknesses and, according to the results, decide any additional measures and improvements in our systems and processes, so that such occurrences are not repeated in the future. The analysis is also related to incident no. 1722089 (see Comment 11). Regular updates were made to the public bug to inform about the ongoing investigation/analysis and address questions raised by the community.

2021-09-13: Started drafting the final Bugzilla report.

2021-09-17: Filed final Bugzilla report (this document).

3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

A hotfix was deployed and tested within one (1) day after the issue was detected. No similar issuances can be performed currently.

4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

One (1) TLS certificate, issued on 2021-07-26.

Our review of potentially impacted certificates confirms that this is the only certificate affected by this bug.

5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

Impacted certificates (and their corresponding pre-certificates):

  1. S/N: 5D1429EA7D3D94548E68F98FF46397B4 (https://crt.sh/?id=4931323746)

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The issue was caused by a bug in our codebase which would manifest only under very strict circumstances, thus no other similar cases have been found in our entire certificate population.

For the issue to occur, two (2) conditions had to be met at the same time:

  • the applicant needed to demonstrate control over an FQDN, and

  • the applicant also needed to craft a request for an altered version of the FQDN which would include a "www." string within, but not at the beginning of, domain labels. To make it clearer, in this particular case, the customer (a) demonstrated control over "062www.com" and (b) submitted a request for the domain "062wwww.ww.com".

Immediately after issuance, the customer noticed the problematic domain name in the certificate and self-revoked it. The issue was reported to SSL.com the same day.

As mentioned above, the issue avoided detection before now because it would manifest only under strict circumstances which were not foreseen by our code/unit tests. Since this was the only such case and it was immediately reported (and revoked), our quarterly certificate reviews did not capture this issue.

7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Immediate actions are described in steps 2-4 of this report and include the immediate revocation of the problematic certificate, the deployment of the code fix to prevent further problematic issuances, the involvement of our engineering, internal auditing and PKI management departments and the review of the certificate population to confirm no other such cases exist.

In accordance with our Incident Management Policy, we conducted a full investigation of the issue and an analysis of the root cause to reveal any underlying weaknesses and determine the proper mitigation measures.

Upon investigation, the bug was determined to only affect our DV certificates, and only under a narrow set of conditions. When we receive certificate requests for a domain including “www.” in the subdomain field (i.e "www.example.com") we also include the naked base domain ("example.com"). During our analysis, we found a faulty call that did not account for the position of the substring “www.” in the FQDN. This would allow an arbitrary “www.” in the middle of a FQDN to be replaced by a blank string, leaving an incorrect FQDN, after validation of the correct FQDN. As a side effect, the constructed FQDN, along with “www.” prepended as a SAN entry, would subsequently be sent in the final certificate request, thus resulting in issuance.

In this particular case, the mis-issuance was triggered by a customer who noticed a possible bug when their request to issue a certificate to “062www.com” could not be processed because the system was stripping the “www.” part of the FQDN. While troubleshooting, the customer crafted a request for the domain “062wwww.ww.com” in order to test issuance. The customer tested against the production environment instead of the sandbox environment, thus resulting in the issuance of the offending certificate.

Per further investigation, this code snippet was authored in November 2018 and passed unnoticed during the review/testing of the change.

For the root cause analysis, several discussions and meetings took place between all the involved departments: engineering, validation and compliance. In our analysis, we also took into account information from bug no. 1722089.

We consider that the root causes and mitigation measures for this issue link to the Final Incident Report of that bug (https://bugzilla.mozilla.org/show_bug.cgi?id=1722089#c11) and our plan to:

  • update our software development lifecycle requirements to mandate more rigorous and collaborative testing and code review standards;

  • extend automated testing (including unit and feature testing) of all critical areas of issuance and validation;

  • increase our testing capacity by improving our testing infrastructure (e.g. automated test environments)

The timeline for the implementation of the above is also specified in the Final Incident Report of bug no. 1722089.

Although the nature of this issue is such that it only applies under a strictly constrained set of circumstances, we consider that this particular bug should have been caught by the QA controls in effect at the time the code was authored.

Our processes and controls have significantly improved since then, the capacity of the development and compliance teams has grown, and a comprehensive re-factoring plan is already in progress. In our Final Incident Report of bug no. 1722089 we announced our intention to adopt periodic code audits going forward; in the light of this bug, we have decided to raise the priority of that initiative. Our target is to initiate the first iteration in Q4 2021.

Apart from mitigation related directly to code management and review, our analysis also has led us to commit to the following actions:

  • Strengthening of our contractual controls and obligations with customers. Our intention is to complete this task before the end of the year; however, this initiative has legal implications, and it is therefore difficult to set a firm timeline.

  • Creation of a clear and persistent indicator in the sandbox environment to easily differentiate it from the production environment. Implementation of this is underway and is due to be completed next week.

Hi,
Please provide a status report and update on the progress of your improvements to your:
1 - SDLC documentation;
2 - automated testing;
3 - testing environment;
4 - contractual controls (including what that entails - its relevance to this incident); and
5 - sandbox environment.
Thanks,
Ben

Our plan has made significant progress in very important initiatives. The first iteration of updates was completed in October, but due to the key role of some changes (especially with regards to SDLC and Change Management) we decided to extend our plan with another iteration before finalization.

In particular:

Regarding item (a), weekly meetings have been adopted to ensure closer collaboration between all departments, and to coordinate our efforts, especially regarding inter-departmental and compliance issues.

Regarding item (b), we estimate the review and update of our documented validation procedures is 50% complete. All EV procedures and a significant part of our OV/IV procedures have been reviewed and updated where necessary by the validation working group. A weekly meeting takes place under the supervision of the compliance department to coordinate efforts and ensure the project is on track and its results are satisfactory. Based on the progress thus far, we expect to reach 100% before the end of the year. Afterwards, a final compliance evaluation shall take place to sign-off.

Regarding item (c), new SDLC and Change Management policies have been prepared by the development and compliance departments. Both policies incorporate a more rigorous and collaborative paradigm with the involvement of impacted roles (e.g. engineers at the network, system and CA levels, compliance, productization, management) in all applicable stages of a project (such as design phase, final testing, review and sign-off).

Our new SDLC also extends automated testing (including unit and feature testing) of all critical areas of issuance and validation. This initiative is aligned/complemented by improvements in the testing infrastructure and testing practices described in Bug #1724520.

These new policies also specify different levels of testing and review by the end users and the compliance department, depending on the nature and/or criticality of the change. For this reason, we have created a Critical Functions List (CFL) which is intended to capture any function that is critical in nature and affects validation, issuance, compliance or security. The CFL is currently under review by our compliance department. We expect it to be formally adopted before the end of this year, and to be updated as required to reflect any future changes in our infrastructure and/or operations.

The two teams continue to work towards a deeper integration between the two policies. Taking into account the other initiatives which run in parallel, we now expect to deliver final versions of these policies to our PMA for approval in January 2022.

Our plan is to provide updates to this bug based on our progress.

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] → [ca-compliance] Next update 2022-01-17

This is an update to report current progress on this issue.

Regarding item (b), the review and update of our documented validation procedures is 90% complete, with the only pending item being the update of our email address validation documentation (for the rare case it needs to be conducted manually).

A major re-structuring of the validation documentation took place by the Validation Department. A separate process is documented per each of the following four (4) validation components:

  1. Validation of organizations and individuals (OV/IV)
  2. Extended validation of organizations (EV)
  3. Domain validation
  4. Email address validation

Each process is supplemented by documented procedures which describe the detailed steps to perform the validation in the context of the certificate type (Server Authentication, Code Signing, Document Signing, Email Protection, Client Authentication).

The updated documentation, and any subsequent update going forwards, shall be approved by the Compliance Department after review/QA.

Regarding item (c), our SDLC documentation has been completed and its integration with our Change Management Policy is underway. End-of-year rush and human resources re-assignment to the investigation of bug #1750631 have delayed this initiative, with a new ETA of February 2022. The Critical Functions List has been completed, reviewed by the Compliance Department and approved as version 1.0. The CFL is owned by the Development Department, who is also responsible for maintaining it going forward, and any material change to the CFL shall be marked as a new version.

Whiteboard: [ca-compliance] Next update 2022-01-17 → [ca-compliance] Next update 2022-02-18

This is an update to report current progress on this issue.

Regarding item (b), all validation procedure documentation has now been delivered and is undergoing review and QA by the Compliance Department.

Regarding item (c), integration of our Change Management Policy with SDLC documentation continues. As a part of this process, and after close review of proposed controls for software development, revisions of the SDLC as submitted are under consideration. This is to ensure complete congruity between the SDLC and our CM practices and procedures. We expect formal adoption of the new SDLC within the next two weeks.

This is an update to report current progress on outstanding items for this issue.

Regarding item (b), the following completely revamped validation procedure documentation components have been approved by the Compliance Department and formally adopted:

  • Extended validation of organizations (EV)
  • Domain validation (DV)

Final review and approval of the following components is underway and is expected to be completed within the next two weeks:

  • Validation of organizations and individuals (OV/IV)
  • Email address validation (S/MIME)

Regarding item (c), we reached an important milestone for our Change Management process: the formal adoption of an enhanced, integrated Software Development Lifecycle Policy (SDLC) which applies to our in-house development process, and which introduces a more collaborative and proactive model.

We have chosen to design and implement the SDLC as a Policy in order to clearly specify the required / expected controls and the responsibilities of each party in our software development process. Several cycles of improvements and continuous collaboration between the Software Engineering team, the Compliance team and other stakeholders was required for this purpose. As part of this process, integration with other applicable policies and conformity of the SDLC to our Change Management Policy was verified before adoption.

Whiteboard: [ca-compliance] Next update 2022-02-18 → [ca-compliance] Next update 2022-06-17

This is an update to report our progress on outstanding items for this issue.

Regarding item (b), all remaining validation procedure documentation components have been approved by the Compliance Department and formally adopted:

  • Validation of organizations and individuals (OV/IV)
  • Email address validation (S/MIME)

This concludes our remediation actions for this bug.

I'll close this on Wed. 22-June-2022, unless there are other issues to discuss.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(support)
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] Next update 2022-06-17 → [ca-compliance] [dv-misissuance]
You need to log in before you can comment on or make changes to this bug.