User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36
Steps to reproduce:
Plus Incident Report
- How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
On May 29, 2019 at 12:55 pm PT, a DigiCert support engineer was assisting a customer and noticed the SANs in the TLS certificate did not match the specified domain validation scope. The support engineer escalated to his manager, who initiated research with both the dev and compliance. This launched a full investigation. The DigiCert engineering team identified the issue with a seldom used feature of legacy DigiCert. Than feature, identified as a “Certificates Plus” allowed a customer to automatically add www.domain to an order that contained domain and vice versa. Unfortunately, the feature added the domains after validation completed but before issuance instead of during the initial order process. This allowed improper issuance of example.com if only www.example.com was verified.
Engineering had previously investigated this code while the Comodo issue was pending referenced here: https://groups.google.com/forum/#!msg/mozilla.dev.security.policy/PoMZvss_PRo/TK8L-lK0EwAJ. The report back was that the system ensured proper validation. However, the individuals performing the assessment were let go some time ago and are no longer available to find out how this was missed during the previous review. We recognize that this is an issue that required immediate attention.
- A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
May 29, 2019 – Support engineer noticed mis-match in validation compared to SANs
May 29, 2019 – Escalated to compliance to initiate root cause investigation.
May 30, 2019 - Root cause identified as Plus feature. Faulty logic was disabled.
May 31, 2019 - Began running reports to identify impacted certificates.
June 1, 2019 – Began internally reviewing problematic cert report. Revised script to eliminate false positives.
June 3, 2019 – Investigation completed and identified final list of 1,069 problematic certs. Customers notified of three options: revalidate, reissue to remove problem domain, or revoke within 24 hours.
June 4, 2019 – 390 certificates were reissued/revalidated; 679 certificates were revoked.
- Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
Yes, we have stopped issuing certificates with this problem. The DigiCert plus feature with the faulty logic was disabled across all platforms on May 30, 2019. The Plus feature will be re-coded to enable the option only when the CN is the base domain. We are also refactoring that part of the code shortly to eliminate its ability to touch orders post validation but pre-issuance. We want to ensure that nothing can be inserted between validation and issuance that can change the order. This way all validation and issuance runs through the same set of services, keeping all compliance tests and operations focused and consistent in that code.
- A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
1,069 problematic certificates identified
The first certificate was issued on April 27, 2017
The last certificate was issued on May 28, 2019
All certs were logged to CT
- The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
<Attach file to bug report> Incident - Plus\serials-crtsh.csv
- Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
A legacy system contained a seldom used path to issuance that was adding the base domain to certificates for a www subdomain (e.g., www.example.com) after validation of the subdomain had been completed. The system added the base domain to the certificate after passing through our RA system and our compliance checks, just before the information was passed to the CA for signing. We attached a diagram of the flow and systems to show how the systems interconnect.
The issue only occurred in isolated cases when validation was done using file auth and DNS text where the customer elected to include our “plus” feature–an older setting that gives customers a free alternative domain. Unfortunately, the feature was designed to work both ways (adding a www subdomain to base domains and adding the base domain to www subdomain cert). The total impact is 1,069 certs where the www subdomain was validated but the base domain was not.
We acknowledge that issuance without the proper level validation is a BR violation as evidenced in the Mozilla discussion < https://groups.google.com/forum/#!msg/mozilla.dev.security.policy/PoMZvss_PRo/TK8L-lK0EwAJ> in 2016. We thought we’d investigated issuance paths like this back in March 2017, to ensure we did not have this issue but somehow, we missed a part of the code logic. As mentioned, the engineers who did the code review in 2017 are no longer with DigiCert so we cannot ask them what happened and how this was missed. The real failure is similar to our CAA record checking. An engineer who didn’t understand the important of the CAB Forum requirements that didn’t realize they missed a significant issue. We also don’t have formal hand-off procedures so there wasn’t a code review by the engineering resources taking over responsibility for that code. The new engineer is quite good. BJ is his name.
The certs represent .003% of total TLS issuance, which is why they were not caught earlier by an audit. None of these were selected during the 3% audits. We do have automated auditing components but they primarily check certificate profiles and content, not validation information.
- List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
We are taking a multi-pronged approach to resolving the situation and eliminating the possibility of future errors as follows:
- May 30, 2019 - We turned off the feature for issuing plus certificates without the proper validation at the subdomain or base domain. This prevented future mis-issuances from this particular system
- June 30, 2019 - We have scheduled training for the engineering staff the last week of June—this will be considered mandatory for any engineer working on CA code, RA code, or any other code that interfaces with the CA and RA systems.
During that training, we are going to assign a developer that is ultimately responsible for each code segment and for ensuring compliance with each portion of the BRs. This compliance requirement includes writing unit tests for the various BRs under their responsibility to ensure compliance checks before code is released into production. We will emphasize the peer-reviews involve a comprehensive understanding of the requirements. We will run the training annually or more often, as necessitated by BR changes. We believe this will improve the quality and operation of our code substantially.
During the training, we will also come up with a better hand-off procedure for when employees. We want a procedure that establishes a new chain for custody for code and provides additional oversight in case things were during an employee’s last days. We’ll ensure that the hand-off procedures follow industry best practices, including a transitional code review. Any additional ideas for the training or that you’d like to see include are appreciated.
3. October 31, 2019 – We previously mentioned that we have a new validation engine that we’ve developed. This includes a new workbench for validation staff and rigorous compliance controls. Migration to this new capability is underway with a target completion date of October 31, 2019. At that point all certificates issued, regardless of platform, will go through the new domain validation tool. At that time, all orders and requests will go through the RA into the CA, without the possibility of order modifications post validation. Any new modifications will reset the process and go through the new validation engine.
4. April 30, 2020 - We are currently working on legacy system migration, including migration of legacy DigiCert and Symantec systems. The first DigiCert system has about 4 more customers that we are migrating. The others have more users but are being migrated at the same time. The target for beginning the decommissioning process for the systems is January 2020. The target date for final shut down is April 2020. This largely depends on how the shutdown process proceeds. We are constantly working towards these dates and will post updates as things change.