1. How your CA first became aware of the problem.
At 2022-10-20 10:30 UTC, we noticed that https://crt.sh/?zlint=1+week was flagging errors for a considerable number of certificates that were recently issued by Sub-CAs operated by Sectigo.
2. A timeline of the actions your CA took in response.
All times are in UTC.
2014-03-17 - R&D commits a code change to our CA platform that adds a function named
unsetBits, which is intended to unset certain bit(s) in a Key Usage BITSTRING value.
2014-03-27 06:05 - IT Operations deploys the code change to our Production system.
2014-03-28 14:37:33 - We issue the first affected certificate.
2022-10-09 15:18 - ZLint v3.4.0 is released.
2022-10-19 20:49:49 - I upgrade ZLint from v3.3.1 to v3.4.0 on the crt.sh servers.
2022-10-20 10:30 - We notice that https://crt.sh/?zlint=1+week is flagging a new error for several of our Sub-CAs.
2022-10-20 11:12:27 - R&D rebuilds our certificate issuance application "cert_producer", upgrading its ZLint dependency from v3.3.1 to v3.4.0. We anticipate that this will quickly block all further misissuance related to this incident.
2022-10-20 11:17 - R&D asks IT Operations to deploy the updated cert_producer ASAP.
2022-10-20 11:33:33 - IT Operations deploys the updated cert_producer to our Production environment. We watch the cert_producer logs to confirm that the updated preissuance linting is blocking issuance of some certificates and logging the same error seen on https://crt.sh/?zlint=1+week.
2022-10-20 11:47 - R&D identifies the root cause of the problem, which is that the
unsetBits function omits the step of recomputing the 'unused bits' octet in the Key Usage BITSTRING.
2022-10-20 11:55 - R&D completes an initial assessment of the scope of impact, determining that the
unsetBits function has only ever been used to unset the keyEncipherment bit in situations where this has been set in the selected certificate profile but where the leaf certificate request has an ECC key.
2022-10-20 12:03 - We realize that most of our ECC Sub-CAs are configured to override our CA system's default Key Usage configuration for the relevant certificate type with their own Key Usage configuration that doesn't set the keyEncipherment bit. Consequently, we conclude that the scope of impact is less than previously thought, affecting only issuance from the small number of ECC Sub-CAs that don't specify their own Key Usage configuration plus leaf certificates with ECC keys that are issued by RSA Sub-CAs.
2022-10-20 12:30 - We update the configuration of the affected ECC Sub-CAs so that they do specify their own Key Usage configuration (that does not set the keyEncipherment bit) and are therefore able to issue correctly formed leaf certificates even though the
unsetBits bug is not yet fixed.
2022-10-20 15:35:59 - R&D commits a bugfix for the
2022-10-20 15:59 - R&D implements a small test application that exercises the
unsetBits function directly with a series of test cases, which are intended to cover both our current use (unsetting keyEncipherment) and any potential future uses of the function.
2022-10-20 16:01 - We confirm by visual inspection that each of the encoded Key Usage extensions emitted by the test application is correctly DER encoded, including specifying the correct number of 'unused bits'.
2022-10-20 16:21 - R&D realizes - due to us only using Certlint (a general-purpose RFC5280 linter) and not ZLint when performing preissuance linting for non-server certificates - that the bug in
unsetBits could potentially be causing malformed Key Usage extensions in S/MIME certificates with ECC keys that are issued by RSA Sub-CAs. R&D recommends to Project Management that deploying the bugfix immediately is the best option to mitigate this concern, even though deployments of the affected code component incur some service interruption and therefore are normally only permitted at off-peak times with plenty of prior warning to customers.
2022-10-22 16:44 - QA team completes regression testing and confirms the acceptability of the test cases and testing performed by R&D.
2022-10-20 17:16 - Project Management explains the situation to the Risk / Release Management teams and requests approval to deploy the updated version of the
unsetBits function as an urgent hotfix.
2022-10-20 17:28 - Risk / Release Management teams conclude their discussions and provide a sufficient number of approvals to meet the required quorum.
2022-10-20 17:29 - IT Operations confirms readiness to deploy the bugfix.
2022-10-20 17:33 - Support Operations provides, and requests feedback from other stakeholders on, a first draft of an emergency deployment notice to be provided to customers.
2022-10-20 19:01 - We approve the final version of the deployment notice, which advises customers of a brief service interruption to occur 30 minutes later. Support Operations posts the notice on our status.io page.
2022-10-20 19:34:30 - Service interruption begins.
2022-10-20 19:37 - Bugfix deployment is completed.
2022-10-20 19:40:35 - Normal service resumes.
2022-10-20 21:10 - We re-attempt issuance for the 188 certificate requests that had been blocked by the new ZLint Key Usage lint since the deployment of the updated version of cert_producer earlier in the day. All 188 are issued successfully, with correctly formed Key Usage extensions.
2022-10-20 22:17:05 - R&D completes implementation of the script mentioned in comment 0 that will identify all certificates that have been misissued due to the
2022-10-20 22:18 - We set the script running on our production CA database.
2022-10-21 09:10 - Incident Response team commences review of the first draft of comment 0.
2022-10-21 15:24 - I post comment 0.
3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.
Prior to resolving the problem, we did not take steps to stop all certificate issuance from the affected parts of our PKI. Instead, as described in the timeline, we opted to respond rapidly in a manner that would block further misissuance without causing a lengthy service disruption for our customers.
4 & 5. A summary of, and the complete certificate data for, the problematic certificates.
We had hoped to provide full details of the affected certificates in this incident report, but at the time of writing the script mentioned in comment 0 is still running, and so we have not yet finished identifying the problematic certificates. We will provide a summary, and the complete certificate data, as soon as we can.
The script is scanning every publicly trusted certificate and precertificate ever issued by our CA system, finding every instance of the byte sequence 0x040403020580, which we've determined is the only malformed DER Key Usage BITSTRING that will have been produced due to the bug described in this incident report. The runtime is measured in weeks simply because we have issued a huge number of certificates and precertificates. Selectively scanning the issued certificates from Sub-CAs we knew to be affected would have been quicker, but we wanted to do an exhaustive scan that will prove or disprove our beliefs and assumptions about the scope of impact.
Meanwhile, to meet Mozilla's timeliness expectations for CA incident reports, we are providing the other sections of this incident report today.
6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
A feature/oddity of our ASN.1 BITSTRING code is that the 'unused bits' octet is managed by the ASN.1 handling code rather than by the DER encoder, but the Comodo R&D engineer responsible for creating the
unsetBits function did not realise that that function would need to recalculate the value of this octet.
This problem went undetected for over eight and half years. During that time, our suite of unit tests grew extensively, but no test was added or even conceived that would have detected the problem; no customers, relying parties, or security researchers reported the problem to us; and none of our preissuance linting tools detected the problem, prior to ZLint v3.4.0.
We are aware of two other recent CA incident bugs that cover different Key Usage encoding errors that, like this Sectigo bug, went undetected for significant periods of time.
7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.
Remediation of the CA system bug was planned and completed on the day of discovery, as detailed in the timeline.