Closed Bug 1796803 Opened 3 months ago Closed 2 months ago

Sectigo: Issuance of ECC leaf certificates with non-DER encoded keyUsage

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rob, Assigned: rob)

References

Details

(Whiteboard: [ca-compliance])

Attachments

(4 files)

Yesterday morning we discovered via https://crt.sh/?zlint=1+week that some ECC leaf certificates issued by Sectigo's CA platform contain an incorrect number of unused bits in their keyUsage BITSTRINGs.

To immediately mitigate against further misissuance, we accelerated our plans to upgrade our preissuance linting system to use the latest ZLint release, which we deployed an hour after discovering the problem.

To comprehensively prevent further misissuance, we quickly identified the root cause and scope of impact, and then prepared and tested a bugfix, which we deployed later the same day.

Once further misissuance could no longer occur, we prepared a script that will identify all the misissued ECC leaf certificates. We set this script running on our CA database later the same day, but we are not yet sure how long it will take to complete.

A full incident report will be forthcoming.

Thanks, Rob, for your rapid response. This is very noteworthy.

+1 to Ben's comment.

If there is any commentary you can share on lessons-learned re: implementing ZLint for preissuance linting, it would certainly be appreciated.

(In reply to Ryan Dickson from comment #2)

If there is any commentary you can share on lessons-learned re: implementing ZLint for preissuance linting, it would certainly be appreciated.

Hi Ryan. Are you referring to the hoops that must be jumped through to persuade ZLint to operate on an unsigned TBSCertificate object instead of a fully-formed X.509 Certificate? If so, I can certainly share details of Sectigo's approach.

Flags: needinfo?(ryandickson)

The script mentioned in comment 0 is continuing to run at this time. Its estimated progress is just over 72%. We have noticed the progress indicator has slowed down somewhat since its initial start, which makes estimating the time of completion difficult. We will keep the community updated on its progress.

In the meantime, we are preparing the full incident report.

@rob, sorry for the delayed response.

Yes - any information on linting the unsigned TBSCertificate instead of a fully-formed certificate (or pre-certificate) would be helpful. I intend to spend more time studying linting projects like ZLint in the coming weeks, so at the least, this will be helpful to me personally (and I suspect, others following along here on Bugzilla). Thanks for your offer!

Flags: needinfo?(ryandickson)

1. How your CA first became aware of the problem.

At 2022-10-20 10:30 UTC, we noticed that https://crt.sh/?zlint=1+week was flagging errors for a considerable number of certificates that were recently issued by Sub-CAs operated by Sectigo.

2. A timeline of the actions your CA took in response.

All times are in UTC.

2014-03-17 - R&D commits a code change to our CA platform that adds a function named unsetBits, which is intended to unset certain bit(s) in a Key Usage BITSTRING value.

2014-03-27 06:05 - IT Operations deploys the code change to our Production system.

2014-03-28 14:37:33 - We issue the first affected certificate.
...

2022-10-09 15:18 - ZLint v3.4.0 is released.

2022-10-19 20:49:49 - I upgrade ZLint from v3.3.1 to v3.4.0 on the crt.sh servers.

2022-10-20 10:30 - We notice that https://crt.sh/?zlint=1+week is flagging a new error for several of our Sub-CAs.

2022-10-20 11:12:27 - R&D rebuilds our certificate issuance application "cert_producer", upgrading its ZLint dependency from v3.3.1 to v3.4.0. We anticipate that this will quickly block all further misissuance related to this incident.

2022-10-20 11:17 - R&D asks IT Operations to deploy the updated cert_producer ASAP.

2022-10-20 11:33:33 - IT Operations deploys the updated cert_producer to our Production environment. We watch the cert_producer logs to confirm that the updated preissuance linting is blocking issuance of some certificates and logging the same error seen on https://crt.sh/?zlint=1+week.

2022-10-20 11:47 - R&D identifies the root cause of the problem, which is that the unsetBits function omits the step of recomputing the 'unused bits' octet in the Key Usage BITSTRING.

2022-10-20 11:55 - R&D completes an initial assessment of the scope of impact, determining that the unsetBits function has only ever been used to unset the keyEncipherment bit in situations where this has been set in the selected certificate profile but where the leaf certificate request has an ECC key.

2022-10-20 12:03 - We realize that most of our ECC Sub-CAs are configured to override our CA system's default Key Usage configuration for the relevant certificate type with their own Key Usage configuration that doesn't set the keyEncipherment bit. Consequently, we conclude that the scope of impact is less than previously thought, affecting only issuance from the small number of ECC Sub-CAs that don't specify their own Key Usage configuration plus leaf certificates with ECC keys that are issued by RSA Sub-CAs.

2022-10-20 12:30 - We update the configuration of the affected ECC Sub-CAs so that they do specify their own Key Usage configuration (that does not set the keyEncipherment bit) and are therefore able to issue correctly formed leaf certificates even though the unsetBits bug is not yet fixed.

2022-10-20 15:35:59 - R&D commits a bugfix for the unsetBits function.

2022-10-20 15:59 - R&D implements a small test application that exercises the unsetBits function directly with a series of test cases, which are intended to cover both our current use (unsetting keyEncipherment) and any potential future uses of the function.

2022-10-20 16:01 - We confirm by visual inspection that each of the encoded Key Usage extensions emitted by the test application is correctly DER encoded, including specifying the correct number of 'unused bits'.

2022-10-20 16:21 - R&D realizes - due to us only using Certlint (a general-purpose RFC5280 linter) and not ZLint when performing preissuance linting for non-server certificates - that the bug in unsetBits could potentially be causing malformed Key Usage extensions in S/MIME certificates with ECC keys that are issued by RSA Sub-CAs. R&D recommends to Project Management that deploying the bugfix immediately is the best option to mitigate this concern, even though deployments of the affected code component incur some service interruption and therefore are normally only permitted at off-peak times with plenty of prior warning to customers.

2022-10-22 16:44 - QA team completes regression testing and confirms the acceptability of the test cases and testing performed by R&D.

2022-10-20 17:16 - Project Management explains the situation to the Risk / Release Management teams and requests approval to deploy the updated version of the unsetBits function as an urgent hotfix.

2022-10-20 17:28 - Risk / Release Management teams conclude their discussions and provide a sufficient number of approvals to meet the required quorum.

2022-10-20 17:29 - IT Operations confirms readiness to deploy the bugfix.

2022-10-20 17:33 - Support Operations provides, and requests feedback from other stakeholders on, a first draft of an emergency deployment notice to be provided to customers.

2022-10-20 19:01 - We approve the final version of the deployment notice, which advises customers of a brief service interruption to occur 30 minutes later. Support Operations posts the notice on our status.io page.

2022-10-20 19:34:30 - Service interruption begins.

2022-10-20 19:37 - Bugfix deployment is completed.

2022-10-20 19:40:35 - Normal service resumes.

2022-10-20 21:10 - We re-attempt issuance for the 188 certificate requests that had been blocked by the new ZLint Key Usage lint since the deployment of the updated version of cert_producer earlier in the day. All 188 are issued successfully, with correctly formed Key Usage extensions.

2022-10-20 22:17:05 - R&D completes implementation of the script mentioned in comment 0 that will identify all certificates that have been misissued due to the unsetBits bug.

2022-10-20 22:18 - We set the script running on our production CA database.

2022-10-21 09:10 - Incident Response team commences review of the first draft of comment 0.

2022-10-21 15:24 - I post comment 0.

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.

Prior to resolving the problem, we did not take steps to stop all certificate issuance from the affected parts of our PKI. Instead, as described in the timeline, we opted to respond rapidly in a manner that would block further misissuance without causing a lengthy service disruption for our customers.

4 & 5. A summary of, and the complete certificate data for, the problematic certificates.

We had hoped to provide full details of the affected certificates in this incident report, but at the time of writing the script mentioned in comment 0 is still running, and so we have not yet finished identifying the problematic certificates. We will provide a summary, and the complete certificate data, as soon as we can.

The script is scanning every publicly trusted certificate and precertificate ever issued by our CA system, finding every instance of the byte sequence 0x040403020580, which we've determined is the only malformed DER Key Usage BITSTRING that will have been produced due to the bug described in this incident report. The runtime is measured in weeks simply because we have issued a huge number of certificates and precertificates. Selectively scanning the issued certificates from Sub-CAs we knew to be affected would have been quicker, but we wanted to do an exhaustive scan that will prove or disprove our beliefs and assumptions about the scope of impact.

Meanwhile, to meet Mozilla's timeliness expectations for CA incident reports, we are providing the other sections of this incident report today.

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

A feature/oddity of our ASN.1 BITSTRING code is that the 'unused bits' octet is managed by the ASN.1 handling code rather than by the DER encoder, but the Comodo R&D engineer responsible for creating the unsetBits function did not realise that that function would need to recalculate the value of this octet.

This problem went undetected for over eight and half years. During that time, our suite of unit tests grew extensively, but no test was added or even conceived that would have detected the problem; no customers, relying parties, or security researchers reported the problem to us; and none of our preissuance linting tools detected the problem, prior to ZLint v3.4.0.

We are aware of two other recent CA incident bugs that cover different Key Usage encoding errors that, like this Sectigo bug, went undetected for significant periods of time.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

Remediation of the CA system bug was planned and completed on the day of discovery, as detailed in the timeline.

(In reply to Ryan Dickson from comment #5)

@rob, sorry for the delayed response.

Yes - any information on linting the unsigned TBSCertificate instead of a fully-formed certificate (or pre-certificate) would be helpful. I intend to spend more time studying linting projects like ZLint in the coming weeks, so at the least, this will be helpful to me personally (and I suspect, others following along here on Bugzilla). Thanks for your offer!

Hi Ryan. I've just filed https://github.com/zmap/zlint/pull/697, which demonstrates our approach to pre-issuance linting by proposing 2 new ZLint API functions that, if adopted by the ZLint project, would make it a lot easier for other CAs to do pre-issuance linting properly using ZLint.

I first implemented this approach in 2017 in the open-source code for the https://crt.sh/linttbscert tool.

Hope this helps!

The script mentioned in comment 0 has finished running. There are 322,161 unique serial numbers in the currently unexpired certificates/precertificates that are affected. The first was issued on 2021-10-11, and the last was issued on 2022-10-20. The latest notAfter date is 2023-11-19.

To work around Bugzilla's maximum attachment size (10MB), I have split the list into four compressed CSV files. Each entry has 3 fields:

  • Serial Number crt.sh URL
  • Certificate crt.sh URL
  • Precertificate crt.sh URL
Product: NSS → CA Program
See Also: → 1800756

The Sectigo WebPKI Incident Response team has considered https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation and has come to the conclusion that we will not revoke the affected certificates mentioned in this bug within the usual 5 day window. As a result, we have opened bug 1800756 to explain how and why we came to this decision.

Ben, since there appear to be no further questions or comments relating to this misissuance incident, we propose that this bug should now be closed.

Flags: needinfo?(bwilson)

I will close this on or about Monday, 28-Nov-2022, unless there are additional questions or issues to be discussed.

Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.