Closed Bug 1752452 Opened 2 years ago Closed 2 years ago

Certainly: TLS Using ALPN TLS Version and OID

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wthayer, Assigned: wthayer)

Details

(Whiteboard: [ca-compliance] [dv-misissuance])

Attachments

(18 files)

9.44 MB, application/octet-stream
Details
9.45 MB, application/octet-stream
Details
9.44 MB, application/octet-stream
Details
9.44 MB, application/octet-stream
Details
9.45 MB, application/octet-stream
Details
9.44 MB, application/octet-stream
Details
9.44 MB, application/octet-stream
Details
9.45 MB, application/octet-stream
Details
9.44 MB, application/octet-stream
Details
9.46 MB, application/octet-stream
Details
9.45 MB, application/octet-stream
Details
9.46 MB, application/octet-stream
Details
9.45 MB, application/octet-stream
Details
9.26 MB, application/octet-stream
Details
9.44 MB, application/octet-stream
Details
9.44 MB, application/octet-stream
Details
9.45 MB, application/octet-stream
Details
7.46 MB, application/octet-stream
Details

On 25-January 2022, Certainly was notified that Let’s Encrypt had just filed bug #1751984. Certainly personnel reviewed the preliminary report and determined that Certainly was vulnerable to the same issue. Certainly disabled validations using the impacted TLS-APLN-01 method and revoked all certificates whose validation relied on this method. The preliminary timeline is:

  • 1/25 19:40 UTC Let’s Encrypt publishes https://bugzilla.mozilla.org/show_bug.cgi?id=1751984
  • 1/25 20:51 UTC Certainly declares an incident after assessing that we are likely also affected
  • 1/25 22:30 UTC Change deployed to disable TLS-ALPN-01 validations
  • 1/26 20:30 UTC All affected certificates issued to this point are revoked
  • 1/26 21:23 to 21:40 UTC 8 additional certificates issued that relied on prior TLS-ALPN-01 validations
  • 1/26 21:40 UTC 8 additional certificates revoked

A full incident report is being prepared.

Assignee: bwilson → wthayer
Status: NEW → ASSIGNED
Whiteboard: [ca-compliance]

Please explain the below questions

  1. Why Fastly failed to revoke the problematic certificates within 24 hours(which should be before 1/26 20:51)?
  2. Disclosure a list of problematic certificates with crt.sh entrance.
Flags: needinfo?(wthayer)
Attached file part1.tar.gz
Attached file part2.tar.gz
Attached file part3.tar.gz
Attached file part4.tar.gz
Attached file part5.tar.gz
Attached file part6.tar.gz
Attached file part7.tar.gz
Attached file part8.tar.gz
Attached file part9.tar.gz
Attached file part10.tar.gz
Attached file part11.tar.gz
Attached file part12.tar.gz
Attached file part13.tar.gz
Attached file part14.tar.gz
Attached file part15.tar.gz
Attached file part16.tar.gz
Attached file part17.tar.gz
Attached file part18.tar.gz

Incident Report

1. How your CA first became aware of the problem.

Certainly continuously monitors Let’s Encrypt status notifications and Mozilla bugs filed in the NSS/CA Certificate Compliance component. We received an email when Let’s Encrypt created https://bugzilla.mozilla.org/show_bug.cgi?id=1751984

2. A timeline of the actions your CA took in response.

1/25 19:40 UTC Let’s Encrypt publishes https://bugzilla.mozilla.org/show_bug.cgi?id=1751984.
1/25 20:51 UTC Certainly declares an incident after assessing that we are likely also affected.
1/25 22:30 UTC Change deployed within Boulder to disable new TLS-ALPN-01 validations. Issuance testing disabled.
1/26 19:00 UTC Boulder release 2022-01-18a containing TLS-ALPN-01 bug fixes is deployed.
1/26 20:30 UTC All affected certificates are revoked.
1/26 21:23 UTC Issuance testing re-enabled.
1/26 21:23 to 21:26 UTC 8 additional certificates issued that relied on prior TLS-ALPN-01 validations.
1/26 21:26 UTC Issuance testing disabled again.
1/26 21:40 UTC Existing TLS-ALPN-01 authorizations revoked and 8 additional certificates revoked.
1/26 21:41 UTC Issuance testing re-enabled.
1/27 19:51 UTC TLS-ALPN-01 validation method re-enabled.
1/28 00:08 UTC Preliminary incident report posted to Bugzilla.
2/2 20:30 UTC Incident remediation plan discussed and finalized during weekly compliance review meeting.

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.

The performance of domain validations using the TLS-ALPN-01 method was halted on 1/25 at 22:30 UTC. After revocations of certificates were completed, but prior to revocation of pre-existing authorizations, 8 new certificates were issued using pre-existing authorizations between 1/26 21:23 and 21:26 UTC. Pre-existing authorizations using this method were revoked on 1/26 at 21:40 UTC and the 8 new certificates were then also revoked.

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

Certainly identified a total of 337,621 unexpired certificates that relied on TLS-ALPN-01 validations, issued over the prior 90 days. Since Boulder does not log the version of the TLS connection that was used during the validation, we cannot prove that all the validations were compliant.
All of these certificates were issued in the course of ongoing system testing that was configured to perform validations using the affected TLS-ALPN-01 method. Certainly is not currently issuing certificates to 3rd party Applicants.

5. In a case involving certificates, the complete certificate data for the problematic certificates.

Certainly is not currently able to log to production CT logs because our certificates are not yet trusted by any browser. We are in the process of configuring our system to log to the special purpose Google Submariner log that recently accepted our roots, but the affected certificates have not been logged there. In lieu of providing links to logged certificates, files containing the PEM encoded certificates are attached to this bug.

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

Certainly has implemented the open-source Boulder CA. This incident was caused by bugs in the Boulder code as described in https://bugzilla.mozilla.org/show_bug.cgi?id=1751984 Please refer to that bug for a truly excellent, detailed root cause analysis.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

Certainly made a conscious decision to implement the Boulder CA system. It is open-source, developed by a highly experienced team, and heavily scrutinized. We fundamentally believe that the choice of Boulder enhances the trustworthiness of Certainly, despite this incident.

Certainly monitors policy changes to ensure that we remain fully compliant with the Mozilla root store policy. We will continue to closely review and test changes to Boulder that may affect our compliance with Mozilla policy. For each new Boulder release, Certainly takes reasonable steps to review the impact of changes to code and configurations. We also review policy updates and change requests related to Boulder that may help us to proactively identify compliance concerns.

We disabled the TLS-ALPN-01 validation method roughly 100 minutes after declaring an incident, and didn’t revoke existing authorizations until about 25 hours after declaring an incident. We have now documented a procedure for revoking existing validations and are planning to document a faster procedure that will allow us to halt issuance in a matter of minutes.

We failed to revoke existing TLS-ALPN-01 authorizations to fully prevent issuance using untrusted authorizations at the point when we disabled TLS-ALPN-01 method. We had disabled our issuance testing tooling when we disabled the TLS-ALPN-01 validation method. After deploying the Boulder patch that fixed the bugs that could have allowed misissuance to occur, we re-enabled the issuance testing tooling, prior to performing the revocation of unexpired authorizations. That resulted in 8 new certificates being issued that relied on authorizations performed prior to the patch deployment during the time we revoked the unexpired authorizations. We have documented the procedure for revoking existing authorizations when disabling a validation method to prevent this issue from occurring in the future.

(In reply to Charles Wang from comment #1)

Please explain the below questions

  1. Why Fastly failed to revoke the problematic certificates within 24 hours(which should be before 1/26 20:51)?

As the timeline shows, all of the problematic certificates in existence prior to 1/26 20:51 UTC were revoked at 1/26 20:30 UTC, within 24 hours of declaring an incident. The additional 8 certificates could only be revoked after they were issued. I hope that the full incident report helps to explain what happened.

  1. Disclosure a list of problematic certificates with crt.sh entrance.

They have all been attached to this bug, and the full incident report explains why crt.sh links were not provided.

Flags: needinfo?(wthayer)

Certainly has completed our remediation of this incident. We are monitoring this bug and will respond to any new comments.

I will close this on next Wed. 16-Feb-2022, unless there are additional questions or concerns.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [dv-misissuance]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: