Closed Bug 1795483 Opened 2 years ago Closed 2 years ago

Let's Encrypt: Delayed revocation for removed gTLD

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: james, Assigned: james)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

Let's Encrypt has identified that we did not revoke three certificates within the 5 day timeline given by BRs Section 4.9.1.1. These certificates' subjects included names whose use became no longer permitted when their gTLD was later removed from the root zone. The affected certificates were:

https://crt.sh/?sha256=92bf51da84f363be1cd992e4fa3ebc6e16b4b1fdb9c9400f833cdd874f54ad17
https://crt.sh/?sha256=22886a189f72f040a2ec9a9bb500e9bdf013469b9f8660fd33b25b6570852750
https://crt.sh/?sha256=76ad2a24d6fc3c97db51a70c537701e7c91af12a579b50b228f845df664ce19e

We have now revoked these certificates. We expect to post a complete incident report by Wednesday, October 19, 2022.

Assignee: bwilson → james
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [delayed-revocation-leaf]

Summary

Let's Encrypt monitors the contents of the ICANN gTLD list and is notified when gTLDs are removed from the list. The .cancerresearch gTLD was removed on October 6, 2022, but Let's Encrypt did not revoke three unexpired certificates under that gTLD within 5 days. This appears to be a violation of BRs Section 4.9.1.1, which requires that "The CA...MUST revoke a Certificate within 5 days if...The CA is made aware of any circumstance indicating that use of a Fully-Qualified Domain Name or IP address in the Certificate is no longer legally permitted."

Incident Report

How we first became aware of the problem.

During investigation into an issuance event under the recently-removed .bugatti gTLD, Let's Encrypt SRE queried for issuances under other recently-removed gTLDs and discovered that there were still-valid certificates under the .cancerresearch gTLD, which had been removed more than 5 days previously.

Timeline of incident and actions taken in response.

All times are UTC.

2022-09-22

2022-10-05

  • The .cancerresearch gTLD is removed from the root zone.

2022-10-06

  • 19:03 Automation notifies SRE that the .cancerresearch gTLD was removed.

2022-10-07

  • The .bugatti gTLD is removed from the root zone.

2022-10-10

  • 19:00 Automation notifies SRE that the .bugatti gTLD was removed.

2022-10-11

  • 19:03 The 5-day window for revoking .cancerresearch certificates closes. (incident begins)

2022-10-12

2022-10-13

  • 19:00 SRE adds .bugatti and .cancerresearch to the list of "high-risk domains" to prevent future issuance.

2022-10-14

  • 16:22 SRE alerted due to recent issuance for a domain under recently removed gTLD .bugatti.
  • 16:23 On-call SRE acknowledges the alert and begins investigating.
  • 16:58 SRE determines that the issuance was based on pre-existing validation documents, and marks the certificate for revocation. This was a near-miss, as the .bugatti certificate was discovered within the five-day window for revocation under BRs 4.9.1.1.
  • 18:29 SRE identifies similar unexpired certificates under gTLD .cancerresearch.
  • 19:31 All affected certificates are revoked. (incident ends)
  • 20:24 SRE confirms no additional affected certificates under any other recently removed gTLDs.

Whether we have stopped the process giving rise to the problem or incident.

We have stopped issuance for names under the .cancerresearch gTLD. We have revoked the affected certificates.

Summary of the affected certificates.

Three certificates for names under the .cancerresearch gTLD were still valid (not revoked and not expired) at 2022-10-11 19:03, 5 days after Let's Encrypt was made aware that the .cancerresearch gTLD was no longer listed in ICANN's registry and thus the names could not be under the subscriber's control.

Complete certificate data for the affected certificates.

https://crt.sh/?sha256=92bf51da84f363be1cd992e4fa3ebc6e16b4b1fdb9c9400f833cdd874f54ad17
https://crt.sh/?sha256=22886a189f72f040a2ec9a9bb500e9bdf013469b9f8660fd33b25b6570852750
https://crt.sh/?sha256=76ad2a24d6fc3c97db51a70c537701e7c91af12a579b50b228f845df664ce19e

Explanation of how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

When a gTLD is removed from ICANN's list of gTLDs, an automated process produces an alert notifying Let's Encrypt SRE of this fact. The documented procedure for this alert involves adding the removed gTLD to our list of top-level domains for which issuance is forbidden, in order to prevent any future issuance under that gTLD based on cached validation documents.

In addition, Let's Encrypt has an automated process which (among other things) compares all issuance in the preceding 72 hours against the list of forbidden top-level domains, and fires an alert if any such issuance is detected. This alert almost never fires; in particular, it had never fired for this reason in the preceding year. This led the team to generally have the understanding that, by the time a gTLD expires, it's been inactive for a while and there are no extant certificates to be concerned about.

This incident shows that this is not the case. Increasing automation means that ACME clients are now sometimes left running indefinitely, as evidenced by the issuance for a .bugatti name (using cached validation documents) after the gTLD had been removed from the root zone.

Fortunately, our automation caught the issuance for the .bugatti name, and we were able to revoke that certificate in a timely fashion. Unfortunately, our automation did not make us aware of the extant .cancerresearch certificates. This is because the job which checks for issuance against the high-risk domains list runs every day and scans the preceding 72 hours' worth of issuance. Even if the .cancerresearch gTLD had been added to the high-risk domains list immediately, the last issuance under that name was more than 72 hours earlier.

The root cause here is that our automation was not designed to detect previously issued certificates for recently removed gTLDs, and our human processes did not document the need to search for such certificates.

List of steps we are taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

We have already revoked the certificates in question. We will update our documentation for how to handle the gTLD-removal alert to include searching for and administratively revoking any unexpired certificates under that gTLD. We will also raise the priority of this alert to ensure that it is handled quickly, reducing the window in which certificates for a recently removed gTLD can be issued or remain valid.

It is important to note that, due to the phrasing of the Baseline Requirements Section 4.9.1.1, one possible remediation item would be to halt the automation which makes us aware of gTLD removals (emphasis added):

The CA [...] MUST revoke a Certificate within 5 days if [...] The CA is made aware of any circumstance indicating that use of a Fully-Qualified Domain Name or IP address in the Certificate is no longer legally permitted (e.g. a court or arbitrator has revoked a Domain Name Registrant's right to use the Domain Name, a relevant licensing or services agreement between the Domain Name Registrant and the Applicant has terminated, or the Domain Name Registrant has failed to renew the Domain Name);

That would obviously be a net negative for the Web PKI, so we are not taking that route. However, we think it is valuable to open a conversation about the exact intent and meaning of this requirement, whether the removal of a gTLD from the root zone qualifies as a revocation event under this clause, and how the phrasing can be improved to avoid placing additional restrictions on CAs which choose to proactively monitor for such events.

Remediation Status Date
Revoke remaining valid certificates under removed gTLDs Complete 2022-10-14
Extend the gTLD removal alert runbook to include revoking any remaining valid certificates Not started 2022-11-23
Improve the gTLD removal alert runbook to highlight required response time Not started 2022-11-23

We're continuing to work on remediation items for this incident, and are monitoring this bug for any questions or comments.

Same status: We're continuing to work on remediation items for this incident, and are monitoring this bug for any questions or comments. We'd be curious to hear other Web PKI participants' thoughts about the intent and meaning of BRs Section 4.9.1.1.

Same status: We're continuing to work on remediation items for this incident, and are monitoring this bug for any questions or comments. We do expect to complete our remediation items shortly.

Product: NSS → CA Program

We're continuing to work on remediation items for this incident, and are monitoring this bug for any questions or comments. We still expect to complete our remediation items shortly, by our target date of 2022-11-23.

We finished updating our gTLD removal alert runbook today, 2022-11-23, to include revoking any remaining valid certificates and highlight required response time.

This concludes our committed remediation items. We do not intend to supply any further updates on this issue, and ask that it be closed. We will monitor it for questions and comments.

I will take a look at closing this on Wed. 30-Nov-2022, unless there are issues or questions to be discussed.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Whiteboard: [ca-compliance] [delayed-revocation-leaf] → [ca-compliance] [leaf-revocation-delay]
You need to log in before you can comment on or make changes to this bug.