Closed Bug 1742704 Opened 3 years ago Closed 3 years ago

Let's Encrypt: Potential Denial of Service against websites with broad private key reuse

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jcj, Assigned: bwilson)

Details

(Keywords: sec-other, Whiteboard: [ca-compliance] [uncategorized])

Summary

On 2021.10.14 we discovered a bug in our ACME implementation that affected certificate revocation. Within eight hours of discovering the bug, we had a patch deployed, but we understand this bug affects other ACME-implementing certificate authorities, and possibly non-ACME certificate authorities as well.

We immediately provided a summary of the bug to the CA/B Forum Management list as a first step in responsible disclosure. For completeness and posterity, we are filing this bug, although we consider this to have more of a software security impact via denial of service than a compliance incident.

The Bug (as sent to the CABForum Management list)

In ACME, anyone can order revocation of a certificate by demonstrating control over:

  1. the private key used for that certificate
  2. the issuing ACME account key
  3. all of the FQDNs in the certificate

However, when the specified reason for the revocation order is "keyCompromise", demonstrating control of the private key should have been the only acceptable method. The "keyCompromise" revocation reason triggers a cascading revocation of all other certificates based on the Subject Public Key Info (SPKI) hash. Because of the bug, the revoker actually only proved control of the FQDNs in the certificate. We discovered this bug during an investigation with a subscriber when we revoked their certificates because of the “keyCompromise” reason and they were (rightfully) unable to find the source of the compromise.

This bug could be used maliciously against a deficient ACME implementation by requesting revocation using the erroneous controls related to the “keyCompromise” reason for a certificate that is known to use a shared private key, thus triggering a massive revocation of all certificates associated with that key. An attacker could be a disgruntled employee or an outside attacker who recognized the large number of certificates attached to one private key by searching in Censys or crt.sh by SPKI hash.

Incident Report

1. How your CA first became aware of the problem.

An ACME subscriber revoked a certificate by demonstrating control over all the FQDNs in the certificate, and selected the reason ‘keyCompromise’. As a result our automated systems attempted to revoke all the certificates based on the SPKI hash. The SPKI hash matched ~130,000 certificates and the automated revocation was stressful on our system causing alerts that our SRE team investigated eventually leading to this finding.

A timeline of the actions your CA took in response.

2021-09-13 15:59:28 - Certificate1 was issued to regID1 for fqdn1
2021-10-13 03:33:49 - regID2 was created.
2021-10-13 06:04:38 - Certificate2 was issued to regID2 covering fqdn1; i.e. that regID had achieved authorization for that FQDN.
2021-10-13 06:05:44 - Certificate1 was revoked by regID2 with reason keyCompromise.
2021-10-13 06:05:56 - bad-key-revoker, an automated tool that checks for follow-up work after a revocation for key-compromise, ingested the blocked keyHash and began to revoke all other certificates sharing that keyHash.
2021-10-13 15:30 - after hours of investigating resource problems related to bad-key-revoker running, we decide to manually revoke the certificates and notify the owner of regID1
2021-10-13 20:33:19 - The owner of regID1 responded to our notification that we would be revoking the certificates associated with regID1
2021-10-14 10:04:39 -The owner of regID1 reported:

"From our first analysis in our [...] logs, only authorized personnel and instances have accessed the private key, which was stored exclusively in [controlled secret storage] as per best practices."

2021-10-14 16:27 - We identified that "we revoked the certificate we were handed because the revoker proved control of the domains within the cert, and did not have to prove control of the key of the certificate."
2021-10-14 17:30 - We confirmed this behavior using test accounts in the staging environment.
2021-10-14 17:30 - We began work on a hotfix.
2021-10-14 18:11 - We identified the relevance of CPS Section 4.9.12. Our staff agreed unanimously that stopping the revoke endpoint would be a policy violation more severe than the potential denial of service.
2021-10-14 21:39 - We merged the hotfix.
2021-10-14 22:36 - We deployed the hotfix release to staging.
2021-10-14 00:06 - We deployed the hotfix release to production.

Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.

We did not stop certificate issuance during this time.

In a case involving certificates, a summary of the problematic certificates.

N/A

In a case involving TLS server certificates, the complete certificate data for the problematic certificates.

N/A

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

2020-05-19 - We began running bad-key-revoker, a daemon to revoke certificates sharing an SPKI hash with one that is revoked with keyCompromise. This came about in the context of newly high volumes of compromised key reports that required manual processing: e.g. Bug 1639794.

2021-07-19 - CPS 4.0 was published, changing Section 4.9.12 to its current language in CPS 4.1. This change was inspired by Ben Wilson of Mozilla's Comment 6 in the ISRG Root X2 inclusion request, Bug 1701317:

Clearly specify methods to demonstrate private key compromise (MRSP § 6)
Effective with version 2.7.1 of the Mozilla Root Store Policy, CAs are supposed to specify how parties can demonstrate key compromise to them in section 4.9.12 of their CPS.
Should be fixed. ISRG indicated in the April 2021 CA Survey "Section 4.9.12 of our CPS already specifies the methods that parties may use to demonstrate private key compromise." Section 4.9.12 of the CPS references section 4.9.3, which states "Revocation requests may be made by anyone, at any time, via the Certificate Revocation interface of the ACME Protocol defined in RFC 8555 section 7.6. Successful revocation requests with a reason code of keyCompromise will result in the affected key being blocked for future issuance and all currently valid certificates with that key will be revoked." It would be better if that explanation appeared in section 4.9.12 itself and not as a cross-reference. Also, I am not sure that the process provided by ISRG constitutes a demonstration of key compromise.

The MRSP states that:

Section 4.9.12 of a CA's CP/CPS MUST clearly specify the methods that parties may use to demonstrate private key compromise.

Our policy review team interpreted Section 4.9.12 as specifying how a reporter may demonstrate a key compromise, not as a limitation on how ISRG may accept a demonstration of a key compromise.

For example, Section 4.9.2 allows ISRG management to revoke a certificate at its discretion. The policy review team did not read 4.9.12 as a limitation of that discretion. Therefore the code path which allows a certificate to be revoked with reason keyCompromise despite key compromise not having been demonstrated was not removed as part of the policy review.

List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

  • Fix the immediate bug and deploy to production - DONE 2021-10-14 00:06
  • Notify the CA/B Forum management list of our discovery - DONE 2021-10-15 23:00
  • Hold an internal post-mortem to identify next steps - DONE 2021-10-25 21:00

Embargo

We have heard back from another certificate authority via the CA/Browser Forum that this should remain embargoed until 6 December 2021 at the earliest; I have set the whiteboard appropriately.

Keywords: sec-other
Whiteboard: [Embargo until no earlier than 6 December 2021] → [Embargo until until 12 December 2021]

This can be made publicly viewable now.

Status: NEW → ASSIGNED
Flags: needinfo?(dveditz)
Group: crypto-core-security
Flags: needinfo?(dveditz)
Whiteboard: [Embargo until until 12 December 2021]

I'm going to close this bug tomorrow, unless anyone has additional comments or questions.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] [uncategorized]
Summary: Potential Denial of Service against websites with broad private key reuse → Let's Encrypt: Potential Denial of Service against websites with broad private key reuse
You need to log in before you can comment on or make changes to this bug.