Let's Encrypt: Potential Denial of Service against websites with broad private key reuse
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: jcj, Assigned: bwilson)
Details
(Keywords: sec-other, Whiteboard: [ca-compliance] [uncategorized])
Summary
On 2021.10.14 we discovered a bug in our ACME implementation that affected certificate revocation. Within eight hours of discovering the bug, we had a patch deployed, but we understand this bug affects other ACME-implementing certificate authorities, and possibly non-ACME certificate authorities as well.
We immediately provided a summary of the bug to the CA/B Forum Management list as a first step in responsible disclosure. For completeness and posterity, we are filing this bug, although we consider this to have more of a software security impact via denial of service than a compliance incident.
The Bug (as sent to the CABForum Management list)
In ACME, anyone can order revocation of a certificate by demonstrating control over:
- the private key used for that certificate
- the issuing ACME account key
- all of the FQDNs in the certificate
However, when the specified reason for the revocation order is "keyCompromise", demonstrating control of the private key should have been the only acceptable method. The "keyCompromise" revocation reason triggers a cascading revocation of all other certificates based on the Subject Public Key Info (SPKI) hash. Because of the bug, the revoker actually only proved control of the FQDNs in the certificate. We discovered this bug during an investigation with a subscriber when we revoked their certificates because of the “keyCompromise” reason and they were (rightfully) unable to find the source of the compromise.
This bug could be used maliciously against a deficient ACME implementation by requesting revocation using the erroneous controls related to the “keyCompromise” reason for a certificate that is known to use a shared private key, thus triggering a massive revocation of all certificates associated with that key. An attacker could be a disgruntled employee or an outside attacker who recognized the large number of certificates attached to one private key by searching in Censys or crt.sh by SPKI hash.
Incident Report
1. How your CA first became aware of the problem.
An ACME subscriber revoked a certificate by demonstrating control over all the FQDNs in the certificate, and selected the reason ‘keyCompromise’. As a result our automated systems attempted to revoke all the certificates based on the SPKI hash. The SPKI hash matched ~130,000 certificates and the automated revocation was stressful on our system causing alerts that our SRE team investigated eventually leading to this finding.
A timeline of the actions your CA took in response.
2021-09-13 15:59:28 - Certificate1
was issued to regID1
for fqdn1
2021-10-13 03:33:49 - regID2
was created.
2021-10-13 06:04:38 - Certificate2
was issued to regID2
covering fqdn1
; i.e. that regID had achieved authorization for that FQDN.
2021-10-13 06:05:44 - Certificate1
was revoked by regID2
with reason keyCompromise
.
2021-10-13 06:05:56 - bad-key-revoker
, an automated tool that checks for follow-up work after a revocation for key-compromise, ingested the blocked keyHash and began to revoke all other certificates sharing that keyHash.
2021-10-13 15:30 - after hours of investigating resource problems related to bad-key-revoker
running, we decide to manually revoke the certificates and notify the owner of regID1
2021-10-13 20:33:19 - The owner of regID1
responded to our notification that we would be revoking the certificates associated with regID1
2021-10-14 10:04:39 -The owner of regID1
reported:
"From our first analysis in our [...] logs, only authorized personnel and instances have accessed the private key, which was stored exclusively in [controlled secret storage] as per best practices."
2021-10-14 16:27 - We identified that "we revoked the certificate we were handed because the revoker proved control of the domains within the cert, and did not have to prove control of the key of the certificate."
2021-10-14 17:30 - We confirmed this behavior using test accounts in the staging environment.
2021-10-14 17:30 - We began work on a hotfix.
2021-10-14 18:11 - We identified the relevance of CPS Section 4.9.12. Our staff agreed unanimously that stopping the revoke
endpoint would be a policy violation more severe than the potential denial of service.
2021-10-14 21:39 - We merged the hotfix.
2021-10-14 22:36 - We deployed the hotfix release to staging.
2021-10-14 00:06 - We deployed the hotfix release to production.
Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.
We did not stop certificate issuance during this time.
In a case involving certificates, a summary of the problematic certificates.
N/A
In a case involving TLS server certificates, the complete certificate data for the problematic certificates.
N/A
Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
2020-05-19 - We began running bad-key-revoker
, a daemon to revoke certificates sharing an SPKI hash with one that is revoked with keyCompromise
. This came about in the context of newly high volumes of compromised key reports that required manual processing: e.g. Bug 1639794.
2021-07-19 - CPS 4.0 was published, changing Section 4.9.12 to its current language in CPS 4.1. This change was inspired by Ben Wilson of Mozilla's Comment 6 in the ISRG Root X2 inclusion request, Bug 1701317:
Clearly specify methods to demonstrate private key compromise (MRSP § 6)
Effective with version 2.7.1 of the Mozilla Root Store Policy, CAs are supposed to specify how parties can demonstrate key compromise to them in section 4.9.12 of their CPS.
Should be fixed. ISRG indicated in the April 2021 CA Survey "Section 4.9.12 of our CPS already specifies the methods that parties may use to demonstrate private key compromise." Section 4.9.12 of the CPS references section 4.9.3, which states "Revocation requests may be made by anyone, at any time, via the Certificate Revocation interface of the ACME Protocol defined in RFC 8555 section 7.6. Successful revocation requests with a reason code of keyCompromise will result in the affected key being blocked for future issuance and all currently valid certificates with that key will be revoked." It would be better if that explanation appeared in section 4.9.12 itself and not as a cross-reference. Also, I am not sure that the process provided by ISRG constitutes a demonstration of key compromise.
The MRSP states that:
Section 4.9.12 of a CA's CP/CPS MUST clearly specify the methods that parties may use to demonstrate private key compromise.
Our policy review team interpreted Section 4.9.12 as specifying how a reporter may demonstrate a key compromise, not as a limitation on how ISRG may accept a demonstration of a key compromise.
For example, Section 4.9.2 allows ISRG management to revoke a certificate at its discretion. The policy review team did not read 4.9.12 as a limitation of that discretion. Therefore the code path which allows a certificate to be revoked with reason keyCompromise
despite key compromise not having been demonstrated was not removed as part of the policy review.
List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.
- Fix the immediate bug and deploy to production - DONE 2021-10-14 00:06
- Notify the CA/B Forum management list of our discovery - DONE 2021-10-15 23:00
- Hold an internal post-mortem to identify next steps - DONE 2021-10-25 21:00
Embargo
We have heard back from another certificate authority via the CA/Browser Forum that this should remain embargoed until 6 December 2021 at the earliest; I have set the whiteboard appropriately.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 2•3 years ago
|
||
I'm going to close this bug tomorrow, unless anyone has additional comments or questions.
Assignee | ||
Updated•3 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•9 months ago
|
Description
•