Closed Bug 1921573 Opened 2 months ago Closed 21 days ago

Let's Encrypt: No Meaningful Subject Distinguished Name

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: preston, Assigned: preston)

Details

(Whiteboard: [ca-compliance] [dv-misissuance])

Attachments

(1 file)

On 2024-09-27 at 19:30 UTC, Let’s Encrypt Policy Management Authority (PMA) discovered a conflict between two sections of v5.3 of our combined CP/CPS.

Section 3.1.2 states:

ISRG certificates include a "Subject" field which identifies the subject entity (i.e. organization or FQDN). The subject entity is identified using a distinguished name.

Section 7.1 states that our Subscriber Certificates' Subject Distinguished Name is of the form:

CN=none, or one of the values from the Subject Alternative Name extension

In 2023, Let’s Encrypt changed both our code (Boulder) and our policy (the Section 7.1 quoted above) to allow issuing certificates with no Common Name, as recommended by Section 7.1.2.7.2 of the Baseline Requirements. At that time, we missed Section 3.1.2's conflicting statements about the Subject field.

Upon confirming this conflict, we disabled certificate issuance at 20:19 UTC, published an updated CP/CPS at 20:32 UTC, and re-enabled certificate issuance at 20:38 UTC. We are currently gathering data for affected serials and will revoke all unexpired affected certificates within 5 days. We will provide a full incident report on or before Friday, 2024-10-04.

Assignee: nobody → preston
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [dv-misissuance]

We identified 133,613 unexpired affected certificates; specifically, those which were issued prior to 2024-09-27 20:32 UTC with an empty Subject Common Name field. We revoked these certificates as of 2024-10-01 21:52 UTC.

Incident Report

Summary

On 2024-09-27 at 19:30 UTC, Let’s Encrypt Policy Management Authority (PMA) discovered that two sections of our CP/CPS had conflicting descriptions of the contents of our Subscriber Certificates’ Subject field.

Section 7.1 “Certificate profile” states that our Subscriber Certificates' Subject Distinguished Name is of the form:

CN=none, or one of the values from the Subject Alternative Name extension

However, Section 3.1.2 states:

ISRG certificates include a "Subject" field which identifies the subject entity (i.e. organization or FQDN). The subject entity is identified using a distinguished name.

As detailed below, Let’s Encrypt does issue some Subscriber Certificates with an empty Common Name field, and therefore a wholly empty Subject. Although such certificates are in compliance with Section 7.1, they do not comply with Section 3.1.2. Thus they had to be revoked within 5 days with reason code 4 (superseded) per the Baseline Requirements, Section 4.9.1.1(12).

Impact

This incident impacted certificates issued between 2023-11-29 and 2024-09-27 with an empty Subject. This totaled 443,453 certificates across the whole incident period, of which 133,613 were unexpired at the time we performed revocation.

We briefly halted issuance while we brought our CP/CPS in line with our intended practices, then resumed issuance after an updated CP/CPS had been published.

Timeline

All times are UTC.

2023-03-04

  • 18:17 CPS github working copy profiles updated to allow empty Common Names, but Section 3.1.2 not changed

2023-03-09

  • 18:31: Feature flag added to Boulder which allows it to omit the Common Name in specific circumstances
  • 18:57: CPS v4.5 approved by Let’s Encrypt Policy Management Authority (PMA)

2023-03-10

2023-11-06

2023-11-29

  • 19:09: Aforementioned feature flag is set in our Production environment (INCIDENT BEGINS)
  • 19:23: The first affected certificate is issued

2024-09-10

  • 18:05: Boulder updated to issue certificates without a Common Name in more circumstances

2024-09-26

  • 18:09: Above change deployed to production

2024-09-27

  • 19:30: Potential incident detected during quarterly CP/CPS review by PMA
  • 19:43: Incident declared
  • 20:16: Conflicting language in Section 3.1.2 removed from CP/CPS github working copy
  • 20:19: Issuance halted
  • 20:21: CP/CPS v5.4 approved by PMA
  • 20:32: CP/CPS v5.4 published (INCIDENT ENDS)
  • 20:38: Issuance re-enabled
  • 21:38: Began collecting list of affected serials
  • 22:28: Preliminary incident report posted

2024-09-30

  • 02:21: Preliminary set of potentially-affected serials identified
  • 02:53: Began serving ARI immediate-renewal recommendations for all identified serials
  • 17:45: Determined that preliminary collection missed some affected serials issued after the 2024-09-26 deploy
  • 21:27: Secondary set of potentially-affected serials identified

2024-10-01

  • 03:54: Began serving ARI immediate-renewal recommendations for additional serials
  • 19:44: Confirmed total set of 133613 affected unexpired serials
  • 21:49: Revocation begins
  • 21:52: Revocation complete

Root Cause Analysis

Historically, Let’s Encrypt has always included a Subject Common Name in our Subscriber certificates. When Ballot SC-062v2 in 2023 updated the BRs Section 7.1 Certificate profile to mark the Subject Common Name field as “NOT RECOMMENDED”, we decided to phase out Common Names from the certificates we issue. Let’s Encrypt first changed Section 7.1 of our own CPS to allow issuing certificates with no Common Name. We then added a feature flag to our issuance software (Boulder) to allow an empty CN in specific circumstances. When we enabled that flag on 2023-11-29, the incident began.

Specifically, Boulder is willing to issue certificates that do not contain a Subject Common Name when there is no Subject Alternative Name short enough to fit within the Common Name’s 64-character limit. When it does so, the resulting certificate has an empty Subject, which is in violation of Section 3.1.2.

On the surface, this incident may appear similar to a previous incident in which our actual behavior diverged from our CPS. However, in that case the divergence was because a CPS review was not conducted at the time that the CA’s behavior changed. In this instance, we ensured our CP/CPS was updated ahead of making corresponding changes to CA behavior.

Rather, this incident happened because, when we did update the profiles in Section 7.1 of our CPS, we did not also update Section 3.1.2 at the same time. We missed Section 3.1.2 because we believed that all restrictions on our certificate contents were confined to Section 7.1, which exists for that express purpose. Having two separate sections impose requirements on the same certificate contents is somewhat surprising and undesirable. The root cause here is that separate sections of our CP/CPS placed separate requirements on certificate content, and keeping separate sections in sync is inherently hard.

Lessons Learned

What went well

  • Our existing PMA document review caught the discrepancy.
  • Our incident response procedures kicked in quickly and effectively after PMA discovered the issue.
  • Our PMA procedures enabled us to quickly review and publish an updated CP/CPS.
  • Boulder's “incident table” feature allowed us to easily serve ACME Renewal Information (ARI) “renew immediately” suggested renewal windows for all affected certificates, allowing ARI-compatible clients to renew certificates prior to revocation.

What didn't go well

  • Our PMA document review did not detect the issue in the first review after the conflict was introduced, increasing the length of the incident period.

Where we got lucky

Action Items

Action Item Kind Due Date
Conduct CP/CPS review specifically looking for any statements about Certificate contents outside of Section 7 Detect 2024-10-18
Publish new CP/CPS, consolidating statements regarding certificate contents into Section 7 Prevent 2024-11-01

Appendix

Details of affected certificates

Attached is a zstd-compressed text file containing 133,613 URLs of the form https://crt.sh/?sha256=<fingerprint>. These URLs provide the complete details of all affected unexpired final certificates.

Thank you for this report. A few questions:

  1. In the Timeline Section, you noted that: “17:45: Determined that preliminary collection missed some affected serials issued after the 2024-09-26 deploy”. To better understand why certain certificates were missed, can you provide more details? There could be value for others in understanding how this occurred.
  2. Can you share information, if available, related to ARI's role in supporting renewal of affected certificates? For example, are you able to approximate the number of subscriber certificates affected by this incident that were using an ACME client that supports ARI, and the number of those subscriber certificates you observed renew in the recommended renewal windows (i.e., demonstrating ARI worked as intended)?
  3. Given this experience, does Let's Encrypt have any lessons learned or perceived best practices related to the use of ARI, or how the community can best support broader adoption such that it can be even more impactful in the future?

Hi Chris, thanks for your questions.

  1. In the Timeline Section, you noted that: “17:45: Determined that preliminary collection missed some affected serials issued after the 2024-09-26 deploy”. To better understand why certain certificates were missed, can you provide more details? There could be value for others in understanding how this occurred.

For background, when we first began issuing certificates with no Common Name, we would only do so when all of the requested Subject Alternative Names were too long to fit in the CN. The 2024-09-26 deploy added another circumstance: if the CSR received during the ACME Finalize request contained a too-long CN, we would move that name to the SANs and leave the CN blank, rather than promoting a different shorter name into the CN.

When gathering the affected serials for this incident, our initial strategy searched Boulder's issuedNames table for certificates where all of the names were longer than 64 characters. However, this could not account for certificates where only the requested CN had been too long. As a result, we missed a number of certificates issued after the 2024-09-26 Boulder deploy.

As a secondary strategy, we searched through audit logs emitted by Boulder for lines that indicated successful issuance, which include the full DER-encoded precertificate. With this methodology, we checked for the missing Common Name directly, and we were able to catch the missing set of certificates.

Fun fact: if you log your DER certificate contents in hexadecimal (as opposed to, say, base64), certain pieces such as field tags and algorithm OIDs have predictable string representations that can be very efficiently searched for without having to parse the whole DER blob. This fact helped us speed up our search significantly.

  1. Can you share information, if available, related to ARI's role in supporting renewal of affected certificates? For example, are you able to approximate the number of subscriber certificates affected by this incident that were using an ACME client that supports ARI, and the number of those subscriber certificates you observed renew in the recommended renewal windows (i.e., demonstrating ARI worked as intended)?

At this time, we don't have a good estimate for the total number of affected subscriber certificates that were using an ARI-supporting ACME client. Since ARI works best when clients have ample time to check it and act accordingly, our tight timeline between serving ARI immediate-renewal suggestions and carrying out revocation limited the number of clients that were able to poll ARI in time to renew. This makes it difficult to differentiate between clients that failed to renew because they don't support ARI and those that failed to renew because they simply didn't check ARI soon enough.

Based on a rough analysis of our issuance logs prior to revocation, we can at least estimate that there were around 7,500 renewals attributable to ARI. This number comes from a considerable spike in issuance we observed for certificates without a Subject Common Name on 2024-09-30, about 15 hours after we began serving immediate-renewal suggestions via ARI. However, only a negligible portion of that issuance made use of the replaces field on the ACME order object, defined since draft-ietf-acme-ari-02, which would have made it possible to directly correlate renewals to the use of ARI.

  1. Given this experience, does Let's Encrypt have any lessons learned or perceived best practices related to the use of ARI, or how the community can best support broader adoption such that it can be even more impactful in the future?

In our view, the biggest takeaway from this incident is that ARI is most effective when used early in the revocation timeline. Our suggestion to other CAs implementing ARI would be to 1. have a mechanism for serving immediate-renewal suggestions for a set of certificates, and 2. use it aggressively.

When maximizing the number of clients that can renew before revocation, it's better to cause unnecessary renewals because of false-positives than to delay renewal for legitimately-affected certificates. For that reason, we chose to populate our ARI suggestions as soon as we had results from our first search strategy, rather than waiting for corroboration from the second. We believe this to be the optimal strategy for minimizing subscriber disruption when faced with a revocation deadline.

Thank you for the responses in Comment 5.

We appreciate the added background and “Fun fact” provided for why certificates were missed during the preliminary collection. Equally appreciate the suggestions for other’s implementing ARI.

Three additional questions:

  1. Can you share the polling interval (Retry-After header) configured by Let’s Encrypt?
  2. Do you have any evidence to suggest that interval was not reliably upheld by ACME clients? If the interval was not reliably upheld by ACME clients, do you have any recommendations for how this might be improved?
  3. Do you intend to perform any additional regularly-scheduled testing or simulations to observe ARI-response, and possibly study potential improvement opportunities?

In response to Comment 6:

  1. Can you share the polling interval (Retry-After header) configured by Let’s Encrypt?

Boulder currently hard-codes this value to 6 hours.

  1. Do you have any evidence to suggest that interval was not reliably upheld by ACME clients? If the interval was not reliably upheld by ACME clients, do you have any recommendations for how this might be improved?

Yes and no. We have, in fact, observed a significant portion of ARI clients polling our API at a rate slower than the suggested 6 hours. We estimate the proportion of clients doing so to be as high as 50%, and it appears that the majority of those clients have a polling rate of around 24 hours. That said, those clients are doing what we told them to do, according to the semantics of the Retry-After header, which only suggest a minimum time to wait between requests. Clients may still use a longer interval if they choose.

We're investigating the value of configuring a lower duration in the Retry-After header going forward, but it may not provide much benefit to those 50% of users that are already polling more slowly than our Retry-After header suggests.

  1. Do you intend to perform any additional regularly-scheduled testing or simulations to observe ARI-response, and possibly study potential improvement opportunities?

As ARI is implemented in more ACME clients, we'll be keeping an eye on the progression of its adoption. We're excited that the draft specification entered working group Last Call at the IETF this month, and we look forward to it becoming a full RFC soon. However, we have no specific plans at this time to run regularly-scheduled testing or simulations.

At 17:30 on 2024-10-16, we performed a review of our CP/CPS and identified a list of statements outside of Section 7 that relate to Certificate contents. This completes the first of our two remediation items.

Action Item Status Due Date
Conduct CP/CPS review specifically looking for any statements about Certificate contents outside of Section 7 Complete 2024-10-18
Publish new CP/CPS, consolidating statements regarding certificate contents into Section 7 Pending 2024-11-01

We are making progress on a new version of our CP/CPS that addresses the statements about certificate contents we identified in our review. We are on track to publish the new version on or before 2024-11-01.

We have published v5.5 of our CP/CPS, completing our second and final remediation item for this incident.

Action Item Status Due Date
Conduct CP/CPS review specifically looking for any statements about Certificate contents outside of Section 7 Complete 2024-10-18
Publish new CP/CPS, consolidating statements regarding certificate contents into Section 7 Complete 2024-11-01

We will continue to monitor this bug for questions.

Flags: needinfo?(bwilson)

Unless there are additional issues or questions to address, I will close this on or about Wed. 6-Nov-2024.

Status: ASSIGNED → RESOLVED
Closed: 21 days ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: