Closed Bug 1919162 Opened 2 months ago Closed 7 days ago

IdenTrust: TLS Certificates with outdated certificate profile

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: roots, Assigned: roots, NeedInfo)

Details

(Whiteboard: [ca-compliance] [ov-misissuance])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36

Steps to reproduce:

Preliminary Incident Report

Summary

On 2024-09-10, we discovered 7 active TLS subscriber certificates that were issued with an outdated certificate profile no longer supported by the TLS BRs since 2024-09-15:

  • Inclusion of a userNotice in the Certificate Policies extension
  • Attributes in the Subject field not arranged in the relative predefined order.

The 7 certificates were duly revoked on 2024-09-13

We are still gathering the details for the root cause analysis and corrective measures to avoid recurrence. We will provide a complete incident report by 2024-09-27

Assignee: nobody → roots
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [ov-misissuance]

Complete Incident Report

Summary

A customer recently reported difficulties in generating a TLS certificate. Upon investigation, we identified the root cause: the certificate profile had not been updated to align with the latest TLS Baseline Requirements (hereinafter: BR) that came into effect in September 2023. This oversight led to two primary issues:
1. Outdated Policy Extension: The certificate profile incorrectly included a userNotice in the certificate policy extension, which is no longer compliant.
2. Incorrect Subject Field Attributes: The attributes within the Subject field were not arranged in the predefined order as mandated by the current BR.

Impact

We identified 7 active TLS subscriber certificates issued between 2023-09-26 and 2024-4-12 using an outdated certificate profile.

Timeline

2023-09-14: Updated TLS certificate profiles to be compliant with BR v2.0.0
2024-06-01: Updated validation process with certificate linting
2024-08-21: Customer reported issue retrieving TLS certificate due to linting.
2024-08-30: Corrected the certificate profile to be compliant with the BR.
2024-09-03 Started internal investigation looking for misissued certificates for this customer
2024-09-10: Found 7 active misissued TLS subscriber certificates issued between 2023-09-26 and 2024-4-12
2021-09-12 Notified customer of the finding requesting revocation within 48 hours
2024-09-13 completed the revocation of the 7 certificates
2024-09-16 Conducted a comprehensive examination of all active certificate profile configurations. This thorough review confirmed that no additional certificate profiles subject to the BR necessitate updates at this time.

Root Cause Analysis

Contributing Factors
Low Usage Profile: The certificate profile in question had very low usage, which led to it being overlooked during the update process.

Communication Gap: The customer Delivery Team managing the enterprise account was not included in the certificate profile update process, leading to a lack of awareness about the outdated profile. Thus, it failed to validate this profile.

Outcome: As a result of these factors, the enterprise customer was left with an outdated certificate profile while other profiles were successfully updated.

Lessons Learned

What went well

Prompt revocation within the expected timelines upon learning of the mis-issuance

What didn't go well

Failure to update a certificate profile

Where we got lucky

This issue was not widely spread and affected only 7 TLS subscriber certificates.

Action Items

To prevent this issue from happening again, we have improved our standard operating procedures (SOP) for certificate profile updates. Now, any changes to the certificate profile will require signoff from the Delivery Team before being deployed in production

Action Item Kind Due Date
Update SOP to include Delivery Team Prevent Complete

Appendix

Details of affected certificates

https://crt.sh/?id=10480417558
https://crt.sh/?id=11365552002
https://crt.sh/?id=11365550460
https://crt.sh/?id=10665016513
https://crt.sh/?id=10529046035
https://crt.sh/?id=10528420733
https://crt.sh/?id=12697783853

Thank you for this report. We have a few questions and request some updates.

Update #1: The timeline should be updated to reflect minute-level granularity aligned with the guidance on the CCADB Incident Report page. Additionally, were there any other relevant activities that are material to this incident? For example, when was the linting process described on 2024-08-21 tested and implemented? It might also be useful to know the times at which issuance stopped and restarted, if that activity occurred.

Update #2: Can the RCA be updated with additional detail, to include how each independent issue avoided detection until they were discovered?

Also, considering the following questions could aid in better understanding the root cause(s):

  1. Something being overlooked due to low utilization seems probable, but can you expand on the certificate profile update process to add context to how low utilization played a role in causing this incident?
  2. What system or process was supposed to update the profile and how did that fail?
  3. Has IdenTrust considered routinely phasing-out profiles that are not broadly relied upon by its Subscribers?
  4. When IdenTrust identifies the need to create an additional profile, how are existing profiles considered for potential phase-out?
  5. Can you expand on why the Delivery Team was not included in this process and the Teams expected role in profile updates? (i.e., why should they be included and how might you expect them to identify items like incorrect RDN ordering?)
  6. Can you elaborate on the role played by certificate linting in this incident?

Update #3: Possibly in combination with an updated RCA, can you elaborate on why the Delivery Team signoff action item will prevent this type of issue from happening again in production? Are there other process or system related actions that could occur in a development or test environment to ensure profile accuracy before humans are involved and before deploying in production?

(In reply to Chris Clements from comment #2)
We value your feedback, and in response to the requests for more information to clarify the situation, we have revised and updated the timeline and root cause analysis below. We hope this updated version provides the additional details you were looking for.

Timeline - Revised/Updated Timeline – All times are UTC

2023-09-14 09:47: Updated TLS certificate profiles to be compliant with BR v2.0.0
2024-06-01 20:40: Updated database with Zlint libraries
2024-07-20 19:36: Deployed the Zlint version 3.6.2 in production
2024-08-21 15:04: Customer reported issue retrieving TLS certificate
2024-08-22 15:30: Determined the customer issue was caused by the 2024-07-20 deployment of the updated Zlint libraries.
2024-08-30 16:42: Corrected the certificate profile to be compliant with the TLS BR.
2024-09-03 10:01: Started internal investigation looking for misissued certificates for this customer
2024-09-10: 21:09: Found 7 active misissued TLS subscriber certificates issued between 2023-09-26 and 2024-4-12
2021-09-12 02:12: Notified customer of the finding requesting revocation within 48 hours
2024-09-13 12:47: completed the revocation of the 7 certificates
2024-09-16: 22:13 Disclosed preliminary incident report in Bugzilla and notified roots stores (Apple and Microsoft)
2024-09-16 22:01: Conducted a comprehensive examination of all active certificate profile configurations. This thorough review confirmed that no additional certificate profiles subject to the BR necessitate updates at this time.

Root Cause Analysis Revised/Updated

Contributing Factors
Low Usage Profile: The certificate profile in question had very low usage, which led to it being overlooked during the update process.
This certificate configuration was initially implemented for two separate internal teams, each with their own distinct operational processes. While the primary user group eventually discontinued its use, the secondary team continued to integrate this configuration into their ongoing workflow, maintaining its relevance within their operational framework. Due to infrequent usage, it was presumed that this particular certificate profile had fallen into disuse.
Communication Gap: The customer Delivery Team managing the enterprise account was not included in the certificate profile update process, leading to a lack of awareness about the outdated profile. Thus, it failed to validate this profile.
For any new certificate profile, we have a Standard Operating Procedure (SOP) in place for archiving inactive certificate profiles, which is overseen by the Delivery team.
Unfortunately, this was not a new certificate profile and the assumption of having a certificate profile in disuse was not communicated to the Delivery team, leading to an unintended oversight.
The updated SOP now requires that any type of certificate profile request (new/update) requires the signoff from the delivery team before the request is deployed in production.
Outcome: As a result of these factors, the enterprise customer was left with an outdated certificate profile while other profiles were successfully updated.
The deployment of the updated Zlint library into our production environment highlighted a previously undetected issue with an outdated certificate profile. This discovery, as detailed in the revised timeline, brought the matter to our immediate attention.

Thank you for the updates in Comment 3. Some follow-up questions.

Related to the updated Timeline:

  1. We interpret the update to imply that certificate issuance was not stopped during this incident, but please correct us if that interpretation is wrong.

2024-08-30 16:42: Corrected the certificate profile to be compliant with the TLS BR.

  1. To us, this implies an understanding that the certificate profile was previously incorrect and that certificates may have been misissued. It’s not clear why it took until 2024-09-10 to discover the 7 misissued certificates. Can you help us better understand the activities occurring between these two dates and the process used for identifying the misissued certificates?

Related to the updated RCA:

Due to infrequent usage, it was presumed that this particular certificate profile had fallen into disuse.

  1. It’s not clear who made this presumption and how that impacted the specific system or process that was supposed to update the certificate profile. Can you add more context?

For any new certificate profile, we have a Standard Operating Procedure (SOP) in place for archiving inactive certificate profiles, which is overseen by the Delivery team.

  1. Has IdenTrust considered routinely phasing-out profiles that are not broadly relied upon by its Subscribers? If so, what does that process look like? If not, how come?

The updated SOP now requires that any type of certificate profile request (new/update) requires the signoff from the delivery team before the request is deployed in production.

  1. Can you expand on the Delivery Team’s expected role in profile updates? (i.e., why should they be included and how might you expect them to identify items like incorrect RDN ordering?)

(In reply to Chris Clements from comment #4)

Related to the updated Timeline:

  1. We interpret the update to imply that certificate issuance was not stopped during this incident, but please correct us if that interpretation is wrong.

The issuance process remained active throughout the incident. However, it's important to note that customers were effectively prevented from issuing certificates due to the linter's functionality. As per the incident timeline, the customer initially reported their inability to issue a certificate on 2024-08-21 at 15:04. This automatic prevention by the linter eliminated the need for a manual stoppage of the issuance process.

  1. To us, this implies an understanding that the certificate profile was previously incorrect and that certificates may have been misissued. It’s not clear why it took until 2024-09-10 to discover the 7 misissued certificates. Can you help us better understand the activities occurring between these two dates and the process used for identifying the misissued certificates?

Here are they steps taken to investigate the misissued certificates:
• Upon identifying the incorrect certificate profile for this customer, we initiated a comprehensive review of all customer profiles to ensure no other profiles were outdated. This process involved meticulous comparison and validation, ultimately confirming that only one certificate profile was incorrect.
Concurrently, we examined the issuance records of all certificates issued since 2023-09-14, across all customers to identify any other potentially misissued certificates._
The thorough investigation of profiles and issued certificates required significant time and effort, leading to the identification of 7 active misissued certificates.

Related to the updated RCA:

Due to infrequent usage, it was presumed that this particular certificate profile had fallen into disuse.

  1. It’s not clear who made this presumption and how that impacted the specific system or process that was supposed to update the certificate profile. Can you add more context?

The Product team is responsible for submitting updates to the certificate profile. However, on 2023-09-14, they failed to validate the proposed updates with the Delivery team who would have recognized that the enterprise certificate profile in question also required updates.

  1. Has IdenTrust considered routinely phasing-out profiles that are not broadly relied upon by its Subscribers? If so, what does that process look like? If not, how come?

Yes, the Client Delivery team conducts routinely exercises to archive older configurations. During this process, they assess changes in client requirements, review active certificates and API access, and identify any expired or expiring subordinate Certificate Authorities (CAs), roots, and contracts. If any questions arise, the team reaches out to customers to discuss the use-case for the certificates and to determine whether they are still needed.

  1. Can you expand on the Delivery Team’s expected role in profile updates? (i.e., why should they be included and how might you expect them to identify items like incorrect RDN ordering?)

Our Delivery team oversees overseeing the enterprise customer’s configuration to access the CA issuing platform via APIs. This team enables standard certificate profiles or custom profiles, which was the case of this incident. With the updated process requiring Delivery to review/signoff, they will review if the proposed change submitted by the product team can affect any of the enterprise customers under their direct control.

We believe we have addressed all outstanding items related to this issue and consider it resolved.

Flags: needinfo?(roots)

I'll pull this case back up on Friday, 1-Nov-2024, and consider closing it if there have been no questions or issues for Identrust to address.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 7 days ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.