Open Bug 1886110 Opened 7 months ago Updated 1 day ago

TWCA: Revocation delay for TLS certificates with non-critical basicConstraints

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: chtsai, Assigned: chtsai)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ca-compliance] [ov-misissuance] [leaf-revocation-delay] Next update 2024-10-31)

Attachments

(10 files)

Incident Report

This is a preliminary report.

Summary

TWCA mis-issued 16,481 OV TLS certificates with non-critical basicConstraints nonconforming to BR Section 7.1.2.7.6. All affected certificates MUST be revoked within 5 days. However, due to various reasons from customers, and our balancing of regulatory compliance with customer impact, we ultimately cannot revoke all certificates in time.

Impact

2,551 out of the 16,481 mis-issued OV TLS certificates (15%) were not revoked in time.

Timeline

All times are UTC+8.

2023-09-15:

  • 08:00 BR for TLS 2.0.0 has become effective.

2024-03-13:

  • 12:00 During the investigation of a previous bug (Bug 1883620), we discovered that some certificates had their basicConstraints not set as critical, which does not comply with BR Section 7.1.2.7.6.

  • 20:28 Preliminary report for the incident is posted (Bug 1885132).

2024-03-14:

  • 12:00 All affected certificates are identified, totaling 16,481 OV TLS certificates.

2024-03-18:

  • 12:00 According to BR requirements, all mis-issued certificates should be revoked before this time.

2024-03-19:

  • 20:40 Posting this report.

Root Cause Analysis

In response to this misissuance event, we have exhausted all means to mobilize our personnel to contact those customers unable to automatically reissue their certificates for replacement. Unfortunately, many certificates still could not be revoked within the given deadline.

We understand Mozilla's position that it is the ultimate responsibility of the CA to determine whether the harm caused by complying with BR requirements outweighs the risk transferred to individuals relying on the Web PKI by choosing not to meet this requirement.

After carefully confirming that delaying the revocation would not harm the customers (especially in terms of cybersecurity), ultimately, we set a new revocation deadline with the users.

For those certificates that cannot be revoked within the stipulated timeframe, we have preliminarily analyzed the reasons and summarized them in the following table.

We expect the number of certificates not revoked within the next week to be less than 3%.

Type Reasons for not revoking the certificates within 5 days Number of certificates Action
A These certificates are installed on NAS devices, with the NAS automatically checking for new patches to trigger the automatic reissuance of certificates. However, because some NAS devices are configured not to automatically install patches or are not powered on, the certificates have not been replaced. This requires manually sending notifications for customers to manually update the patches. 2,069 We will mandatorily revoke all certificates on March 25th.
B These certificates are pinned within the mobile app and replacing the certificates requires the app to be re-released. Because customers have indicated that the process from testing to launching the app cannot be completed within 5 days. Mostly in the financial industry. 51 Continuing to coordinate with the customer and setting a revocation deadline.
C These certificates are wildcard or multi-domain certificates, involving many websites. Customers have reported that they are unable to replace all site certificates within 5 days. 120 Continuing to coordinate with the customer and setting a revocation deadline.
D Customers have encountered technical issues during the certificate replacement process, which cannot be resolved within 5 days. Including issues with abnormal connections after the certificate replacement, incomplete certificate chains, or the replacement still being ineffective, TWCA technical personnel are currently assisting in addressing the issue. 33 Continuing to coordinate with the customer and setting a revocation deadline.
E Replacing certificates entails coordination with external vendors and aligning with their availability, thus prompting a request to extend the revocation deadline. 15 Continuing to coordinate with the customer and setting a revocation deadline.
F Due to the certificate replacement, relying parties need to cooperate in changing the bound certificates. Customers have reported that all their relying parties cannot be changed within 5 days. 16 Continuing to coordinate with the customer and setting a revocation deadline.
G The customer has critical activities underway and cannot shut down and replace certificates within the specified deadline. 66 Continuing to coordinate with the customer and setting a revocation deadline.
H Various issues with failed communication with customers (e.g., the responsible person being on leave) have resulted in the inability to replace certificates within 5 days. 25 Continuing to coordinate with the customer and setting a revocation deadline.
I The customer did not specify the reasons for needing to delay the revocation. 155 Continuing to coordinate with the customer and setting a revocation deadline.

Lessons Learned

What went well

  • Most customers are willing to cooperate and completed replacing the certificate in time.
  • A high proportion of certificates were promptly reissued through automated processes.

What didn't go well

  • We have looked into the situations of other CAs (such as Bug 1861682, Bug 1877388) and, unfortunately , many of the issues occurring with other CAs have also occurred with us.

  • Some customers use the certificate in critical applications. Replacing certificates using emergency processes may impose higher risks.

  • Replacing wildcard certificates within the specified time is particularly challenging. Moreover, we find it difficult to enumerate the number of sites affected by multi-domain certificates and to detect whether customers have completed the replacement.

  • It is challenging to explain to some customers the reasons for the urgent revocation and replacement of certificates, as well as the differences in appearance between the certificates before and after replacement.

Where we got lucky

N/A

Action Items

Action Item Kind Due Date
Continue to revoke the certificates that have been delayed in revocation. Mitigate 2024-04-30
Weekly update on revocation progress. Mitigate 2024-04-30
Ensure customers are aware of possible situations where certificates must be replaced quickly for security reasons Prevent 2024-04-30
Ensure customers are aware of the risk of certificate pinning and alternative methods Prevent 2024-04-30

Details of affected certificates

Refer to the attached file.

Assignee: nobody → chtsai
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [ov-misissuance]

2024-03-25:

  • 11:00 Among the 16,481 affected certificates, 16,391 certificates have been revoked or expired, with the remaining 90 certificates (0.5%) yet to be revoked.

2024-04-01:

  • 11:00 Among the 16,481 affected certificates, 16,453 certificates have been revoked or expired, with the remaining 28 certificates (0.17%) yet to be revoked.

Thank you for this report and for including the tabled list of revocation challenges. Two questions as you continue to make progress towards your first Action Item.

  • Could a private PKI (i.e., not required to adhere to the TLS BRs or root program policies by virtue of not being publicly trusted) possibly be a better fit for some of the affected customer use cases? For example, you describe certificates installed on NAS devices that are not powered on. Can you help us understand the customer motivation for using a publicly-trusted CA for this use case, if you are aware?
  • What role might automation play to ensure timely revocation and reduced negative impact of these types of events in the future?

(In reply to Chris Clements from comment #5)

Thank you for this report and for including the tabled list of revocation challenges. Two questions as you continue to make progress towards your first Action Item.

  • Could a private PKI (i.e., not required to adhere to the TLS BRs or root program policies by virtue of not being publicly trusted) possibly be a better fit for some of the affected customer use cases? For example, you describe certificates installed on NAS devices that are not powered on. Can you help us understand the customer motivation for using a publicly-trusted CA for this use case, if you are aware?
  • We have evaluated the solution of using a private PKI, but in that scenario, customers would set up public websites on NAS.
  • For other cases, we have assessed and provided recommendations. Customers who have the capability to establish their own PKI are not included in the aforementioned affected cases. Typically, customers with policy issues or certificate management problems will still apply for publicly-trusted CA certificates.
  • What role might automation play to ensure timely revocation and reduced negative impact of these types of events in the future?
  • We have observed that customers who have implemented automated mechanisms have significantly shorter response times, ensuring that certificates can be replaced within the revocation deadline.
  • Such customers, because the replacement process is less affected, tend to be more cooperative and usually do not require additional technical support (such as certificate installation).
  • If the incident is related to cybersecurity issues, promptly revoking and replacing certificates can reduce potential cybersecurity risks.

2024-04-08:

  • 11:00 Among the 16,481 affected certificates, 16,470 certificates have been revoked or expired, with the remaining 11 certificates yet to be revoked.

2024-04-12:

18:30 Among the 16,481 affected certificates, 16,478 certificates have been revoked or expired, with the remaining 3 certificates yet to be revoked.

2024-04-15:

  • 18:37 Among the 16,481 affected certificates, 16,479 certificates have been revoked or expired, with the remaining 2 certificates yet to be revoked.
Whiteboard: [ca-compliance] [ov-misissuance] → [ca-compliance] [ov-misissuance] [leaf-revocation-delay]
Attachment #9397506 - Attachment description: Unrevoked List (Until 2024/04/19) → Unrevoked List (Until 2024/04/15)

2024-04-26:

  • Since April 15th, there has been no change in status, and unfortunately, 2 certificates have not yet been revoked (for the same customer). We are making our utmost effort to continue communicating and coordinating with the user.
Completed Items (synchronized with Bug 1884568):
  • Updated the correct customer contact information through this incident.
  • Updated user contracts, adding descriptions related to the termination of validity.
  • Completed sales staff education and training:
    • Explanation of requirements according to BR standards.
    • Alternative solutions for certificate binding.
    • Explanation of risks associated with wildcard certificates.
  • Feedback has been progressively received from the surveys previously sent to customers; internally, we will continue to evaluate this as a reference for future improvements in the certificate application process.
Unfinished Items
  • The remaining two wildcard certificates involve multiple hosts; most hosts have either completed the replacement or switched to certificates from other CAs. However, some hosts still require configuration changes to be made by vendors. We will allow an additional two-week extension.
  • Unrevoked certificates:

Action Items

Action Item Kind Due Date
Continue to revoke the certificates that have been delayed in revocation. Mitigate 2024-05-15
Weekly update on revocation progress. Mitigate 2024-05-15
Ensure customers are aware of possible situations where certificates must be replaced quickly for security reasons Prevent Done
Ensure customers are aware of the risk of certificate pinning and alternative methods Prevent Done

2024-05-07:

  • 18:02 The remaining two certificates have been revoked. All certificates affected by this incident have either expired or been revoked.
Action Item Kind Due Date
Continue to revoke the certificates that have been delayed in revocation. Mitigate Done
Weekly update on revocation progress. Mitigate Done
Ensure customers are aware of possible situations where certificates must be replaced quickly for security reasons Prevent Done
Ensure customers are aware of the risk of certificate pinning and alternative methods Prevent Done

(In reply to chtsai from comment #15)

2024-05-07:

  • 18:02 The remaining two certificates have been revoked. All certificates affected by this incident have either expired or been revoked.

I am seeing both certificates still responding 'OCSP: Good' and not being listed in the relevant CRL:
https://crt.sh/?id=10841100654&opt=ocsp
https://crt.sh/?id=10964835099&opt=ocsp

Could you confirm that these have been revoked?

I would recommend timezones be included with any future timeline responses for clarity. I will note that TWCA provided an exceptional breakdown of pending revocations reasons originally, good work there.

Flags: needinfo?(chtsai)

(In reply to Wayne from comment #16)

(In reply to chtsai from comment #15)

2024-05-07:

  • 18:02 The remaining two certificates have been revoked. All certificates affected by this incident have either expired or been revoked.

I am seeing both certificates still responding 'OCSP: Good' and not being listed in the relevant CRL:
https://crt.sh/?id=10841100654&opt=ocsp
https://crt.sh/?id=10964835099&opt=ocsp

Could you confirm that these have been revoked?

Since the next CRL & OCSP publication time has not yet arrived, the current certificate status hasn't been updated. The actual revocation date for the two certificates are as follows:

I would recommend timezones be included with any future timeline responses for clarity. I will note that TWCA provided an exceptional breakdown of pending revocations reasons originally, good work there.

Thank you for your attention.

Flags: needinfo?(chtsai)

(In reply to chtsai from comment #17)

Since the next CRL & OCSP publication time has not yet arrived, the current certificate status hasn't been updated. The actual revocation date for the two certificates are as follows:

I may be missing something, but if the OCSP and CRL resources haven’t been updated, in what sense is this revoked from the perspective of a relying party?

The purpose of revocation is to prevent relying parties from trusting the revoked certificates. If the relying parties can’t see that the certificates are revoked then they might as well not be.

I am surprised that this would be a point of confusion, but maybe even more explicit language is necessary in the BRs about what it means to actually “revoke” a certificate?

Flags: needinfo?(chtsai)

Section 4.9.5 of the Baseline Requirements says (emphasis added):

The period from receipt of the Certificate Problem Report or revocation-related notice to published revocation MUST NOT exceed the time frame set forth in Section 4.9.1.1.

There is also significant precedent here in Bugzilla and on MDSP for the fact that a certificate is not considered revoked until that revocation has been published and globally propagated (i.e. all well-behaved caches have dropped the previous status).

(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #18)

I am surprised that this would be a point of confusion, but maybe even more explicit language is necessary in the BRs about what it means to actually “revoke” a certificate?

This is an area where clarity could be improved and has been documented as an issue for the Server Certificate Working Group to consider/address: https://github.com/cabforum/servercert/issues/252.

That said, precedent of interpretation of expectations here is readily available, as Aaron notes -- as far as I can tell, there is likely always room for improvement in the TBRs, Root Program Policies, RFCs, and everything in between, but that doesn't excuse practices which violate the requirements even when it aids in understanding how such practices came into effect.

(In reply to chtsai from comment #17)

Since the next CRL & OCSP publication time has not yet arrived, the current certificate status hasn't been updated.

You should be able to issue a CRL or an OCSP response BEFORE the nextUpdate value of existing CRL/OCSP responses.

Attached file G2_before.crl
Flags: needinfo?(chtsai)
Attached file G2_after.crl
Attached file G3_before.crl
Attached file G3_after.crl

We have read everyone's comments and understood the concerns. Here, we are sharing some information for everyone's reference to facilitate discussion. If the clarity of the BR standards is involved, we will remain neutral and listen to all perspectives.

The two revoked certificates belong to two different generations of UCA, so we've provided the detailed timeline below.

Timeline

  • 2024-05-07 16:20:00 (UTC+8) The G2UCA CRL has been published.
  • 2024-05-07 16:20:00 (UTC+8) The G3UCA CRL has been published.
  • 2024-05-07 17:59:52 (UTC+8) The certificate has been marked as revoked in the G2UCA database
  • 2024-05-07 18:00:02 (UTC+8) The certificate has been marked as revoked in the G3UCA database
  • 2024-05-08 04:20:00 (UTC+8) The G2UCA CRL has been published, and the certificates included for the first time.
  • 2024-05-08 04:20:00 (UTC+8) The G3UCA CRL has been published, and the certificate is included for the first time.

Information supplement

  • The times presented in the incident report are the database marking times, which represent the revocationDate that will appear in the CRL. They are not the times when the revocation information is published or disclosed. This clarification is provided here.
  • The current CRL is issued twice daily, at 04:20 AM and 04:20 PM.
  • The CRL includes all non-expired revoked certificates under that CA. It is issued with a validity period of 4 days, meeting the requirements of BR 4.9.7.
  • The issuance frequency and validity period of OCSP are the same as the CRL.
  • Our CRL and OCSP both have forced issuance/publication features. For certificates that need to be revoked within 24 hours (such as key compromises), we manually trigger this mechanism.

In sync with Bug 1884568, we have scheduled the ACME ARI development plan.
We request to set the next update for June 30, 2024. Thank you.

Action Item Kind Due Date
Continue to revoke the certificates that have been delayed in revocation. Mitigate Done
Weekly update on revocation progress. Mitigate Done
Ensure customers are aware of possible situations where certificates must be replaced quickly for security reasons Prevent Done
Ensure customers are aware of the risk of certificate pinning and alternative methods Prevent Done
The CA system supports ACME Renewal Information (ARI) Mitigate 2024-09-30

Similar to Bug 1884568, we request setting the nextUpdate to June 30, 2024. Thank you.

Whiteboard: [ca-compliance] [ov-misissuance] [leaf-revocation-delay] → [ca-compliance] [ov-misissuance] [leaf-revocation-delay] Next update 2024-06-30

Currently, ACME ARI is developing as expected, we request setting the nextUpdate to Aug 30, 2024. Thank you.

Whiteboard: [ca-compliance] [ov-misissuance] [leaf-revocation-delay] Next update 2024-06-30 → [ca-compliance] [ov-misissuance] [leaf-revocation-delay] Next update 2024-08-30
Blocks: 1911183

The ACME ARI feature has been successfully launched this week, and all the action items we committed to have been completed.

Action Items

Action Item Kind Due Date
Continue to revoke the certificates that have been delayed in revocation. Mitigate Done
Weekly update on revocation progress. Mitigate Done
Ensure customers are aware of possible situations where certificates must be replaced quickly for security reasons Prevent Done
Ensure customers are aware of the risk of certificate pinning and alternative methods Prevent Done
The CA system supports ACME Renewal Information (ARI) Mitigate Done

We appreciate the update related to landing ARI.

Can you please share more detail related to:

  • a) which version of the draft RFC has TWCA adopted?
  • b) how will ARI be made available to customers? (for example, will it be available to all customers, or a specific subset?)
  • c) how will customers be encouraged to adopt ARI?
  • d) how will TWCA monitor ARI adoption?
  • e) how will TWCA verify its ARI implementation is working as intended for its customer community, and how often will it perform these evaluations?
Flags: needinfo?(chtsai)

(In reply to Ryan Dickson from comment #31)

We appreciate the update related to landing ARI.

Can you please share more detail related to:

  • a) which version of the draft RFC has TWCA adopted?
  • b) how will ARI be made available to customers? (for example, will it be available to all customers, or a specific subset?)
  • c) how will customers be encouraged to adopt ARI?
  • d) how will TWCA monitor ARI adoption?
  • e) how will TWCA verify its ARI implementation is working as intended for its customer community, and how often will it perform these evaluations?

Hi Ryan,
This response indicates that we have received your inquiry. We expect to reply by August 26 at the latest. Thank you for your attention.

Flags: needinfo?(chtsai)

(In reply to Ryan Dickson from comment #31)

We appreciate the update related to landing ARI.

Can you please share more detail related to:

  • a) which version of the draft RFC has TWCA adopted?

We have developed using draft-ietf-acme-ari-03. We are aware that there are newer versions available, and we will regularly monitor version changes and update our CA software as necessary.

  • b) how will ARI be made available to customers? (for example, will it be available to all customers, or a specific subset?)

We have not pre-defined a specific group for ARI support; all ACME users can utilize the ARI mechanism.

  • c) how will customers be encouraged to adopt ARI?

For ACME users, we will create and maintain a list of ACME clients that support ARI, and assist customers in using the software included in the list. Additionally, we will emphasize to customers that using supported software can eliminate the need for manual intervention when immediate certificate revocation is required.

  • d) how will TWCA monitor ARI adoption?

In our system design, all ACME messages, including the renewal-info resource, are written to the database. Through the backend management interface, we can analyze ARI query statuses and the proportion of certificate renewals handled via the ARI mechanism. This data will help us adjust future business strategies accordingly.

  • e) how will TWCA verify its ARI implementation is working as intended for its customer community, and how often will it perform these evaluations?

Regarding server availability:

  • Our operations team will use monitoring tools to track the server's uptime. For the ACME service, we will monitor the directory resource to ensure it responds as expected, with monitoring frequencies set in minutes.

Regarding ARI availability:

  • TWCA's test site will use ACME software that supports ARI to periodically update certificates with the ACME server. Monitoring tools will track this site to ensure the validity of its certificates.

Drills:

  • We plan to conduct at least one drill annually, simulating an immediate revocation on the test site mentioned above, to ensure that the site's certificates can be reissued within the suggestedWindow time frame.

We are continuously monitoring this issue.

We are continuously monitoring this issue.

Whiteboard: [ca-compliance] [ov-misissuance] [leaf-revocation-delay] Next update 2024-08-30 → [ca-compliance] [ov-misissuance] [leaf-revocation-delay]
Whiteboard: [ca-compliance] [ov-misissuance] [leaf-revocation-delay] → [ca-compliance] [ov-misissuance] [leaf-revocation-delay] Next update 2024-10-31
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: