Open Bug 1892419 Opened 9 months ago Updated 2 months ago

Chunghwa Telecom: Delayed Revocation Due to GTLSCA EKU Misissuance

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: leox, Assigned: leox)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-02-01)

Attachments

(8 files)

Incident Report

Summary

On March 19, 2024, GTLSCA received notification from the CHT Root CA team that there was an issue with incorrectly set EKU fields in certain certificates. We revoked the three listed certificates on the day of notification and advised users to reapply. After a detailed investigation over one day, we identified that 6,450 certificates were affected. As this error was assessed not to impact key security or information security, we have initiated this bug report to explain why we need to delay the revocation of these mistakenly issued certificates. Nonetheless, we will continue to monitor the situation and ask users to replace and install new certificates.

How we first became aware of the problem.

On March 19, 2024, GTLSCA received a notification of a certificate misissuance from the CHT Root CA team and immediately commenced remediation.

Impact

There are 6,450 certificates affected, all of which will be revoked and updated. However, the impact assessment indicates that only the certificate fields have changed, and user connection security is not affected.

TimeLine

All times are UTC+8.

2023-09-15

  • BR for TLS 2.0.0 has become effective.

2024-02-26

  • When GTLSCA PMA reviewed CPS v1.0.3 and BR, it was found that the extKeyUsage Boolean Flag was "MUST" to be marked as non-critical in version v2.0.0, and it was proposed to adjust the certificate profile.
  • Update GTLSCA CPS v1.0.4 draft.

2024-03-05

  • Provide certificate profile and update documentation based on BR.
  • Submit GTLSCA CPS v1.0.4 to RootCA for review.

2024-03-11

  • 14:16 Updated the certificate profile setting to mark the certificate field extKeyUsage as non-critical.
  • 14:34 Certificate issued after this point in time comply with BR standards.

2024-03-19

  • 09:35 RAO counter informs the users of 3 listed certs informed by Chrome Root Program this incident, and ask them to re-apply asap, and GTLSCA revoked the problematic certs after reissued the new ones in hours.
  • 09:41 Ask RAO to suspend review and issuance of new application cases and starts an action plan and initial investigation.
  • 10:55 Confirmed the impact scope and checked the problem has been fixed on 2024-03-11. The preliminary investigation is completed.
  • 11:15 Certificate issuing continued.
  • 11:21 Revoked the first two problematic certificates on the report.
  • 11:27 Hold the first investigation report meeting, confirm the problem, and impact scope.
  • 12:39 Revoked the third problematic certificates on the report.
  • 15:00 Hold the second report meeting, propose solutions with minimal impact on users and how to avoid recurrence.
  • 16:32 Provide incident report to CHT root CA team.

2024-03-22

  • Posted this incident report.

2024-03-24

  • According to BR, all problematic certificates should be revoked at this time.

Root Cause Analysis (Why we delayed revoking the majority of the problematic certificates.)

This issue was made due to the misunderstanding of the profile as to the EKU in the version 2.0.0 TLS BR. We ignored the part where the EKU "MUST" not be marked as critical. We have corrected this problem on March 11, 2024.

In response to this incident of mistaken issuance, the verification targets are all government units and government agency websites. We have assessed that the cause of this mis-issuance does not involve a key issue, but only a certificate field issue, which will not affect the customer's information security. In addition, in accordance with the administrative efficiency of government agencies, from notification to the start of processing, it requires agency supervisors at all levels. Signing and approval, and some public agencies need to find information vendors for processing, so it is difficult to complete the replacement within 5 days. Therefore, the certificate is postponed and revoked within a time limit so that the certificates of all websites can be updated smoothly.

We understand Mozilla's position that CA should comply with BR's requirements. However, considering different circumstances, the harm caused may exceed the choice to meet this requirement, and the risk cannot be transferred to users who use TLS certificates.

In addition, we have not encountered such a large number of requests for certificate abolition and renewal. The original program basically provides for revocation, but the package does not preset a large number of certificate renewals, so this incident has a large part The time lies in developing the program for renewing certificates.

Action Items

Action Item Status Due Date
Reissue new certificates Finished 2024-04-11
Notify users to change certificates Started 2024-04-25
Keep track of issues Started 2024-05-15
Revoke all problematic certs Not yet started 2024-05-15

Lesson Learns

What went well

  • Most customers are willing to cooperate and completed replacing the certificate in time.

What didn’t go well

  • At present, our automated mechanism is still being promoted, so most of them still use manual review mechanism. Partly due to this, it takes more time to process the replacement of certificates.
  • In order to provide uninterrupted user services, the replacement process needs to be completed first, and testing has been strengthened for the sake of caution. In fact, a lot of time was spent on this work item including version launch and environment installation.
  • The re-issuance certificates were also issued in batches. Unfortunately, the Ching Ming Festival holiday fell in between, and it took a total of 10 days to complete the re-issuance.
  • In the announcement of the scheduled cancellation time, in order to enable users to cooperate with the replacement as soon as possible, the owner's government agency does not agree to specify the expected withdrawal time to avoid causing public resentment.

Where we got lucky

  • We can take this opportunity to familiarize ourselves with the problem reporting process and use Bugzilla to document issues.

Appendix

Details of the affected certificates: A list of 6,450 unexpired certificates to be revoked is attached to this post.

So a couple of points:

There are 6,450 certificates affected, all of which will be revoked and updated. However, the impact assessment indicates that only the certificate fields have changed, and user connection security is not affected.

Per the CCADB incident report guidelines:

The Impact section should contain a short description of the size and nature of the incident. For example: how many certificates, OCSP responses, or CRLs were affected; whether the affected objects share features (such as issuance time, signature algorithm, or validation type); and whether the CA Owner had to cease issuance during the incident.

A security impact assessment is specifically not a part of this section. The impact section is missing information such as, do all of these certificates belong to the same subscriber? If not, then how many subscribers are impacted by this?

We understand Mozilla's position that CA should comply with BR's requirements. However, considering different circumstances, the harm caused may exceed the choice to meet this requirement, and the risk cannot be transferred to users who use TLS certificates.

I believe it is only the Mozilla Root Program that understands that there may be some exceptional circumstances for failure to revoke: https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation

The decision and rationale for delaying revocation will be disclosed in the form of a preliminary incident report immediately; preferably before the BR-mandated revocation deadline. The rationale must include detailed and substantiated explanations for why the situation is exceptional. Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable. When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.

Any decision to not comply with the timeline specified in the Baseline Requirements must also be accompanied by a clear timeline describing if and when the problematic certificates will be revoked or expire naturally, and supported by the rationale to delay revocation.

You will perform an analysis to determine the factors that prevented timely revocation of the certificates, and include a set of remediation actions in the final incident report that aim to prevent future revocation delays.

So a few things that are missing here:

  1. Where are the action items that explain how you're going to prevent another failure to revoke? Your action items listed in this incident response do not currently address this at all.
  2. Where it the per subscriber rationale for not revoking the 6450 certificates?

Also, keep in mind that other root programs may not recognize the "exceptionality" for lack of timely revocation.

What's deeply concerning to me with this incident:

In addition, we have not encountered such a large number of requests for certificate abolition and renewal. The original program basically provides for revocation, but the package does not preset a large number of certificate renewals, so this incident has a large part The time lies in developing the program for renewing certificates.

If you're unable to revoke the certificates you have issued, then you have no business being a CA. It's one of the primary requirements of being a CA.

Do you have the ability to revoke every single certificate you've issued that is still currently active?

In the announcement of the scheduled cancellation time, in order to enable users to cooperate with the replacement as soon as possible, the owner's government agency does not agree to specify the expected withdrawal time to avoid causing public resentment.

I'm not entirely sure what this is trying to say, but I want to make it clear: The primary obligations you have as a CA are to the root programs and the public. Your primary obligation can not be, and is not your customers/subscribers. If your customers want to dictate their own revocation timelines, they should not be using certificates that are publicly trusted.

Flags: needinfo?(leox)
Whiteboard: [ca-compliance] [leaf-revocation-delay]
Attached file revoke0409-crtsh-2.csv

On April 9th, 2 affected certificates were subject to a revocation due to the application for domain name changes.

Flags: needinfo?(leox)

update: April 24th is the final deadline for notifying subscribers of the replacement. We commenced the revocation of the first batch of certificates, totaling 2,318.

On April 25th, we provided some subscribers with a one-day flexibility period and proceeded with a large batch of certificate revocations, totaling 3,974. In this incident report, a total of 6,297 certificates (97.58%) were revoked, including the three initially reported on March 19th. The remaining 156 certificates (2.42%) require some time for the subscribers' official procedures to be processed. We have agreed on the revocation dates with the subscribers, and the revocations will be carried out in batches subsequently.

(In reply to amir from comment #1)

So a couple of points:
There are 6,450 certificates affected, all of which will be revoked and updated. However, the impact assessment indicates that only the certificate fields have changed, and user connection security is not affected.

A security impact assessment is specifically not a part of this section. The impact section is missing information such as, do all of these certificates belong to the same subscriber? If not, then how many subscribers are impacted by this?

We calculated the number of affected subscribers was 2,727.

The Impact section should contain a short description of the size and nature of the incident. For example: how many certificates, OCSP responses, or CRLs were affected; whether the affected objects share features (such as issuance time, signature algorithm, or validation type); and whether the CA Owner had to cease issuance during the incident.

The incident caused by the wrong cert profile with which the 2.5.29.37 boolean flag of extKeyUsage was set as true. We did cease cert issuance, during about 9:35 to 11:15 (UTC+8), immediately as soon as we received the notification from Root CA team on March 19. And GTLSCA resumes the cert issuance after investigation and confirmed that the problem has been fixed on 2024-03-11.

The following three certificates were revoked on March 19 after receiving the notice. They are not included in the 6,450 affected records, so the total affected scope should be 6,450 + 3.
https://crt.sh/?sha256=6d0259f547f96db2c3ce4f23e0673249d7466ab73447abeb4277db9d3865aa8e
https://crt.sh/?sha256=992244b8fc9b1d663e4a7fb6c9b5f85610a6155542697ad06ad16bbeeb2a8d0f
https://crt.sh/?sha256=03713f1213e50cce391677d1f5dc2d99d4eb62f4424b2b5737329c37d285bd21

So a few things that are missing here:
Where are the action items that explain how you're going to prevent another failure to revoke? Your action items listed in this incident response do not currently address this at all.

In fact, it is not a failure to revoke, but to allow the subscriber's website to replace and install a new certificate in accordance with our replacement notification, and then revoke it to ensure website availability.

Where it the per subscriber rationale for not revoking the 6450 certificates?

In this project we plan to initially issue new certificates using the same keys for users to install, and then revoke the old certificates. As these are official government websites, and considering the pressure from government agencies and public opinion, we cannot immediately revoke all certificates without compromising security. Doing so would quickly become news, and we would face further censure from government authorities.

If you're unable to revoke the certificates you have issued, then you have no business being a CA. It's one of the primary requirements of being a CA. Do you have the ability to revoke every single certificate you've issued that is still currently active?

Actually, We need some time to contact each windows of government agencies to update their certificates before we proceed with the revocation After negotiation with the CA owner.

I'm not entirely sure what this is trying to say, but I want to make it clear: The primary obligations you have as a CA are to the root programs and the public. Your primary obligation can not be, and is not your customers/subscribers. If your customers want to dictate their own revocation timelines, they should not be using certificates that are publicly trusted.

We have a timeline and have contacted the impacted subscribers and requesting them to expedite the replacement of the new certificates, so that we can revoke the original ones within the planned timeframe.

Thank you for your advice, Amir.

Attached file revoke0426-crtsh-1.csv

Update:

  • 4/26 revoked 1 certificate
  • 5/1 revoked 91 certificates

Total affected certificates: 6,453
Total certificates revoked: 6,389 (99.01%)
Remaining: 64 (0.99%)

In fact, it is not a failure to revoke, but to allow the subscriber's website to replace and install a new certificate in accordance with our replacement notification, and then revoke it to ensure website availability.

Right, but you've not provided the per-subscriber reasoning for delaying the revocation.

and considering the pressure from government agencies and public opinion, we cannot immediately revoke all certificates without compromising security. Doing so would quickly become news, and we would face further censure from government authorities.

I think this is a deeply concerning statement. There are a few consequences to this statement:

  1. If mass revocation is required in the future, you're going to fail the mass revocation again.
  2. You're placing the subscriber's needs ahead of the rules for WebPKi.

Your primary stakeholder is the public. Not the government. What happens if the government pressures you to maliciously issue a certificate for a DNS name?

Actually, We need some time to contact each windows of government agencies to update their certificates before we proceed with the revocation After negotiation with the CA owner.

That is literally against the Baseline requirements.

I'm very concerned with this response. This is gross negligence on your behalf, and I'm hoping that you can provide assurances on how you will reconcile your required commitments with the BRs, with the pressure from the government.

Flags: needinfo?(leox)

To clarify, when I said "That is literally against the Baseline requirements.". I was relating it to this specific incident.

Generally, in misissuances that require revocation, you have 24/120 hours after finding out about the incident to communicate with your subscribers. However, the revocation date can not be changed based on what your subscribers ask of you.

In this specific incident, the revocation date MUST be done at most 5 days after finding out about the incident. Exceptions to this are not provided.

Assignee: nobody → leox
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Summary: Chunghwa Telecom: Instructions for Delayed Revocation Due to GTLSCA EKU Misissuance → Chunghwa Telecom: Delayed Revocation Due to GTLSCA EKU Misissuance
Attached file revoke0510-crtsh-5.csv
Flags: needinfo?(leox)
Attached file revoke0513-crtsh-2.csv

Update:

  • 5/10 revoked 5 certificates
  • 5/13 revoked 2 certificates

Total affected certificates: 6,453
Total certificates revoked: 6,453 (100%)
Remaining: 0

All affected certificates have been revoked.

In the event of similar incidents in the future, we will first assess whether they relate to key access security. If it is a major security issue, we will promptly report to the government and comply with BR regulations by revoking all certificates. However, if it involves only changes to the certificate fields and does not pertain to major security issues, we will explain the reason for the bug here. After confirming the schedule with the government, we will proceed with phased and batch revocations to ensure the availability of government websites is not affected.

GTLSCA stands for Government TLS CA, a TLS CA operated by Chunghwa Telecom under the commission of the government. It is responsible for issuing certificates to domestic government agency websites and browser services. The government retains the right to commission and may also entrust other qualified operators to manage the CA.

If the government forces us to maliciously issue certificates for DNS names, we will follow the BR validation process. If the DNS name instructed by the government cannot pass validation, we will not be able to issue the certificate to the government. All records will be preserved, and neither the government nor we can unilaterally issue certificates.

In the event of similar incidents in the future, we will first assess whether they relate to key access security. If it is a major security issue, we will promptly report to the government and comply with BR regulations by revoking all certificates. However, if it involves only changes to the certificate fields and does not pertain to major security issues, we will explain the reason for the bug here. After confirming the schedule with the government, we will proceed with phased and batch revocations to ensure the availability of government websites is not affected.

This comment is concerning for many reasons:

  1. it fails to inspire confidence that Chunghwa Telecom reliably adheres to the BRs or the commitments described in its own policies. Given Chunghwa Telecom has cross-certified GTLSCA, it also calls into question how well Chunghwa Telecom is ensuring GTLSCA’s commitments are being upheld.
  2. It relies upon vague and highly subjective phrasing (i.e., “major security issue") that leaves room for downplaying what other organizations might consider serious security incidents.
  3. It prioritizes customer needs over those of the ecosystem, and appears to be in clear violation of well-defined and consensus-driven requirements (i.e., the TLS BRs). The revocation timelines described in 4.9 of the TLS BRs must be considered as superseding to those desires of a customer.
  4. There are technical measures Chunghwa Telecom could take to more meaningfully ensure revocation timelines are met in the future. These measures may include 1) promoting automation (e.g., ACME + ARI) and 2) further encouraging its use by (a) deprecating domain validation methods that cannot rely on automation and (b) reducing certificate validity to further encourage use of automation tools made available to customers. Can you explain whether Chunghwa Telecom is exploring this combined set of activities? If not, can you describe barriers in doing so?

If the government forces us to maliciously issue certificates for DNS names, we will follow the BR validation process. If the DNS name instructed by the government cannot pass validation, we will not be able to issue the certificate to the government. All records will be preserved, and neither the government nor we can unilaterally issue certificates.

This statement is also a cause for concern. In the CCADB, GTLSCA G1 has a CA Owner identified that is a different organization than Chunghwa Telecom. I interpret that to mean that the CA’s private key (or any versions of that key that would be capable of signing publicly-trusted certificates) are not under Chunghwa Telecom’s direct control. Can you explain how Chunghwa Telecom meaningfully ensures that all domain validations are done so in a manner consistent with the BRs?

Flags: needinfo?(leox)

(In reply to Ryan Dickson from comment #13)

This comment is concerning for many reasons:

  1. it fails to inspire confidence that Chunghwa Telecom reliably adheres to the BRs or the commitments described in its own policies. Given Chunghwa Telecom has cross-certified GTLSCA, it also calls into question how well Chunghwa Telecom is ensuring GTLSCA’s commitments are being upheld.
  2. It relies upon vague and highly subjective phrasing (i.e., “major security issue") that leaves room for downplaying what other organizations might consider serious security incidents.
  3. It prioritizes customer needs over those of the ecosystem, and appears to be in clear violation of well-defined and consensus-driven requirements (i.e., the TLS BRs). The revocation timelines described in 4.9 of the TLS BRs must be considered as superseding to those desires of a customer.

Our terminology in the above explanation was not precise enough. The significant security issue mentioned refers to items (items 1-5) that must be revoked within 24 hours according to BR section 4.9.1.1, which will be completed on schedule.

Since this is the first time we have encountered a situation requiring a large number of revocations, we were somewhat unprepared. In the future, we will actively implement automated solutions and improve existing procedures. This will ensure that, even in cases of major revocations as mentioned above, we will comply with BR and CPS regulations and complete the revocation within 5 days. If we are still unable to complete it on schedule, we will file an issue on Bugzilla as required and provide progress tracking.

  1. There are technical measures Chunghwa Telecom could take to more meaningfully ensure revocation timelines are met in the future. These measures may include 1) promoting automation (e.g., ACME + ARI) and 2) further encouraging its use by (a) deprecating domain validation methods that cannot rely on automation and (b) reducing certificate validity to further encourage use of automation tools made available to customers. Can you explain whether Chunghwa Telecom is exploring this combined set of activities? If not, can you describe barriers in doing so?

Thank you for your valuable suggestions. GTLSCA currently supports the ACME mechanism, and we are in the promotion phase. We need to conduct usage training for our subscribers, which will take some time for everyone to familiarize themselves with the new application process. We are also preparing documentation for the automatic renewal mechanism, along with training for our customer service staff, to better provide guidance to our users.

This statement is also a cause for concern. In the CCADB, GTLSCA G1 has a CA Owner identified that is a different organization than Chunghwa Telecom. I interpret that to mean that the CA’s private key (or any versions of that key that would be capable of signing publicly-trusted certificates) are not under Chunghwa Telecom’s direct control. Can you explain how Chunghwa Telecom meaningfully ensures that all domain validations are done so in a manner consistent with the BRs?

Although GTLSCA G1 is owned by MODA, it is fully entrusted to Chunghua Telecom for operation. Our bilateral contract explicitly stipulates adherence to BR regulations. Each year, MODA commissions a qualified third-party auditor to conduct a thorough audit of Chunghua Telecom based on BR audit standards, ensuring that all operational processes comply with BR regulations.

Flags: needinfo?(leox)
Blocks: 1911183
Whiteboard: [ca-compliance] [leaf-revocation-delay] → [ca-compliance] [leaf-revocation-delay] Next update 2024-10-31
Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-10-31 → [ca-compliance] [leaf-revocation-delay] Next update 2024-11-30

We continue work on incident-reporting and compliance requirements aimed at reducing delayed revocation, so this bug will remain open until at least February 1, 2025. Meanwhile, CAs should review https://github.com/mozilla/www.ccadb.org/pull/186.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-11-30 → [ca-compliance] [leaf-revocation-delay] Next update 2025-02-01
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: