Chunghwa Telecom: Failure to check restrictive CAA record during Migration
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: tmkuo, Assigned: tmkuo)
Details
(Whiteboard: [ca-compliance] [ov-misissuance])
Attachments
(3 files)
Incident Report
Summary
Due to the phase-out of the old root certificate, eCA-G1, CHT has recently initiated a plan to migrate existing TLS certificates in GTLSCA to the Single-purpose PKI (HiPKI). Since both GTLSCA and HiPKI OV TLS CA are operated by CHT, we recognize the evidences of domain control of these valid certificates on GTLSCA by reusing validation data or documents, and directly proceed with certificate reissuance through HiPKI OV TLS CA.
On March 1, 2025, CHT received notification from the Chrome Root Program that they have noted an atypical surge in TLS certificate issuance (of specific domain) by CHT and observed issuance of new certificates for several domains that appear to have CAA records set that conflict with the CHT permitted to issue values described in the CPS of HiPKI OV TLS CA. After holding a preliminary incident discussion meeting with the CA response team, we have decided to promptly revoke all certificates issued during the migration period. In addition, we have also revoked previously issued certificates that had similar concerns regarding the domain control validation method.
Impact
On September 10, 2024, we reissued a total of 121 certificates. Between February 21, 2025, and February 6, 2025, we reissued a total of 11,739 certificates. However, only 24 certificates were sent to subscribers for installation testing. In this process, a total of 11,860 certificates were revoked (CRLReason #4, superseded).
Timeline
All times are UTC+8.
2025-03-01:
- 05:20 CHT received a notification from the Chrome Root Program indicating an unusual surge in certificate issuance and potential issues with the checking of CAA records set.
- 11:27 CHT holds a preliminary incident discussion meeting.
- 12:28 Convene the CA team members and personnels of computer room.
- 12:55 Develop a large-scale revocation process and SOP, and inventory the list of affected certificates that are prepared for revocation.
- 13:04 Begin the process of modifying the program for large-scale revocation and adjusting parameters.
- 15:05 Program delivery for test environment verification testing.
- 15:29 Deployment in the test environment failed, proceeding with the second round of program modifications and parameter adjustments.
- 17:30 Completed verification testing in the test environment.
- 17:35 Program delivered for deployment to the production environment.
- 17:47 Begin revoking the first problematic certificate.
- 18:27 Begin large-scale revocation process; by 20:25, 11,855 certificates have been revoked.
- 21:19 Revoke one more, bringing the total to 11,856 certificates revoked.
2025-03-03:
- 13:30 After a detailed investigation and inventory, an additional 4 affected certificates were identified.
- 14:34 Revoke four more problematic certificates, bringing the total to 11,860 certificates revoked.
Root Cause Analysis
The issuance was only made due to DCV reuse, the process of migrating existing TLS certificates from GTLSCA to HiPKI did not thoroughly check the CAA records. For example, a specific domain serves a CAA record with issue tag "letsencrypt.org" or "GTLSCA-G1", which denies HiPKI OV TLS CA from issuing. In addition, the process of migrating did not follow the complete verification process required by HiPKI OV TLS CA. As a result, the evidence for domain control may be unreliable.
Lessons Learned
What went well
- The large-scale revocation process and SOP had been rehearsed beforehand, and we are glad it proved useful this time, causing the affected certificates can successfully revoked within a short period.
What didn't go well
- During the migration process, more attention should be paid, like checking CAA records and ensuring BR compliance.
Where we got lucky
- Only 24 certificates were sent to subscribers for installation testing.
Action Items
| Action Item | Kind | Due Date |
|---|---|---|
| Seeking other large-scale migration methods for lesson learned | Prevent | 2025-04-15 |
| Personnel retraining as to BR compliance | Prevent | 2025-03-10 |
Appendix
Details of affected certificates
Details of the affected certificates: A list of 11,860 revoked certificates is attached to this post.
Based on Incident Reporting Template v. 2.0
Updated•11 months ago
|
Comment 1•11 months ago
|
||
We commend Chunghwa Telecom for prioritizing this high-volume revocation event and completing it within mandated timelines. The management of this incident is a case study in CA improvement over time.
Comment 2•11 months ago
|
||
Thank you for filing this public incident report. We have one request and several questions, detailed below.
Request:
The updated CCADB Incident Reporting Guidelines (IRGs) went into effect on March 1, 2025 (announced here and here). The format used for this report has been superseded and is no longer appropriate. Please correct the report to rely on the in-force IRGs.
Questions:
(1) Can you help us understand why the superseded Incident Report Template (i.e., Version 2.0) was used instead of the current effective version?
(2) The incident report disclosed in Comment 0 includes: “The issuance was only made due to DCV reuse, the process of migrating existing TLS certificates from GTLSCA to HiPKI did not thoroughly check the CAA records.” Additionally, the CCADB describes “政府伺服器數位憑證管理中心 - G1” as owned by the Ministry of Digital Affairs.
Given the above statement, can you provide a meaningful overview to help us understand how DCV occurs for certificates issued by each (a) 政府伺服器數位憑證管理中心 - G1 and (b) HiPKI OV TLS CA - G1?
Specifically, we’re interested in understanding which entity (i.e., either CHT or MODA) is considered “The CA” in relation to the following requirement from Section 3.2.2.4 of the TLS BRs:
“The CA SHALL confirm that prior to issuance, the CA has validated each Fully-Qualified Domain Name (FQDN) listed in the Certificate as follows…”
We note this is the second time CHT’s DCV procedures have come into question.
(3) Who owns the systems that are responsible for performing DCV for the certificates issued by 政府伺服器數位憑證管理中心 - G1? (i.e., are they owned by MODA or CHT?)
(4) Who operates the systems that are responsible for performing DCV for the certificates issued by 政府伺服器數位憑證管理中心 - G1? (i.e., are they operated by MODA or is this responsibility delegated to CHT?)
(5) Which audit documents consider the systems and related processes described in Questions 3 and 4 “in-scope?”
(6) Please describe the number of issued certificates for which CHT did not perform CAA checking prior to issuance, as required by Section 3.2.2.8 of the TLS BRs. The answer should consider issuance beyond those certificates described in this report.
(7) How does HiPKI OV TLS CA - G1, and more broadly CHT, ordinarily check CAA prior to certificate issuance, as required by Section 3.2.2.8 of the TLS BRs?
(8) How does 政府伺服器數位憑證管理中心 - G1 ordinarily check CAA prior to certificate issuance, as required by Section 3.2.2.8 of the TLS BRs?
(9) Please explain the circumstances that resulted in the certificates disclosed in this incident report from having bypassed CAA checking. (e.g., was CAA checking intentionally disabled while supporting the customer migration, or is CAA checking not part of the standard routine HiPKI OV TLS CA - G1 certificate issuance process?) The Root Cause Analysis provided states HiPKI did not thoroughly check the CAA records, but we’re trying to understand why.
(10) The incident report disclosed in Comment 0 includes “In addition, the process of migrating did not follow the complete verification process required by HiPKI OV TLS CA”. Can you help us understand what this means? It’s not clear what part of the verification process was not followed.
| Assignee | ||
Comment 3•11 months ago
|
||
Full Incident Report
Summary
-
CA Owner CCADB unique ID: A000015
-
Incident description: Due to the phase-out of the old root certificate, eCA-G1, CHT has recently initiated a plan to migrate existing TLS certificates in GTLSCA to the Single-purpose PKI (HiPKI). Since both GTLSCA and HiPKI OV TLS CA are operated by CHT, we recognize the evidences of domain control of these valid certificates on GTLSCA by reusing validation data or documents, and directly proceed with certificate reissuance through HiPKI OV TLS CA.
On March 1, 2025, CHT received notification from the Chrome Root Program that they have noted an atypical surge in TLS certificate issuance (of specific domain) by CHT and observed issuance of new certificates for several domains that appear to have CAA records set that conflict with the CHT permitted to issue values described in the CPS of HiPKI OV TLS CA. After holding a preliminary incident discussion meeting with the CA response team, we have decided to promptly revoke all certificates issued during the migration period. In addition, we have also revoked previously issued certificates that had similar concerns regarding the domain control validation method. -
Timeline summary:
- Non-compliance start date: 2024-09-10
- Non-compliance identified date: 2025-03-01
- Non-compliance end date: 2025-03-03
-
Relevant policies: Section 4.2.1 of our CPS (Version 1.0), Section 3.2.2.8 CAA Records of TLS BR (Version 2.1.3)
This is a violation of our CPS, which states: “If a CAA record exists that does not list “pki.hinet.net” or “tls.hinet.net” as a CAA Issuer Domain Name, HiPKI OV TLS CA will not issue the certificate.” -
Source of incident disclosure: Self Reported
Impact
- Total number of certificates:(revocations)11,860
- Total number of "remaining valid" certificates: 0
- Affected certificate types: This incident affects OV certificates (with OID 2.23.140.1.2.2).
- Incident heuristic: 11,860, the full corpus of affected certificates are disclosed in the Appendix.
- Was issuance stopped in response to this incident, and why or why not?: Yes. Issuance was stopped after learning of this issue and all the affected certificates are revoked immediately.
- Analysis:
- Additional considerations:
Timeline
All times are UTC+8.
2024-09-10:
- 11:28 Conducted the first small batch issuance with a quantity of 121 certificates.
2025-02-21:
- 11:39 Conducted a mass issuance operation. As of 2025-02-26, a total of 11,739 certificates have been reissued.
2025-03-01:
- 05:20 CHT received a notification from the Chrome Root Program indicating an unusual surge in certificate issuance and potential issues with the checking of CAA records set.
- 11:27 CHT holds a preliminary incident discussion meeting.
- 12:28 Convene the CA team members and personnels of computer room.
- 12:55 Develop a large-scale revocation process and SOP, and inventory the list of affected certificates that are prepared for revocation.
- 13:04 Begin the process of modifying the program for large-scale revocation and adjusting parameters.
- 15:05 Program delivery for test environment verification testing.
- 15:29 Deployment in the test environment failed, proceeding with the second round of program modifications and parameter adjustments.
- 17:30 Completed verification testing in the test environment.
- 17:35 Program delivered for deployment to the production environment.
- 17:47 Begin revoking the first problematic certificate.
- 18:27 Begin large-scale revocation process; by 20:25, 11,855 certificates have been revoked.
- 21:19 Revoke one more, bringing the total to 11,856 certificates revoked.
2025-03-03:
- 13:30 After a detailed investigation and inventory, an additional 4 affected certificates were identified.
- 14:34 Revoke four more problematic certificates, bringing the total to 11,860 certificates revoked.
Root Cause Analysis
** Contributing Factor 1: insufficient checks**
-
Description: The issuance was only made due to DCV reuse, the process of migrating existing TLS certificates from GTLSCA to HiPKI did not thoroughly check the CAA records. For example, a specific domain serves a CAA record with issue tag "letsencrypt.org" or "GTLSCA-G1", which denies HiPKI OV TLS CA from issuing. In addition, the process of migrating did not follow the complete verification process required by HiPKI OV TLS CA. As a result, the evidence for domain control may be unreliable.
-
Timeline:
2024-09-10 11:28: Conducted the first small batch issuance with a quantity of 121 certificates.
2025-02-21 11:39 Conducted a mass issuance operation. As of 2025-02-26, a total of 11,739 certificates have been reissued. -
Detection: CHT received a notification from the Chrome Root Program indicating potential issues with the checking of CAA records set.
-
Interaction with other factors: Inexperience with large-scale migration methods, we should seek more advice from other CAs or experts before proceeding reissue operations.
Lessons Learned
- What went well: The large-scale revocation process and SOP had been rehearsed beforehand, and we are glad it proved useful this time, causing the affected certificates can successfully revoked within a short period.
- What didn’t go well: During the migration process, more attention should be paid, like checking CAA records and ensuring BR compliance.
- Where we got lucky: Only 24 certificates were sent to subscribers for installation testing.
- Additional: We should seek more advice from other CAs or experts before proceeding reissue operations.
Action Items
| Action Item | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Personnel retraining as to BR compliance | Prevent | Root Cause # 1 | Insufficient checks is the main factor, and we will conduct retraining actions | 2025-03-10 | Ongoing |
| Seeking other large-scale migration methods for lesson learned | Prevent | What didn’t go well | Inexperience with large-scale migration methods, we should seek more advice from other CAs or experts before proceeding reissue operations. | 2025-04-15 | Ongoing |
Appendix
See Attached.
| Assignee | ||
Comment 4•11 months ago
|
||
(In reply to chrome-root-program from comment #2)
Thank you for filing this public incident report. We have one request and several questions, detailed below.
Request:
The updated CCADB Incident Reporting Guidelines (IRGs) went into effect on March 1, 2025 (announced here and here). The format used for this report has been superseded and is no longer appropriate. Please correct the report to rely on the in-force IRGs.
The report has been corrected in the post above.
Questions:
(1) Can you help us understand why the superseded Incident Report Template (i.e., Version 2.0) was used instead of the current effective version?
On the morning of March 1, 2025, while preparing to report this incident, I accessed the CCADB website to retrieve the full incident report template and noticed the announcement of a new version taking effect, but possibly due to time zone differences or webpage cache, I still saw the old version template, so I used it directly. But anyway, the first post of this report adopted to the previous version 2.0 format not the latest version 3.0, that's our oversight. This mistake has been corrected in our second post with the correct template.
(2) The incident report disclosed in Comment 0 includes: “The issuance was only made due to DCV reuse, the process of migrating existing TLS certificates from GTLSCA to HiPKI did not thoroughly check the CAA records.” Additionally, the CCADB describes “政府伺服器數位憑證管理中心 - G1” as owned by the Ministry of Digital Affairs.
Given the above statement, can you provide a meaningful overview to help us understand how DCV occurs for certificates issued by each (a) 政府伺服器數位憑證管理中心 - G1 and (b) HiPKI OV TLS CA - G1?
Specifically, we’re interested in understanding which entity (i.e., either CHT or MODA) is considered “The CA” in relation to the following requirement from Section 3.2.2.4 of the TLS BRs:
“The CA SHALL confirm that prior to issuance, the CA has validated each Fully-Qualified Domain Name (FQDN) listed in the Certificate as follows…”
We note this is the second time CHT’s DCV procedures have come into question.
Regardless of whether it is 政府伺服器數位憑證管理中心 - G1 or HiPKI OV TLS CA - G1, domain control validation (DCV) will be conducted according to section 3.2.2.4 of the TLS Baseline Requirements (BR) prior to issuance. MODA outsources GTLSCA service to CHT via contract, and CHT takes full responsibility for GTLSCA’s operation and management.
For GTLSCA subscribers whose FQDN ends with "gov.tw", they are required to apply and make DNS set through the GNS network service, which is constructed and maintained by CHT under the government's commission. Therefore, domain validation is completed using the methods in BR 3.2.2.4.12. For other government agency FQDNs, like ".taipei," since their domain registrars are not GSN, GTLSCA will perform control validation through 3.2.2.4.18. Finally, these FQDNs will be checked for CAA records before issuance.
As mentioned in comment 0, CHT has recently initiated a plan to migrate existing TLS certificates in GTLSCA to the Single-purpose PKI (HiPKI). Since both GTLSCA and HiPKI OV TLS CA are operated by CHT, we made DCV reuse and directly proceed with certificate reissuance through HiPKI OV TLS CA with a batch program to shorten the processing time (CA indeed did linting and other checks), but failture to check the CAA records, which is a fundamental function of RA. These practices are contrary to the statements of our CPS.
(3) Who owns the systems that are responsible for performing DCV for the certificates issued by 政府伺服器數位憑證管理中心 - G1? (i.e., are they owned by MODA or CHT?)
MODA outsources GTLSCA service to CHT via contract, and CHT takes full responsibility for GTLSCA’s operation and management.
(4) Who operates the systems that are responsible for performing DCV for the certificates issued by 政府伺服器數位憑證管理中心 - G1? (i.e., are they operated by MODA or is this responsibility delegated to CHT?)
They are operated by CHT.
(5) Which audit documents consider the systems and related processes described in Questions 3 and 4 “in-scope?”
It's in the GTLSCA 2024 Annual Audit Report, it states ".. Maintained and manage By ChunghwaTelecom...".
(6) Please describe the number of issued certificates for which CHT did not perform CAA checking prior to issuance, as required by Section 3.2.2.8 of the TLS BRs. The answer should consider issuance beyond those certificates described in this report.
We confirm that a total of 11,860 certificates were issued during this migration process through a batch issuance program, but appropriate CAA checking were not performed prior to issuance. In a standard process, subscribes should conduct submission applications, domain control validation, and CAA checking, as described in our CPS.
(7) How does HiPKI OV TLS CA - G1, and more broadly CHT, ordinarily check CAA prior to certificate issuance, as required by Section 3.2.2.8 of the TLS BRs?
As mentioned in Here, HiPKI OV TLS CA - G1 supports automatic CAA checking prior to issuance using dig instruction.
(8) How does 政府伺服器數位憑證管理中心 - G1 ordinarily check CAA prior to certificate issuance, as required by Section 3.2.2.8 of the TLS BRs?
政府伺服器數位憑證管理中心 - G1 uses a regularly updated verification system (with data retrieval conducted through Dig instruction) to check CAA records before certificate issuance, ensuring compliance with the requirements of TLS BR Section 3.2.2.8.
(9) Please explain the circumstances that resulted in the certificates disclosed in this incident report from having bypassed CAA checking. (e.g., was CAA checking intentionally disabled while supporting the customer migration, or is CAA checking not part of the standard routine HiPKI OV TLS CA - G1 certificate issuance process?) The Root Cause Analysis provided states HiPKI did not thoroughly check the CAA records, but we’re trying to understand why.
CAA checking is part of the standard routine process of our RA prior to send the request to the CA system for certificate issuance. The root cause for having bypassed CAA checking is that we use a batch program in the large-scale migration this time.
(10) The incident report disclosed in Comment 0 includes “In addition, the process of migrating did not follow the complete verification process required by HiPKI OV TLS CA”. Can you help us understand what this means? It’s not clear what part of the verification process was not followed.
The same as the answer to question (9).
Comment 5•11 months ago
|
||
(In response to Comment 3.)
(1) Thank you for updating the report to rely on the improved incident reporting format! A few corrections and clarifications are requested, described below.
-
(A) In the “Summary” section, “Source of incident disclosure” should be “Third Party Reported.” CHT only became aware of this issue due to the third party report submitted by the Chrome Root Program.
-
(B) The “Timeline” should be improved to describe when the “batch process” was first defined, implemented, and how this was evaluated for compliance against the TLS BRs and root program expectations before being adopted on 2024-09-10. Other expected timeline items do not appear present as described necessary on CCADB.org. For example, the timeline should also describe when CHT realized it violated ecosystem requirements, when the updated CRL that made the revocation status change available to relying parties was published, and when issuance was resumed.
-
(C) The “Related Incidents” section is missing.
-
(D) The “Root Cause Analysis” section is lacking detail and does not allow us to understand why this incident took place. It describes one contributing factor (i.e., “insufficient checks”), but the circumstances that allowed that contributing factor to exist and avoid detection are unclear. Were there no other contributing factors?
-
(E) The “Action Items” section is lacking detail and does not sufficiently describe a plan that would reliably ensure similar incidents do not occur in the future. Personnel training is considered especially ineffective when compared to the adoption of strong technical controls. Additionally, the “Evaluation Criteria” provided is not consistent with the expectations on CCADB.org and does not describe how CHT will measure the effectiveness of the Action Item in addressing the Root Cause, or how the public can contribute to this process.
(2) The provided timeline describes certificates revoked on 2025-03-03 that were not considered in scope of the first round of revocations.
-
(A) Given these revocations took place more than 24 hours following the incident’s disclosure (and the first round of revocations), should these revocations be considered “delayed?” If not, can you explain why?
-
(B) Can you explain why these certificates were not considered in the first batch of revocations?
-
(C) What improvements does CHT intend to make such that it can more effectively scope the impact of incidents?
-
(D) How will improvements in response to (C) be captured as Action Items?
(3) We’re still having a hard time understanding the relationship between MODA and CHT in the context of a publicly-trusted CA. Can you clarify whether CHT should be considered a Delegated Third Party from the perspective of 政府伺服器數位憑證管理中心 - G1, and explain reasoning for that conclusion?
(4) In response to Question 3 in Comment 2, it was stated that MODA owns the systems that are responsible for performing DCV for the certificates issued by 政府伺服器數位憑證管理中心 - G1.
This means the TLS BR requirement described in Question 2 of Comment 2 (“The CA SHALL confirm that prior to issuance, the CA has validated each Fully-Qualified Domain Name (FQDN) listed in the Certificate as follows…”) can be interpreted as “MODA SHALL confirm that prior to issuance, the it has validated each Fully-Qualified Domain Name (FQDN) listed in the Certificate as follows…”
We feel this distinction is important, as to us, it seems to mean MODA should be considered the organization that performed DCV for the domains in question, not CHT. Extending this further, it then seems inappropriate for CHT to have reused MODA’s validation data for its own use by HiPKI OV TLS CA - G1.
If we consider other CA Owners with a similar type of relationship as we understand exists between CHT and MODA, for example Amazon Trust Services (owner of “Amazon Root CA 1”) and DigiCert (owner of “Amazon RSA 2048 M01”) - it would seem equally inappropriate for DigiCert to reuse DCV collected in support of issuing a certificate to an Amazon customer for certificate issuance from a non-Amazon affiliated ICA (e.g., “DigiCert Global G3 TLS ECC SHA384 2020 CA1”). To be clear, this example is purely hypothetical and is not intended to question the practices of Amazon Trust Services or DigiCert.
Other opinions from the community are also welcome and appreciated on the understanding above and questions presented below.
-
(A) Does CHT agree with this conclusion? If not, can you help us understand why?
-
(B) Did CHT consider whether it was eligible to reuse the MODA DCV data before issuing the certificates from HiPKI OV TLS CA - G1 affected by this incident? If so, what was considered and what were the conclusions?
-
(C) Have CHT-owned ICAs reused DCV data belonging to a different CA Owner (e.g., 政府伺服器數位憑證管理中心 - G1) in the past?
(5) Are the systems performing DCV for 政府伺服器數位憑證管理中心 - G1 the same or different than those relied upon for performing DCV for HiPKI OV TLS CA - G1?
(6) Do GTLSCA subscribers accept the same Subscriber Agreement as HiPKI OV TLS CA - G1 subscribers? We’re having a hard time finding an agreement that states applicability for the GTLSCA.
Given the GTLSCA CPS is hosted separately from HiPKI’s, we assume there is a separate agreement.
(7) For what reason(s) did CHT deviate from its standard issuance process for HiPKI OV TLS CA - G1 to issue the certificates affected by this incident?
(8) What compliance considerations were made prior to executing the “batch” issuance process, where CHT deviated from its standard issuance process?
(9) Has CHT relied upon non-standard issuance processes in the past? If so, when, and what were the circumstances for doing so?
(10) In response to Question 7 in Comment 2, we were referred to Bug 1532436. We see no reference to CAA in that bug.
To improve our understanding, can you help us with a few real-world examples? For example, could you share the procedures used for validating DCV and CAA for the following certificates issued by 政府伺服器數位憑證管理中心 - G1:
-
(A) www.kcc.gov.tw - issued February 13, 2025
-
(B) 117-56-73-149.rpb.gov.tw - issued August 26, 2024
-
(C) domain.gsn.gov.tw - issued February 26, 2025
-
(D) qec.kh.edu.tw - issued September 13, 2024
Providing relevant logs (e.g., dig output) and applicable code snippets, assuming they do not reveal sensitive information, to corroborate your response, would be helpful!
Thank you very much!
Hi, a couple of questions:
Where we got lucky
Only 24 certificates were sent to subscribers for installation testing.
Could you describe the process you use for sending out certificates to subscribers? For example, if this is a zip file, what content does the zipfile contain?
All times are UTC+8.
2024-09-10:
11:28 Conducted the first small batch issuance with a quantity of 121 certificates.2025-02-21:
11:39 Conducted a mass issuance operation. As of 2025-02-26, a total of 11,739 certificates have been reissued.
(Nit: Please use minute-level granularity, as described here, if possible.)
There was a couple of months of gap between these. Did any further discussions/meetings take place regarding this plan? Did anyone raise the DCV reuse being a risk? Was CAA brought up?
Why was 2025-02-26 chosen for the execution of the plan date?
(Similar to question 5 from comment 5) Could you please describe, as much as you are able to, the physical separation (if any), of the physical hardware providing DCV for 政府伺服器數位憑證管理中心 - G1 in relation to the physical hardware providing DCV for HiPKI OV TLS CA - G1?
Comment 7•11 months ago
|
||
I would like to better understand whether CHT’s actions resulted from a misunderstanding of compliance obligations, or if they point to a broader structural issue within CHT’s CA operations and its relationship with MODA.
From my understanding of the requirements and based on CHT’s comments, CHT was responsible for performing domain validation, and that responsibility could not be delegated to nor performed by MODA; and CHT’s reuse of domain validation data from MODA was a significant mistake that led to non-compliance.
To fully assess the compliance implications, I believe it would be helpful to review the relevant portions of the CHT-MODA agreement to verify how responsibilities were assigned. Can CHT provide a copy of the relevant sections of this agreement? Even if it is in Chinese, it can be translated for review.
Thank you.
| Assignee | ||
Comment 8•11 months ago
|
||
We repost the full incident report as follows, where the date of first issuance is revised. And responses to other questions will be provided in the next reply.
Full Incident Report
Summary
- CA Owner CCADB unique ID: A000015
- Incident description: Due to the phase-out of the old root certificate, eCA-G1, CHT has recently initiated a plan to migrate existing TLS certificates in GTLSCA to the Single-purpose PKI (HiPKI). Since both GTLSCA and HiPKI OV TLS CA are operated by CHT, we recognize the evidences of domain control of these valid certificates on GTLSCA by reusing validation data or documents, and directly proceed with certificate reissuance through HiPKI OV TLS CA.
On March 1, 2025, CHT received notification from the Chrome Root Program that they have noted an atypical surge in TLS certificate issuance (of specific domain) by CHT and observed issuance of new certificates for several domains that appear to have CAA records set that conflict with the CHT permitted to issue values described in the CPS of HiPKI OV TLS CA. After holding a preliminary incident discussion meeting with the CA response team, we have decided to promptly revoke all certificates issued during the migration period. In addition, we have also revoked previously issued certificates that had similar concerns regarding the domain control validation method. - Timeline summary:
- Non-compliance start date: 2024-09-04
- Non-compliance identified date: 2025-03-01
- Non-compliance end date: 2025-03-03
- Relevant policies: Section 4.2.1 of our CPS (Version 1.0), Section 3.2.2.8 CAA Records of TLS BR (Version 2.1.3)
This is a violation of our CPS, which states: “If a CAA record exists that does not list “pki.hinet.net” or “tls.hinet.net” as a CAA Issuer Domain Name, HiPKI OV TLS CA will not issue the certificate.” - Source of incident disclosure: Third Party Reported
Impact
- Total number of certificates: (revocations)11,860
- Total number of "remaining valid" certificates: 0
- Affected certificate types: This incident affects OV certificates (with OID 2.23.140.1.2.2).
- Incident heuristic: 11,860, the full corpus of affected certificates are disclosed in the Appendix.
- Was issuance stopped in response to this incident, and why or why not?: Yes. Issuance was stopped after learning of this issue and all the affected certificates are revoked immediately.
- Analysis:
- Additional considerations:
Timeline
All times are UTC+8.
2024-08-29:
- To facilitate the rapid migration of existing TLS certificates from GTLSCA to HiPKI, we defined and developed the “batch process” to conduct reissuance by relying on DCV reuse.
2024-09-03:
- 19:00 The “batch process” was first deployed.
2024-09-04:
- 09:00 The initial validation test (implemented) in the production environment with the “batch process”, issuing 1 certificate which is in attachment.
2024-09-10:
- 11:28 Conducted the small batch issuance with a quantity of 119 certificates.
2024-10-18:
- 14:28 Use the “batch process” to issue an extra certificate which is in original attachment as well.
2025-02-21:
- 11:39 Conducted a mass issuance operation. As of 2025-02-26, a total of 11,739 certificates have been reissued.
2025-03-01:
- 05:20 CHT received a notification from the Chrome Root Program indicating an unusual surge in certificate issuance and potential issues with the checking of CAA records set.
- 11:27 CHT holds a preliminary incident discussion meeting.
- 12:28 Convene the CA team members and personnels of computer room.
- 12:55 Develop a large-scale revocation process and SOP, and inventory the list of affected certificates that are prepared for revocation.
- 13:04 Begin the process of modifying the program for large-scale revocation and adjusting parameters.
- 15:05 Program delivery for test environment verification testing.
- 15:29 Deployment in the test environment failed, proceeding with the second round of program modifications and parameter adjustments.
- 17:30 Completed verification testing in the test environment.
- 17:35 Program delivered for deployment to the production environment.
- 17:47 Begin revoking the first problematic certificate.
- 18:27 Begin large-scale revocation process; by 20:25, 11,855 certificates have been revoked.
- 21:19 Revoke one more, bringing the total to 11,856 certificates revoked.
2025-03-03:
- 13:30 After a detailed investigation and inventory, an additional 4 affected certificates were identified.
- 14:34 Revoke four more problematic certificates, bringing the total to 11,860 certificates revoked.
Related Incidents
| Bug | Date | Description |
|---|---|---|
| 1904038 | 2024-06-21 | “Test Website - Valid" URL disclosed to CCADB is expired. |
| 1947034 | 2025-02-09 | Outdated and stale policy documents disclosed to the CCADB. |
| 1917224 | 2025-02-09 | Delay in submitting annual audit report. |
Root Cause Analysis
Contributing Factor 1: insufficient checks
-
Description: The issuance was only made due to DCV reuse, the process of migrating existing TLS certificates from GTLSCA to HiPKI did not thoroughly check the CAA records. For example, a specific domain serves a CAA record with issue tag "letsencrypt.org" or "GTLSCA-G1", which denies HiPKI OV TLS CA from issuing. In addition, the process of migrating did not follow the complete verification process required by HiPKI OV TLS CA. As a result, the evidence for domain control may be unreliable.
-
Timeline:
2024-09-04 09:00: Conducted the first small batch issuance with 1 certificate.
2024-09-10 11:28 Conducted the small batch issuance with a quantity of 119 certificates.
2024-10-18 14:28 Use the “batch process” to issue an extra certificate.
2025-02-21 11:39 Conducted a mass issuance operation. As of 2025-02-26, a total of 11,739 certificates have been reissued. -
Detection: CHT received a notification from the Chrome Root Program indicating potential issues with the checking of CAA records set.
-
Interaction with other factors: Inexperience with large-scale migration methods, we should seek more advice from other CAs or experts before proceeding with reissue operations.
Lessons Learned
- What went well: The large-scale revocation process and SOP had been rehearsed beforehand, and we are glad it proved useful this time, causing the affected certificates can successfully revoked within a short period.
- What didn’t go well: During the migration process, more attention should be paid, like checking CAA records and ensuring BR compliance.
- Where we got lucky: Only 24 certificates were sent to subscribers for installation testing.
- Additional: We should seek more advice from other CAs or experts before proceeding reissue operations.
Action Items
| Action Item | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Personnel retraining as to BR compliance | Prevent | Root Cause # 1 | Insufficient checks is the main factor, and we conducted retraining actions | 2025-03-10 | Completed |
| Seeking other large-scale migration methods for lesson learned | Prevent | What didn’t go well | Inexperience with large-scale migration methods, we should seek more advice from other CAs or experts before proceeding reissue operations. | 2025-04-15 | Ongoing |
Appendix
See Attached.
| Assignee | ||
Comment 9•11 months ago
|
||
(Update) We repost the full incident report again as follows, where the date of first issuance is revised. And responses to other questions will be provided in the next reply.
Full Incident Report
Summary
- CA Owner CCADB unique ID: A000015
- Incident description: Due to the phase-out of the old root certificate, eCA-G1, CHT has recently initiated a plan to migrate existing TLS certificates in GTLSCA to the Single-purpose PKI (HiPKI). Since both GTLSCA and HiPKI OV TLS CA are operated by CHT, we recognize the evidences of domain control of these valid certificates on GTLSCA by reusing validation data or documents, and directly proceed with certificate reissuance through HiPKI OV TLS CA.
On March 1, 2025, CHT received notification from the Chrome Root Program that they have noted an atypical surge in TLS certificate issuance (of specific domain) by CHT and observed issuance of new certificates for several domains that appear to have CAA records set that conflict with the CHT permitted to issue values described in the CPS of HiPKI OV TLS CA. After holding a preliminary incident discussion meeting with the CA response team, we have decided to promptly revoke all certificates issued during the migration period. In addition, we have also revoked previously issued certificates that had similar concerns regarding the domain control validation method. - Timeline summary:
- Non-compliance start date: 2024-09-04
- Non-compliance identified date: 2025-03-01
- Non-compliance end date: 2025-03-03
- Relevant policies: Section 4.2.1 of our CPS (Version 1.0), Section 3.2.2.8 CAA Records of TLS BR (Version 2.1.3)
This is a violation of our CPS, which states: “If a CAA record exists that does not list “pki.hinet.net” or “tls.hinet.net” as a CAA Issuer Domain Name, HiPKI OV TLS CA will not issue the certificate.” - Source of incident disclosure: Third Party Reported
Impact
- Total number of certificates: (revocations)11,860
- Total number of "remaining valid" certificates: 0
- Affected certificate types: This incident affects OV certificates (with OID 2.23.140.1.2.2).
- Incident heuristic: 11,860, the full corpus of affected certificates are disclosed in the Appendix.
- Was issuance stopped in response to this incident, and why or why not?: Yes. Issuance was stopped after learning of this issue and all the affected certificates are revoked immediately.
- Analysis:
- Additional considerations:
Timeline
All times are UTC+8.
2024-08-29:
- To facilitate the rapid migration of existing TLS certificates from GTLSCA to HiPKI, we defined and developed the “batch process” to conduct reissuance by relying on DCV reuse.
2024-09-03:
- 19:00 The “batch process” was first deployed.
2024-09-04:
- 09:00 The initial validation test (implemented) in the production environment with the “batch process”, issuing 1 certificate which is in attachment.
2024-09-10:
- 11:28 Conducted the small batch issuance with a quantity of 119 certificates.
2024-10-18:
- 14:28 Use the “batch process” to issue an extra certificate which is in original attachment as well.
2025-02-21:
- 11:39 Conducted a mass issuance operation. As of 2025-02-26, a total of 11,739 certificates have been reissued.
2025-03-01:
- 05:20 CHT received a notification from the Chrome Root Program indicating an unusual surge in certificate issuance and potential issues with the checking of CAA records set.
- 11:10 CHT became aware of the incident.
- 11:27 CHT holds a preliminary incident discussion meeting and CHT realized it violated ecosystem requirements at this moment.
- 12:28 Convene the CA team members and personnels of computer room.
- 12:55 Develop a large-scale revocation process and SOP, and inventory the list of affected certificates that are prepared for revocation.
- 13:04 Begin the process of modifying the program for large-scale revocation and adjusting parameters.
- 15:05 Program delivery for test environment verification testing.
- 15:29 Deployment in the test environment failed, proceeding with the second round of program modifications and parameter adjustments.
- 17:30 Completed verification testing in the test environment.
- 17:35 Program delivered for deployment to the production environment.
- 17:47 Begin revoking the first problematic certificate.
- 18:27 Begin large-scale revocation process; by 20:25, 11,855 certificates have been revoked.
- 20:26 Check the log and audit to ensure the total number of entries is accurate, and simultaneously perform random checks to confirm that the OCSP/CRL status has been updated to "Revoked (superseded)".
- 21:19 Revoke one more, bringing the total to 11,856 certificates revoked.
- 21:52 Check and confirm the OCSP status of this revoked certificate has been updated.
2025-03-03:
- 13:30 After a detailed investigation and inventory, an additional 4 affected certificates were identified.
- 14:34 Revoke four more problematic certificates, bringing the total to 11,860 certificates revoked. CHT completed revocation of affected certificates.
- 14:59 Check the OCSP status of these four recently revoked certificates and confirm that the CRL status has been updated.
- 15:10 The incident ended.
Related Incidents
| Bug | Date | Description |
|---|---|---|
| 1904038 | 2024-06-21 | “Test Website - Valid" URL disclosed to CCADB is expired. |
| 1947034 | 2025-02-09 | Outdated and stale policy documents disclosed to the CCADB. |
| 1917224 | 2025-02-09 | Delay in submitting annual audit report. |
Root Cause Analysis
Contributing Factor 1: insufficient checks
-
Description: The issuance was only made due to DCV reuse, the process of migrating existing TLS certificates from GTLSCA to HiPKI did not thoroughly check the CAA records. For example, a specific domain serves a CAA record with issue tag "letsencrypt.org" or "GTLSCA-G1", which denies HiPKI OV TLS CA from issuing. In addition, the process of migrating did not follow the complete verification process required by HiPKI OV TLS CA. As a result, the evidence for domain control may be unreliable.
-
Timeline:
2024-09-04 09:00: Conducted the first small batch issuance with 1 certificate.
2024-09-10 11:28 Conducted the small batch issuance with a quantity of 119 certificates.
2024-10-18 14:28 Use the “batch process” to issue an extra certificate.
2025-02-21 11:39 Conducted a mass issuance operation. As of 2025-02-26, a total of 11,739 certificates have been reissued. -
Detection: CHT received a notification from the Chrome Root Program indicating potential issues with the checking of CAA records set.
-
Interaction with other factors: Inexperience with large-scale migration methods, we should seek more advice from other CAs or experts before proceeding with reissue operations.
Contributing Factor 2: lacks an understanding of compliance
-
Description: During the preliminary incident discussion meeting on the morning of March 1st, we also convened the development team to review the verification process. The development team pointed out that the “batch process” indeed failed to implement the CAA checking mechanism. CHT realized it violated ecosystem requirements at this moment.
-
Timeline:
2024-08-29 The “batch process” is first defined and the requirements are handed over to the development team for implementation.
2024-09-03 19:00 The “batch process” was first deployed.
2025-03-01 11:27 CHT holds a preliminary incident discussion meeting and aware of the violation of failure to check CAA record during migration. -
Detection: During the discussion meeting on the morning of March 1st, CHT realized it violated ecosystem requirements.
-
Interaction with other factors: The development team lacks an understanding of compliance against to TLS BRs and root program expectations. In addition, the management team should conduct a more in-depth review of the verification process of the "batch process".
Lessons Learned
- What went well: The large-scale revocation process and SOP had been rehearsed beforehand, and we are glad it proved useful this time, causing the affected certificates can successfully revoked within a short period. The management team should reconfirm compliance during the program development phase and identify any issues early before going live as a good practices.
- What didn’t go well: During the migration process, more attention should be paid, like checking CAA records and ensuring BR compliance. Especially our development team lacks an understanding of compliance against to TLS BRs and root program expectations this time.
- Where we got lucky: Only 24 certificates were sent to subscribers for installation testing. We also ask our development team to strengthened the program delivery process to require confirmation from quality assurance personnel familiar with BRs before going live.
- Additional: Batch issuance may not be feasible, so we will continue to explore alternative solutions that can improve efficiency and ensure system compliance. We will seek more advice from other CAs or experts/browsers before proceeding reissue operations.
Action Items
| Action Item | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Personnel retraining as to BR compliance | Prevent | Root Cause # 1 | (1) Insufficient checks is the main factor, and we conducted retraining actions. (2) There should be no batch migration of certificates that have been verified by different CAs. | 2025-03-10 | Completed |
| Strengthened the program delivery process | Prevent | Root Cause # 2 | Requiring confirmation by program quality assurance personnel familiar with BRs before going online. | 2025-03-12 | Completed |
| Seeking other large-scale migration methods for lesson learned | Prevent | What didn’t go well | Inexperience with large-scale migration methods, we should seek more advice from other CAs or experts/browsers before proceeding reissue operations. | 2025-04-15 | Ongoing |
Appendix
See Attached.
| Assignee | ||
Comment 10•11 months ago
|
||
GTLSCA Dig screenshots
| Assignee | ||
Comment 11•11 months ago
|
||
OVCA Dig log screenshots
| Assignee | ||
Comment 12•10 months ago
|
||
(2) The provided timeline describes certificates revoked on 2025-03-03 that were not considered in scope of the first round of revocations.
- (A) Given these revocations took place more than 24 hours following the incident’s disclosure (and the first round of revocations), should these revocations be considered “delayed?” If not, can you explain why?
I believe we have complied with the requirements of Section 4.9.1.1, item 12 of the BR. It did not take more than five days from receiving the notification to revoke all 11,860 affected certificates.
Section 4.9.1.1, item 12 of the BR states:
"The CA is made aware that the Certificate was not issued in accordance with these Requirements or the CA’s Certificate Policy or Certification Practice Statement (CRLReason #4, superseded);"
- (B) Can you explain why these certificates were not considered in the first batch of revocations?
We only sent 24 certificates to subscribers, and they were installed for testing on the website a few days before revocation. We received the notification on the morning of Saturday, March 1st, and around 13:00, we informed the subscribers of these 24 certificates that the issued certificates should be revoked. However, 4 subscribers were unable to reach out during the holiday, so we contacted the subscribers and revoked the certificates at around 14:34 on the first working day, March 3rd.
- (C) What improvements does CHT intend to make such that it can more effectively scope the impact of incidents?
The verification process should be conducted in accordance with the CPS. After this incident, there should not be any batch migration by which data are verified by different CAs for issuance. Instead, each request should be resubmitted individually and finished the DCV procedure of the issuing CA, and then conduct related checks, like CAA checking, according to the standard verification procedures.
And if this is the case, we should initiate the bulk revocation SOP process that we have established and rehearsed many times. First, we will notify subscribers that we will revoke the certificates within the specified time frame. Once the subscribers respond, revoke the certificates immediately. If no response is received within the time limit, proceed with revocation according to the BR requirements within the stipulated time frame to control the impact of the incident.
- (D) How will improvements in response to (C) be captured as Action Items?
Enhance internal management processes and adopt agile and systematic tracking to prevent the recurrence of failure to complete related fundamental tasks.
(3) We’re still having a hard time understanding the relationship between MODA and CHT in the context of a publicly-trusted CA. Can you clarify whether CHT should be considered a Delegated Third Party from the perspective of 政府伺服器數位憑證管理中心 - G1, and explain reasoning for that conclusion?
MODA is the policy authority of GTLSCA. MODA procures GTLSCA operation and certificate verification services from CHT for all governmental website certificates. GTLSCA system is hosted in CHT’s data center and managed by CHT.
Additionally, GTLSCA is a Sub-CA under CHT's PKI hierarchy, with its operation and technical control fully managed by CHT. We think the content of Delegated Third Party you mentioned does not align with the current situation.
(4) In response to Question 3 in Comment 2, it was stated that MODA owns the systems that are responsible for performing DCV for the certificates issued by 政府伺服器數位憑證管理中心 - G1.
This means the TLS BR requirement described in Question 2 of Comment 2 (“The CA SHALL confirm that prior to issuance, the CA has validated each Fully-Qualified Domain Name (FQDN) listed in the Certificate as follows…”) can be interpreted as “MODA SHALL confirm that prior to issuance, the it has validated each Fully-Qualified Domain Name (FQDN) listed in the Certificate as follows…”
We feel this distinction is important, as to us, it seems to mean MODA should be considered the organization that performed DCV for the domains in question, not CHT. Extending this further, it then seems inappropriate for CHT to have reused MODA’s validation data for its own use by HiPKI OV TLS CA - G1.
If we consider other CA Owners with a similar type of relationship as we understand exists between CHT and MODA, for example Amazon Trust Services (owner of “Amazon Root CA 1”) and DigiCert (owner of “Amazon RSA 2048 M01”) - it would seem equally inappropriate for DigiCert to reuse DCV collected in support of issuing a certificate to an Amazon customer for certificate issuance from a non-Amazon affiliated ICA (e.g., “DigiCert Global G3 TLS ECC SHA384 2020 CA1”). To be clear, this example is purely hypothetical and is not intended to question the practices of Amazon Trust Services or DigiCert.
Other opinions from the community are also welcome and appreciated on the understanding above and questions presented below.
- (A) Does CHT agree with this conclusion? If not, can you help us understand why?
MODA only procures GTLSCA operation and certificate verification services from CHT, and the GTLSCA system (including the DCV system) is hosted in CHT's data center and managed by CHT. As mentioned, GTLSCA is a Sub-CA under CHT's PKI hierarchy, so CHT is responsible for performing domain validation for subscribers in compliance with the BR.
We believe this is different from the case of Amazon and Digicert. MODA can be considered a principal and does not participate in any DCV verification cases.
- (B) Did CHT consider whether it was eligible to reuse the MODA DCV data before issuing the certificates from HiPKI OV TLS CA - G1 affected by this incident? If so, what was considered and what were the conclusions?
As answered to question (A) above, we do not think that is the same case if I'm not misunderstanding, as we think both CAs are operated by CHT.
- (C) Have CHT-owned ICAs reused DCV data belonging to a different CA Owner (e.g., 政府伺服器數位憑證管理中心 - G1) in the past?
Never, this is the first time we perform large-scale migration between CHT-operated ICAs.
(5) Are the systems performing DCV for 政府伺服器數位憑證管理中心 - G1 the same or different than those relied upon for performing DCV for HiPKI OV TLS CA - G1?
There are different, GTLSCA and HiPKI OV TLS CA have their own CA systems, development teams, and management teams.
(6) Do GTLSCA subscribers accept the same Subscriber Agreement as HiPKI OV TLS CA - G1 subscribers? We’re having a hard time finding an agreement that states applicability for the GTLSCA.
Given the GTLSCA CPS is hosted separately from HiPKI’s, we assume there is a separate agreement.
No, there are different Subscriber Agreements with GTLSCA and HiPKI OV TLS CA - G1. We must acknowledge that our migration plan is half-baked and lack experience.
(7) For what reason(s) did CHT deviate from its standard issuance process for HiPKI OV TLS CA - G1 to issue the certificates affected by this incident?
The primary reason was to facilitate a rapid migration, but the team has lack of experience and insufficient considerations led to this incident and the erroneous practice.
(8) What compliance considerations were made prior to executing the “batch” issuance process, where CHT deviated from its standard issuance process?
In this incident, we did not demonstrate the professionalism and assessment expected of a publicly-trusted CA.
CHT’s deviation from its standard issuance process was also held accountable by the company’s senior management, our team were warned to adhere to compliance requirements and professional evaluations and not to make hasty decisions. For major decisions in the future, we will seek external expert opinions before proceeding.
(9) Has CHT relied upon non-standard issuance processes in the past? If so, when, and what were the circumstances for doing so?
No, this is the first time.
(10) In response to Question 7 in Comment 2, we were referred to Bug 1532436. We see no reference to CAA in that bug.
To improve our understanding, can you help us with a few real-world examples? For example, could you share the procedures used for validating DCV and CAA for the following certificates issued by 政府伺服器數位憑證管理中心 - G1:
(A) www.kcc.gov.tw - issued February 13, 2025
(B) 117-56-73-149.rpb.gov.tw - issued August 26, 2024
(C) domain.gsn.gov.tw - issued February 26, 2025
(D) qec.kh.edu.tw - issued September 13, 2024
Providing relevant logs (e.g., dig output) and applicable code snippets, assuming they do not reveal sensitive information, to corroborate your response, would be helpful!
Thank you very much!
For validating DCV:
- (A) www.kcc.gov.tw - issued February 13, 2025
- (B) 117-56-73-149.rpb.gov.tw - issued August 26, 2024
- (C) domain.gsn.gov.tw - issued February 26, 2025
GTLSCA performs DCV using the method in BR 3.2.2.4.12 by checking whether there is registration information about these FQDNs (ending with "gov.tw") in GNS network service. - (D) qec.kh.edu.tw - issued September 13, 2024
GTLSCA performs DCV with the method in BR 3.2.2.4.12 by
(a) Using "Ping qec.kh.edu.tw" to get the IP information of this FQDN, like 163.16.5.55; and then
(b) checking if there are any registration information like user unit as to this IP (163.16.5.55) in TANet Whois Database.
For CAA checking:
GTLSCA provides a verification system (with Dig instruction like dig caa www.kcc.gov.tw) for RAOs to perform CAA checking, see here. However, GTLSCA does not support the retention of verification trail logs, and the verification process is done by manually taking a screenshot of the current verification result and storing it in a file as evidence.
While in HiPKI OV TLS CA, it do support the retention of verification trail logs, see here.
| Assignee | ||
Comment 13•10 months ago
|
||
(In reply to amir from comment #6)
Hi, a couple of questions:
Where we got lucky
Only 24 certificates were sent to subscribers for installation testing.Could you describe the process you use for sending out certificates to subscribers? For example, if this is a zip file, what content does the zipfile contain?
We sent notification emails with the zipfile to the contacts of the 24 governmental agencies that participated in the installation testing, where the zipfile containing the subscriber certificate and the issuing CA and Root CA certificates of the trusted certificate chain.
All times are UTC+8.
2024-09-10:
11:28 Conducted the first small batch issuance with a quantity of 121 certificates.2025-02-21:
11:39 Conducted a mass issuance operation. As of 2025-02-26, a total of 11,739 certificates have been reissued.(Nit: Please use minute-level granularity, as described here, if possible.)
There was a couple of months of gap between these. Did any further discussions/meetings take place regarding this plan? Did anyone raise the DCV reuse being a risk? Was CAA brought up?
Why was 2025-02-26 chosen for the execution of the plan date?
Several discussion meetings were held, and the relevant dates are briefly described below.
Timeline
2024-05-09:
- By the first time CHT requests the Root program’s approval of an external subordinate CA cross-certificate issued to GTLSCA from HiPKI Root CA.
2024-06-11:
- CHT received the reject notification from the Root Program.
2024-08-29, 2024-08-30, 2024-09-01:
- CHT team held several meetings regarding this batch process (discussed DCV reuse and technical feasibility, but forgotten to consider CAA Checking) to prepare for the mass transfer of valid certificates issued by GTLSCA.
2024-09-03:
- 19:00 The “batch process” was first deployed.
2024-09-04:
- 09:00 The initial validation test (implemented) in the production environment with the “batch process”, 1 certificate was issued.
2024-09-10:
- 11:28 Conducted the small batch issuance with a quantity of 119 certificates.
2024-10-18:
- 14:28 Use the “batch process” to issue an extra certificate.
[a total of 121 certificates were issued with the batch process but not sending out to subscribers.]
2024-12-13:
- CHT again requests the Root program’s approval of the external subordinate CA cross-certificate issued to GTLSCA from HiPKI Root CA.
2025-01-17:
- The Root program rejected the cross-certificate request.
2025-02-12, 2025-02-18 :
- CHT team decided to expedite the migration of valid GTLSCA certificates by using the batch process again which is conducted in February 2025.
[a total of 11,739 certificates were reissued with the batch process, and only 24 certificates were sent to subscribers for installation testing.]
(Similar to question 5 from comment 5) Could you please describe, as much as you are able to, the physical separation (if any), of the physical hardware providing DCV for 政府伺服器數位憑證管理中心 - G1 in relation to the physical hardware providing DCV for HiPKI OV TLS CA - G1?
GTLSCA and HiPKI OV TLS CA have their own CA systems, development teams, management teams, and independent CA facilities/physical controls mechanisms.
| Assignee | ||
Comment 14•10 months ago
|
||
(In reply to Ben Wilson from comment #7)
I would like to better understand whether CHT’s actions resulted from a misunderstanding of compliance obligations, or if they point to a broader structural issue within CHT’s CA operations and its relationship with MODA.
From my understanding of the requirements and based on CHT’s comments, CHT was responsible for performing domain validation, and that responsibility could not be delegated to nor performed by MODA; and CHT’s reuse of domain validation data from MODA was a significant mistake that led to non-compliance.
To fully assess the compliance implications, I believe it would be helpful to review the relevant portions of the CHT-MODA agreement to verify how responsibilities were assigned. Can CHT provide a copy of the relevant sections of this agreement? Even if it is in Chinese, it can be translated for review.
Thank you.
Due to the confidentiality agreement between MODA and CHT, I am not able to provide you the related documents. However, I can provide you with the Request for Proposal document about this GTLSCA Project for your reference in private.
Comment 15•10 months ago
|
||
Hello CHT Team,
Thank you for providing answers to the questions we presented in Comment 5.
General observations:
-
The “Related Incidents” section of the full incident report provided in Comment 9 is intended to present a discussion of other incidents sharing the same or similar root cause(s) and/or failure modes as the subject report. Other than being incident reports opened by CHT, it’s not clear to us how the cited reports are related to the matters described in this report. As cited on CCADB.org, the description column is supposed to explain that relationship.
-
Many of the responses lack evidence or supporting details to help the reader understand the positions cited. For example, Comment 12 cites the CHT/MODA relationship is different than that of Amazon/DigiCert example, but lacks detail to help a reader understand why that’s the case.
Additional questions:
(1) The response to question 2A from Comment 5 states: “I believe we have complied with the requirements of Section 4.9.1.1, item 12 of the BR. It did not take more than five days from receiving the notification to revoke all 11,860 affected certificates.”
We do not agree that 4.9.1.1 Item 12 of the TLS BRs is the appropriate reference, but instead, 4.9.1.1 Item 5 (“The CA obtains evidence that the validation of domain authorization or control for any Fully-Qualified Domain Name or IP address in the Certificate should not be relied upon (CRLReason #4, superseded).” Item #5 obligates CAs to revoke affected certificates within 24 hours.
-
(A) Do you agree with 4.9.1.1 Item 5 being the correct benchmark? If you do not agree, we’re open to better understanding your position, in which case please offer an explanation.
-
(B) If you do agree, does that change your opinion on the response provided regarding possible revocation delay?
(2) It’s still unclear why CHT should not be considered a delegated third party concerning DCV for MODA’s issuance — or why CHT considered it appropriate for CHT to reuse MODA’s DCV data. Can you please help us understand this better?
(3) We’d like to make sure we fully understand what’s being presented in the screenshots uploaded to Comments 10 and 11.
-
(A) Should “dig_domain.gsn.gov.tw.png” (from Comment 10) be interpreted as the CAA check log record captured prior to the issuance of this certificate? If not, can you explain how we should interpret this image?
-
(B) If the answer to the immediately above question is “Yes” - we interpret “dig_domain.gsn.gov.tw.png” to show a CAA record was set (i.e., globalsign.com) which GTLSCA was not permitted to issue against. Is that CHT’s understanding? If not, can you please explain why?
-
(C) If the answer to 3B, above, is “Yes” - does that mean the scope of this incident described by CHT in Comment 3 (i.e., that the incident was limited to the “batch” process as customers were migrated across CHT’s publicly-trusted hierarchies) is incorrect?
-
(D) If the answer to 3B, above, is “Yes” - does that not mean this (and presumably others, like this one should also be considered mis-issued?
Comment 16•10 months ago
|
||
The CCADB Incident Reporting Guidelines set an expectation that “CA Owners should respond promptly to comments and questions, and MUST respond within 7 days, even if only to acknowledge the request and provide a timeline for a full response.”
It has been over 7 days since we asked the questions presented in Comment 15. Can the CHT Team please share when we can expect answers to these questions?
| Assignee | ||
Comment 17•10 months ago
|
||
We will reply today, some questions must confirm with our team.
| Assignee | ||
Comment 18•10 months ago
|
||
(In reply to chrome-root-program from comment #15)
Hello CHT Team,
Thank you for providing answers to the questions we presented in Comment 5.
General observations:
The “Related Incidents” section of the full incident report provided in Comment 9 is intended to present a discussion of other incidents sharing the same or similar root cause(s) and/or failure modes as the subject report. Other than being incident reports opened by CHT, it’s not clear to us how the cited reports are related to the matters described in this report. As cited on CCADB.org, the description column is supposed to explain that relationship.
Many of the responses lack evidence or supporting details to help the reader understand the positions cited. For example, Comment 12 cites the CHT/MODA relationship is different than that of Amazon/DigiCert example, but lacks detail to help a reader understand why that’s the case.
Additional questions:
(1) The response to question 2A from Comment 5 states: “I believe we have complied with the requirements of Section 4.9.1.1, item 12 of the BR. It did not take more than five days from receiving the notification to revoke all 11,860 affected certificates.”
We do not agree that 4.9.1.1 Item 12 of the TLS BRs is the appropriate reference, but instead, 4.9.1.1 Item 5 (“The CA obtains evidence that the validation of domain authorization or control for any Fully-Qualified Domain Name or IP address in the Certificate should not be relied upon (CRLReason #4, superseded).” Item #5 obligates CAs to revoke affected certificates within 24 hours.
(A) Do you agree with 4.9.1.1 Item 5 being the correct benchmark? If you do not agree, we’re open to better understanding your position, in which case please offer an explanation.
(B) If you do agree, does that change your opinion on the response provided regarding possible revocation delay?
After the previous discussions, we agree with the perspective you proposed. Since TLS BR 4.9.1.1 Item 5 is an appropriate reference, there was indeed a delay in revocation by CHT in this incident.
We will take this case as lesson learned and keep watch on similar cases to ensure that future situations meet the requirement of TLS BR, especially Item 5 of TLS BR 4.9.1.1, and can be addressed promptly.
(2) It’s still unclear why CHT should not be considered a delegated third party concerning DCV for MODA’s issuance — or why CHT considered it appropriate for CHT to reuse MODA’s DCV data. Can you please help us understand this better?
It must be clarified that CHT does not deny the possible inappropriate reuse of GTLSCA/MODA DCV data after the previous discussions.
In our opinion, if CHT is a delegated third party, then CHT is not responsible for performing domain validation for subscribers and is not the CA, but is only authorized by the CA by performing or fulfilling one or more of the CA requirements set forth in the BR. And that is why we believe this situation is different from that of Amazon and Digicert, because MODA is procuring GTLSCA operations and certificate validation services from CHT, rather than CHT purchasing services/CA authorization from GTLSCA/MODA in our case.
(3) We’d like to make sure we fully understand what’s being presented in the screenshots uploaded to Comments 10 and 11.
- (A) Should “dig_domain.gsn.gov.tw.png” (from Comment 10) be interpreted as the CAA check log record captured prior to the issuance of this certificate? If not, can you explain how we should interpret this image?
Yes, it is the CAA check log record captured prior to the issuance. However, the GTLSCA RAO did not cease the issuance of this certificate, resulting in the misissuance of the certificate.
- (B) If the answer to the immediately above question is “Yes” - we interpret “dig_domain.gsn.gov.tw.png” to show a CAA record was set (i.e., globalsign.com) which GTLSCA was not permitted to issue against. Is that CHT’s understanding? If not, can you please explain why?
Yes, we understood. The CAA record indicates that the domain is only authorized for issuance by globalsign, and GTLSCA is not authorized. GTLSCA should not issue this certificate.
- (C) If the answer to 3B, above, is “Yes” - does that mean the scope of this incident described by CHT in Comment 3 (i.e., that the incident was limited to the “batch” process as customers were migrated across CHT’s publicly-trusted hierarchies) is incorrect?
Yes, we agreed. We are conducting further reviews to determine if there are more affected certificates.
Yes, we confirm that these certificates should be considered misissued. We are expanding the scope of our investigation to ensure that all affected certificates are revoked and documented in the incident report.
| Assignee | ||
Comment 19•10 months ago
|
||
Regarding the mis-issued incidents in responses 3C and 3D of comment 18, we always trained the GTLSCA RAO must conduct CAA record checks prior to the issuance of the certificates. On the evening of March 26, while preparing to respond to Comment 18, we reconfirmed with the CA system development team and the reviewing RAO and discovered that the GTLSCA system did not block the issuance of certificates that its CAA records were non-compliant with the BR. We will report this mis-issued incident on Bugzilla.
Comment 20•10 months ago
|
||
(In reply to Tsung-Min Kuo from comment #18)
(In reply to chrome-root-program from comment #15)
Hello CHT Team,
Thank you for providing answers to the questions we presented in Comment 5.
General observations:
The “Related Incidents” section of the full incident report provided in Comment 9 is intended to present a discussion of other incidents sharing the same or similar root cause(s) and/or failure modes as the subject report. Other than being incident reports opened by CHT, it’s not clear to us how the cited reports are related to the matters described in this report. As cited on CCADB.org, the description column is supposed to explain that relationship.
Many of the responses lack evidence or supporting details to help the reader understand the positions cited. For example, Comment 12 cites the CHT/MODA relationship is different than that of Amazon/DigiCert example, but lacks detail to help a reader understand why that’s the case.
Additional questions:
(1) The response to question 2A from Comment 5 states: “I believe we have complied with the requirements of Section 4.9.1.1, item 12 of the BR. It did not take more than five days from receiving the notification to revoke all 11,860 affected certificates.”
We do not agree that 4.9.1.1 Item 12 of the TLS BRs is the appropriate reference, but instead, 4.9.1.1 Item 5 (“The CA obtains evidence that the validation of domain authorization or control for any Fully-Qualified Domain Name or IP address in the Certificate should not be relied upon (CRLReason #4, superseded).” Item #5 obligates CAs to revoke affected certificates within 24 hours.
(A) Do you agree with 4.9.1.1 Item 5 being the correct benchmark? If you do not agree, we’re open to better understanding your position, in which case please offer an explanation.
(B) If you do agree, does that change your opinion on the response provided regarding possible revocation delay?
After the previous discussions, we agree with the perspective you proposed. Since TLS BR 4.9.1.1 Item 5 is an appropriate reference, there was indeed a delay in revocation by CHT in this incident.
We will take this case as lesson learned and keep watch on similar cases to ensure that future situations meet the requirement of TLS BR, especially Item 5 of TLS BR 4.9.1.1, and can be addressed promptly.
Have you, or will you create a delayed revocation bug for this delayed revocation?
Comment 21•10 months ago
|
||
(In reply to chrome-root-program from comment #15)
(1) The response to question 2A from Comment 5 states: “I believe we have complied with the requirements of Section 4.9.1.1, item 12 of the BR. It did not take more than five days from receiving the notification to revoke all 11,860 affected certificates.”
We do not agree that 4.9.1.1 Item 12 of the TLS BRs is the appropriate reference, but instead, 4.9.1.1 Item 5 (“The CA obtains evidence that the validation of domain authorization or control for any Fully-Qualified Domain Name or IP address in the Certificate should not be relied upon (CRLReason #4, superseded).” Item #5 obligates CAs to revoke affected certificates within 24 hours.
Hi Chrome Root Program,
I actually disagree with this analysis. Checking CAA records is not part of "validation of domain authorization or control", it is (quoting the BRs, Section 3.2.2.8) "part of the Certificate issuance process". After all, the results of a CAA check can never cause a CA to issue a certificate; it can only cause a CA to not issue a certificate that has otherwise been fully validated.
It's easy to see why CAA might be construed as part of DCV, given that it also requires reaching out to DNS zones under the Applicant's control, and especially given Section 3.2.2.8's presence within Section 3.2.2 "Authentication of Organization and Domain Identity". But the Validation Subcommittee of the CA/Browser Forum has recently recognized that this placement of the section is a bug, not a feature, and is working on a ballot which moves CAA requirements to Section 4.2.2.1 (part of "Certificate Application Processing") where it more correctly belongs.
Comment 22•10 months ago
|
||
I would like to second the interpretation that checking CAA records for "issue/issuewild" tags is not part of the Domain Control Validation process. However, if used as described in methods 3.2.2.4.13 and 3.2.2.4.17, then the CAA records are part of the DCV process, and fall under the provisions of 4.9.1.1 item 5.
Comment 23•10 months ago
|
||
I disagree with both Comment 22 and 21.
I have not followed the CA/B forum's findings on this, but if the implication is that this moves to a 5 day window of revocation, I would disagree with that too.
CAA (Certificate Authority Authorization) should be considered in the authorization part of domain authorization and control.
I also have issues with:
After all, the results of a CAA check can never cause a CA to issue a certificate; it can only cause a CA to not issue a certificate that has otherwise been fully validated.
From my interpretation of the BRs, domain issuance can be summed up as: "Always issue the certificate, except in ${list_of_scenarios}". For example, that list includes stuff like:
- Applicant does not have control over the name.
- CAA records prevent our CA from issuing.
- etc...
Given this, I've personally not considered it useful to split the issuance/DCV process into positive or negative signals. They're effectively all negative signals in my mind. Following that logic, I'd not relegate CAA checking to fall outside of domain control validation and authorization.
Given that, I'd consider violating CAA to fall under both:
The Subscriber notifies the CA that the original certificate request was not authorized and does not retroactively grant authorization (CRLReason #9, privilegeWithdrawn);
and
The CA obtains evidence that the validation of domain authorization or control for any Fully-Qualified Domain Name or IP address in the Certificate should not be relied upon (CRLReason #4, superseded).
There are really four separate ways CAA checking can be incorrect:
- CA just outright didn't check for CAA.
- CA had an implementation/logic error.
- DNS issues on the side of the CA.
- DNS issues on the side of the applicant.
Out of all of these, I'd argue only the fourth wouldn't fall within a 24 hour revocation window (or, potentially any revocation window since at the time of issuance, CAA passed successfully).
Comment 24•10 months ago
|
||
Responding to Comments 21, 22, and 23.
Thanks for your participation in the discussion!
We agree that the framing of our response in Comment 15 improperly considers CAA checking as part of the DCV process. These are indeed two separate processes, which we nonetheless consider related, especially given the recent addition of MPIC requirements via SC-067. DCV's purpose is to prove the applicant's control or authority over the domains in the requested certificate. In contrast, CAA checking is about respecting the domain owner's explicit authorization of which CAs are allowed to issue certificates for their domains.
From our view, it seems a 24-hour revocation window is more appropriate than a 5-day window. If CAA is to be interpreted as a security function explicitly enabled by a domain owner that is intended to prevent certificate issuance by unauthorized CAs, it feels odd to treat this differently than 4.9.1.1 (2), which states “The Subscriber notifies the CA that the original certificate request was not authorized and does not retroactively grant authorization (CRLReason #9, privilegeWithdrawn);”. We recognize that a request not being authorized and issuance not being authorized are indeed distinct, but from our view they appear to communicate the same conclusion in that from the subscriber’s perspective issuance should not have taken place.
As always, additional discussion is welcome and we’re open to changing our perspective. Because we’d like to be mindful of inadvertently distracting the discussion on this bug, we’ve cross-posted the discussion at public@ccadb.org to see if there are opportunities for better alignment going forward.
| Assignee | ||
Comment 25•10 months ago
|
||
(In reply to Zacharias from comment #20)
(In reply to Tsung-Min Kuo from comment #18)
(In reply to chrome-root-program from comment #15)
Hello CHT Team,
Thank you for providing answers to the questions we presented in Comment 5.
General observations:
The “Related Incidents” section of the full incident report provided in Comment 9 is intended to present a discussion of other incidents sharing the same or similar root cause(s) and/or failure modes as the subject report. Other than being incident reports opened by CHT, it’s not clear to us how the cited reports are related to the matters described in this report. As cited on CCADB.org, the description column is supposed to explain that relationship.
Many of the responses lack evidence or supporting details to help the reader understand the positions cited. For example, Comment 12 cites the CHT/MODA relationship is different than that of Amazon/DigiCert example, but lacks detail to help a reader understand why that’s the case.
Additional questions:
(1) The response to question 2A from Comment 5 states: “I believe we have complied with the requirements of Section 4.9.1.1, item 12 of the BR. It did not take more than five days from receiving the notification to revoke all 11,860 affected certificates.”
We do not agree that 4.9.1.1 Item 12 of the TLS BRs is the appropriate reference, but instead, 4.9.1.1 Item 5 (“The CA obtains evidence that the validation of domain authorization or control for any Fully-Qualified Domain Name or IP address in the Certificate should not be relied upon (CRLReason #4, superseded).” Item #5 obligates CAs to revoke affected certificates within 24 hours.
(A) Do you agree with 4.9.1.1 Item 5 being the correct benchmark? If you do not agree, we’re open to better understanding your position, in which case please offer an explanation.
(B) If you do agree, does that change your opinion on the response provided regarding possible revocation delay?
After the previous discussions, we agree with the perspective you proposed. Since TLS BR 4.9.1.1 Item 5 is an appropriate reference, there was indeed a delay in revocation by CHT in this incident.
We will take this case as lesson learned and keep watch on similar cases to ensure that future situations meet the requirement of TLS BR, especially Item 5 of TLS BR 4.9.1.1, and can be addressed promptly.
Have you, or will you create a delayed revocation bug for this delayed revocation?
No problem, I will post a delayed revocation bug in days.
| Assignee | ||
Comment 26•10 months ago
|
||
We are continuing to monitor this issue.
Comment 27•10 months ago
|
||
We are continuing to monitor this issue.
| Assignee | ||
Comment 28•9 months ago
|
||
We are continuing to monitor this issue.
| Assignee | ||
Comment 29•9 months ago
|
||
Report Closure Summary
-
Incident description: During the certificate migration process from GTLSCA to HiPKI OV TLS CA, CHT made a Domain Control Validation (DCV) reuse because both are all operated by CHT but it was later confirmed to be a misuse. In addition, we did not re-verify the CAA records during migration, which resulted in HiPKI OV TLS CA issuing a batch of certificates that did not authorized our issuance. As soon as we received a notification from Chrome Root Program on March 1, 2025, indicating potential issues with the checking of CAA records set, we immediately initiated an incident investigation and conducted a large-scale revocation.
-
Incident Root Cause(s):
(a) The certificate migration process did not undergo a complete verification procedures of HiPKI OV TLS CA, especially the checking of CAA records of relevant domains. For example, some domains sets a CAA record with issue tag "letsencrypt.org" or "GTLSCA-G1", which denies HiPKI OV TLS CA from issuing. This procedural oversight led to the issuance of non-compliant certificates.
(b) Inexperience with large-scale migration methods, we should seek more advice from other CAs or experts before proceeding with reissue operations. -
Remediation description: In this incident, we developed and executed a large-scale revocation process, successfully revoking a total of 11,860 certificates (CRLReason #4: superseded). Over the past nearly two months, we conducted retraining our CA team members on TLS BR compliance and also reviewed the certificate migration process to strengthen verification discipline, and we realized that batch process is not feasible and we will not do it again.
-
Commitment summary: All improvement actions evaluated in this incident have been completed on schedule and incorporated into our certificate management and reviewing operations. CHT is committed to continuously enhancing the certificate issuance process to ensure full compliance with TLS BR. We take this incident as lesson learned, and we will continue to explore alternative solutions that can improve efficiency and ensure system complianceso. So that we can be fully compliant next time we encounter the same situation.
All Action Items disclosed in this report have been completed as described, and we request its closure.
Updated•9 months ago
|
Updated•9 months ago
|
Comment 30•9 months ago
|
||
This is a final call for comments or questions on this Incident Report.
Otherwise, this bug will be closed on approximately 2025-05-08.
Updated•9 months ago
|
Updated•9 months ago
|
Updated•9 months ago
|
Description
•