Closed Bug 1888882 Opened 2 years ago Closed 1 year ago

CFCA: Delayed revocation of TLS certificates(basicConstraints extension not marked as critical)

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gaofei, Assigned: gaofei)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

Attachments

(5 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36

Steps to reproduce:

In the investigation and processing incident https://bugzilla.mozilla.org/show_bug.cgi?id=1886135, we found a total of 2098 certificates, and there was an issue that the basicConstraints extension was not marked as critical.
Based on this incident, according to Section 4.9.1.1 of the TLS baseline requirements, the affected certificates should be revoked within 5 days. Currently, 840 certificates have been revoked. The remaining certificates are being processed intensively and the revocation will be completed as soon as possible.
A new incident report is expected to be available on April 6th.

Assignee: nobody → gaofei
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [leaf-revocation-delay]

Summary

CFCA has issued a total of 2,098 SSL certificates. These certificates have a problem that the basicConstraints extension is not marked as critical.Detailed description:https://bugzilla.mozilla.org/show_bug.cgi?id=1886135.Affected certificates must be revoked within 5 days, but ultimately we were unable to revoke all certificates in time for a variety of reasons.

Impact

As of 2024-04-09, a total of 1,409 certificates have been revoked, and 689 certificates have not been revoked. It is planned to complete all certificate revocations by the end of April.

Timeline

All times are UTC+8.

2024-03-04 08:00 Ryan sent an email to CFCA (rc@cfca.com.cn) for the first time, reminding CFCA to confirm whether there is a basicConstraints extension error in the certificate. CFCA failed to check the email in time.
2024-03-18 20:59 Ryan sent an email to CFCA (gaofei@cfca.com.cn, qiudawei@cfca.com.cn) again.
2024-03-19 16:30 CFCA received the second email from Ryan and began analysis and confirmation.
2024-03-19 17:15 It was confirmed that the problem existed and decided to stop issuing certificates.
2024-03-19 18:35 CFCA completed the system configuration change, and the basicConstraints of the issued certificate have been modified to critical.
2024-03-19 20:10 Discuss how to handle certificates. After discussion, these certificates will be revoked.
2024-03-20 08:00 Notify business and operations personnel and contact customer users to inform them that the certificate will be revoked.
2024-03-29 18:00 16 certificate revocations have been completed.
2024-04-01 18:00 833 certificate revocations have been completed.
2024-04-09 18:00 553 certificate revocations have been completed.
As of 2024-04-09, a total of 1,409 certificates have been revoked, and 689 certificates have not been revoked. It is planned to complete all certificate revocations by the end of April.
The next certificate revocation data is expected on April 13.

Root Cause Analysis

We respect the requirements of BR and other rules. After confirming the existence of wrong certificates, we decided to revoke all certificates after discussion. Then start notifying business and operations personnel, and contact customers to inform them that the certificate will be revoked.

  1. We contact customers by phone and email. However, some contact persons have changed or are on vacation, etc., and these personnel cannot be contacted in time, resulting in delays in informing customers.
  2. Most of the current installation and use methods for these certificates still require manual processing. In addition, many companies have operational time requirements, and they need procedures to apply for emergency changes, resulting in very long response times.
  3. There is a lack of automated systems to help users implement certificate application, download, installation and testing functions.

Lessons Learned

What went well

Some users expressed understanding and could cooperate to complete the certificate replacement.

What didn't go well

  1. The "Basic Constraints" extension of the certificate has not had any impact on the user's business system. Users are more worried about the impact of replacing the new certificate on the business system, and there are concerns and rejections.
  2. Many companies have requirements for the operation window period. For manual operations under special circumstances, they need procedures to apply for emergency changes, resulting in a very long response time.

Where we got lucky

n/a

Action Items

Action Item Kind Due Date
During the certificate application process, applicants are encouraged to fill in multiple backup contact information Mitigation 2024-06-18
Encourage and guide users to use more automated certificate deployment systems or ACME tools Mitigation Long-term

Appendix

Details of affected certificates

Attached file all certificates.txt

About 180 certificates will be revoked tomorrow. We will update the list after the revocation is completed.

Completed the revocation of 201 certificates.There are still 488 certificates waiting to be revoked.

If an incident like this happens again, what are your action items to prevent a delayed revocation?

Encourage and guide users to use more automated certificate deployment systems or ACME tools

I am glad to see this exists, but that's not really a realistic action plan since it doesn't have any deliverable associated with it.

Many companies have requirements for the operation window period. For manual operations under special circumstances, they need procedures to apply for emergency changes, resulting in a very long response time.

I am not entirely sure how this is relevant here. This is not your problem to deal with.

There are still 488 certificates waiting to be revoked.

What is special about these 488 certificates? It seems like you're ignoring the BRs and just waiting for customers to give you the okay to revoke. Is that, in fact, what's happening here?

Completed the revocation of 248 certificates.

(In reply to amir from comment #7)

If an incident like this happens again, what are your action items to prevent a delayed revocation?

  1. Plan for handling a large number of certificates within 5 days
    Although we insist that PKI ecological security is more important, there are still delays in revocation. Part of the reason is a large number of certificate revocations, insufficient preparation and response, and delays in implementation. We will formulate a certificate revocation handling plan based on this incident scenario:
    (1) Arrange multiple groups of emergency personnel and clarify more detailed work content from event confirmation, problem repair, customer notification, certificate issuance, and certificate replacement.
    (2) Break down each link to be completed within 5 days, and the task is accurate to the hour.
    (3) Conduct a drill to ensure that the drill based on this incident can be completed within 5 days.
  2. Complete incorrect certificate revocation notification and training within 5 days
    (1) For all customers, reiterate and emphasize the 5-day certificate revocation requirement.
    (2) Focus on selecting some customers whose revocation processing is slow this time, investigate their detailed change process, what links are required for approval, assist in formulating a system for completing emergency certificate changes within 5 days, and conduct a simulation exercise.
    (3) Make certificate revocation time requirements, emergency change system, and emergency change drills part of the new customer service content.
  3. Based on what emerged during the handling of this incident: delays in notifying customers and the lack of an automated system to help users install and test certificates, make improvements.
    (1) Update customer contact information and add backup contacts.
    (2) System development and upgrade (we are currently conducting a comprehensive analysis, involving multiple systems and functions, and detailed plans are still in the system).
  4. Classify customers
    For customers who have an incorrect certificate and are unable to assist in completing the certificate revocation within 5 days, we will carefully consider it and prioritize the security of the PKI ecosystem.

Encourage and guide users to use more automated certificate deployment systems or ACME tools

I am glad to see this exists, but that's not really a realistic action plan since it doesn't have any deliverable associated with it.

This action plan will take a long time, but we will still accelerate the progress to encourage as many of our existing customers with larger certificate volumes to use the automated certificate deployment system as much as possible by 2025.

Many companies have requirements for the operation window period. For manual operations under special circumstances, they need procedures to apply for emergency changes, resulting in a very long response time.

I am not entirely sure how this is relevant here. This is not your problem to deal with.

There are still 488 certificates waiting to be revoked.

What is special about these 488 certificates? It seems like you're ignoring the BRs and just waiting for customers to give you the okay to revoke. Is that, in fact, what's happening here?

It still needs to be stated again that we always believe that PKI ecological security is more important than customers. In 2023, we also had situations where certificates needed to be revoked, and we completed the revocation processing within the prescribed time. The issue involved in this incident is an optional extension and will not affect PKI security and customer use. But we still had no doubts, chose to revoke all certificates, and started executing immediately.
The certificates that have not yet been revoked mostly belong to banks and some infrastructures. They have regulatory requirements, operating process requirements and operating time limits. At the same time, we also need to improve the number of certificates in the processing process.
In the previous reply, we also mentioned that we will take four measures to prevent delayed revocation.
We are now conducting a comprehensive analysis to comprehensively identify and summarize problems from business systems, technical means, process management, risk identification, etc. We will provide an additional CFCA analysis report and plan. We hope to prevent problems from occurring through detailed analysis and improved systems and technologies. We will classify and prioritize all improvement measures, publish them to Bugzilla, and update the implementation status every 1-2 weeks.
We also expect more people to provide excellent management and execution experience, learn from it and apply it to our subsequent work.

In response to this incident, we summarized and analyzed: existing problems and risks in business systems, system implementation and supervision, technical means, etc. To improve these problems, formulate the following improvement measures and list the time plan.

  1. Revise three specifications: "Bugzilla Incident Handling Specifications", "Risk Incident Handling Specifications" and "CT Management Specifications".
    (1) Revise the event query scope, event query frequency, and event report template content in the "Bugzilla Incident Processing Specifications". The scope of event query is events that occur in all CA institutions in "CA Certificate Compliance". At the same time, the event query results are reported to the designated CFCA management personnel in the form of emails.
    Status: Completed. The specification modification was completed on 2024-04-20.
    (2) Revise the event confirmation time requirements in the "Risk Event Handling Specifications" and add query requirements for risk event receiving email addresses.
    Status: Completed. The specification modification was completed on 2024-04-20.
    (3) Revise the "Risk Event Handling Specifications" and add a reporting mechanism for CT standard query results.
    Status: Completed. The specification modification was completed on 2024-04-20.

  2. Added 1 new specification - "SSL Certificate Standard Tracking Specification"
    For "BR", "EVG", Microsoft/Mozilla/Google/Apple Root Store Policy, Chrome/Apple Certificate Transparency Policy, etc., formulate detailed requirements for the tracking, interpretation, and implementation of these standards.
    Status: In progress
    Time plan: It is planned to be completed before June 30, 2024.

  3. Develop a certificate incident handling plan, detailing the work content and timetable for incident confirmation, problem repair, customer notification, certificate issuance, certificate replacement, etc.
    Status: Not started
    Time plan: It is planned to be completed before June 30, 2024.

  4. Improve execution and supervision
    (1) Form a supervision group including: product personnel representatives, compliance personnel representatives, development and test personnel representatives, and operation personnel representatives.
    (2) Formulate the work criteria and scope of the supervision team.
    (3) The working group will regularly review the implementation of various systems and issue inspection reports.
    (4) The product team will make rectifications based on the inspection report.
    Status: Not started
    Time plan: It is planned to be completed before June 30, 2024.

  5. Complete incorrect certificate revocation notification and training work
    (1) For all customers, reiterate and emphasize the 5-day certificate revocation requirement.
    (2) Focus on selecting some customers whose revocation processing is slow this time, investigate their detailed change process and which links are required for approval, assist in formulating an emergency certificate replacement system within 5 days, and conduct simulation exercises.
    (3) Incorporate certificate revocation time requirements, emergency change system, and emergency change drills into new customer service content.
    Status: Not started
    Time plan: It is planned to be completed before July 30, 2024.

  6. Compile the events that occurred in various institutions in the past year and check whether the same problems exist in CFCA
    Status: In progress
    Time plan: It is planned to be completed before May 15, 2024.

  7. Split the BR rules into check items, make a check list, and check the value of each configuration item according to the current system.
    Status: In progress
    Time plan: It is planned to be completed before May 30, 2024.

  8. Upgrade Zlint to the latest version
    Status: In progress
    Time plan: It is planned to be completed before June 7, 2024.
    Supplementary note: When Zlint releases a new version and CFCA has not yet integrated the latest version, CFCA will add a manual detection mechanism and manually use the Zlint client to detect the issued certificates.

  9. Add PKIlint automatic detection
    Status: Not started
    Time plan: It is planned to be completed before July 20, 2024.
    Additional note: We will use Zlint and PKIlint at the same time. Each certificate issuance will be tested twice by Zlint and PKIlint. Only if both tests pass, issuance is allowed.

  10. Classify customers
    For customers whose certificates are incorrect and unable to assist in completing certificate revocation within 5 days, we will carefully consider and prioritize the security of the PKI ecosystem.
    Status: Not started
    Time plan: It is planned to be completed before July 30, 2024.

Update the following information:

  1. The "SSL Certificate Standard Tracking Specification" is being drafted.
  2. Complete the statistics of events that occurred in each institution and are looking for similar problems or risks at CFCA. The work will be completed as scheduled on May 15, 2024.
  3. Split the BR rules into inspection items, and the inspection list is in progress.
  4. Upgrading to the latest version of Zlint is already in the development process. The work will be completed as scheduled on June 7, 2024.
  5. The functions and limitations of PKI lint are being evaluated and development plans are being formulated.
Summary: CFCA:Delayed revocation of TLS certificates(basicConstraints extension not marked as critical) → CFCA: Delayed revocation of TLS certificates(basicConstraints extension not marked as critical)

So far, we have sorted out a total of 177 incidents that occurred in other CA institutions from March 1, 2023 to the present. Based on the incidents and measures taken by other institutions, we plan to improve in the following areas:

  1. CP/CPS update
    Add a monitoring script to check whether the document has been updated within 365 days
  2. CT function
    Optimize the pre-certificate processing mechanism when the number of CTs is insufficient
  3. CRL and OCSP
    Optimize CRL and OCSP monitoring, and enhance monitoring/alerts for CRL/OCSP publishing failures
  4. Certificate test website page
    Increase monitoring to ensure that the certificates of each test website meet the requirements

These measures, we are discussing the time plan.

It has been quite a while since you reported the status of certificates for revocation. What are the total numbers of certificates revoked, certificates expired of their own accord, and active certificates remaining?

Flags: needinfo?(gaofei)

There are 2062 revoked certificates, 28 expired certificates, and 8 remaining unrevoked certificates (revoked on May 25, 2024).

Flags: needinfo?(gaofei)

Update the following information:

  1. Split the BR rules into check items and make a checklist, which is in progress.
  2. The latest version of Zlint is in the testing process.
  3. "SSL Certificate Standard Tracking Specification" is being formulated.
  4. "Incident handling plan/pre-plan" is in progress.
  5. Optimize the pre-certification processing mechanism when the number of CTs is insufficient, and it is planned to be completed on August 20.
  6. Optimize CRL and OCSP monitoring, scheduled to be completed on August 30.

The remaining certificate revocation was completed on May 25. There are 2098 certificates in total, 2070 certificates have been revoked and 28 have expired.

Update the following information:
1.Revise three specifications: "Bugzilla Incident Handling Specifications", "Risk Incident Handling Specifications" and "CT Management Specifications".
Status: Completed

2.Added 1 new specification - "SSL Certificate Standard Tracking Specification"
Status: In progress

3.Develop a certificate incident handling plan, detailing the work content and timetable for incident confirmation, problem repair, customer notification, certificate issuance, certificate replacement, etc.
Status: In progress

4.Improve execution and supervision
Status: In progress
The team has been formed and work standards and guidelines are under discussion.

5.Complete incorrect certificate revocation notification and training work
Status: In progress
Completed training for 3 key customers.

6.Compile the events that occurred in various institutions in the past year and check whether the same problems exist in CFCA
Status: Completed.
177 events in the past year or so have been sorted out. Optimization of CT function and optimization of CRL/OCSP monitoring are scheduled for the third quarter.

7.Split the BR rules into check items, make a check list, and check the value of each configuration item according to the current system.
Status:Completed
Currently, there are more than 60 checklist items, which will continue to increase in the future. They mainly include: identification items and requirements, certificate format (field items, field content, key extensions), CRL format, OCSP format and other categories.

8.Upgrade Zlint to the latest version
Status: In progress
Compared with the original plan, it is 1-2 weeks later and is expected to be completed on June 21st.

9.Add PKIlint automatic detection
Status: In progress
Complete preliminary assessment and finalize requirements writing.

10.lassify customers
Status: Not started

There is no update on the status of the above tasks yet, and it will be updated next week.

Update the following information:
1.Revise three specifications: "Bugzilla Incident Handling Specifications", "Risk Incident Handling Specifications" and "CT Management Specifications".
Status: Completed

2.Added 1 new specification - "SSL Certificate Standard Tracking Specification"
Status: Completed

3.Develop a certificate incident handling plan, detailing the work content and timetable for incident confirmation, problem repair, customer notification, certificate issuance, certificate replacement, etc.
Status: In progress
The content for publication has been completed and is being reviewed and revised.

4.Improve execution and supervision
Status: In progress
The team has been formed and work standards and guidelines are under discussion.

5.Complete incorrect certificate revocation notification and training work
Status: In progress
Completed training for 4 key customers.

6.Compile the events that occurred in various institutions in the past year and check whether the same problems exist in CFCA
Status: Completed.
177 events in the past year or so have been sorted out. Optimization of CT function and optimization of CRL/OCSP monitoring are scheduled for the third quarter.

7.Split the BR rules into check items, make a check list, and check the value of each configuration item according to the current system.
Status:Completed
Currently, there are more than 60 checklist items, which will continue to increase in the future. They mainly include: identification items and requirements, certificate format (field items, field content, key extensions), CRL format, OCSP format and other categories.

8.Upgrade Zlint to the latest version
Status: In progress
Scheduled to be completed on June 26.

9.Add PKIlint automatic detection
Status: In progress
Complete preliminary assessment and finalize requirements writing.

10.lassify customers
Status: In progress

Update the following information:
1.Revise three specifications: "Bugzilla Incident Handling Specifications", "Risk Incident Handling Specifications" and "CT Management Specifications".
Status: Completed

2.Added 1 new specification - "SSL Certificate Standard Tracking Specification"
Status: Completed

3.Develop a certificate incident handling plan, detailing the work content and timetable for incident confirmation, problem repair, customer notification, certificate issuance, certificate replacement, etc.
Status: Completed
The SSL certificate incident handling plan has been completed, and we are communicating with the relevant departments to arrange a drill.

4.Improve execution and supervision
Status: Completed
The implementation will start and the first inspection will be completed in about 2 weeks.

5.Complete incorrect certificate revocation notification and training work
Status: In progress
Completed training for 4 key customers.

6.Compile the events that occurred in various institutions in the past year and check whether the same problems exist in CFCA
Status: Completed.
177 events in the past year or so have been sorted out. Optimization of CT function and optimization of CRL/OCSP monitoring are scheduled for the third quarter.

7.Split the BR rules into check items, make a check list, and check the value of each configuration item according to the current system.
Status:Completed
Currently, there are more than 60 checklist items, which will continue to increase in the future. They mainly include: identification items and requirements, certificate format (field items, field content, key extensions), CRL format, OCSP format and other categories.

8.Upgrade Zlint to the latest version
Status: Completed
1 check result is warn(w_ext_subject_key_identifier_not_recommended_subscriber).Our strategy is to use this extension.

9.Add PKIlint automatic detection
Status: In progress
Complete preliminary assessment and finalize requirements writing.

10.lassify customers
Status: In progress

Gao,

We are now 22 comments and four months into this bug, and we have yet to see a Root Cause Analysis that acknowledges the true cause of this bug or any Action Items that address that cause. This despite the fact that many other bugs on this forum are dealing with the exact same problem and there are literally dozens of comments explaining this to a dozen or more CAs in active delayed revocation bugs.

The true root cause of this bug is that CFCA deliberately did not revoke the affected certificates when the rules stipulate that it must. Other factors such as the Subscriber’s unwillingness to install replacement certificates by the revocation deadline are not relevant to that root cause or the CA’s responsibility. Action items such as implementation of pre-certificate linting, while advisable, are not action items for this issue, which is CFCA’s knowing failure to revoke. Even action items such as updates to certificate revocation processes and employee training, though relevant to this bug, do not address this true root cause.

In bug 1889062 comment 15 Mozilla specifies that acknowledgement and action items of this sort are necessary to resolve an intentional delrev bug,

  • detailed changes to policies and procedures to ensure timely revocation, including new guidelines, checklists, and approval processes; and
  • monitoring and auditing to ensure compliance with such policies and procedures and to identify any lapses quickly

I see that you are diligently posting updates every week and working on your action items, but somehow you have missed that these action items are not sufficient to resolve the issue. If you address the missing components now, then maybe you will be able to close this bug once you have delivered all the items you have on your list. Here is what you need to do:

  1. Read and follow not only your own bugs but other CAs’ bugs also. Most especially, if other CAs are having the exact same problem as your current bug (and there are many in this case), you should follow what is being said there as it applies to you as well. This needs to be a standard practice not just for this incident but any incident opened against your CA any time in the future.
  2. Update your Incident Report to clarify that the root cause of this problem was not any lack of capability among your Subscribers but rather CFCA’s deliberate decision not to revoke certificates on time.
  3. Include in this Incident Report one or more Action Items that convincingly stand to resolve the true root cause so that this problem will never occur again.
  4. Appropriate Action Items include a change in policy to clarify that deliberate delayed revocation will never occur again, a clear commitment here on Bugzilla that you will not delay revocation, and checks to ensure this policy is followed.

We have completed the response, but in order to better clarify the root cause and effective actions to address it, we need more discussion and plan to provide a response on July 11.

Thanks for your suggestion. We have already found out the improvement measures of focusing on other CA errors, and sorted out the error events in the past year in May. At the same time, we revised the "Bugzilla Event Handling Specification" in April and immediately began to implement it, requiring that the error events occurring in CA institutions be counted at a frequency of no less than once a week, sent to designated managers and shared with product managers.This will be a long-term measure.

1.Read and follow not only your own bugs but other CAs’ bugs also. Most especially, if other CAs are having the exact same problem as your current bug (and there are many in this case), you should follow what is being said there as it applies to you as well. This needs to be a standard practice not just for this incident but any incident opened against your CA any time in the future.

We are combining the results of the various improvement measures we have taken in the past period of time, conducting a deep root cause analysis, and formulating practical action items to solve the root causes. We respect the rules of the Web PKI community, and CFCA will complete our remediation plan through appropriate measures and actions. We apply to make a detailed response to Tim's suggestions before next Friday.

(In reply to Tim Callan from comment #23)

Root Cause Analysis

We respect the requirements of BR and other rules. After confirming the existence of the wrong certificate, we decided to revoke all certificates after discussion. Then we started to notify business and operation personnel, and contacted customers to inform them that the certificates would be revoked.
The number of certificates that need to be processed is large. For the inability to complete the certificate revocation within the specified time, the delayed revocation is summarized as follows:

  1. We contact customers by phone and email. However, some contacts have changed or are on vacation, etc., and these people cannot be contacted in time, resulting in delays in notifying customers.
  2. Customer system changes require approval, and the approval process and time are long.
  3. Critical infrastructure, such as finance and government affairs, requires security testing before replacement, and changes can only be made after the test passes.
  4. Wildcard and multi-domain certificates used by some users are applied in multiple business systems, and each business system requires more time to replace and upgrade.
  5. Customer business systems provide services to multiple institutions, and the change time needs to be coordinated with each institution.
    If these certificates are forcibly revoked before being replaced, it will have a significant impact on many key services. During the revocation process, CFCA did not revoke these certificates in a timely manner based on customer feedback, resulting in delayed revocation.
  1. Update your Incident Report to clarify that the root cause of this problem was not any lack of capability among your Subscribers but rather CFCA’s deliberate decision not to revoke certificates on time.

Action Items

| Action items | Type | Expiration date |
| When users apply for or obtain certificates, they will be clearly informed of the revocation scenarios and time limits. CFCA will strictly follow the specification requirements to perform certificate revocation| Prevention | 2024-08-16|
| Formulate a certificate incident handling plan, detailing the work content and timetable for incident confirmation, problem repair, customer notification, certificate issuance, and certificate replacement| Prevention | 2024-06-30|
| Revise the "Bugzilla Incident Handling Specification" to require weekly tracking of error events occurring in other CAs and check whether there are similar problems| Prevention | 2024-05-16|
| Establish a supervision team. Improve execution and supervision to ensure that policies and process requirements are fully followed during business operations. | Prevention | 2024-06-30 |
| During the certificate application process, encourage applicants to fill in multiple alternative contact information | Mitigation | Long-term |
| Encourage and remind customers to control the number of wildcard and multi-domain certificate applications | Mitigation | Long-term |
| Encourage and guide users to use more automated certificate deployment systems or ACME tools | Mitigation | Long-term measures |

  1. Include in this Incident Report one or more Action Items that convincingly stand to resolve the true root cause so that this problem will never occur again.

CFCA will comply with the revocation requirements of BR's relevant policies and complete the certificate processing within the time required by the rules.

  1. Appropriate Action Items include a change in policy to clarify that deliberate delayed revocation will never occur again, a clear commitment here on Bugzilla that you will not delay revocation, and checks to ensure this policy is followed.

Update Action Items format

Action Items

Action items Type Expiration date
When users apply for or obtain certificates, they will be clearly informed of the revocation scenarios and time limits. CFCA will strictly follow the specification requirements to perform certificate revocation. Prevention 2024-08-16
Formulate a certificate incident handling plan, detailing the work content and timetable for incident confirmation, problem repair, customer notification, certificate issuance, and certificate replacement. Prevention 2024-06-30
Revise the "Bugzilla Incident Handling Specification" to require weekly tracking of error events occurring in other CAs and check whether there are similar problems. Prevention 2024-05-16
Establish a supervision team. Improve execution and supervision to ensure that policies and process requirements are fully followed during business operations. Prevention 2024-06-30
During the certificate application process, encourage applicants to fill in multiple alternative contact information. Mitigation Long-term
Encourage and remind customers to control the number of wildcard and multi-domain certificate applications. Mitigation Long-term
Encourage and guide users to use more automated certificate deployment systems or ACME tools. Mitigation Long-term measures

When applying for or obtaining a certificate, users will be clearly informed of the scenarios and time limits for certificate revocation, and CFCA will strictly follow the regulatory requirements to perform certificate revocation.
This project is proceeding as planned, and there are no other updates for the time being.

Blocks: 1911183

There are no updates yet, we request that the next update be set to August 20, 2024.

Whiteboard: [ca-compliance] [leaf-revocation-delay] → [ca-compliance] [leaf-revocation-delay] Next update 2024-08-20

When users apply for or obtain certificates, they will be clearly informed of the scenarios and time limits for certificate revocation. CFCA has completed the rules and training.
We have completed all the measures in the Action Items.

CFCA has completed the improvement measures. There is no new discussion at present. We apply to close the incident.

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-08-20 → [ca-compliance] [leaf-revocation-delay]
Flags: needinfo?(bwilson)

I will publish something on or around 1-Oct-2024 concerning delayed revocations.

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [leaf-revocation-delay] → [ca-compliance] [leaf-revocation-delay] Next update 2024-10-31
Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-10-31 → [ca-compliance] [leaf-revocation-delay] Next update 2024-11-30

We continue work on incident-reporting and compliance requirements aimed at reducing delayed revocation, so this bug will remain open until at least February 1, 2025. Meanwhile, CAs should review https://github.com/mozilla/www.ccadb.org/pull/186.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-11-30 → [ca-compliance] [leaf-revocation-delay] Next update 2025-02-01

We will review the guidance and follow its requirements.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-02-01 → [ca-compliance] [leaf-revocation-delay] Next update 2025-03-03

Before closing this incident, CFCA should repeat its commitment to revoke TLS certificates timely in accordance with section 4.9.1 of the TLS Baseline Requirements.

Mozilla acknowledges that some of CFCA’s subscribers operate under complex regulatory or bureaucratic constraints. Still, CFCA will need to provide additional Action Items aimed at handling finance-related and government-managed subscribers and ensuring that no external policies prevent timely revocation. Examples include: requiring government and finance-related entities to provide written confirmation that they can comply with revocation timelines before issuance; ensuring that they have plans for replacing certificates within 24 hours of a misissuance or security incident; and streamlined approval processes so that TLS certificates can be replaced without problematic bureaucratic approval chains.

Finally, we will need a completed Closure Summary.

Flags: needinfo?(gaofei)

CFCA made a commitment to revoke certificates in a timely manner in its response on July 19, 2024. Of course, we are willing to reiterate: CFCA will revoke TLS certificates in a timely manner in accordance with Section 4.9.1 of the TLS Baseline Requirements.

Thanks for your suggestions for additional actions, and we will provide a complete closure summary later.

Flags: needinfo?(gaofei)

The complete closure summary is being revised and will be available upon completion.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-03-03 → [ca-compliance] [leaf-revocation-delay]

We were notified a while ago that some certificates issued between September 15, 2023 and March 19, 2024 were missing and not revoked. Therefore, we spent some time to deal with and analyze this issue, and created a new incident (https://bugzilla.mozilla.org/show_bug.cgi?id=1949131), and provided an incident report.
For the current incident, we will provide a closure summary within this week.

Action Items

Action items Type Expiration date
Our ACME-compatible system had finished an MVP version release. We'll push the progress so it can be deployed later this year Prevention Long-term
Send formal requests to customers for written confirmation that they can comply with revocation and incident response timeline limits. Prevention/Mitigation Long-term

Report Closure Summary

  • Incident description: CFCA received a report email from Ryan in March, 2024, pointing out that some certificates with basicConstraints extension not marked as critical, we took actions immediately after confirming the problem, but failed to keep in accordance with Section 4.9.1 of the TLS Baseline Requirements, some of the affected certificates failed to be revoked with in 5 days.
  • Incident Root Cause(s): Some contacts have changed or are on vacation, etc., and cannot be contacted in time; Customer system changes require approval, which takes a long time; wildcard/multi-domain certificates are used in different systems that added up to longer time; Services' cross-dependency make the revocation hard to progress timely;
  • Remediation description: Clearly inform the customers of the revocation scenarios and timeline limits before they obtain a certificate; Setup a certificate incident handling plan that could let our team handle events more quickly; Revise the "Bugzilla Incident Handling Specification"; Establish a supervision team and improve execution and supervision; Encourage applicants to fill in multiple alternative contact information; Encourage and remind customers to control the number of wildcard and multi-domain certificate application; Encourage and guide users to use more automated certificate deployment systems or ACME tools that helps with prompt incident response;Our ACME-compatible system had finished an MVP version release. We'll push the progress so it can be deployed later this year; Send formal requests to customers for written confirmation that they can comply with revocation and incident response timeline limits.
  • Commitment summary: CFCA is committed to continuously improving our services. Keep following the new policies and processes strictly, stay updated on industry issues, and guide customers better in certificate-related activities.

All Action Items disclosed in this report have been completed as described, and we request its closure.

Ben - I have a question on this bug. Is there a delayed response bug associated with it? I noticed that many of the responses go over the 7 day requirement. For example, you set the next update to 3-3-25 but comment 38 wasn't made until 3-12-25. Are you doing away with the delayed response bugs going forward or is another bug needed to explain why CFCA is missing updates?

Flags: needinfo?(bwilson)

Hi Jeremy,
Thanks for pointing this out. The expectation remains that CAs update their incident bugs at least weekly, unless a Next Update is set. The primary goal is to ensure accountability and timely updates, rather than to create additional overhead with new bugs. If there is a delay, the CA should provide an explanation in its next comment. A single delay or two does not automatically require a separate bug, but if delays persist, then a new bug should be opened to address the issue with the CA.
Alternatively, community members often request that the CA open a delayed response bug, which is a reasonable approach. In this particular case, I acknowledge that my extensions of the Next Update might have contributed to CFCA’s delayed status reports, as these delayed revocation cases remained in limbo while I worked on broader solutions for the delayed revocation issue, which affected multiple CAs. Given that context, I am open to further discussion on whether a separate incident bug should be opened here. Let me know your thoughts.
Thanks,
Ben

Flags: needinfo?(bwilson)

The delay is not the next update field. A lot of the updates are late. For example, between the 2nd and 3rd update, 8 days passed. Between 4/30 and 5/8 are 8 days. Between 5/8 and 5/17 are 9 days, 5/27-6/6 = 19 days, 6/13-621 = 8 days, 6/21-6/30= 9 days, etc. These were all dates before you set the next update field. The next updates are consistently missed after the next update as well. For example, you set a next update of 3/3 and CFCA didn't post until 3/12. You set a next update of 8/20 and CFCA didn't post until 8/21.

I don't think filing a separate bug on the missed days would be that helpful. Instead, could CFCA just state here that it understands updates need to be made at least every 7 days?

Hi Ben & Jeremy:

Thank you for pointing this out. You guys are right, and we prioritize the weekly update at the top, and we totally agree that it's good for the community as a whole.

It was the first time for us to deal with large number of revocations, and it's a challenge for our team members and related systems.
During handling of the incident, we've learned a lot and improved a lot. To be brief:

  • A certificate incident handling plan was established;
  • A supervision team was established;
  • Gao Fei and Qiu Dawei, two main members left our team, 5 more human resources into our team, and I will lead the team from now on, adding Michael (me) to Primary POC has been done;
  • Assigned tasks to a team member to check status of all related bugs at least twice a week.
  • Our ACME-compatible system had finished an MVP version release;

Also we're seeking some automation tools/solutions that could help us with this. (Any suggestions are highly appreciated)

As Jeremy requested, we, CFCA, are ready and willing to make the statement that, we understand updates need to be made at least every 7 days, and we'll follow the rules.

Since all Action Items disclosed in the above report have been completed as described, we request its closure.

I will close this next week.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: