GDCA: Delayed revocation of SSL/TLS certificates with Non-critical Basic Constraints
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: capoc, Assigned: capoc)
References
(Blocks 1 open bug)
Details
(Whiteboard: [ca-compliance] [leaf-revocation-delay])
Attachments
(1 file)
|
13.12 KB,
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
|
Details |
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.289 Safari/537.36
Actual results:
As noted in Bug 1888060, GDCA has issued 20 SSL/TLS certificates with Non-critical Basic Constraints extension from 15 September to 8 October 2023, and as of 17:10, 2 April 2024 (UTC+8), all these certificates are either revoked or expired, there are 13 certificates that have been revoked not within 5 days since receiving the Certificate Problem Report, which is a violation of Baseline Requirements section 4.9.1.1.
This is a preliminary incident report acknowledging the revocation delay, and we will provide more details later.
Updated•1 year ago
|
Incident Report
Delayed revocation of SSL/TLS certificates with Non-critical Basic Constraints
Summary
GDCA has issued 20 SSL/TLS certificates from 15 September to 8 October 2023, with the Basic Constraints extension included but not set as Critical, as of 17:00, 2 April 2024 (UTC+8), all these certificated have either been revoked or expired, among which 13 certificates were revoked not within 5 days after confirming the issue, which does not comply with section 4.9.1.1 of the Baseline Requirements.
Impact
13 certificates have not been revoked within 5 days after confirming the issue.
Timeline
All times are UTC+8.
2023-9-15:
-First problematic certificate issued.
2023-10-8:
-Last problematic certificate issued.
2024-3-26:
-23:54: A certificate problem report was received by our CPR mechanism from a concerned third party, indicating that we have mis-issued a number of SSL/TLS certificates with Basic Constraints not set as critical and should be treated as an incident .
2024-3-27:
-14:15: We confirmed this issue and published a preliminary incident report on Bugzilla.
2024-3-27:
-We began to contact the impacted customers to communicate about certificate revocation and replacement.
2024-4-2:
-17:00: We completed the revocation of all problematic certificates.
Root Cause Analysis
The root cause for the delayed certificates revocation lies in the complex and lengthy process required for government entities to revoke and replace certificates, it involves the approval and coordinated efforts from multiple internal departments to deploy a new SSL/TLS certificate for a government website, which makes the time required to complete the process unpredictable, generally it takes several days or even longer.
Lessons Learned
What went well
We began to contact the customers to communicate about certificate revocation and replacement shortly after confirming the issue.
What didn't go well
We were not able to communicate effectively with the customers to meet the required revocation timeline while not causing disruptions to the normal website operations.
Where we got lucky
The number of affected certificates is small, and the scope of impact on subscribers is limited.
Action Items
| Action Item | Kind | Due Date |
|---|---|---|
| Revoke and replace the problematic certificates. | Mitigate | Completed |
Appendix
N/A
Details of affected certificates
Following certificates have not been revoked within the required timeline:
https://crt.sh/?id=10493475350
https://crt.sh/?id=10493412807
https://crt.sh/?id=10477997256
https://crt.sh/?id=10478083391
https://crt.sh/?id=10478023477
https://crt.sh/?id=10469199584
https://crt.sh/?id=10478050726
https://crt.sh/?id=10410654215
https://crt.sh/?id=10419351537
https://crt.sh/?id=10494148757
https://crt.sh/?id=10469206693
https://crt.sh/?id=10493688664
https://crt.sh/?id=10385760399
Comment 2•1 year ago
|
||
Has full remediation of this issue taken place? Under the topic, "What didn't go well", I think there is a lesson to be learned that hasn't been expressed. You say, "We were not able to communicate effectively with the customers to meet the required revocation timeline while not causing disruptions to the normal website operations." That leads to the follow-up question, what are some things you can do, if any, to improve your ability to communicate quickly with your customers, especially those who will have trouble replacing certificates because their website operations will be disrupted?
(In reply to Ben Wilson from comment #2)
Has full remediation of this issue taken place? Under the topic, "What didn't go well", I think there is a lesson to be learned that hasn't been expressed. You say, "We were not able to communicate effectively with the customers to meet the required revocation timeline while not causing disruptions to the normal website operations." That leads to the follow-up question, what are some things you can do, if any, to improve your ability to communicate quickly with your customers, especially those who will have trouble replacing certificates because their website operations will be disrupted?
Hello Ben,
In response to the challenges encountered in communicating with our customers, we have optimized our communication strategy and adopted a customized communication approach. Given the diversity of our customers (such as government entities), we have had in-depth conversations with our customers, and assisted them in developing an internal approval process specifically for SSL/TLS certificate replacement based on their certificate replacement cycle and the existing internal approval process. Currently, the optimized internal approval process for certificate replacement has been updated for such customers. Additionally, we have extended this optimized certificates replacement approval process to other similar customers to further enhance the efficiency of certificate replacement and revocation.
Action Items
| Action Item | Kind | Due Date |
|---|---|---|
| Revoke and replace the problematic certificates. | Mitigate | Completed |
| Assist the customers to develop and optimize the internal approval process for SSL/TLS certificates replacement. | Prevent | Completed |
| Extend the optimized certificates replacement approval process to other similar customers. | Prevent | Completed |
Thanks.
Comment 7•1 year ago
|
||
Does anyone have any comments, questions, or suggestions? If not, then I'd suggest that this be closed sometime next week (May 27-31).
Comment 9•1 year ago
•
|
||
The remediation items here are "we will try even harder", and I think the BRs and WebPKI in general demand a response tied to an outcome, not an activity. It should not be sufficient to say "we will tell our customers to fix things" or even "we will help our customers fix things". If customer operational capability is to be a reason for violation of the BRs, then the mitigation here should result in those operational problems no longer being the case, and there being evidence for it.
What does GDCA commit to doing that will ensure that these customers and their policies don't lead to delayed revocation in the future? How will GDCA determine that their customers' replacement practices will no longer interfere with GDCA's ability to meet its commitments to root programs and the WebPKI ecosystem?
How will GDCA ensure that they do not issue certificates to subscribers with critical services, who have not also provided assurances that they can replace certificates within 24 hours or themselves (the subscriber) take responsibility for any service disruption if that does not occur?
If a service is essential to society and cannot operate successfully within the constraints of the BRs, then it should not be using WebPKI, and CAs should be ensuring that they do not issue WebPKI certificates to such services.
| Assignee | ||
Comment 10•1 year ago
|
||
(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #9)
The remediation items here are "we will try even harder", and I think the BRs and WebPKI in general demand a response tied to an outcome, not an activity. It should not be sufficient to say "we will tell our customers to fix things" or even "we will help our customers fix things". If customer operational capability is to be a reason for violation of the BRs, then the mitigation here should result in those operational problems no longer being the case, and there being evidence for it.
What does GDCA commit to doing that will ensure that these customers and their policies don't lead to delayed revocation in the future? How will GDCA determine that their customers' replacement practices will no longer interfere with GDCA's ability to meet its commitments to root programs and the WebPKI ecosystem?
How will GDCA ensure that they do not issue certificates to subscribers with critical services, who have not also provided assurances that they can replace certificates within 24 hours or themselves (the subscriber) take responsibility for any service disruption if that does not occur?
If a service is essential to society and cannot operate successfully within the constraints of the BRs, then it should not be using WebPKI, and CAs should be ensuring that they do not issue WebPKI certificates to such services.
Hello Mike,
For a particular type of customers (such as government entities) whose certificate replacement approval process can be lengthy and complex, we have assisted them to analyze and redesign the various aspects in the approval process to reduce the redundant steps and improve the approval efficiency, and eventually shorten the time required for certificate replacement.
Additionally, in order to ensure that the customer's certificate replacement process does not interfere with our commitment to the WebPKI ecosystem, we will further increase the resource of our customer support team and provide 7x24 technical support during the certificate replacement process, to ensure that any issues encountered by our customers can be quickly resolved, this should make sure that certificates can be revoked within the required timelines.
Comment 11•1 year ago
•
|
||
(In reply to capoc from comment #10)
For a particular type of customers (such as government entities) whose certificate replacement approval process can be lengthy and complex, we have assisted them to analyze and redesign the various aspects in the approval process to reduce the redundant steps and improve the approval efficiency, and eventually shorten the time required for certificate replacement.
I don’t think that this meets the requirements of the BRs.
9.6.3 is the BR section dealing with the subscriber warranties and acknowledgements, which I believe to be legally binding upon the Subscriber.
Part 8 reads
- Acknowledgment and Acceptance: An acknowledgment and acceptance that the CA is entitled to revoke the certificate immediately if the Applicant were to violate the terms of the Subscriber Agreement or Terms of Use or if revocation is required by the CA’s CP, CPS, or these Baseline Requirements.
Issuing a certificate to a subscriber who did not acknowledge and accept that immediate revocation may occur in the case of BR violation is misissuance. By my understanding of the BRs, and that of a well-informed anonymous expert who I consulted, you should not have issued replacement certificates if the subscriber did not accept that revocation can happen instantly at any time.
Did your subscribers accept and acknowledge that such revocation could occur? If so, why are you not holding them to their commitment?
You may also wish to amend section 1.4.2 of your CPS, to prohibit use in life-critical or similar contexts, but that’s sort of a separate issue.
Edit: I have also sent a question on this topic to the public CCADB list for clarification.
| Assignee | ||
Comment 12•1 year ago
|
||
Hello,
We appreciate the questions and concerns you raised. We have been committed to adhering to the BRs and other relevant requirements, placing the security of the WebPKI ecosystem as our priority. We immediately decided to revoke all the affected certificates upon confirming the mis-issuance and began to contact our customers work on the certificates revocation and replacement.
In the meantime, we will once again reiterate and emphasize the certificate revocation timelines as required by the BRs and our CPS to our customers. We plan to strengthen customer education efforts in future as part of our business procedures to enhance their understanding of the revocation timeline requirements, with the goal of jointly maintaining a more secure and reliable WebPKI ecosystem.
We are committed to continuously improving and optimizing our services to ensure high level of compliance and best practices.
Comment 13•1 year ago
|
||
Are there any further comments or questions?
Comment 14•1 year ago
|
||
Ben,
I propose that this entire thread lacks basic acknowledgement from GDCA of its responsibility to the WebPKI ecosystem and the fact that BR revocation rules are not optional for a CA. The original root cause analysis in comment 1 places the blame entirely on Subscribers and their reported inability to handle an unexpected revocation event. There is no acknowledgement of the fact that GDCA is deliberately choosing Subscriber convenience over the security of Relying Parties. Even after prompting by you, no action items address the actual failure that demanded the creation of this incident report.
Multiple attempts by Mike Shaver to explain why GDCA’s words and action are misaligned with both the spirit and the letter of the law for a public CA have fallen on deaf ears. A huge amount of similar discussion on other, contemporary bugs has gone unnoticed. Even the most recent comment from GDCA continues to suggest that somehow this event is in the Subscriber’s control and that talking to Subscribers about certificate agility is the remedy for this error and others like it:
In the meantime, we will once again reiterate and emphasize the certificate revocation timelines as required by the BRs and our CPS to our customers. We plan to strengthen customer education efforts in future as part of our business procedures to enhance their understanding of the revocation timeline requirements, with the goal of jointly maintaining a more secure and reliable WebPKI ecosystem.
I put it to you that, after more than two months, there is no reason to believe that this CA has changed for the better regarding its failure to revoke on time. The lack of acknowledgement of the CA’s responsibility or action items that stand to prevent this kind of problem in the future leaves me with no confidence that it will handle the next misissuance differently.
We at Sectigo are given to understand that one of the primary purposes of Bugzilla is for CAs to learn from mistakes and continually improve. A disappointingly large number of CAs are suffering from this same malady right now, with almost no progress in these CAs seeing the light. I propose that Bugzilla can play a meaningful role in establishing clarity for CAs on their and their Subscribers’ proper roles in mandatory revocations and reduce the willful failure to revoke on time as the epidemic problem it is today.
My suggestion is that this and all instances of the deliberate delay of mandatory revocations beyond their prescribed time periods be treated the same way: The failure to revoke on time incident should remain open until the CA
- offers an accurate and complete analysis of the problem
- convincingly shows a change in attitude toward deliberate late revocation
- proposes and delivers action items that credibly offer positive change.
I suggest that for any such incident the CA is expected to
- monitor the bug for comments or questions
- maintain weekly posting cadence
- demonstrate responsiveness and candor in its postings.
I propose that offending CAs should carry this responsibility until such time as they meet these three requirements.
Comment 15•1 year ago
|
||
Thanks, Tim. I will leave this bug open so that GDCA can provide a clear and strong statement committing to improve it treatment of revocations, including acknowledgement of the seriousness of delayed revocations and their impact on public trust and security.
Furthermore, GDCA must re-file an updated, amended, and improved Incident Report with all required elements, such as timelines, certificates for which revocation was delayed, subscribers that had delayed their revocation and the reasons why, root cause analysis, and a restated list of Action Items to address root causes and prevent future revocation delays. Action Items should include:
- technological improvements that describe any technological upgrades or changes to the infrastructure that will help in faster detection and response to incidents requiring revocation;
- detailed changes to policies and procedures to ensure timely revocation, including new guidelines, checklists, and approval processes; and
- monitoring and auditing to ensure compliance with such policies and procedures and to identify any lapses quickly.
GDCA's future progress reporting here in Bugzilla is also essential so that stakeholders can gauge GDCA's ongoing efforts and improvements.
Comment 16•1 year ago
|
||
Hi Ben!
Unfortunately I can't comment on https://bugzilla.mozilla.org/show_bug.cgi?id=1872738 since its closed, but I think the same applies to that bug.
ARI integration as an action item was pushed back multiple times, and was not done by the time that incident was closed.
Do you think we can consider reopening that one until the action items are done and complete?
Maybe this can also be feedback for the bug handling as a whole, that incident action items need to be complete before an incident is closed?
| Assignee | ||
Comment 17•1 year ago
|
||
Following is the updated incident report:
Incident Report
Delayed revocation of SSL/TLS certificates with Non-critical Basic Constraints
Summary
GDCA has issued 20 SSL/TLS certificates from 15 September to 8 October 2023, with the Basic Constraints extension included but not set as Critical, as of 17:00, 2 April 2024 (UTC+8), all these certificated have either been revoked or expired, among which 13 certificates were revoked not within 5 days after confirming the issue, which does not comply with section 4.9.1.1 of the Baseline Requirements.
Impact
13 certificates have not been revoked within 5 days after confirming the issue.
Timeline
All times are UTC+8.
2023-9-15:
-First problematic certificate issued.
2023-10-8:
-Last problematic certificate issued.
2024-3-26:
-23:54: A certificate problem report was received by our CPR mechanism from a concerned third party, indicating that we have mis-issued a number of SSL/TLS certificates with Basic Constraints not set as critical and should be treated as an incident .
2024-3-27:
-14:15: We confirmed this issue and published a preliminary incident report on Bugzilla.
2024-3-27:
-We began to contact the impacted customers to communicate about certificate revocation and replacement.
2024-4-2:
-17:00: We completed the revocation of all problematic certificates.
Root Cause Analysis
GDCA has always believed that CAs must strictly abide by the BRs. We immediately contacted the subscribers to proceed with the certificate revocation and replacement upon confirming the mis-issuance.
The reason for the delayed certificate revocation is that we only require each subscriber to provide the contact information of a single person, not the multiple contacts for emergency cases like this, resulting in some of the subscribers in this case not being able to be contacted in time, and therefore they didn’t have sufficient time to revoke and replace the mis-issued certificates. Additionally, the affected certificates primarily involve government entities, and the process required for government entities to revoke and replace certificates is complex, it takes the approval and coordinated efforts from multiple internal departments to deploy a new SSL/TLS certificate for a government website, which makes the time required to complete the process unpredictable, generally it takes several days or even longer.
Lessons Learned
What went well
We began to contact the customers to communicate about certificate revocation and replacement shortly after confirming the issue.
What didn't go well
We only require each subscriber to provide the contact information of a single person, not the multiple contacts for emergency cases like this, resulting in some of the subscribers in this case not being able to be contacted in time, and therefore they didn’t have sufficient time to revoke and replace the mis-issued certificates to meet the required revocation timeline while not causing disruptions to the normal website operations.
Where we got lucky
The number of affected certificates is small, and the scope of impact on subscribers is limited.
Action Items
In response to this delayed revocation incident, we have reviewed and analyzed our business processes, PKI system technology, customer service, and monitoring management and propose the following remediation steps:
| Action Item | Kind | Due Date | Status |
|---|---|---|---|
| Establish a rapid certificate revocation response team to handle situations that require urgent certificate revocations. | Prevent | 05 June 2024 | Completed |
| Encourage the applicants to provide multiple backup contacts (emergency contacts) during the certificate application process to ensure timely communication. | Prevent | 18 June 2024 | In progress |
| Emphasize the revocation requirements as described in our Subscriber Agreement and our CPS to our customers to make sure they understand the possible situations that require rapid certificate replacement for security reasons, reiterate the revocation requirements as described in the BRs and our CPS to our major customers, ensuring the agreement and assistance from them in case revocation is required. | Prevent | 20 June 2024 | In progress |
| Formulate the GDCA Certificate Emergency Revocation Management Specification | Prevent | 20 June 2024 | In progress |
| Strengthen our team efforts to regularly monitor any updates to the BRs and establish a two-person cross-check process to cross check each configuration item in our current system. | Prevent | 28 June 2024 | In progress |
| Deploy pkilint in production environment. | Prevent | 10 July 2024 | In progress |
| Develop a certificate emergency revocation drill plan and conduct certificate revocation simulation drills for our major customers. | Prevent | 12 July 2024 | In progress |
| Upgrade the zlint to the latest version in our production environment, and incorporate pkilint to work alongside zlint for pre-issuance linting, certificate issuance will be permitted only when both linters return no anomaly values. | Prevent | 19 July 2024 | In progress |
Appendix
N/A
Details of affected certificates
Following certificates have not been revoked within the required timeline:
https://crt.sh/?id=10493475350
https://crt.sh/?id=10493412807
https://crt.sh/?id=10477997256
https://crt.sh/?id=10478083391
https://crt.sh/?id=10478023477
https://crt.sh/?id=10469199584
https://crt.sh/?id=10478050726
https://crt.sh/?id=10410654215
https://crt.sh/?id=10419351537
https://crt.sh/?id=10494148757
https://crt.sh/?id=10469206693
https://crt.sh/?id=10493688664
https://crt.sh/?id=10385760399
Comment 18•1 year ago
|
||
I’m curious about hearing details on this:
Develop a certificate emergency revocation drill plan and conduct certificate revocation simulation drills for our major customers.
Beyond that, did GDCA just find out about the revocation requirements? From this report I can’t help but read it as GDCA just didn’t think that they’re ever going to need to revoke certs.
So I’m curious, before this incident, had there been any internal discussions on how your CA will handle revocations?
Also, if a subscriber didn’t respond to you, it seems like your assumption was that you can’t revoke the certificate until they respond to you.
Can you explain how that is compatible with the BRs? If it’s not, then did you know that strategy is not compatible with the BRs prior to this incident?
Comment 19•1 year ago
|
||
Mozilla's incident response policy for delayed revocation includes the following expectations:
- The decision and rationale for delaying revocation will be disclosed in the form of a preliminary incident report immediately; preferably before the BR-mandated revocation deadline. The rationale must include detailed and substantiated explanations for why the situation is exceptional. Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable. When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.
- Your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA participates in to ensure your analysis of the risk and plan of remediation is acceptable
This report does not contain detailed per-subscriber rationale, which should include details about the harm that would be caused by prompt revocation, and how ways of mitigating that harm were not available.
It also does not contain any description of how GDCA worked with their auditors and root programs to ensure that the risk analysis and plan were acceptable. Please provide that information as well.
| Assignee | ||
Comment 20•1 year ago
|
||
(In reply to amir from comment #18)
I’m curious about hearing details on this:
Develop a certificate emergency revocation drill plan and conduct certificate revocation simulation drills for our major customers.
Our preliminary plan is to randomly select some subscribers from our key customer group to conduct a simulated certificate revocation exercise before 12 July 2024. The detailed process is as follows:
(1) Initiate the emergency response process (Day 1)
Simulating the situations where certificate revocation is required, our team evaluates the potential impact of certificate revocation to our business and subscribers, and then confirms the initiation of the emergency revocation process. The leader of the rapid response team organizes the relevant members (after-sales personnel and technical support engineers etc.) to have an emergency meeting to confirm the specific steps and responsibility allocation of the emergency revocation process and initiates the emergency response.
(2) Notify the subscribers and prepare for certificates replacement (Day 1)
- The after-sales personnel are responsible for contacting the subscribers within the specified time to convey the emergency situation and explain the necessity of certificate reissuance.
- The technical support engineers prepare the necessary technical support to ensure quick response to any questions from subscribers.
(3) Follow up on the internal approval process of the subscribers (Day 2 and Day 3)
- The after-sales personnel maintain close communication with the subscribers to understand the progress of the internal approval process.
- The after-sales personnel provide necessary information and documents to help subscribers expedite the approval process.
- The quick response team leader monitors the timeline of the approval process, ensuring the process to proceed according to the plan and complete by Day 3 after identifying any problematic certificates.
(4) Certificates replacement (Day 4)
- The after-sales personnel deliver the new certificate to the subscriber and follow up on the certificate replacement.
- Technical support engineers assist the subscriber in completing the certificate reconfiguration and ensure the website's certificate works correctly.
- The quick response team leader monitors the timeline of the certificate replacement process, ensuring it is completed by Day 4 after identifying any problematic certificates.
- The risk control specialist checks the progress on Day 4 and reviews the quick response team's timeline. If the replacement is not completed at this point, the risk control specialist will notify the quick response team to enforce mandatory certificate revocation by Day 5.
(5) Revocation of problematic certificates (Day 5)
- Perform certificate revocation, ensuring that the revocation status of the problematic certificate is “revoked”.
- Record the detailed system revocation logs, including the revocation time, certificate serial number and other relevant information.
- Notify the subscriber of the certificate revocation.
After this incident, we assisted our key customers in optimizing their internal approval process (comment #10) and developed an emergency certificate revocation drill plan for them. We hope to test the effectiveness of the optimized emergency revocation process and internal approval process through the certificate revocation simulation drill.
Beyond that, did GDCA just find out about the revocation requirements? From this report I can’t help but read it as GDCA just didn’t think that they’re ever going to need to revoke certs.
So I’m curious, before this incident, had there been any internal discussions on how your CA will handle revocations?
No, we have always been aware of the revocation requirements as stipulated by the BR, and have been constantly monitoring the latest updates to the BRs. We have always believed that the security of the WebPKI ecosystem is more important. We started internal discussions on how to complete the revocation of these certificates immediately after we confirmed the mis-issuance, and began to contact the customers to communicate about certificate revocation and replacement.
Also, if a subscriber didn’t respond to you, it seems like your assumption was that you can’t revoke the certificate until they respond to you.
Can you explain how that is compatible with the BRs? If it’s not, then did you know that strategy is not compatible with the BRs prior to this incident?
GDCA has always been committed to complying with BRs, and we believe that certificates must be revoked if revocation is required by the BRs. To ensure compliance with the revocation requirements of the BRs, GDCA will now reiterate the revocation requirements as set out by the BRs and our CPS with our subscribers. Additionally, in the GDCA Certificate Emergency Revocation Management Specification we intend to formulate, we will add a step to execute mandatory revocation on the fifth day if we are not able to contact the subscriber for 4 consecutive days (with a contact frequency of no less than 3 times per day).
We will learn a lesson from this incident and continuously improve our business processes, PKI system technologies, customer service, and monitoring management to prevent the recurrence of such issue.
| Assignee | ||
Comment 21•1 year ago
|
||
(In reply to Mike Shaver (:shaver emeritus) from comment #19)
Mozilla's incident response policy for delayed revocation includes the following expectations:
- The decision and rationale for delaying revocation will be disclosed in the form of a preliminary incident report immediately; preferably before the BR-mandated revocation deadline. The rationale must include detailed and substantiated explanations for why the situation is exceptional. Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable. When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.
- Your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA participates in to ensure your analysis of the risk and plan of remediation is acceptable
This report does not contain detailed per-subscriber rationale, which should include details about the harm that would be caused by prompt revocation, and how ways of mitigating that harm were not available.
The subscribers with delayed certificates revocation are mainly government entities or government-managed social organizations that provide a wide range of basic online services for the public. These services include healthcare, education, social security, transportation, and public resources, and other services that are essential to the normal functioning of the society and the daily lives of the public. While we immediately contacted those subscribers to proceed with the certificate revocation and replacement upon confirming the mis-issuance, due to the special nature of these government entities, the process required for them to revoke and replace certificates is complex, it takes the approval and coordinated efforts from multiple internal departments to deploy a new SSL/TLS certificate for a government website, which makes the time required to complete the process unpredictable, generally it takes several days or even longer. The continued provision of critical online services by these organizations will be severely affected if prompt revocation is implemented.
We fully understand the expectation of prompt revocation of affected SSL/TLS certificates from the WebPKI community. We are working closely with affected subscribers to assist them in optimizing their internal approval processes and improving the certificate revocation plan to ensure the revocation and replacement of certificates happen within the required time frame.
It also does not contain any description of how GDCA worked with their auditors and root programs to ensure that the risk analysis and plan were acceptable. Please provide that information as well.
During our WebTrust audit in April, we discussed the details of Bug 1888060 (certificates mis-issuance) and Bug 1889062 (delayed revocation) with our auditor. We have now completed the audit and submitted the audit report, which includes summaries of both incidents. The summary regarding the certificate revocation delay in the audit report is as follows:
GDCA had disclosed an incident (Bug 1889062) on Mozilla’s Bugzilla Platform on 2 April 2024. In the incident, As noted in Bug 1888060, GDCA had issued 20 SSL/TLS certificates with Non-critical Basic Constraints extension from 15 September to 8 October 2023, and as of 5:10PM, 2 April 2024 (UTC+8), all the certificates had either been revoked or expired. There were 13 certificates which had not been revoked within 5 days since receiving the Certificate Problem Report, leading to a violation of Baseline Requirements 4.9.1.1. After the revocations delayed, the cause analysis of the derived incident and the remediations conducted by GDCA have been illustrated in the process of public discussion. The matter on the public platform is still open for discussions and monitored for status update.
Regarding our communication with root programs, after publishing the certificates mis-issuance incident report (Bug 1888060) which is related to the certificate revocation delay incident, we separately sent the link to Bug 1888060 to all other root programs that we participated to further ask for their opinions, suggestions questions. Since Bug 1889062 is closely related to Bug 1888060, we did not separately share the link of Bug 1889062 to the other root programs that we participated.
With regard to the risk analysis and our proposed remediation plan, we are open to feedback and welcome opinions and suggestions from root programs and other concerned parties on whether or not they are acceptable.
Comment 22•1 year ago
|
||
How did you include summaries of incidents in an audit report when the incidents in question have not resolved? I'm used to seeing incidents only appearing in an audit after they are considered fully resolved, and the auditor can fully evaluate the situation with complete context. That does not mean you should not be consulting your auditors, but more that it should be separate from any WebTrust audit itself.
Will these incidents now appear in two consecutive WebTrust audits?
Comment 23•1 year ago
|
||
(In reply to capoc from comment #21)
(In reply to Mike Shaver (:shaver emeritus) from comment #19)
Mozilla's incident response policy for delayed revocation includes the following expectations:
- The decision and rationale for delaying revocation will be disclosed in the form of a preliminary incident report immediately; preferably before the BR-mandated revocation deadline. The rationale must include detailed and substantiated explanations for why the situation is exceptional. Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable. When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.
This report does not contain detailed per-subscriber rationale, which should include details about the harm that would be caused by prompt revocation, and how ways of mitigating that harm were not available.
The subscribers with delayed certificates revocation are mainly government entities or government-managed social organizations that provide a wide range of basic online services for the public.
GDCA continues to violate the clear expectations of Mozilla's delayed revocation incident policy: despite repeated explicit requests, GDCA had not provided detailed and substantiated explanations for why the situation is exceptional, on a per-Subscriber basis.
Why does GDCA refuse to comply with this aspect of the Mozilla policy?
(In reply to Mike Shaver (:shaver emeritus) from comment #19)
- Your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA participates in to ensure your analysis of the risk and plan of remediation is acceptable
It also does not contain any description of how GDCA worked with their auditors and root programs to ensure that the risk analysis and plan were acceptable. Please provide that information as well.
During our WebTrust audit in April, we discussed the details of Bug 1888060 (certificates mis-issuance) and Bug 1889062 (delayed revocation) with our auditor. We have now completed the audit and submitted the audit report, which includes summaries of both incidents. The summary regarding the certificate revocation delay in the audit report is as follows:
If your WebTrust audit was in April, then it couldn't have been a consultation about the risk analysis of delayed revocation, because you'd already decided to delay revocation starting in March.
I think it's pretty clear that the incident response policy's requirement for working with root programs and auditors requires them to be involved before the 5-day window has expired, so that the CA can take appropriate action if the conclusion is that the risk analysis is not correct, or that the remediation plan is not sufficient. Ben: could you provide your position on this matter? I'll try to remember to file a bug to update the MRSP language to be clearer here if you feel it appropriate.
*After the revocations delayed, the cause analysis of the derived incident and the remediations conducted by GDCA have been illustrated in the process of public discussion.
The "cause analysis" has not been illustrated here to the satisfaction of Mozilla's policy, as I have said above. Did your auditors agree that you had met the Mozilla incident policy for delayed revocation, based on the information provided here at the time?
Updated•1 year ago
|
Comment 24•1 year ago
|
||
(In reply to Mike Shaver (:shaver emeritus) from comment #23)
(In reply to capoc from comment #21)
(In reply to Mike Shaver (:shaver emeritus) from comment #19)
Mozilla's incident response policy for delayed revocation includes the following expectations:
(In reply to Mike Shaver (:shaver emeritus) from comment #19)
- Your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA participates in to ensure your analysis of the risk and plan of remediation is acceptable
It also does not contain any description of how GDCA worked with their auditors and root programs to ensure that the risk analysis and plan were acceptable. Please provide that information as well.
During our WebTrust audit in April, we discussed the details of Bug 1888060 (certificates mis-issuance) and Bug 1889062 (delayed revocation) with our auditor. We have now completed the audit and submitted the audit report, which includes summaries of both incidents. The summary regarding the certificate revocation delay in the audit report is as follows:
If your WebTrust audit was in April, then it couldn't have been a consultation about the risk analysis of delayed revocation, because you'd already decided to delay revocation starting in March.
I think it's pretty clear that the incident response policy's requirement for working with root programs and auditors requires them to be involved before the 5-day window has expired, so that the CA can take appropriate action if the conclusion is that the risk analysis is not correct, or that the remediation plan is not sufficient. Ben: could you provide your position on this matter? I'll try to remember to file a bug to update the MRSP language to be clearer here if you feel it appropriate.
I believe we should add a comment to https://github.com/mozilla/pkipolicy/issues/276, or we should open a new issue, in order to clarify the meaning of "Your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA participates in to ensure your analysis of the risk and plan of remediation is acceptable", which is currently found here: https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation, if that is what you are referring to.
Still, to try and answer the question, I believe there are three periods of time to consider - before misissuance, after discovery of misissuance, and after the 5-day period in which to revoke. Before mis-issuance, CAs should be working proactively on risk and remediation plans. During the 5-day period, CAs should be filing a bug and explaining their plans for remediation, which, if time allows, would require involvement of auditors, supervisory bodies, and root programs. Finally, it appears that remediation plans may need to be fine-tuned after the 5 days, which I think is primarily meant by "to ensure your analysis of the risk and plan of remediation is acceptable". During remediation efforts, as described in a bug, the CA should be engaging with auditors, supervisory bodies, and root programs to "ensure" that the risk analysis and remediation plan are acceptable to those bodies.
| Assignee | ||
Comment 25•1 year ago
|
||
Following is a status update of the proposed action items:
| Action Item | Kind | Due Date | Status |
|---|---|---|---|
| Establish a rapid certificate revocation response team to handle situations that require urgent certificate revocations. | Prevent | 05 June 2024 | Completed |
| Encourage the applicants to provide multiple backup contacts (emergency contacts) during the certificate application process to ensure timely communication. | Prevent | 18 June 2024 | Completed |
| Emphasize the revocation requirements as described in our Subscriber Agreement and our CPS to our customers to make sure they understand the possible situations that require rapid certificate replacement for security reasons, reiterate the revocation requirements as described in the BRs and our CPS to our major customers, ensuring the agreement and assistance from them in case revocation is required. | Prevent | 20 June 2024 | Completed |
| Formulate the GDCA Certificate Emergency Revocation Management Specification | Prevent | 20 June 2024 | Completed |
| Strengthen our team efforts to regularly monitor any updates to the BRs and establish a two-person cross-check process to cross check each configuration item in our current system. | Prevent | 28 June 2024 | Completed |
| Deploy pkilint in production environment. | Prevent | 10 July 2024 | In progress |
| Develop a certificate emergency revocation drill plan and conduct certificate revocation simulation drills for our major customers. | Prevent | 12 July 2024 | In progress |
| Upgrade the zlint to the latest version in our production environment, and incorporate pkilint to work alongside zlint for pre-issuance linting, certificate issuance will be permitted only when both linters return no anomaly values. | Prevent | 19 July 2024 | In progress |
| Assignee | ||
Comment 26•1 year ago
|
||
(In reply to Wayne from comment #22)
How did you include summaries of incidents in an audit report when the incidents in question have not resolved? I'm used to seeing incidents only appearing in an audit after they are considered fully resolved, and the auditor can fully evaluate the situation with complete context. That does not mean you should not be consulting your auditors, but more that it should be separate from any WebTrust audit itself.
Although this incident was submitted not within our audit period (1 March 2023 - 29 February 2024), it has been mentioned in the audit report. And as it is still under public discussion, the audit report has not yet provided a conclusive description.
Will these incidents now appear in two consecutive WebTrust audits?
Yes, it will appear in our next audit report as well.
| Assignee | ||
Comment 27•1 year ago
|
||
(In reply to Mike Shaver (:shaver emeritus) from comment #23)
GDCA continues to violate the clear expectations of Mozilla's delayed revocation incident policy: despite repeated explicit requests, GDCA had not provided detailed and substantiated explanations for why the situation is exceptional, on a per-Subscriber basis.
Why does GDCA refuse to comply with this aspect of the Mozilla policy?
The subscribers with delayed certificates revocation are mainly government entities or government-managed social organizations that provide a wide range of basic online services for the public. Please check the attached per-subscriber rationale for the revocation delay.
If your WebTrust audit was in April, then it couldn't have been a consultation about the risk analysis of delayed revocation, because you'd already decided to delay revocation starting in March.
I think it's pretty clear that the incident response policy's requirement for working with root programs and auditors requires them to be involved before the 5-day window has expired, so that the CA can take appropriate action if the conclusion is that the risk analysis is not correct, or that the remediation plan is not sufficient. Ben: could you provide your position on this matter? I'll try to remember to file a bug to update the MRSP language to be clearer here if you feel it appropriate.
*After the revocations delayed, the cause analysis of the derived incident and the remediations conducted by GDCA have been illustrated in the process of public discussion.
The "cause analysis" has not been illustrated here to the satisfaction of Mozilla's policy, as I have said above. Did your auditors agree that you had met the Mozilla incident policy for delayed revocation, based on the information provided here at the time?
We did not consult our auditor about the risk analysis in relation to the revocation delay before the 5-day window expired. We began communicating about this incident with our auditor during the on-site audit in mid-April.
| Assignee | ||
Comment 28•1 year ago
|
||
| Assignee | ||
Comment 29•1 year ago
|
||
Following is a status update of the proposed action items:
| Action Item | Kind | Due Date | Status |
|---|---|---|---|
| Establish a rapid certificate revocation response team to handle situations that require urgent certificate revocations. | Prevent | 05 June 2024 | Completed |
| Encourage the applicants to provide multiple backup contacts (emergency contacts) during the certificate application process to ensure timely communication. | Prevent | 18 June 2024 | Completed |
| Emphasize the revocation requirements as described in our Subscriber Agreement and our CPS to our customers to make sure they understand the possible situations that require rapid certificate replacement for security reasons, reiterate the revocation requirements as described in the BRs and our CPS to our major customers, ensuring the agreement and assistance from them in case revocation is required. | Prevent | 20 June 2024 | Completed |
| Formulate the GDCA Certificate Emergency Revocation Management Specification | Prevent | 20 June 2024 | Completed |
| Strengthen our team efforts to regularly monitor any updates to the BRs and establish a two-person cross-check process to cross check each configuration item in our current system. | Prevent | 28 June 2024 | Completed |
| Deploy pkilint in production environment. | Prevent | 10 July 2024 | Completed |
| Develop a certificate emergency revocation drill plan and conduct certificate revocation simulation drills for our major customers. | Prevent | 12 July 2024 | Completed |
| Upgrade the zlint to the latest version in our production environment, and incorporate pkilint to work alongside zlint for pre-issuance linting, certificate issuance will be permitted only when both linters return no anomaly values. | Prevent | 19 July 2024 | Completed |
| Assignee | ||
Comment 30•1 year ago
|
||
We are monitoring this bug for further comments or questions.
| Assignee | ||
Comment 31•1 year ago
|
||
We are monitoring this bug for further comments or questions.
| Assignee | ||
Comment 32•1 year ago
|
||
We are monitoring this bug for further comments or questions.
| Assignee | ||
Comment 33•1 year ago
|
||
We are monitoring this bug for further comments or questions.
Updated•1 year ago
|
| Assignee | ||
Comment 34•1 year ago
|
||
We are monitoring this bug for further comments or questions.
Updated•1 year ago
|
Comment 35•1 year ago
|
||
We continue work on incident-reporting and compliance requirements aimed at reducing delayed revocation, so this bug will remain open until at least February 1, 2025. Meanwhile, CAs should review https://github.com/mozilla/www.ccadb.org/pull/186.
Updated•9 months ago
|
Comment 36•9 months ago
|
||
Before closing this incident, GDCA should repeat its commitment to revoke TLS certificates timely in accordance with section 4.9.1 of the TLS Baseline Requirements.
Mozilla acknowledges that some of GDCA’s subscribers operate under complex regulatory or bureaucratic constraints. Still, GDCA will need to provide additional Action Items aimed at handling government-managed subscribers and ensuring that no external policies prevent timely revocation. Examples include: requiring government entities to provide written confirmation that they can comply with revocation timelines before issuance; ensuring that they have plans for replacing certificates within 24 hours of a misissuance or security incident; and streamlined approval processes so that TLS certificates can be replaced without problematic bureaucratic approval chains.
Finally, we will need a completed Closure Summary.
| Assignee | ||
Comment 37•9 months ago
|
||
GDCA commits to strictly complying with section 4.9.1 of the TLS Baseline Requirements to revoke TLS certificates in a timely manner. We will provide relevant Action Items later to support the implementation of this commitment.
Updated•8 months ago
|
| Assignee | ||
Comment 39•8 months ago
|
||
GDCA commits to strictly complying with section 4.9.1 of the TLS Baseline Requirements to revoke TLS certificates in a timely manner. In addition to the Action Items outlined last year, we propose the following Action Item to ensure timely revocation remains unaffected by external factors.
| Action Item | Kind | Due Date | Status |
|---|---|---|---|
| Revise the Subscriber Agreement to explicitly state that GDCA will revoke certificates according to Section 4.9.1 of the Baseline Requirements or our CPS, and will require the subscribers to provide written confirmation. | Prevent | 26 March 2025 | In progress |
| Assignee | ||
Comment 40•8 months ago
|
||
Following is a status update of the proposed action item:
| Action Item | Kind | Due Date | Status |
|---|---|---|---|
| Revise the Subscriber Agreement to explicitly state that GDCA will revoke certificates according to Section 4.9.1 of the Baseline Requirements or our CPS, and will require the subscribers to provide written confirmation. | Prevent | 26 March 2025 | Completed |
| Assignee | ||
Comment 41•8 months ago
|
||
Incident Closure Summary
Incident description:
GDCA issued 20 SSL/TLS certificates from 15 September to 8 October 2023, with the Basic Constraints extension included but not set as Critical, among which 13 certificates were revoked not within 5 days after confirming the issue, which does not comply with section 4.9.1.1 of the Baseline Requirements.
Incident Root Cause(s):
We only require each subscriber to provide the contact information of a single person, resulting in some of the subscribers in this case not being able to be contacted in time, and therefore they didn’t have sufficient time to revoke and replace the mis-issued certificates. Additionally, the affected certificates primarily involve government entities, and the process required for government entities to revoke and replace certificates is complex, it takes the approval and coordinated efforts from multiple internal departments to deploy a new SSL/TLS certificate for a government website, which makes the time required to complete the process unpredictable, generally it takes several days or even longer.
Remediation description:
(1) Formulated the GDCA Certificate Emergency Revocation Management Specification and established a rapid certificate revocation response team to handle situations that require urgent certificate revocations.
(2) Revised the Subscriber Agreement to explicitly state that GDCA will revoke certificates according to Section 4.9.1 of the Baseline Requirements or our CPS, and will require the subscribers to provide written confirmation.
(3) Developed a certificate emergency revocation drill plan and conducted certificate revocation simulation drills for our major customers.
(4) Encouraged the applicants to provide multiple emergency contacts during the certificate application process to ensure timely communication.
(5) Deployed the latest version of zlint version in the production environment, and also deployed pkilint.
Commitment summary:
GDCA commits to strictly complying with section 4.9.1 of the TLS Baseline Requirements to revoke TLS certificates in a timely manner. We will continuously improve and optimize our services to maintain a more secure and reliable WebPKI ecosystem.
All Action Items disclosed in this report have been completed as described, and we request its closure.
Comment 42•8 months ago
|
||
I intend to close this on Wed. 2-Apr-2025, unless there are questions or issues to discuss.
Thanks,
Ben
Updated•7 months ago
|
Description
•