Closed Bug 1916478 Opened 1 year ago Closed 11 months ago

eMudhra emSign PKI Services: Delayed Revocation of SSL/TLS Certificates

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: naveen.ml, Assigned: naveen.ml)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

Incident Report

Summary

An external researcher notified emSign CA about a potential compromise involving four SSL certificates belonging to a customer. Although the revocation of certificates with compromised keys is required within a 24-hour window, the process was delayed due to internal process failure and additionally, some minor delay can also be attributed to the affected customer's lack of response, which prevented timely action. Once responsibility for the incident was routed to the correct team and contact with the customer was established, the certificates were successfully revoked, resolving the issue. This incident underscores the importance of updated training and re-training of support staff to triage support inflows appropriately, as well as prompt customer communication in addressing security concerns effectively and ensuring timely mitigation actions.

Impact

The delay in revoking the compromised SSL certificates posed a potential risk, as the certificates remained active beyond the intended revocation period. However, there were no reported incidents of misuse during this period, minimizing the potential impact.

Timeline

All times are IST.

2024-08-31 23:19 - The emSign PKI received a notification in the general support queue from an external researcher regarding the potential compromise of four certificates. The staff monitoring queue has processed as per the standard operating procedure and failed to route the information in a timely manner to the appropriate team.

2024-09-01 10:30 - The emSign PKI team became aware of the issue and began investigating the accessibility of the private key on their website. Previous mis-categorization of the support ticket introduced an approximate 11 hour delay in its processing.

2024-09-01 11:15 - The emSign PKI team confirmed that the private key had been compromised and raised an internal ticket with the PKI support team to initiate contact with the affected customer.

2024-09-01 12:45 - The SSL support team attempted to contact the customer using the registered contact information but did not receive any response to calls or emails.

2024-09-01 16:30 - The SSL support team made another attempt to contact the customer, but again, there was no response.

2024-09-02 09:15 - The SSL support team confirmed to emSign PKI that the customer had not responded.

2024-09-02 09:40 - The emSign PKI team acknowledged the situation and raised an internal incident (Incident No. EMINCPKI0019) with the compliance group, noting the delay in revocation due to the customer's lack of response.

2024-09-02 10:10 - The SSL support team successfully contacted the customer, informed them of the private key compromise, and notified the emSign PKI team to revoke the four affected SSL certificates.

2024-09-02 10:24 - The emSign PKI team revoked the four SSL certificates compromised due to the private key exposure.

Root Cause Analysis

The delay in revocation occurred because the support staff monitoring the general queue has followed the general standard operation procedure, and failed to route the information in a timely manner to the appropriate team. NOTE: there is a separate dedicated queue for Revocation requests, but this request came into the general support queue. In addition to this, the affected customer was unresponsive to multiple attempts to inform them about the compromise of their SSL private keys (although somewhat expected over a weekend period). The emSign PKI Team should have set their revocation cutoff time in alignment with the original communication timestamp, instead of when the case was eventually routed to them. This lack of following due process resulted in the revocation process exceeding the expected timeframe.

Lessons Learned

This incident highlights a critical need to train and re-train support staff to recognize and triage revocation requests appropriately, even if they are submitted via an alternative channel. The team responsible for taking the revocation action (in the case of key compromise), should validate the timestamp on the original request, and not just from when it was received into their queue. Additionally, this incident highlights the benefits to the industry for customers to adopt more robust, automated processes for managing certificate revocations, such as ACME (Automated Certificate Management Environment), which could allow for immediate replacement of compromised certificates without requiring manual intervention.

What went well

The external researcher quickly identified and reported the compromised certificates, enabling emSign CA an opportunity to initiate the revocation process as soon as possible. Internal teams acted promptly once the support request was routed appropriately.

What didn't go well

The revocation process was delayed due to the failure by support staff to recognize a revocation request in the general queue and the subsequent mis-categorization of the support ticket. The inability to reach the affected customer further contributed to the delay in the process unnecessarily, when the revocation could have proceeded regardless. These elements prevented the timely deactivation of the compromised certificates, leading to a potential security vulnerability.

Where we got lucky

Despite the delay, there were no reported incidents of misuse of the compromised certificates. The prompt action taken by the external researcher allowed the vulnerable certificates to be replaced.

Appendix

Details of affected certificates

Below are impacted certificate that private keys are exposed:

https://crt.sh/?id=13600202644
https://crt.sh/?id=14318127190
https://crt.sh/?id=14329256062
https://crt.sh/?id=14330892679

Next Steps:

Moving forward, the focus will be on training support staff to recognize the importance of certain types of requests that require expedited processing, irrespective of which channel the request is received through. Periodic validation of support staff competency will be reinforced.

Outside of addressing internal process failures, additional actions such as encouraging customers to implement automated certificate management solutions like ACME, which allow for immediate certificate replacement in the event of a compromise, will be a focus. Additionally, we will continue to educate customers on the importance of responsiveness during security incidents to ensure swift and effective mitigation actions.

Based on Incident Reporting Template v. 2.0

Assignee: nobody → naveen.ml
Blocks: 1911183
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [leaf-revocation-delay]
Whiteboard: [ca-compliance] [leaf-revocation-delay] → [ca-compliance] [leaf-revocation-delay] Next update 2024-10-31

Thank you for this report. We have a requested update and a few questions.

Update: Please provide a list of Action Items as detailed on the CCADB Incident Reports page. We can interpret your “Next Steps” as actions, but it’s not clear what type of action is being performed and when they are expected to be completed (which is where the CCADB Incident Reporting format helps.)

Additionally, it's not clear how you are “addressing internal process failures” outside of retraining, which is generally not sufficient for remediation (recent example [1] and past examples [2] and [3]).

Regarding questions:

  1. Can you help us understand the difference in role and responsibilities between the four different teams noted in the Timeline Section (i.e., emSign PKI team, PKI support team, SSL support team, compliance group)?
  2. In the RCA Section it’s stated: “The delay in revocation occurred because the support staff monitoring the general queue has followed the general standard operation procedure, and failed to route the information in a timely manner to the appropriate team.” Does this mean the SOP included a process to route the information correctly and that process was not followed OR that the SOP did not include a process for proper routing? Additionally, how does “routing” in this statement relate to the “mis-categorization” stated in the What didn’t go well Section?
  3. What other solutions can you implement in response to lack of customer response (something outside of your control) to position yourself from repeating this incident in the future?

Overall, we would encourage a more robust Root Cause Analysis be performed and measurable Action Items created that address the root cause(s) identified during that analysis.

Summary: emSign PKI Services: Delayed Revocation of SSL/TLS Certificates → eMudhra emSign PKI Services: Delayed Revocation of SSL/TLS Certificates

emSign's CPS (https://repository.emsign.com/cps/CP-CPS-v1.16.pdf) has the two following statements:

Section 1.5.2. Contact Person
Certificate Problem Reporting
Attn: Revocation Support
Email: problem-reporting@emsign.com

Section 4.9.12. Special Requirements in Relation to Key Compromise
The reports shall be sent by email to the contact as provided in Section 1.5.2 of the CP/CPS (Certificate Problem Reporting section).

Question for emSign: Was the communication received at 2024-08-31 23:19 sent to the proper Certificate Problem Reporting address? Given that the timeline says it was in the "general support queue" I assume it was not, but I'd like to make sure.

Question for the community: If the message was not sent to the proper Certificate Problem Reporting address, does the 24-hour clock start at that time anyway?

(In reply to Chris Clements from comment #1)

Thank you for this report. We have a requested update and a few questions.

Update: Please provide a list of Action Items as detailed on the CCADB Incident Reports page. We can interpret your “Next Steps” as actions, but it’s not clear what type of action is being performed and when they are expected to be completed (which is where the CCADB Incident Reporting format helps.)
We have formalized our Action Items in alignment with the CCADB Incident Reporting format. Below are the key actions being taken, along with their expected completion dates:

  1. Implement Automated Routing of Revocation Requests:
    Develop and implement automated routing for all revocation-related requests, regardless of the queue they arrive in, to ensure timely escalation to the appropriate team.
    o Completion Date: 2024-10-31
  2. Enhance SOPs for Support Staff:
    Update standard operating procedures to include specific routing protocols for revocation requests, ensuring that mis-categorization is prevented in the future. The updated SOPs will clearly define how to handle requests from all channels and escalate them to the appropriate team without delay.
    o Completion Date: 2024-09-30
  3. Training and Validation:
    Provide additional training to support staff to identify critical security-related requests and validate their routing based on the original timestamp to adhere to revocation timelines. Training sessions will also include role-specific scenarios to reinforce the importance of correct categorization and escalation.
    o Completion Date: 2024-10-15
  4. Adopt ACME for Customer-Side Revocation Management:
    Encourage and guide customers to implement ACME (Automated Certificate Management Environment) to enable immediate certificate replacement in the case of key compromise. This will involve webinars, detailed guides, and individualized support to assist customers in adopting ACME.
    o Status: Ongoing.

Additionally, it's not clear how you are “addressing internal process failures” outside of retraining, which is generally not sufficient for remediation (recent example [1] and past examples [2] and [3]).

Regarding questions:

  1. Can you help us understand the difference in role and responsibilities between the four different teams noted in the Timeline Section (i.e., emSign PKI team, PKI support team, SSL support team, compliance group)?

o emSign PKI Team: Responsible for overseeing all certificate-related operations, including investigation, incident response, and maintaining adherence to compliance standards. This team also acts as the point of escalation for security-critical incidents.
o PKI Support Team: Manages technical support requests related to PKI systems and provides technical expertise to resolve issues related to certificate management. This team is responsible for executing actions like revocation based on findings and recommendations from the emSign PKI Team.
o SSL Support Team: Handles customer support specifically for SSL certificates, including reaching out to customers in cases of suspected compromise. The SSL Support Team acts as the first line of contact for certificate-related inquiries and supports the customer-side interaction during incidents.
o Compliance Group: Ensures that all processes and actions are in line with regulatory requirements. They oversee the compliance aspects of incidents and ensure that incidents are handled within the stipulated guidelines to avoid any regulatory violations.

  1. In the RCA Section it’s stated: “The delay in revocation occurred because the support staff monitoring the general queue has followed the general standard operation procedure, and failed to route the information in a timely manner to the appropriate team.” Does this mean the SOP included a process to route the information correctly and that process was not followed OR that the SOP did not include a process for proper routing? Additionally, how does “routing” in this statement relate to the “mis-categorization” stated in the What didn’t go well Section?

The revocation request was received by the researcher through the generic support email ID instead of being directed to the dedicated Certificate Problem Reporting or revocation email ID, as outlined in our CP/CPS. The Standard Operating Procedure (SOP) at the time did not explicitly define how to handle revocation requests that were submitted through the general support queue rather than the dedicated revocation queue. This lack of clarity in the SOP led to the mis-categorization of the request and delayed escalation to the appropriate team. The term "mis-categorization" refers specifically to the incorrect assignment of the support ticket, which resulted in it being processed as a general inquiry rather than a high-priority revocation request. The routing issue in the SOP refers to the absence of a well-defined escalation path for such requests when submitted through an alternate channel.

  1. What other solutions can you implement in response to lack of customer response (something outside of your control) to position yourself from repeating this incident in the future?

To mitigate the impact of unresponsive customers in the future, we are implementing predefined escalation timelines for revocation in cases of private key compromise. This means that if we do not receive a response from the customer within a set period, we will proceed with the revocation based on the initial notification timestamp from the researcher or the trusted source reporting the compromise. Additionally, we will create an internal policy to expedite the revocation process if there is credible evidence of a compromise, regardless of customer confirmation. Corresponding updates to subscriber agreements will inform subscribers that eMudhra will take these steps on behalf of customers when required. These measures will ensure that compromised certificates are revoked promptly to protect the broader ecosystem.

Overall, we would encourage a more robust Root Cause Analysis be performed and measurable Action Items created that address the root cause(s) identified during that analysis.

We recognize the importance of a comprehensive Root Cause Analysis (RCA) and the need for measurable actions to address the root causes effectively. In addition to retraining staff, we have identified specific systemic and procedural improvements:

  1. Systemic Root Causes:
    o Ineffective Queue Management: The current support system allowed revocation requests to be mis-categorized due to limited differentiation between general and critical support inquiries. To address this, we are implementing automated categorization tools that use keyword recognition to flag and route high-priority revocation requests directly to the appropriate queue.
    o Lack of Clear SOP for Alternate Channel Requests: The SOPs did not adequately cover how to handle requests submitted via alternate channels. We will revise these SOPs and conduct regular audits to ensure compliance with the updated procedures.
  2. Measurable Action Items:
    o Automated Categorization Implementation: By implementing automation tools to categorize and prioritize revocation requests, we aim to minimize manual errors in routing. This change will be tested and rolled out by 2024-10-31, and we will conduct quarterly reviews to measure the efficiency and accuracy of automated routing.
    o SOP Revision and Training Completion: Updated SOPs will be rolled out by 2024-09-30, with mandatory training for all support staff to be completed by 2024-10-15. The training will include specific scenarios on alternate channel handling and the importance of maintaining strict revocation timelines.
    o Customer Communication Enhancement: We will initiate a communication program starting on 2024-09-20 to educate customers about adopting automated tools like ACME. We will track customer adoption rates and issue periodic reports to measure the success of this initiative.
    By addressing both the systemic gaps and procedural weaknesses identified in the Root Cause Analysis, we aim to create a more robust and resilient revocation process that ensures compliance with industry standards and timely response to security incidents.

(In reply to Naveen Kumar ML from comment #3)

The revocation request was received by the researcher through the generic support email ID instead of being directed to the dedicated Certificate Problem Reporting or revocation email ID, as outlined in our CP/CPS.

I believe "the researcher" refers to me, so I would like to comment on that. I directed the revocation request to the mail address listed in CCADB as the problem reporting contact.
If emSign operates a "dedicated" contact for certificate problem reporting, that should probably the one being listed in CCADB.

(In reply to Hanno Boeck from comment #4)

(In reply to Naveen Kumar ML from comment #3)

The revocation request was received by the researcher through the generic support email ID instead of being directed to the dedicated Certificate Problem Reporting or revocation email ID, as outlined in our CP/CPS.

I believe "the researcher" refers to me, so I would like to comment on that. I directed the revocation request to the mail address listed in CCADB as the problem reporting contact.
If emSign operates a "dedicated" contact for certificate problem reporting, that should probably the one being listed in CCADB.

Thank you for bringing this to our attention. Upon review, we confirm that the current contact details in the CCADB should be updated to accurately reflect the dedicated Certificate Problem Reporting email address, as outlined in our CP/CPS. We had earlier updated the CP/CPS in line with guidelines to have dedicated problem reporting address but the one in CCADB reflects the general support email ID. We apologize for the oversight and we will now update this to reflect the correct problem reporting email address. In any case, we felt there was an obligation to report the incident given that we could have acted sooner.

Given that customers may send such sensitive requests to both queues, we have updated our SOP’s and training to allow agents managing both queues to be reactive to similar requests for revocation in the future so that it can be handled within the stipulated time as per guidelines. We have added keyword based triggers to allow for immediate redirection of support requests relating to certificate revocation to the dedicated queue. Our dedicated revocation channels, as specified in our CP/CPS, are monitored 24x7, as required by the CA/B Forum Guidelines.

We want to reassure the community that we are committed to maintaining a high level of attention to detail and continuously refining our processes.

Thank you for all of the updates in Comment 3.

A few follow-up questions:

To mitigate the impact of unresponsive customers in the future, we are implementing predefined escalation timelines for revocation in cases of private key compromise. This means that if we do not receive a response from the customer within a set period, we will proceed with the revocation based on the initial notification timestamp from the researcher or the trusted source reporting the compromise.

  1. Is there an escalation process for other revocation reasonCodes?
  2. Can you help us understand the type of response or outcome you are expecting to receive from a customer related to revocation, as required by the TLS BRs?

Customer Communication Enhancement: We will initiate a communication program starting on 2024-09-20 to educate customers about adopting automated tools like ACME. We will track customer adoption rates and issue periodic reports to measure the success of this initiative.

  1. Would eMudhra consider sharing the adoption information with this community to include describing what you find to be working and not working, as it relates to your adoption efforts?
  2. In conjunction with ACME, has eMudhra considered also adopting ACME Renewal Information (ARI)?
  3. Has eMudhra considered technical controls to further encourage adoption of automation solutions? If so, can you please describe them? If not, how come?

Specific to Comment 5, please create a separate report for this incident (e.g., failure to ensure information stored in the CCADB is kept up to date as changes occur).

(In reply to Chris Clements from comment #6)

Thank you for all of the updates in Comment 3.

A few follow-up questions:

To mitigate the impact of unresponsive customers in the future, we are implementing predefined escalation timelines for revocation in cases of private key compromise. This means that if we do not receive a response from the customer within a set period, we will proceed with the revocation based on the initial notification timestamp from the researcher or the trusted source reporting the compromise.

  1. Is there an escalation process for other revocation reasonCodes?

Yes, eMudhra has an escalation process for other revocation reason codes in addition to private key compromise. Each revocation request, depending on the reason code, follows predefined timelines for customer response and internal action. For example, in cases such as certificate misuse, fraud, or cessation of operation, the same principle of predefined escalation timelines applies. If a customer does not respond within the defined period, eMudhra proceeds with revocation based on the initial report from a trusted source or internal investigation, ensuring compliance with the relevant industry standards, including the TLS BRs.

  1. Can you help us understand the type of response or outcome you are expecting to receive from a customer related to revocation, as required by the TLS BRs?

In relation to revocation requests, eMudhra seeks customer confirmation or acknowledgment of the reported issue, such as key compromise or misuse. The ideal outcome would be the customer’s verification that they are aware of the issue, alongside any steps they may have taken to resolve it (e.g., key replacement). If no response is received within the set timeline, eMudhra proceeds with revocation to ensure compliance with the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates (TLS BRs).

Customer Communication Enhancement: We will initiate a communication program starting on 2024-09-20 to educate customers about adopting automated tools like ACME. We will track customer adoption rates and issue periodic reports to measure the success of this initiative.

  1. Would eMudhra consider sharing the adoption information with this community to include describing what you find to be working and not working, as it relates to your adoption efforts?

eMudhra is open to sharing insights from our ACME adoption efforts with this community. We will be happy to periodically share information about adoption rates, challenges faced, and what strategies have proven effective or less effective as we engage with our customers to promote automation.

  1. In conjunction with ACME, has eMudhra considered also adopting ACME Renewal Information (ARI)?

Yes, eMudhra is actively considering the adoption of ACME Renewal Information (ARI) with target to implement the current draft by end of year.

  1. Has eMudhra considered technical controls to further encourage adoption of automation solutions? If so, can you please describe them? If not, how come?

eMudhra has considered several technical controls to encourage the adoption of automation solutions, particularly focusing on the ACME protocol (RFC 8555).
In addition to ACME, eMudhra provides APIs that allow customers to automate certificate issuance, renewal, and revocation. We also offer CLM solution that automates. These controls help streamline certificate management, including issuance, renewal, and revocation.

Specific to Comment 5, please create a separate report for this incident (e.g., failure to ensure information stored in the CCADB is kept up to date as changes occur).

Incident has been created.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-10-31 → [ca-compliance] [leaf-revocation-delay] Next update 2024-11-30

We continue work on incident-reporting and compliance requirements aimed at reducing delayed revocation, so this bug will remain open until at least February 1, 2025. Meanwhile, CAs should review https://github.com/mozilla/www.ccadb.org/pull/186.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-11-30 → [ca-compliance] [leaf-revocation-delay] Next update 2025-02-01

To update on my earlier message from three months ago, I am pleased to share that the implementation of ACME Renewal Information (ARI) has now been successfully completed.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-02-01 → [ca-compliance] [leaf-revocation-delay] Next update 2025-03-03

eMudhra should file an updated, full incident report that includes all relevant information learned or discussed herein since Comment #0 (9/3/2024). Also, the updated incident report should clearly explain the technical controls have been added to prevent unintentional revocation delays from reoccurring. These should be listed as Action Items. Finally, eMudhra should confirm its commitment to adhere to section 4.9.1 of the TLS Baseline Requirements and ensure the timely revocation of certificates in accordance with that section of the TLS BRs.

Flags: needinfo?(naveen.ml)

eMudhra should file an updated, full incident report that includes all relevant information learned or discussed herein since Comment #0 (9/3/2024). Also, the updated incident report should clearly explain the technical controls have been added to prevent unintentional revocation delays from reoccurring. These should be listed as Action Items. Finally, eMudhra should confirm its commitment to adhere to section 4.9.1 of the TLS Baseline Requirements and ensure the timely revocation of certificates in accordance with that section of the TLS BRs.

Incident Report

Summary

On August 31, 2024, eMudhra received a private key compromise notification for four SSL/TLS certificates from an external researcher. As per TLS Baseline Requirements (BR) Section 4.9.1, these certificates should have been revoked within 24 hours. However, revocation was delayed due to misrouting of the request, which was sent to a general support email instead of the dedicated Certificate Problem Reporting contact listed in our CP/CPS. Initially, it was believed that the researcher had sent the request to a general support email, but upon further review, it was determined that the researcher has retrieved the Problem Reporting contact address from CCADB, which was outdated. Additionally, customer non-responsiveness further contributed to the delay.
Once the issue was escalated to the appropriate team, eMudhra took immediate action, and the certificates were revoked on September 2, 2024. To prevent similar delays in the future, technical and procedural controls have been implemented to ensure timely processing of revocation requests in compliance with TLS BR 4.9.1.

Impact

The incident affected four SSL/TLS certificates, which were not revoked within the required 24-hour period due to misrouting of the request and customer non-responsiveness. As a result, these certificates remained active longer than intended.
Despite the delay, there was no evidence of misuse or compromise before revocation. OCSP responses and CRLs were correctly updated once revocation was completed, ensuring that relying parties could verify the certificate status. Certificate issuance was not impacted, and no service disruption was reported.
The incident highlighted the need for regular CCADB audits, improved request handling automation, and enhanced escalation workflows to prevent similar delays in the future.

Timeline

All times are ITC.
• 2024-08-31 23:19 – Researcher submitted the revocation request via general support email.
• 2024-09-01 10:30 – eMudhra became aware of the issue and started internal investigation.
• 2024-09-01 11:15 – Confirmed private key compromise and escalated to the PKI support team.
• 2024-09-01 12:45 – Multiple attempts to contact the customer, but no response.
• 2024-09-02 09:15 – Internal escalation triggered due to lack of customer response.
• 2024-09-02 09:40 – Internal Incident (EMINCPKI0019) raised for delayed revocation.
• 2024-09-02 10:10 – Customer responded, and compromise notification was acknowledged.
• 2024-09-02 10:24 – eMudhra revoked the four affected certificates.

Root Cause Analysis

The revocation delay was caused by three primary factors:

  1. Incorrect Email Address in CCADB – The researcher retrieved the correct Certificate Problem Reporting email from CCADB, but the email address listed was outdated, causing the request to be misrouted internally.
  2. Delayed Internal Escalation – Due to misrouting, the support team did not immediately recognize the request as high priority, leading to additional processing delays before it was escalated to the PKI team.
  3. Customer Non-Responsiveness – Multiple attempts were made to contact the affected customer, but no response was received in time. While customer confirmation is not required for key compromise revocations, lack of a predefined escalation mechanism caused unnecessary waiting.

Lessons Learned

This incident highlighted the critical importance of maintaining up-to-date problem reporting contact details in CCADB. Ensuring revocation requests are routed correctly from the start prevents delays. Additionally, a time-based escalation mechanism has been implemented to automatically trigger revocation in cases where the affected customer does not respond within the required timeframe. These improvements will ensure full compliance with TLS BR 4.9.1 and minimize future revocation delays.

What went well

The issue was detected internally before external enforcement action was needed. Incident response teams quickly escalated and resolved the issue once identified. New process enhancements have been successfully implemented.

What didn't go well

  • The problem reporting email in CCADB was outdated, leading to misrouting of the request. Internal escalation was not immediate, causing additional processing delays. Customer non-responsiveness delayed action further, as no predefined escalation mechanism was in place.

Where we got lucky

  • No misuse of the compromised certificates was reported before revocation. The external researcher followed up, ensuring timely identification of the issue.

Action Items

Action Item Kind Due Date
Improve Communication & Email Routing: Ensure CCADB records remain updated, and revocation requests are automatically routed to the appropriate support team. More regular reviews of CCADB records to validate accuracy has also been implemented. Prevent Completed
Define Handling Process for Non-Responsive Customers: Establish clear timelines where, if a customer does not respond within 6 hours, revocation proceeds based on the initial notification timestamp. Improve customer communication for urgent security cases. Prevent Completed
Deploy ACME ARI for Automated Revocation Handling: Enables automatic renewal notifications to customers and allows automated revocation in cases where a certificate is compromised. Mitigate Completed
Strengthen Internal Processes & Training: Conduct quarterly audits and training to reinforce 24-hour revocation compliance and ensure support teams can identify and escalate revocation requests promptly. Prevent Completed

Appendix

Details of affected certificates

https://crt.sh/?sha256=1CD5EC94CE7EB7FEFFC1329A39FD1E671498AFA225DC8F29C3638FBAE0A54AA8
https://crt.sh/?sha256=1CA07B23B2D53B4A983D17437A58F9BAC4AA724C65CF98E7D81590FDC2AA420A
https://crt.sh/?sha256=F398AEC5325890555B4BAAE65D7E4B9EAE6FD735BF5724AC307A00017B131D72
https://crt.sh/?sha256=95C43EB0CD7C4E1A8FAB78CA206DBB93417D123F74A21368446DB29170844EB0

Final Commitment

eMudhra remains fully committed to complying with TLS Baseline Requirements (Section 4.9.1) and ensuring timely revocation of all certificates in cases of key compromise or security threats. The implemented improvements ensure strict compliance with industry standards and prevent similar delays in the future.

Based on Incident Reporting Template v. 2.0

Flags: needinfo?(naveen.ml)

Please provide a status update.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-03-03 → [ca-compliance] [leaf-revocation-delay]

No action items are pending and no further action

No action items are pending and no further action.

No action items are pending and no further action.

No action items are pending and no further action.

If there are no action items remaining, and you believe that this case can be closed, then please submit a Closure Summary:
https://www.ccadb.org/cas/incident-report#how-are-reports-closed
https://www.ccadb.org/cas/incident-report#closure-report
https://www.ccadb.org/cas/incident-report#incident-closure-summary
Thanks,
Ben

Flags: needinfo?(naveen.ml)

Report Closure Summary

  • Incident description:
    On August 31, 2024, eMudhra received a private key compromise notification for four SSL/TLS certificates from an external researcher. As per TLS Baseline Requirements (BR) Section 4.9.1, these certificates should have been revoked within 24 hours. However, revocation was delayed due to misrouting of the request, which was sent to a general support email instead of the dedicated Certificate Problem Reporting contact listed in our CP/CPS. Initially, it was believed that the researcher had sent the request to a general support email, but upon further review, it was determined that the researcher has retrieved the Problem Reporting contact address from CCADB, which was outdated. Additionally, customer non-responsiveness further contributed to the delay.
    Once the issue was escalated to the appropriate team, eMudhra took immediate action, and the certificates were revoked on September 2, 2024. To prevent similar delays in the future, technical and procedural controls have been implemented to ensure timely processing of revocation requests in compliance with TLS BR 4.9.1.

  • Incident Root Cause(s):
    The revocation delay was caused by three primary factors:

  1. Incorrect Email Address in CCADB – The researcher retrieved the correct Certificate Problem Reporting email from CCADB, but the email address listed was outdated, causing the request to be misrouted internally.
  2. Delayed Internal Escalation – Due to misrouting, the support team did not immediately recognize the request as high priority, leading to additional processing delays before it was escalated to the PKI team.
  3. Customer Non-Responsiveness – Multiple attempts were made to contact the affected customer, but no response was received in time. While customer confirmation is not required for key compromise revocations, lack of a predefined escalation mechanism caused unnecessary waiting.
  • Remediation description:
    To address the identified issue, eMudhra has implemented the following corrective actions:
  1. Enhanced Communication & Email Routing: CCADB records are now regularly reviewed and updated to ensure accuracy. Revocation requests are automatically routed to the appropriate support team for prompt action.
  2. Defined Process for Non-Responsive Customers: A structured approach has been established, ensuring that if a customer does not respond within six hours, revocation proceeds based on the initial notification timestamp. Additionally, customer communication has been improved for urgent security cases.
  3. Deployment of ACME ARI for Automated Revocation Handling: This enables automatic renewal notifications and facilitates automated revocation in cases of certificate compromise, enhancing efficiency and security.
  4. Strengthened Internal Processes & Training: Quarterly audits and training sessions have been implemented to reinforce adherence to the 24-hour revocation policy. Support teams are now better equipped to identify and escalate revocation requests promptly.
  • Commitment summary:
    eMudhra remains fully committed to complying with TLS Baseline Requirements (Section 4.9.1) and ensuring timely revocation of all certificates in cases of key compromise or security threats. The implemented improvements ensure strict compliance with industry standards and prevent similar delays in the future.

All Action Items disclosed in this report have been completed as described, and we request its closure.

Last call. I'll pull this up for a review and closure later next week (Wed-Fri). Please provide any additional concerns or questions before then.

Flags: needinfo?(naveen.ml) → needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.