Closed Bug 1970727 Opened 10 months ago Closed 8 months ago

eMudhra: Failure to respond to a Problem Report within 24 hours

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rob, Assigned: naveen.ml)

References

Details

(Whiteboard: [ca-compliance] [policy-failure] [external])

On 2025-06-03 at 14:35 UTC I sent the following CPR to eMudhra:

Hi. https://crt.sh/mozilla-disclosures#disclosureincomplete is reporting "ECDSA verification failure" for several CRLs issued by eMudhra. Double-checking with "openssl crl -verify" also shows errors for them.

Are these CRLs signed using the wrong private key?

More than 48hrs later, I still have not received a reply.

See Also: → 1970728

I just emailed problem-reporting@emsign.com again to notify eMudhra that I'd created this bug and bug 1970728. On this occasion I immediately received an automated response email informing me that a ticket had been created in eMudhra's system.

As for my original Problem Report email, sent 2025-06-03: I haven't received a delivery failure message, but I also didn't receive an automated response email.

I don't have any records of automated responses when dealing with eMudhra historically. However they were responding to that email address within 24h with followups from personal accounts.

That was last June however, things may have changed on their processes.

Assignee: nobody → naveen.ml
Status: NEW → ASSIGNED
Type: defect → task
Whiteboard: [ca-compliance] [policy-failure]

Preliminary Incident Report

Summary

  • Incident description: On June 3, 2025 at 14:35 UTC, a Third Party submitted a CPR (Certificate Problem Report) to problem-reporting@emsign.com regarding ECDSA verification failures for multiple CRLs issued by eMudhra. Despite this, no automated acknowledgement or substantive response was received by June 4, 2025 at 14:35 UTC, in breach of the Baseline Requirements for the Issuance and Management of Publicly-Trusted TLS Server Certificates section 4.9.5, which obligates a CA to investigate and provide a preliminary report within 24 hours of CPR receipt.
    Upon preliminary investigation post eMudhra becoming aware (i.e June 5, 2025), it was discovered that the CPR message in question was quarantined by anti spam filters as a result it did not raise a ticket in our system. We are investigating into the root cause as to why this was classified when a subsequent one wasn’t and processes around review of messages classified as spam.

A full incident report will be submitted no later than June 9, 2025.

  • Relevant policies:
    o Baseline Requirements for the Issuance and Management of Publicly-Trusted TLS Server Certificates, section 4.9.5 (“Time within which CA must process the revocation request”): “Within 24 hours after receiving a Certificate Problem Report, the CA SHALL investigate … and provide a preliminary report on its findings to both the Subscriber and the entity who filed the CPR”
    o CCADB Incident Reporting Policy: A CA Owner’s failure to meet CA/Browser Forum commitments (e.g., not responding to a CPR within 24 hours) is classified as an incident that must be publicly reported ccadb.org.

  • Source of incident disclosure: Third party report via Mozilla Bugzilla Bug 1970727 opened by Sectigo team on June 5, 2025 and ticket raised as a result of email received on June 5, 2025 from Sectigo.

Full Incident Report

Summary

  • CA Owner CCADB unique ID: A005678

  • Incident description:
    On June 3, 2025, at 10:05:00 UTC, a third party (Sectigo) submitted a Certificate Problem Report (CPR) to problem-reporting@emsign.com, citing ECDSA signature verification failures for CRLs associated with six Intermediate CAs. However, the email was inadvertently quarantined by internal spam filters and did not reach the compliance or PKI operations teams.
    A follow-up email from the same sender was received on June 5, 2025, at 20:12:16 UTC, which successfully reached the inbox and the ticketing system, prompting immediate investigation.
    As the initial CPR was not acknowledged or responded to within the required 24 hours, this constituted a deviation from the CA/Browser Forum Baseline Requirements, Section 4.9.5, which mandates timely CPR processing and response.

  • Timeline summary:

    • Non-compliance start date: June 3, 2025 10:05:00 UTC (CPR received but quarantined)
    • Non-compliance identified date: June 5, 2025 20:12:16 UTC (follow-up email received and investigated)
    • Non-compliance end date: June 6, 08:29:00 UTC (Preliminary Incident Report submitted)
  • Relevant policies:
    CA/Browser Forum Baseline Requirements, Section 4.9.5: Timeframe to process Certificate Problem Reports.

  • Source of incident disclosure:
    Third-party report via Mozilla Bugzilla Bug 1970727, raised by Sectigo and follow-up email, citing lack of response to their initial CPR.

Impact

  • Total number of certificates: N/A – This issue pertained to reporting on CRL signature errors and not certificate issuance.

  • Total number of "remaining valid" certificates: N/A

  • Affected certificate types: None. The CPR pertained to invalid CRL signatures.

  • Incident heuristic: Communication and process gap in CPR monitoring.

  • Was issuance stopped in response to this incident, and why or why not?: No. The incident did not involve certificate issuance or compromise. The response gap was limited to CPR acknowledgement timing.

  • Analysis:
    The initial CPR was sent to problem-reporting@emsign.com, an address configured to notify senior compliance personnel and generate automated incident tickets via our internal system.
    However, due to on-going focus on reducing spam through better policies CPR email in question was incorrectly flagged at mailbox level as spam and quarantined, preventing it from reaching the intended recipients .
    The issue came to light only after a follow-up email from the same sender on June 5, 2025, at 20:12:16 UTC got delivered to ticketing system and also to other members of compliance team. Basis this, the compliance team in-conjunction with in-house IT retrieved the original email from quarantine and began immediate investigation and response.
    The delay was traced to:
    • Inaccurate classification by the spam filter.
    • Absence of a mandatory daily quarantine review process for critical mailboxes
    This resulted in a failure to meet the 24-hour response window specified under BR 4.9.5.

  • Additional considerations:
    • The problem-reporting@emsign.com address is configured to notify designated compliance personnel and also generate tickets via our internal ticketing system.
    • This workflow has consistently functioned as intended, with all prior CPRs delivered to intended audience.
    • In this specific case, however, the message was filtered before it could reach the ticketing layer, and no secondary monitoring mechanism existed.

Timeline

Timeline (All times in UTC)
2025-06-03 10:05:00 : CPR email received by spam filter but quarantined.
2025-06-04 10:05:00 : 24-hour response window expired (as per BR 4.9.5).
2025-06-05 20:12:16 : Follow-up email received and ticket created in our email system.
2025-06-06 04:05:00 : Internal teams forwarded the flagged critical follow-up email to the PKI and compliance team.
2025-06-06 04:30:00 : Internal incident created and IT team began reviewing quarantine email traces and found that the first email was quarantine.
2025-06-06 08:29:00 : Preliminary incident response submitted within 24 hours of the follow-up bug filed by Sectigo.

Related Incidents

Bug Date Description
1967929 2025-05-22 Failed to respond a Certificate Problem Report within 24 hours which violates Section 4.9.5 of the TLS BRs.
1905509 2024-06-29 Failed to respond a Certificate Problem Report within 24 hours which violates Section 4.9.5 of the TLS BRs.
1886998 2024-03-22 Failed to respond a Certificate Problem Report within 24 hours which violates Section 4.9.5 of the TLS BRs.
1886626 2024-03-20 Failed to respond a Certificate Problem Report within 24 hours which violates Section 4.9.5 of the TLS BRs.
1959733 2025-04-10 Failed to respond a Certificate Problem Report within 24 hours which violates Section 4.9.5 of the TLS BRs.
1916478 2024-10-14 Failed to respond a Certificate Problem Report within 24 hours which violates Section 4.9.5 of the TLS BRs.

Root Cause Analysis

Contributing Factor 1: Spam Filter Quarantine of CPR Email

  • Description: The CPR email was flagged and quarantined by spam filtering software before it could reach either the ticketing system or compliance team members. The mailbox's SOP did not include daily quarantine checks for misclassified incident reports.
  • Timeline:
    2025-06-03 10:05:00 : The CPR email from Sectigo received was automatically quarantined and did not reach compliance personnel or the ticketing system.
    2025-06-05 20:12:16 : A follow-up email from Sectigo was received successfully, triggering investigation.
    2025-06-06 04:30:00 : Internal IT confirmed that the original CPR email had been quarantined.
    2025-06-06 08:29:00 : The issue was formally acknowledged and a preliminary report was submitted.
  • Detection: Discovered on June 5, 2025 20:12:16 UTC, after a follow-up email.
  • Interaction with other factors: The CPR was associated with a separate technical issue (CRLs signed incorrectly). While both incidents are distinct, the delay in CPR response extended the time required to address the CRL issue.
  • Root Cause Analysis methodology used: 5-Whys

Lessons Learned

  • What went well:
    o The follow-up alert mechanism and public Bugzilla visibility prompted timely investigation.
    o Ticketing workflow otherwise functions well for CPR processing.

  • What didn’t go well:
    o No daily quarantine monitoring was in place.
    o BR 4.9.5 compliance was inadvertently violated.

  • Where we got lucky:
    o The incident did not involve certificate mis-issuance or security compromise
    o The follow-up email helped limit escalation and enabled timely correction

  • Additional: o Daily quarantine review will be included in the updated CPR SOP.

Action Items

Action Item Kind Corresponding Root Cause(s) Evaluation Criteria Due Date Status
Implement daily review of quarantine for critical mailboxes Detect Root Cause # 1 Daily review process documented and tracked in logs 2025-06-06 Completed

Appendix

N/A

As identified by your related incidents list, this isn't the first time spam filters have intercepted CPRs. As such, I have several questions relating not specifically to the handling of this CPR, but rather to how eMudhra approaches incidents reported by other CAs.

Question 1: Does eMudhra have in place a standard operating procedure to review all incidents reported by other CAs?

Question 2: Did eMudhra review any of the previous incidents attributing a missed CPR to a spam filter or other email delivery related failures?

Question 3: Did eMudhra review its CPR related procedures and/or (email) systems following previous reports of CPRs being intercepted by spam filters? If yes, what was the result of this review?

Question 1: Does eMudhra have in place a standard operating procedure to review all incidents reported by other CAs?

Yes we do, this is independently done by the compliance team and a monthly report summarizing the incidents, their grouping into key areas is published to Policy Authority Members as well as the team that develops and operates the Web PKI infrastructure.

Question 2: Did eMudhra review any of the previous incidents attributing a missed CPR to a spam filter or other email delivery related failures?

We have reviewed these incidents attributing to a missed CPR in the past. This is why in the past in an effort to move away from single point dependence, we moved the CPR reporting email to a distribution list marking both a Ticketing system as well as certain key members of the compliance team. We hoped that this would enable timely reaction to incidents even if one person misses it. As part of the ticketing system, an automated response was also setup to send a ticketID for better tracking.

Question 3: Did eMudhra review its CPR related procedures and/or (email) systems following previous reports of CPRs being intercepted by spam filters? If yes, what was the result of this review?

We did indeed review the CPR related procedures following previous reports on delayed responses but failed to specifically action on daily checking of quarantined emails. We have now assigned this responsibility to two members of the team so as to ensure we don’t miss any email coming in.

(In reply to Naveen Kumar ML from comment #6)

Thank you for the prompt reply! This all sounds very reasonable and answers my questions completely.

I have a vague recollection (but can't seem to find the relevant incident report) that some CAs struggled previously with emails being filtered at such an early stage they weren't even delivered as quarantined; do you have any hints or experiences you could share for CAs potentially struggling with such issues?

(In reply to JSaares from comment #7)

(In reply to Naveen Kumar ML from comment #6)

Thank you for the prompt reply! This all sounds very reasonable and answers my questions completely.

I have a vague recollection (but can't seem to find the relevant incident report) that some CAs struggled previously with emails being filtered at such an early stage they weren't even delivered as quarantined; do you have any hints or experiences you could share for CAs potentially struggling with such issues?

While we haven’t observed CPR emails being dropped before quarantine in our environment, we do recognize that spam filters especially those using adaptive or multi-layered logic can behave like a black box. Even when configurations are set to quarantine rather than reject, some filters may silently drop messages based on scoring thresholds, domain reputation, or upstream routing behavior.

For others facing similar risks, a few practices we've considered or found useful include:

  1. Ensuring CPR-related addresses are excluded from “reject” or auto-delete rules.
  2. Monitoring quarantined messages daily for high-priority mailboxes.
  3. Reviewing upstream filters or provider-level controls that may silently block mail before it reaches internal systems.

No further action required at this time.

No further action required at this time.

No further action required at this time.

Please provide a Closure Report if you feel this is ready for closure.

Flags: needinfo?(naveen.ml)

Report Closure Summary

  • Incident description:
    On June 3, 2025, at 10:05 UTC, a Certificate Problem Report (CPR) was submitted by Sectigo to the designated problem-reporting email address (problem-reporting@emsign.com). Due to ongoing efforts to strengthen email filtering policies, the message was inadvertently flagged and quarantined by the spam filter at the mailbox level, preventing its delivery to the compliance team or internal ticketing system. As a result, the CPR was not acknowledged or responded to within the 24-hour timeframe mandated by CA/Browser Forum Baseline Requirements Section 4.9.5. The issue came to attention on June 5, 2025, at 20:12 UTC, when a follow-up message from the same sender successfully reached the inbox, triggering immediate investigation and response.
  • Incident Root Cause(s):
    The CPR email was misclassified and quarantined due to spam filtering logic, and the absence of a daily quarantine review process for the mailbox prevented timely detection. The standard workflow for CPR intake otherwise functions as intended, but this specific incident revealed a gap in monitoring quarantined messages.
  • Remediation description:
    To address the identified issue, we have implemented a daily review process for quarantined emails, focusing on critical mailboxes to ensure timely identification of false positives and appropriate handling of potential threats.
    Review process for quarantined items-Daily review process documented and tracked in logs.
  • Commitment summary:
    eMudhra remains committed to timely and complete handling of all Certificate Problem Reports in accordance with CA/Browser Forum Baseline Requirements. We have addressed the specific filtering and monitoring gaps identified in this incident. All relevant corrective measures have been fully implemented to prevent recurrence, and internal processes have been updated to ensure continuous compliance. All Action Items disclosed in this report have been completed as described, and we request its closure.

All Action Items disclosed in this report have been completed as described, and we request its closure.

Flags: needinfo?(naveen.ml)
Flags: needinfo?(incident-reporting)
Whiteboard: [ca-compliance] [policy-failure] → [ca-compliance] [policy-failure] [external]

This is a final call for comments or questions on this Incident Report.

Otherwise, it will be closed on approximately 2025-07-14.

Whiteboard: [ca-compliance] [policy-failure] [external] → [close on 2025-07-14] [ca-compliance] [policy-failure] [external]
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Flags: needinfo?(incident-reporting)
Resolution: --- → FIXED
Whiteboard: [close on 2025-07-14] [ca-compliance] [policy-failure] [external] → [ca-compliance] [policy-failure] [external]
You need to log in before you can comment on or make changes to this bug.