1994454 - Sectigo: Failure to reply to Certificate Problem Reports within 24 hours

Martijn Katerbarg

Assignee

Description

•

4 months ago

Preliminary Incident Report

Summary

Incident description:

On October 13th we became aware of a problem with our Certificate Problem Report email queue, which resulted in us failing to reply to several Certificate Problem Reports within the required 24 hours.

Relevant policies: Baseline Requirements for the Issuance and Management of Publicly-Trusted TLS Server Certificates Version 2.1.7, Section 4.9.5
Source of incident disclosure: Third Party Reported

incident-reporting

Updated

•

4 months ago

Assignee: nobody → martijn.katerbarg

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Whiteboard: [ca-compliance] [policy-failure]

Martijn Katerbarg

Assignee

Comment 1

•

3 months ago

Full Incident Report

Summary

CA Owner CCADB unique ID: A000016
Incident description:

On October 13th we became aware of a problem with our Certificate Problem Report email queue, which resulted in us failing to reply to several Certificate Problem Reports within the required 24 hours.

Timeline summary:
- Non-compliance start date: 2025-09-19 – 09:51 UTC
- Non-compliance identified date: 2025-10-13 – 08:01 UTC
- Non-compliance end date: 2025-10-13 – 20:57 UTC
Relevant policies:

Baseline Requirements for the Issuance and Management of Publicly-Trusted TLS Server Certificates Version 2.1.7, Section 4.9.5

Source of incident disclosure: Third Party Reported

Impact

Total number of certificates: 150 Certificates were referenced by a total of 68 Certificate Problem Reports which fall under the scope of this incident.
Total number of "remaining valid" certificates: 0. All Certificate Problem reports have been triaged.
Affected certificate types: TLS (DV, OV, EV), S/MIME and Code Signing
Incident heuristic:

The incident affected all certificate problem reports (CPR) filed between 2025-09-18 – 09:51 UTC and 2025-10-13 – 08:01 UTC.

To further expand on the scope, the CPRs can be divided into:

53 CPRs filed by a single customer, of which:
o 3 were for their own TLS Certificates, resulting in 34 revocations.
o 52 were for their own S/MIME Certificates, resulting in 104 revocations.
o Note: 2 of the CPRs referenced both TLS and S/MIME certificates.
3 CPRs were deemed invalid.
12 further CPRs were reported, mostly by third parties, of which:
o 1 was for a single TLS certificate, resulting in 1 revocation.
o 1 was for an S/MIME certificate, resulting in 1 revocation.
o 10 were for Code Signing certificates, resulting in 10 revocations.
Was issuance stopped in response to this incident, and why or why not?: No. This incident affected certificate problem report handling, not certificate issuance.
Analysis: N/A.
Additional considerations: N/A.

Timeline

All times are in UTC.

2025-09-17
o 20:22 We make changes to our (Salesforce based) CRM, specifically the “case origin” field. As part of this, and unbeknown at the time, this breaks the logic taking care of directing CPR emails into the correct queue.
2025-09-18
o 08:01 We receive the first CPR since the change, resulting in it not showing up in the appropriate filter.
2025-10-12
o 20:39 I personally receive an email on my work address from an acquainted third party, requesting assistance on having code signing CPRs be processed. The reporter states that emails sent to the CPR address have not been actioned as of late.
2025-10-13
o 08:01 I read the aforementioned email and start investigating with the appropriate teams.
o 08:35 We confirm our queue of CPR emails (or SSL Abuse, as we call these), is empty.
o 09:59 Based on emails we are able to trace, we build a custom filter to find all pending CPR emails. At this time visibility into pending cases is restored. Our abuse team starts processing the backlog of emails.
o 15:51 We open a ticket with our IT helpdesk to address the breakdown in our CRM.
o 20:57 All emails have been processed.
2025-10-15
o 18:27 Our SF team confirms the issue is due to a recent change to the Case Origin logic. The team continues to discuss the best option going forward.
2025-10-17
o 22:18 Our SF team reverts the recent change, restoring functionality of the Case Origin field and logic utilizing the field.

Related Incidents

Bug	Date	Description
1905509	2024-06-29	Both incidents affect the timely response to a CPR, however the root cause differs.
1959733	2025-04-10	Both incidents affect the timely response to a CPR. In bug 1959733 CPR emails did not reach the CA due to email filtering at the ISP, which did not occur in our case.
1888881	2024-04-01	Both incidents affect the timely response to a CPR. In bug 1888881 CPR emails were not processed due to personnel deficiencies, which did not occur in our case.
1963629	2025-04-30	Both incidents affect the timely response to a CPR. In bug 1963629 a second CPR address email address, which did not occur in our case.
1967929	2025-05-22	Both incidents affect the timely response to a CPR. In bug 1967929 CPR emails were not processed timely due to personnel deficiencies, which did not occur in our case.
1970727	2025-05-22	Both incidents affect the timely response to a CPR. In bug 1970727 the affected CPR email was incorrectly marked as spam.
1985466	2025-08-27	Both incidents affect the timely response to a CPR. In bug 1985466 CPR emails were not processed timely due to personnel deficiencies, which did not occur in our case.

Root Cause Analysis

Contributing Factor 1: Logic change to Case Origin field

Description:

We confirmed that the root cause is a change to the Case Origin field, which disrupted the logic in the TriggerUtility Apex class responsible for stamping the Case Reason. This logic relies on a custom label (Set_Reason_Case_Checker) to match specific origin values. With the Case Origin values changed, the logic could no longer be applied, resulting in blank case reasons. As the list view filter for showing pending CPRs relied on these fields to be filled properly, cases no longer showed up

Timeline: The logic change took place in September 2025.
Detection: This was detected as part of our incident investigation.
Interaction with other factors: N/A

Contributing Factor 2: Certificate Problem Report logic not tested as part of changes in CRM processing logic

Description:

Due to a gap in understanding the requirements, the CPR logic flow was not specifically tested when changes were made to the Case Origin field. The lack of testing led to the issue going undetected until externally reported.

Timeline: The logic change took place in September 2025.
Detection: This was detected as part of our incident investigation.
Interaction with other factors: The gap in understanding of requirements around CPRs, directly caused the flow of CPRs to not have been tested when changes were made to the Case Origin field.

Contributing Factor 3: Gap in communications and understanding of requirements between SF / CRM team and SSL Abuse team

Description:

It’s become clear during our investigation that the SF/CRM team was not sufficiently aware of the flow of CPR cases, and the requirements which apply to us for responding to these in a timely fashion.

Timeline: This was detected in October 2025.
Detection: Self-detected as part of our investigation.
Interaction with other factors: The gap in understanding of requirements around CPRs, directly caused the flow of CPRs to not have been tested when changes were made to the Case Origin field.

Lessons Learned

What went well:
- Once identified, we were quickly able to setup a workaround to capture backlog and future cases.
What didn’t go well:
- With the Case Origin logic change, we did not confirm if there was any impact to the CPR processing logic.
- We detected gaps between responsible departments in understanding requirements in place.
Where we got lucky:
- One of the third-party reporters reached out through an out-of-band method.

Action Items

Action Item	Kind	Corresponding Root Cause(s)	Evaluation Criteria	Due Date	Status
Restore Case Origin logic	Prevent	Contributing Factor 1	By restoring the Case Origin logic, existing flows will return to normal operations.	2025-10-21	Completed
Add SLA with automated monitoring and alerting	Prevent	Contributing Factor 1	An automated SLA will be added that triggers alerts when CPRs have been received 12 hours ago and not yet been processed. The monitoring for this will use a less restrictive filter, which increases the coverage, and would have caught the cases that were affected by this bug.	2025-11-30	Ongoing
Add testing CPR flows as part of any SF/CRM changes	Prevent	Contributing Factor 2	Making sure CPR logic still works after any changes affecting cases, “email to case” logic or filters, should make sure no further reoccurrence of this root cause will occur.	2025-10-30	Ongoing
Iterate and clarify the requirements placed on the SSLAbuse team to the SF/CRM team	Prevent	Contributing Factor 3	Already as part of this bug, the importance and requirements of CPRs have been make clear to the SF/CRM team, which is already leading to contributions from the team on making further improvements, as outlined within these Action Items.	2025-09-21	Completed

Martijn Katerbarg

Assignee

Comment 2

•

3 months ago

Our third action item was completed as scheduled.

Based on the final remaining action item, we would like to request a next-update for 2025-11-30

Flags: needinfo?(bwilson)

incident-reporting

Updated

•

3 months ago

Whiteboard: [ca-compliance] [policy-failure] → [ca-compliance] [policy-failure] Next update 2025-11-30

Ben Wilson

Updated

•

3 months ago

Flags: needinfo?(bwilson)

Martijn Katerbarg

Assignee

Comment 3

•

2 months ago

Yesterday our automated monitoring and alerting changes for CPRs went into production, completing our final action item.

We intend to post a Report Closure Summary by this time next week.

Martijn Katerbarg

Assignee

Comment 4

•

2 months ago

Report Closure Summary

Incident description:

A logic change to our Certificate Problem Report email queue resulted in us failing to reply to several Certificate Problem Reports (CPR) within the required 24 hours.

Incident Root Cause(s):

A logic change to Case Origin field within our CRM, combined with insufficient testing on the CPR email queue, led to this incident.

Remediation description:

We have restored the Case Origin logic, added testing CPR flows as part of any CRM changes, and added an automated SLA with alerting and escalation methods to the CPR queue.

Commitment summary:

Sectigo is committed to keeping these changes in place and to maintaining awareness within our organization of the expectations regarding CPRs.

All Action Items disclosed in this report have been completed as described, and we request its closure.

Flags: needinfo?(bwilson)

incident-reporting

Comment 5

•

2 months ago

This is a final call for comments or questions on this Incident Report.

Otherwise, it will be closed on approximately 2025-12-10.

Flags: needinfo?(bwilson) → needinfo?(incident-reporting)

Whiteboard: [ca-compliance] [policy-failure] Next update 2025-11-30 → [close on 2025-12-10] [ca-compliance] [policy-failure]

incident-reporting

Updated

•

2 months ago

Status: ASSIGNED → RESOLVED

Closed: 2 months ago

Flags: needinfo?(incident-reporting)

Resolution: --- → FIXED

Bugzilla

Sectigo: Failure to reply to Certificate Problem Reports within 24 hours

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

People

(Reporter: martijn.katerbarg, Assigned: martijn.katerbarg)

References

Details

(Whiteboard: [close on 2025-12-10] [ca-compliance] [policy-failure])

Crash Data

Security

(public)

User Story

Description

Preliminary Incident Report

Summary

Updated

Comment 1

Full Incident Report

Summary

Impact

Timeline

Related Incidents

Root Cause Analysis

Lessons Learned

Action Items

Comment 2

Updated

Updated

Comment 3

Comment 4

Report Closure Summary

Comment 5

Updated